1
|
Li R, Hu M, Jiang X, Xu C. Metagenomic insights into the microbiota involved in lactate and butyrate production and manipulating their synthesis in alfalfa silage. J Appl Microbiol 2023; 134:lxad197. [PMID: 37660237 DOI: 10.1093/jambio/lxad197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/04/2023] [Accepted: 08/30/2023] [Indexed: 09/04/2023]
Abstract
AIMS Lactate and butyrate are important indicators of silage quality. However, the microorganisms and mechanisms responsible for lactate and butyrate production in silage are not well documented. METHODS AND RESULTS whole-metagenomic sequencing was used to analyse metabolic pathways, microbiota composition, functional genes, and their contributions to lactate and butyrate production in alfalfa silage with (SA) and without (CK) sucrose addition. Carbon metabolism was the most abundant metabolic pathway. We identified 11 and 2 functional genes associated with lactate and butyrate metabolism, respectively. Among them, D-lactate dehydrogenase (ldhA) and L-lactate dehydrogenase (ldhB) were most important for the transition between D/L-lactate and pyruvate and were primarily related to Lactobacillus in the SA group. The genes encoding L-lactate dehydrogenase (lldD), which decomposes lactate, were the most abundant and primarily associated with Enterobacter cloacae. Butyrate-related genes, mainly encoding butyryl-CoA: acetate CoA-transferase (but), were predominantly associated with Klebsiella oxytoca and Escherichia coli in the CK group. CONCLUSIONS Enterobacteriaceae and Lactobacillaceae were mainly responsible for butyrate and lactate formation, respectively.
Collapse
Affiliation(s)
- Rongrong Li
- College of Engineering, China Agricultural University, Beijing 100083, China
- College of Environment and Life Sciences, Weinan Normal University, Weinan 714099, China
| | - Ming Hu
- College of Environment and Life Sciences, Weinan Normal University, Weinan 714099, China
| | - Xin Jiang
- College of Engineering, China Agricultural University, Beijing 100083, China
| | - Chuncheng Xu
- College of Engineering, China Agricultural University, Beijing 100083, China
| |
Collapse
|
2
|
Rong Lee M, Kim JC, Eun Park S, Kim WJ, Su Kim J. Detection of Viral Genes in Metarhizium anisopliae JEF-290-infected longhorned tick, Haemaphysalis longicornis using transcriptome analysis. J Invertebr Pathol 2023; 198:107926. [PMID: 37087092 DOI: 10.1016/j.jip.2023.107926] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 04/13/2023] [Accepted: 04/16/2023] [Indexed: 04/24/2023]
Abstract
Ticks are carriers of viruses that can cause disease in humans and animals. The longhorned ticks (Haemaphysalis longicornis; LHT), for example, mediates the severe fever with thrombocytopenia syndrome virus (SFTSV) in humans, and the population of ticks is growing due to increases in temperature caused by climate change. As ticks carry primarily RNA viruses, there is a need to study the possibility of detecting new viruses through tick virome analysis. In this study, viruses in LHTs collected in Korea were investigated and virus titers in ticks exposed to the entomopathogenic fungus Metarhizium anisopliae JEF-290 were analyzed. Total RNA was extracted from the collected ticks, and short reads were obtained from Illumina sequencing. A total of 50,024 contigs with coding capacity were obtained after de novo assembly of the reads in the metaSPAdes genome assembler. A series of BLAST-based analyses using the GenBank database was performed to screen viral contigs, and three putative virus species were identified from the tick meta-transcriptome, such as Alongshan virus (ALSV), Denso virus and Taggert virus. Measurements of virus-expression levels of infected and non-infected LHTs failed to detect substantial differences in expression levels. However, we suggest that LHT can spread not only SFTSV, but also various other disease-causing viruses over large areas of the world. From the phylogenetic analysis of ALSV glycoproteins, genetic differences in the ALSV could be due to host differences as well as regional differences. Viral metagenome analysis can be used as a tool to manage future outbreaks of disease caused by ticks by detecting unknown viruses.
Collapse
Affiliation(s)
- Mi Rong Lee
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54596, Korea
| | - Jong-Cheol Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54596, Korea
| | - So Eun Park
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54596, Korea
| | | | - Jae Su Kim
- Department of Agricultural Biology, College of Agriculture & Life Sciences, Jeonbuk National University, Jeonju 54596, Korea; Department of Agricultural Convergence Technology, Jeonbuk National University, Jeonju 54596, Republic of Korea.
| |
Collapse
|
3
|
Li J, Liang Y, Miao Y, Wang D, Jia S, Liu CH. Metagenomic insights into aniline effects on microbial community and biological sulfate reduction pathways during anaerobic treatment of high-sulfate wastewater. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020; 742:140537. [PMID: 32623173 DOI: 10.1016/j.scitotenv.2020.140537] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 06/10/2020] [Accepted: 06/24/2020] [Indexed: 06/11/2023]
Abstract
For comprehensive insights into the change of sulfate reduction pathway responding to the toxic stress and the shift of microbial community and performance of sulfate reduction, we built a laboratory-scale expanded granular sludge bed reactor (EGSB) treating high-sulfate wastewater with elevated aniline concentrations from 0 to 480 mg/L. High-throughput sequencing and metagenomic approaches were applied to decipher the molecular mechanisms of sulfate reduction under aniline stress through taxonomic and functional profiles. The increasing aniline in the anaerobic system induced the accumulation of volatile fatty acids (VFA), further turned the bioreactor into acidification, which was the principal reason for the deterioration of system performance and finally resulted in the accumulation of toxic free sulfide. Moreover, aniline triggered the change of bacterial community and genes relating to sulfate reduction pathways. The increase of aniline from 0 to 320 mg/L enriched total sulfate-reducing bacteria (SRB), and the most abundant genus was Desulfomicrobium, accounting for 66.85-91.25% of total SRB. The assimilatory sulfate reduction pathway was obviously inhibited when aniline was over 160 mg/L, while genes associated with dissimilatory sulfate reduction pathways all exhibited an upward tendency with the increasing aniline content. The enrichment of aniline-resistant SRB (e.g. Desulfomicrobium) carrying genes associated with the dissimilatory sulfate reduction pathway also confirmed the underlying mechanism that sulfate reduction turned into dissimilation under high aniline condition. Taken together, these results comprehensively provided solid evidence for the effects of aniline on the biological sulfate reduction processes treating high-sulfate wastewater and the underlying molecular mechanisms which may highlight the important roles of SRB and related sulfate reduction genes during treatment.
Collapse
Affiliation(s)
- Jun Li
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Ying Liang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Yu Miao
- Department of Civil and Environmental Engineering, University of California, Los Angeles, CA 90095, United States
| | - Depeng Wang
- State Key Laboratory of Pollution Control and Resources Reuse, School of the Environment, Nanjing University, Nanjing 210023, China
| | - Shuyu Jia
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China.
| | - Chang-Hong Liu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210023, China
| |
Collapse
|
4
|
Meaker GA, Hair EJ, Gorochowski TE. Advances in engineering CRISPR-Cas9 as a molecular Swiss Army knife. Synth Biol (Oxf) 2020; 5:ysaa021. [PMID: 33344779 PMCID: PMC7737000 DOI: 10.1093/synbio/ysaa021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 09/29/2020] [Accepted: 10/01/2020] [Indexed: 02/06/2023] Open
Abstract
The RNA-guided endonuclease system CRISPR-Cas9 has been extensively modified since its discovery, allowing its capabilities to extend far beyond double-stranded cleavage to high fidelity insertions, deletions and single base edits. Such innovations have been possible due to the modular architecture of CRISPR-Cas9 and the robustness of its component parts to modifications and the fusion of new functional elements. Here, we review the broad toolkit of CRISPR-Cas9-based systems now available for diverse genome-editing tasks. We provide an overview of their core molecular structure and mechanism and distil the design principles used to engineer their diverse functionalities. We end by looking beyond the biochemistry and toward the societal and ethical challenges that these CRISPR-Cas9 systems face if their transformative capabilities are to be deployed in a safe and acceptable manner.
Collapse
Affiliation(s)
- Grace A Meaker
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
- School of Biosciences, Cardiff University, Cardiff CF10 3AT, UK
| | - Emma J Hair
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
| | - Thomas E Gorochowski
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, UK
- BrisSynBio, University of Bristol, Bristol BS8 1TQ, UK
| |
Collapse
|
5
|
Padovani de Souza K, Setubal JC, Ponce de Leon F de Carvalho AC, Oliveira G, Chateau A, Alves R. Machine learning meets genome assembly. Brief Bioinform 2020; 20:2116-2129. [PMID: 30137230 DOI: 10.1093/bib/bby072] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 07/11/2018] [Accepted: 07/22/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale. RESULTS This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.
Collapse
Affiliation(s)
| | - João Carlos Setubal
- University of São Paulo, Brazil.,Department of Computer Science, University of São Paulo, Brazil
| | | | | | - Annie Chateau
- Vale Technology Institute-Sustainable Development, Brazil
| | - Ronnie Alves
- Federal University of Pará, Brazil.,University of Montpellier, LIRMM, France
| |
Collapse
|
6
|
Wang Z, Wang Y, Fuhrman JA, Sun F, Zhu S. Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences. Brief Bioinform 2020; 21:777-790. [PMID: 30860572 DOI: 10.1093/bib/bbz025] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/25/2019] [Indexed: 12/19/2022] Open
Abstract
In metagenomic studies of microbial communities, the short reads come from mixtures of genomes. Read assembly is usually an essential first step for the follow-up studies in metagenomic research. Understanding the power and limitations of various read assembly programs in practice is important for researchers to choose which programs to use in their investigations. Many studies evaluating different assembly programs used either simulated metagenomes or real metagenomes with unknown genome compositions. However, the simulated datasets may not reflect the real complexities of metagenomic samples and the estimated assembly accuracy could be misleading due to the unknown genomes in real metagenomes. Therefore, hybrid strategies are required to evaluate the various read assemblers for metagenomic studies. In this paper, we benchmark the metagenomic read assemblers by mixing reads from real metagenomic datasets with reads from known genomes and evaluating the integrity, contiguity and accuracy of the assembly using the reads from the known genomes. We selected four advanced metagenome assemblers, MEGAHIT, MetaSPAdes, IDBA-UD and Faucet, for evaluation. We showed the strengths and weaknesses of these assemblers in terms of integrity, contiguity and accuracy for different variables, including the genetic difference of the real genomes with the genome sequences in the real metagenomic datasets and the sequencing depth of the simulated datasets. Overall, MetaSPAdes performs best in terms of integrity and continuity at the species-level, followed by MEGAHIT. Faucet performs best in terms of accuracy at the cost of worst integrity and continuity, especially at low sequencing depth. MEGAHIT has the highest genome fractions at the strain-level and MetaSPAdes has the overall best performance at the strain-level. MEGAHIT is the most efficient in our experiments. Availability: The source code is available at https://github.com/ziyewang/MetaAssemblyEval.
Collapse
Affiliation(s)
- Ziye Wang
- School of Mathematical Sciences and the Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Jed A Fuhrman
- Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, California, United States of America
| | - Fengzhu Sun
- Department of Biological Sciences, University of Southern California, Los Angeles, California, United States of America
| | - Shanfeng Zhu
- Shanghai Key Lab of Intelligent Information Processing, the School of Computer Science and the Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes. Brief Bioinform 2020; 20:1140-1150. [PMID: 28968737 DOI: 10.1093/bib/bbx098] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Revised: 07/13/2017] [Indexed: 01/09/2023] Open
Abstract
Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
Collapse
|
8
|
Chen LX, Anantharaman K, Shaiber A, Eren AM, Banfield JF. Accurate and complete genomes from metagenomes. Genome Res 2020; 30:315-333. [PMID: 32188701 PMCID: PMC7111523 DOI: 10.1101/gr.258640.119] [Citation(s) in RCA: 185] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
Collapse
Affiliation(s)
- Lin-Xing Chen
- Department of Earth and Planetary Sciences, University of California, Berkeley, California 94720, USA
| | - Karthik Anantharaman
- Department of Earth and Planetary Sciences, University of California, Berkeley, California 94720, USA
| | - Alon Shaiber
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois 60637, USA.,Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - A Murat Eren
- Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA.,Bay Paul Center, Marine Biological Laboratory, Woods Hole, Massachusetts 02543, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Sciences, University of California, Berkeley, California 94720, USA.,Department of Environmental Science, Policy, and Management, University of California, Berkeley, California 94720, USA.,Earth and Environmental Sciences, Lawrence Berkeley National Laboratory, University of California, Berkeley, California 94720, USA
| |
Collapse
|
9
|
Garrido-Sanz L, Senar MÀ, Piñol J. Estimation of the relative abundance of species in artificial mixtures of insects using low-coverage shotgun metagenomics. METABARCODING AND METAGENOMICS 2020. [DOI: 10.3897/mbmg.4.48281] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Amplicon metabarcoding is an established technique to analyse the taxonomic composition of communities of organisms using high-throughput DNA sequencing, but there are doubts about its ability to quantify the relative proportions of the species, as opposed to the species list. Here, we bypass the enrichment step and avoid the PCR-bias, by directly sequencing the extracted DNA using shotgun metagenomics. This approach is common practice in prokaryotes, but not in eukaryotes, because of the low number of sequenced genomes of eukaryotic species. We tested the metagenomics approach using insect species whose genome is already sequenced and assembled to an advanced degree. We shotgun-sequenced, at low-coverage, 18 species of insects in 22 single-species and 6 mixed-species libraries and mapped the reads against 110 reference genomes of insects. We used the single-species libraries to calibrate the process of assignation of reads to species and the libraries created from species mixtures to evaluate the ability of the method to quantify the relative species abundance. Our results showed that the shotgun metagenomic method is easily able to set apart closely-related insect species, like four species of Drosophila included in the artificial libraries. However, to avoid the counting of rare misclassified reads in samples, it was necessary to use a rather stringent detection limit of 0.001, so species with a lower relative abundance are ignored. We also identified that approximately half the raw reads were informative for taxonomic purposes. Finally, using the mixed-species libraries, we showed that it was feasible to quantify with confidence the relative abundance of individual species in the mixtures.
Collapse
|
10
|
Chan AWY, Naphtali J, Schellhorn HE. High-throughput DNA sequencing technologies for water and wastewater analysis. Sci Prog 2019; 102:351-376. [PMID: 31818206 PMCID: PMC10424514 DOI: 10.1177/0036850419881855] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Conventional microbiological water monitoring uses culture-dependent techniques to screen indicator microbial species such as Escherichia coli and fecal coliforms. With high-throughput, second-generation sequencing technologies becoming less expensive, water quality monitoring programs can now leverage the massively parallel nature of second-generation sequencing technologies for batch sample processing to simultaneously obtain compositional and functional information of culturable and as yet uncultured microbial organisms. This review provides an introduction to the technical capabilities and considerations necessary for the use of second-generation sequencing technologies, specifically 16S rDNA amplicon and whole-metagenome sequencing, to investigate the composition and functional potential of microbiomes found in water and wastewater systems.
Collapse
Affiliation(s)
| | - James Naphtali
- Department of Biology, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
11
|
Laudadio I, Fulci V, Stronati L, Carissimi C. Next-Generation Metagenomics: Methodological Challenges and Opportunities. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 23:327-333. [PMID: 31188063 DOI: 10.1089/omi.2019.0073] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Metagenomics is not only one of the newest omics system science technologies but also one that has arguably the broadest set of applications and impacts globally. Metagenomics has found vast utility not only in environmental sciences, ecology, and public health but also in clinical medicine and looking into the future, in planetary health. In line with the One Health concept, metagenomics solicits collaboration between molecular biologists, geneticists, microbiologists, clinicians, computational biologists, plant biologists, veterinarians, and other health care professionals. Almost every ecological niche of our planet hosts an extremely diverse community of organisms that are still poorly characterized. Detailed characterization of the features of such communities is instrumental to our comprehension of ecological, biological, and clinical complexity. This expert review article evaluates how metagenomics is improving our knowledge of microbiota composition from environmental to human samples. Furthermore, we offer an analysis of the common technical and methodological challenges and potential pitfalls arising from metagenomics approaches, such as metagenomics study design, data processing, and interpretation. All in all, at this critical juncture of further growth of the metagenomics field, it is time to critically reflect on the lessons learned and the future prospects of next-generation metagenomics science, technology, and conceivable applications, particularly from the standpoint of a metagenomics methodology perspective.
Collapse
Affiliation(s)
- Ilaria Laudadio
- Department of Molecular Medicine, "Sapienza" University of Rome, Rome, Italy
| | - Valerio Fulci
- Department of Molecular Medicine, "Sapienza" University of Rome, Rome, Italy
| | - Laura Stronati
- Department of Molecular Medicine, "Sapienza" University of Rome, Rome, Italy
| | - Claudia Carissimi
- Department of Molecular Medicine, "Sapienza" University of Rome, Rome, Italy
| |
Collapse
|
12
|
Sutton TDS, Clooney AG, Ryan FJ, Ross RP, Hill C. Choice of assembly software has a critical impact on virome characterisation. MICROBIOME 2019; 7:12. [PMID: 30691529 PMCID: PMC6350398 DOI: 10.1186/s40168-019-0626-5] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 01/14/2019] [Indexed: 05/19/2023]
Abstract
BACKGROUND The viral component of microbial communities plays a vital role in driving bacterial diversity, facilitating nutrient turnover and shaping community composition. Despite their importance, the vast majority of viral sequences are poorly annotated and share little or no homology to reference databases. As a result, investigation of the viral metagenome (virome) relies heavily on de novo assembly of short sequencing reads to recover compositional and functional information. Metagenomic assembly is particularly challenging for virome data, often resulting in fragmented assemblies and poor recovery of viral community members. Despite the essential role of assembly in virome analysis and difficulties posed by these data, current assembly comparisons have been limited to subsections of virome studies or bacterial datasets. DESIGN This study presents the most comprehensive virome assembly comparison to date, featuring 16 metagenomic assembly approaches which have featured in human virome studies. Assemblers were assessed using four independent virome datasets, namely, simulated reads, two mock communities, viromes spiked with a known phage and human gut viromes. RESULTS Assembly performance varied significantly across all test datasets, with SPAdes (meta) performing consistently well. Performance of MIRA and VICUNA varied, highlighting the importance of using a range of datasets when comparing assembly programs. It was also found that while some assemblers addressed the challenges of virome data better than others, all assemblers had limitations. Low read coverage and genomic repeats resulted in assemblies with poor genome recovery, high degrees of fragmentation and low-accuracy contigs across all assemblers. These limitations must be considered when setting thresholds for downstream analysis and when drawing conclusions from virome data.
Collapse
Affiliation(s)
- Thomas D S Sutton
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Adam G Clooney
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
| | - Feargal J Ryan
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Present Address: South Australian Health and Medical Research Institute, Adelaide, Australia
| | - R Paul Ross
- APC Microbiome Ireland, Cork, Ireland
- School for Microbiology, University College Cork, Cork, Ireland
- Teagasc Food Research Centre, Fermoy, Cork, Ireland
| | - Colin Hill
- APC Microbiome Ireland, Cork, Ireland.
- School for Microbiology, University College Cork, Cork, Ireland.
| |
Collapse
|
13
|
Batut B, Gravouil K, Defois C, Hiltemann S, Brugère JF, Peyretaillade E, Peyret P. ASaiM: a Galaxy-based framework to analyze microbiota data. Gigascience 2018; 7:5001424. [PMID: 29790941 PMCID: PMC6007547 DOI: 10.1093/gigascience/giy057] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 05/10/2018] [Indexed: 12/24/2022] Open
Abstract
Background New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.
Collapse
Affiliation(s)
- Bérénice Batut
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Kévin Gravouil
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LIMOS, 63000 Clermont-Ferrand, France
| | - Clémence Defois
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| | - Saskia Hiltemann
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | - Jean-François Brugère
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
| | - Eric Peyretaillade
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
| | - Pierre Peyret
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| |
Collapse
|
14
|
Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017; 5:e3817. [PMID: 28948103 PMCID: PMC5610896 DOI: 10.7717/peerj.3817] [Citation(s) in RCA: 157] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/26/2017] [Indexed: 12/20/2022] Open
Abstract
Background Viral metagenomics (viromics) is increasingly used to obtain uncultivated viral genomes, evaluate community diversity, and assess ecological hypotheses. While viromic experimental methods are relatively mature and widely accepted by the research community, robust bioinformatics standards remain to be established. Here we used in silico mock viral communities to evaluate the viromic sequence-to-ecological-inference pipeline, including (i) read pre-processing and metagenome assembly, (ii) thresholds applied to estimate viral relative abundances based on read mapping to assembled contigs, and (iii) normalization methods applied to the matrix of viral relative abundances for alpha and beta diversity estimates. Results Tools specifically designed for metagenomes, specifically metaSPAdes, MEGAHIT, and IDBA-UD, were the most effective at assembling viromes. Read pre-processing, such as partitioning, had virtually no impact on assembly output, but may be useful when hardware is limited. Viral populations with 2–5 × coverage typically assembled well, whereas lesser coverage led to fragmented assembly. Strain heterogeneity within populations hampered assembly, especially when strains were closely related (average nucleotide identity, or ANI ≥97%) and when the most abundant strain represented <50% of the population. Viral community composition assessments based on read recruitment were generally accurate when the following thresholds for detection were applied: (i) ≥10 kb contig lengths to define populations, (ii) coverage defined from reads mapping at ≥90% identity, and (iii) ≥75% of contig length with ≥1 × coverage. Finally, although data are limited to the most abundant viruses in a community, alpha and beta diversity patterns were robustly estimated (±10%) when comparing samples of similar sequencing depth, but more divergent (up to 80%) when sequencing depth was uneven across the dataset. In the latter cases, the use of normalization methods specifically developed for metagenomes provided the best estimates. Conclusions These simulations provide benchmarks for selecting analysis cut-offs and establish that an optimized sample-to-ecological-inference viromics pipeline is robust for making ecological inferences from natural viral communities. Continued development to better accessing RNA, rare, and/or diverse viral populations and improved reference viral genome availability will alleviate many of viromics remaining limitations.
Collapse
Affiliation(s)
- Simon Roux
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Joanne B Emerson
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
| | - Emiley A Eloe-Fadrosh
- Joint Genome Institute, Department of Energy, Walnut Creek, CA, United States of America
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America.,Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, United States of America
| |
Collapse
|
15
|
van der Walt AJ, van Goethem MW, Ramond JB, Makhalanyane TP, Reva O, Cowan DA. Assembling metagenomes, one community at a time. BMC Genomics 2017; 18:521. [PMID: 28693474 PMCID: PMC5502489 DOI: 10.1186/s12864-017-3918-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 07/02/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metagenomics allows unprecedented access to uncultured environmental microorganisms. The analysis of metagenomic sequences facilitates gene prediction and annotation, and enables the assembly of draft genomes, including uncultured members of a community. However, while several platforms have been developed for this critical step, there is currently no clear framework for the assembly of metagenomic sequence data. RESULTS To assist with selection of an appropriate metagenome assembler we evaluated the capabilities of nine prominent assembly tools on nine publicly-available environmental metagenomes, as well as three simulated datasets. Overall, we found that SPAdes provided the largest contigs and highest N50 values across 6 of the 9 environmental datasets, followed by MEGAHIT and metaSPAdes. MEGAHIT emerged as a computationally inexpensive alternative to SPAdes, assembling the most complex dataset using less than 500 GB of RAM and within 10 hours. CONCLUSIONS We found that assembler choice ultimately depends on the scientific question, the available resources and the bioinformatic competence of the researcher. We provide a concise workflow for the selection of the best assembly tool.
Collapse
Affiliation(s)
- Andries Johannes van der Walt
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa.,Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Marc Warwick van Goethem
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Jean-Baptiste Ramond
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Thulani Peter Makhalanyane
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa
| | - Oleg Reva
- Centre for Bioinformatics and Computational Biology, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Don Arthur Cowan
- Centre for Microbial Ecology and Genomics (CMEG), Department of Genetics, University of Pretoria, Natural Sciences Building 2, Lynnwood Road, Pretoria, 0028, South Africa.
| |
Collapse
|