1
|
Gao Y, Luo H, Lyu H, Yang H, Yousuf S, Huang S, Liu YX. Benchmarking short-read metagenomics tools for removing host contamination. Gigascience 2025; 14:giaf004. [PMID: 40036691 PMCID: PMC11878760 DOI: 10.1093/gigascience/giaf004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 10/31/2024] [Accepted: 01/09/2025] [Indexed: 03/06/2025] Open
Abstract
BACKGROUND The rapid evolution of metagenomic sequencing technology offers remarkable opportunities to explore the intricate roles of microbiome in host health and disease, as well as to uncover the unknown structure and functions of microbial communities. However, the swift accumulation of metagenomic data poses substantial challenges for data analysis. Contamination from host DNA can substantially compromise result accuracy and increase additional computational resources by including nontarget sequences. RESULTS In this study, we assessed the impact of computational host DNA decontamination on downstream analyses, highlighting its importance in producing accurate results efficiently. We also evaluated the performance of conventional tools like KneadData, Bowtie2, BWA, KMCP, Kraken2, and KrakenUniq, each offering unique advantages for different applications. Furthermore, we highlighted the importance of an accurate host reference genome, noting that its absence negatively affected the decontamination performance across all tools. CONCLUSIONS Our findings underscore the need for careful selection of decontamination tools and reference genomes to enhance the accuracy of metagenomic analyses. These insights provide valuable guidance for improving the reliability and reproducibility of microbiome research.
Collapse
Affiliation(s)
- Yunyun Gao
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hao Luo
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hujie Lyu
- Department of Life Sciences, Imperial College of London, London SW7 2AZ, UK
| | - Haifei Yang
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- College of Life Sciences, Qingdao Agricultural University, Qingdao 266000, China
| | - Salsabeel Yousuf
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Shi Huang
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Yong-Xin Liu
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
2
|
Yepes-García J, Falquet L. Metagenome quality metrics and taxonomical annotation visualization through the integration of MAGFlow and BIgMAG. F1000Res 2024; 13:640. [PMID: 39360247 PMCID: PMC11445639 DOI: 10.12688/f1000research.152290.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/04/2024] Open
Abstract
Background Building Metagenome-Assembled Genomes (MAGs) from highly complex metagenomics datasets encompasses a series of steps covering from cleaning the sequences, assembling them to finally group them into bins. Along the process, multiple tools aimed to assess the quality and integrity of each MAG are implemented. Nonetheless, even when incorporated within end-to-end pipelines, the outputs of these pieces of software must be visualized and analyzed manually lacking integration in a complete framework. Methods We developed a Nextflow pipeline (MAGFlow) for estimating the quality of MAGs through a wide variety of approaches (BUSCO, CheckM2, GUNC and QUAST), as well as for annotating taxonomically the metagenomes using GTDB-Tk2. MAGFlow is coupled to a Python-Dash application (BIgMAG) that displays the concatenated outcomes from the tools included by MAGFlow, highlighting the most important metrics in a single interactive environment along with a comparison/clustering of the input data. Results By using MAGFlow/BIgMAG, the user will be able to benchmark the MAGs obtained through different workflows or establish the quality of the MAGs belonging to different samples following the divide and rule methodology. Conclusions MAGFlow/BIgMAG represents a unique tool that integrates state-of-the-art tools to study different quality metrics and extract visually as much information as possible from a wide range of genome features.
Collapse
Affiliation(s)
- Jeferyd Yepes-García
- Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Canton of Fribourg, 1700, Switzerland
| | - Laurent Falquet
- Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Canton of Fribourg, 1700, Switzerland
| |
Collapse
|
3
|
Arcioni L, Arcieri M, Martino JD, Liberati F, Bottoni P, Castrignanò T. HPC-T-Annotator: an HPC tool for de novo transcriptome assembly annotation. BMC Bioinformatics 2024; 25:272. [PMID: 39169276 PMCID: PMC11340092 DOI: 10.1186/s12859-024-05887-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 07/30/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND The availability of transcriptomic data for species without a reference genome enables the construction of de novo transcriptome assemblies as alternative reference resources from RNA-Seq data. A transcriptome provides direct information about a species' protein-coding genes under specific experimental conditions. The de novo assembly process produces a unigenes file in FASTA format, subsequently targeted for the annotation. Homology-based annotation, a method to infer the function of sequences by estimating similarity with other sequences in a reference database, is a computationally demanding procedure. RESULTS To mitigate the computational burden, we introduce HPC-T-Annotator, a tool for de novo transcriptome homology annotation on high performance computing (HPC) infrastructures, designed for straightforward configuration via a Web interface. Once the configuration data are given, the entire parallel computing software for annotation is automatically generated and can be launched on a supercomputer using a simple command line. The output data can then be easily viewed using post-processing utilities in the form of Python notebooks integrated in the proposed software. CONCLUSIONS HPC-T-Annotator expedites homology-based annotation in de novo transcriptome assemblies. Its efficient parallelization strategy on HPC infrastructures significantly reduces computational load and execution times, enabling large-scale transcriptome analysis and comparison projects, while its intuitive graphical interface extends accessibility to users without IT skills.
Collapse
Affiliation(s)
- Lorenzo Arcioni
- Department of Computer Science, Sapienza University of Rome, Viale Regina Elena 295, 00166, Rome, Italy
| | - Manuel Arcieri
- Department of Health Technology, Technical University of Denmark, Anker Engelunds Vej 101, 2800, Kongens Lyngby, Denmark
| | - Jessica Di Martino
- Department of Ecological and Biological Sciences, University of Tuscia, Viale dell'Università s.n.c., 01100, Viterbo, Italy
| | - Franco Liberati
- Department of Computer Science, Sapienza University of Rome, Viale Regina Elena 295, 00166, Rome, Italy
- Department of Ecological and Biological Sciences, University of Tuscia, Viale dell'Università s.n.c., 01100, Viterbo, Italy
| | - Paolo Bottoni
- Department of Computer Science, Sapienza University of Rome, Viale Regina Elena 295, 00166, Rome, Italy.
| | - Tiziana Castrignanò
- Department of Ecological and Biological Sciences, University of Tuscia, Viale dell'Università s.n.c., 01100, Viterbo, Italy.
| |
Collapse
|
4
|
Valencia EM, Maki KA, Dootz JN, Barb JJ. Mock community taxonomic classification performance of publicly available shotgun metagenomics pipelines. Sci Data 2024; 11:81. [PMID: 38233447 PMCID: PMC10794705 DOI: 10.1038/s41597-023-02877-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/22/2023] [Indexed: 01/19/2024] Open
Abstract
Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Collapse
Affiliation(s)
- E Michael Valencia
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Katherine A Maki
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Jennifer N Dootz
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Jennifer J Barb
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA.
| |
Collapse
|
5
|
Yu J, Lee JYY, Tang SN, Lee PKH. Niche differentiation in microbial communities with stable genomic traits over time in engineered systems. THE ISME JOURNAL 2024; 18:wrae042. [PMID: 38470313 PMCID: PMC10987969 DOI: 10.1093/ismejo/wrae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 02/21/2024] [Accepted: 03/07/2024] [Indexed: 03/13/2024]
Abstract
Microbial communities in full-scale engineered systems undergo dynamic compositional changes. However, mechanisms governing assembly of such microbes and succession of their functioning and genomic traits under various environmental conditions are unclear. In this study, we used the activated sludge and anaerobic treatment systems of four full-scale industrial wastewater treatment plants as models to investigate the niches of microbes in communities and the temporal succession patterns of community compositions. High-quality representative metagenome-assembled genomes revealed that taxonomic, functional, and trait-based compositions were strongly shaped by environmental selection, with replacement processes primarily driving variations in taxonomic and functional compositions. Plant-specific indicators were associated with system environmental conditions and exhibited strong determinism and trajectory directionality over time. The partitioning of microbes in a co-abundance network according to groups of plant-specific indicators, together with significant between-group differences in genomic traits, indicated the occurrence of niche differentiation. The indicators of the treatment plant with rich nutrient input and high substrate removal efficiency exhibited a faster predicted growth rate, lower guanine-cytosine content, smaller genome size, and higher codon usage bias than the indicators of the other plants. In individual plants, taxonomic composition displayed a more rapid temporal succession than functional and trait-based compositions. The succession of taxonomic, functional, and trait-based compositions was correlated with the kinetics of treatment processes in the activated sludge systems. This study provides insights into ecological niches of microbes in engineered systems and succession patterns of their functions and traits, which will aid microbial community management to improve treatment performance.
Collapse
Affiliation(s)
- Jinjin Yu
- School of Energy and Environment, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Justin Y Y Lee
- School of Energy and Environment, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Siang Nee Tang
- Facility Management and Environmental Engineering, TAL Group, Kowloon, Hong Kong SAR, China
| | - Patrick K H Lee
- School of Energy and Environment and State Key Laboratory of Marine Pollution, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
6
|
Schmidt RL, Azarbad H, Bainard L, Tremblay J, Yergeau E. Intermittent water stress favors microbial traits that better help wheat under drought. ISME COMMUNICATIONS 2024; 4:ycae074. [PMID: 38863723 PMCID: PMC11165427 DOI: 10.1093/ismeco/ycae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 05/01/2024] [Accepted: 05/16/2024] [Indexed: 06/13/2024]
Abstract
Microorganisms can improve plant resistance to drought through various mechanisms, such as the production of plant hormones, osmolytes, antioxidants, and exopolysaccharides. It is, however, unclear how previous exposure to water stress affects the functional capacity of the soil microbial community to help plants resist drought. We compared two soils that had either a continuous or intermittent water stress history (WSH) for almost 40 years. We grew wheat in these soils and subjected it to water stress, after which we collected the rhizosphere soil and shotgun sequenced its metagenome. Wheat growing in soil with an intermittent WSH maintained a higher biomass when subjected to water stress. Genes related to indole-acetic acid and osmolyte production were more abundant in the metagenome of the soil with an intermittent WSH as compared to the soil with a continuous WSH. We suggest that an intermittent WSH selects traits beneficial for life under water stress.
Collapse
Affiliation(s)
- Ruth Lydia Schmidt
- Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique, Laval, QC, H7V 1B7, Canada
| | - Hamed Azarbad
- Department of Biology, Evolutionary Ecology of Plants, Philipps-University Marburg, Karl-von-Frisch-Strasse 8, 35043 Marburg, Germany
| | - Luke Bainard
- Agassiz Research and Development Centre, Agriculture and Agri-Food Canada, 6947 #7 Highway, Agassiz, BC, V0M 1A2, Canada
| | - Julien Tremblay
- Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique, Laval, QC, H7V 1B7, Canada
| | - Etienne Yergeau
- Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique, Laval, QC, H7V 1B7, Canada
| |
Collapse
|
7
|
Schreiber L, Hunnie B, Altshuler I, Góngora E, Ellis M, Maynard C, Tremblay J, Wasserscheid J, Fortin N, Lee K, Stern G, Greer CW. Long-term biodegradation of crude oil in high-arctic backshore sediments: The Baffin Island Oil Spill (BIOS) after nearly four decades. ENVIRONMENTAL RESEARCH 2023; 233:116421. [PMID: 37327845 DOI: 10.1016/j.envres.2023.116421] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/30/2023] [Accepted: 06/13/2023] [Indexed: 06/18/2023]
Abstract
With an on-going disproportional warming of the Arctic Ocean and the reduction of the sea ice cover, the risk of an accidental oil spill from ships or future oil exploration is increasing. It is hence important to know how crude oil weathers in this environment and what factors affect oil biodegradation in the Arctic. However, this topic is currently poorly studied. In the 1980s, the Baffin Island Oil Spill (BIOS) project carried out a series of simulated oil spills in the backshore zone of beaches located on Baffin Island in the Canadian High Arctic. In this study two BIOS sites were re-visited, offering the unique opportunity to study the long-term weathering of crude oil under Arctic conditions. Here we show that residual oil remains present at these sites even after almost four decades since the original oiling. Oil at both BIOS sites appears to have attenuated very slowly with estimated loss rates of 1.8-2.7% per year. The presence of residual oil continues to significantly affect sediment microbial communities at the sites as manifested by a significantly decreased diversity, differences in the abundance of microorganisms and an enrichment of putative oil-degrading bacteria in oiled sediments. Reconstructed genomes of putative oil degraders suggest that only a subset is specifically adapted for growth under psychrothermic conditions, further reducing the time for biodegradation during the already short Arctic summers. Altogether, this study shows that crude oil spilled in the Arctic can persist and significantly affect the Arctic ecosystem for a long time, in the order of several decades.
Collapse
Affiliation(s)
- Lars Schreiber
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada.
| | - Blake Hunnie
- Centre for Earth Observation Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Ianina Altshuler
- Department of Natural Resource Sciences, McGill University, Montreal, Quebec, Canada
| | - Esteban Góngora
- Department of Natural Resource Sciences, McGill University, Montreal, Quebec, Canada
| | - Madison Ellis
- Department of Natural Resource Sciences, McGill University, Montreal, Quebec, Canada
| | - Christine Maynard
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada
| | - Julien Tremblay
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada
| | - Jessica Wasserscheid
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada
| | - Nathalie Fortin
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada
| | - Kenneth Lee
- Fisheries and Oceans Canada, Ecosystem Science, Ottawa, Ontario, Canada
| | - Gary Stern
- Centre for Earth Observation Science, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Charles W Greer
- Energy, Mining and Environment Research Centre, National Research Council Canada, Montreal, Quebec, Canada; Department of Natural Resource Sciences, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
8
|
Salas-Espejo E, Terrón-Camero LC, Ruiz JL, Molina NM, Andrés-León E. Exploring the Microbiome in Human Reproductive Tract: High-Throughput Methods for the Taxonomic Characterization of Microorganisms. Semin Reprod Med 2023; 41:125-143. [PMID: 38320576 DOI: 10.1055/s-0044-1779025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
Microorganisms are important due to their widespread presence and multifaceted roles across various domains of life, ecology, and industries. In humans, they underlie the proper functioning of multiple systems crucial to well-being, including immunological and metabolic functions. Emerging research addressing the presence and roles of microorganisms within human reproduction is increasingly relevant. Studies implementing new methodologies (e.g., to investigate vaginal, uterine, and semen microenvironments) can now provide relevant insights into fertility, reproductive health, or pregnancy outcomes. In that sense, cutting-edge sequencing techniques, as well as others such as meta-metabolomics, culturomics, and meta-proteomics, are becoming more popular and accessible worldwide, allowing the characterization of microbiomes at unprecedented resolution. However, they frequently involve rather complex laboratory protocols and bioinformatics analyses, for which researchers may lack the required expertise. A suitable pipeline would successfully enable both taxonomic classification and functional profiling of the microbiome, providing easy-to-understand biological interpretations. However, the selection of an appropriate methodology would be crucial, as it directly impacts the reproducibility, accuracy, and quality of the results and observations. This review focuses on the different current microbiome-related techniques in the context of human reproduction, encompassing niches like vagina, endometrium, and seminal fluid. The most standard and reliable methods are 16S rRNA gene sequencing, metagenomics, and meta-transcriptomics, together with complementary approaches including meta-proteomics, meta-metabolomics, and culturomics. Finally, we also offer case examples and general recommendations about the most appropriate methods and workflows and discuss strengths and shortcomings for each technique.
Collapse
Affiliation(s)
- Eduardo Salas-Espejo
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Granada, Granada, Spain
| | - Laura C Terrón-Camero
- Bioinformatics Unit, Institute of Parasitology and Biomedicine "López-Neyra" (IPBLN), CSIC, Granada, Spain
| | - José L Ruiz
- Bioinformatics Unit, Institute of Parasitology and Biomedicine "López-Neyra" (IPBLN), CSIC, Granada, Spain
| | - Nerea M Molina
- Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Granada, Granada, Spain
| | - Eduardo Andrés-León
- Bioinformatics Unit, Institute of Parasitology and Biomedicine "López-Neyra" (IPBLN), CSIC, Granada, Spain
| |
Collapse
|
9
|
Pande PM, Azarbad H, Tremblay J, St-Arnaud M, Yergeau E. Metatranscriptomic response of the wheat holobiont to decreasing soil water content. ISME COMMUNICATIONS 2023; 3:30. [PMID: 37061589 PMCID: PMC10105728 DOI: 10.1038/s43705-023-00235-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 03/17/2023] [Accepted: 03/23/2023] [Indexed: 04/17/2023]
Abstract
Crops associate with microorganisms that help their resistance to biotic stress. However, it is not clear how the different partners of this association react during exposure to stress. This knowledge is needed to target the right partners when trying to adapt crops to climate change. Here, we grew wheat in the field under rainout shelters that let through 100%, 75%, 50% and 25% of the precipitation. At the peak of the growing season, we sampled plant roots and rhizosphere, and extracted and sequenced their RNA. We compared the 100% and the 25% treatments using differential abundance analysis. In the roots, most of the differentially abundant (DA) transcripts belonged to the fungi, and most were more abundant in the 25% precipitation treatment. About 10% of the DA transcripts belonged to the plant and most were less abundant in the 25% precipitation treatment. In the rhizosphere, most of the DA transcripts belonged to the bacteria and were generally more abundant in the 25% precipitation treatment. Taken together, our results show that the transcriptomic response of the wheat holobiont to decreasing precipitation levels is stronger for the fungal and bacterial partners than for the plant.
Collapse
Affiliation(s)
- Pranav M Pande
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada
| | - Hamed Azarbad
- Department of Biology, Evolutionary Ecology of Plants, Philipps-University Marburg, Marburg, Germany
| | - Julien Tremblay
- National Research Council of Canada, Energy Mining and Environment, Montréal, Québec, Canada
| | - Marc St-Arnaud
- Institut de recherche en biologie végétale, Université de Montréal et Jardin Botanique de Montréal, Montréal, Québec, Canada
| | - Etienne Yergeau
- Institut national de la recherche scientifique, Centre Armand-Frappier Santé Biotechnologie, Laval, Québec, H7V 1B7, Canada.
| |
Collapse
|