1
|
Qayyum H, Talib MS, Ali A, Kayani MUR. Evaluating the potential of assembler-binner combinations in recovering low-abundance and strain-resolved genomes from human metagenomes. Heliyon 2025; 11:e41938. [PMID: 39897886 PMCID: PMC11786835 DOI: 10.1016/j.heliyon.2025.e41938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 01/08/2025] [Accepted: 01/13/2025] [Indexed: 02/04/2025] Open
Abstract
Human-associated microbial communities are a complex mixture of bacterial species and diverse strains prevalent at varying abundances. Due to the inherent limitations of metagenomic assemblers and genome binning tools in recovering low-abundance species (<1 %) and strains, we lack comprehensive insight into these communities. Although many bioinformatics approaches are available for recovering metagenome-assembled genomes, their effectiveness in recovering low-abundance species and strains is often questioned. Moreover, each tool has its trade-offs, making selecting the right tools challenging. In this study, we investigated the combinatory effect of various assemblers and binning tools on the recovery of low-abundance species and strain-resolved genomes from real and simulated human metagenomes. We evaluated the performance of nine combinations of metagenome assemblers and genome binning tools for their potential to recover genomes of useable quality. Our results revealed that the metaSPAdes-MetaBAT2 combination is highly effective in recovering low-abundance species, while MEGAHIT-MetaBAT2 excels in recovering strain-resolved genomes. These findings highlight the significant variation in the performance of different combinations, even when aiming for the same objective. This suggests the profound impact of selecting the right assembler-binner combination for metagenome analyses. We believe this study will be a cornerstone for the scientific community, guiding the choice of tools by highlighting their complementary effects. Furthermore, it underscores the potential of existing tools to address the current challenges in the field improving the recovery of information from metagenomes.
Collapse
Affiliation(s)
- Hajra Qayyum
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
- Capital University of Science & Technology, Islamabad Expressway, Kahuta Road Zone-V Sihala, Islamabad, Pakistan
| | - Muhammad Sarfraz Talib
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| | - Amjad Ali
- Integrative Biology Laboratory, Department of Microbiology and Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| | - Masood Ur Rehman Kayani
- Metagenomics Discovery Lab, School of Interdisciplinary Engineering and Sciences (SINES), National University of Sciences and Technology (NUST), Srinagar Highway, Sector H-12, Islamabad, Pakistan
| |
Collapse
|
2
|
Mallawaarachchi V, Wickramarachchi A, Xue H, Papudeshi B, Grigson SR, Bouras G, Prahl RE, Kaphle A, Verich A, Talamantes-Becerra B, Dinsdale EA, Edwards RA. Solving genomic puzzles: computational methods for metagenomic binning. Brief Bioinform 2024; 25:bbae372. [PMID: 39082646 PMCID: PMC11289683 DOI: 10.1093/bib/bbae372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 06/05/2024] [Accepted: 07/15/2024] [Indexed: 08/03/2024] Open
Abstract
Metagenomics involves the study of genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology of microbial communities. Once an environmental sample is sequenced and processed, metagenomic binning clusters the sequences into bins representing different taxonomic groups such as species, genera, or higher levels. Several computational tools have been developed to automate the process of metagenomic binning. These tools have enabled the recovery of novel draft genomes of microorganisms allowing us to study their behaviors and functions within microbial communities. This review classifies and analyzes different approaches of metagenomic binning and different refinement, visualization, and evaluation techniques used by these methods. Furthermore, the review highlights the current challenges and areas of improvement present within the field of research.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Hansheng Xue
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, SA 5011, Australia
| | - Rosa E Prahl
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Andrey Verich
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
- The Kirby Institute, The University of New South Wales, Randwick, Sydney, NSW 2052, Australia
| | - Berenice Talamantes-Becerra
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| |
Collapse
|
3
|
Jia L, Wu Y, Dong Y, Chen J, Chen WH, Zhao XM. A survey on computational strategies for genome-resolved gut metagenomics. Brief Bioinform 2023; 24:7145904. [PMID: 37114640 DOI: 10.1093/bib/bbad162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 03/20/2023] [Accepted: 04/04/2023] [Indexed: 04/29/2023] Open
Abstract
Recovering high-quality metagenome-assembled genomes (HQ-MAGs) is critical for exploring microbial compositions and microbe-phenotype associations. However, multiple sequencing platforms and computational tools for this purpose may confuse researchers and thus call for extensive evaluation. Here, we systematically evaluated a total of 40 combinations of popular computational tools and sequencing platforms (i.e. strategies), involving eight assemblers, eight metagenomic binners and four sequencing technologies, including short-, long-read and metaHiC sequencing. We identified the best tools for the individual tasks (e.g. the assembly and binning) and combinations (e.g. generating more HQ-MAGs) depending on the availability of the sequencing data. We found that the combination of the hybrid assemblies and metaHiC-based binning performed best, followed by the hybrid and long-read assemblies. More importantly, both long-read and metaHiC sequencings link more mobile elements and antibiotic resistance genes to bacterial hosts and improve the quality of public human gut reference genomes with 32% (34/105) HQ-MAGs that were either of better quality than those in the Unified Human Gastrointestinal Genome catalog version 2 or novel.
Collapse
Affiliation(s)
- Longhao Jia
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Yingjian Wu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Yanqi Dong
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | - Wei-Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center for Artificial Intelligence Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
- Institution of Medical Artificial Intelligence, Binzhou Medical University, Yantai 264003, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Ministry of Education, Ministry of Education, Shanghai 200433, China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
5
|
Zachariasen T, Petersen AØ, Brejnrod A, Vestergaard GA, Eklund A, Nielsen HB. Identification of representative species-specific genes for abundance measurements. BIOINFORMATICS ADVANCES 2023; 3:vbad060. [PMID: 37213867 PMCID: PMC10199311 DOI: 10.1093/bioadv/vbad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 04/14/2023] [Accepted: 05/05/2023] [Indexed: 05/23/2023]
Abstract
Motivation Metagenomic binning facilitates the reconstruction of genomes and identification of Metagenomic Species Pan-genomes or Metagenomic Assembled Genomes. We propose a method for identifying a set of de novo representative genes, termed signature genes, which can be used to measure the relative abundance and used as markers of each metagenomic species with high accuracy. Results An initial set of the 100 genes that correlate with the median gene abundance profile of the entity is selected. A variant of the coupon collector's problem was utilized to evaluate the probability of identifying a certain number of unique genes in a sample. This allows us to reject the abundance measurements of strains exhibiting a significantly skewed gene representation. A rank-based negative binomial model is employed to assess the performance of different gene sets across a large set of samples, facilitating identification of an optimal signature gene set for the entity. When benchmarked the method on a synthetic gene catalog, our optimized signature gene sets estimate relative abundance significantly closer to the true relative abundance compared to the starting gene sets extracted from the metagenomic species. The method was able to replicate results from a study with real data and identify around three times as many metagenomic entities. Availability and implementation The code used for the analysis is available on GitHub: https://github.com/trinezac/SG_optimization. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Asker Brejnrod
- Department of Health and Technology, Technical University of Denmark, Lyngby 2800, Denmark
| | | | - Aron Eklund
- Clinical Microbiomics A/S, Copenhagen 2100, Denmark
| | | |
Collapse
|
6
|
Fournier P, Pellan L, Barroso-Bergadà D, Bohan DA, Candresse T, Delmotte F, Dufour MC, Lauvergeat V, Le Marrec C, Marais A, Martins G, Masneuf-Pomarède I, Rey P, Sherman D, This P, Frioux C, Labarthe S, Vacher C. The functional microbiome of grapevine throughout plant evolutionary history and lifetime. ADV ECOL RES 2022. [DOI: 10.1016/bs.aecr.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
7
|
Gil-Gil T, Ochoa-Sánchez LE, Baquero F, Martínez JL. Antibiotic resistance: Time of synthesis in a post-genomic age. Comput Struct Biotechnol J 2021; 19:3110-3124. [PMID: 34141134 PMCID: PMC8181582 DOI: 10.1016/j.csbj.2021.05.034] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/13/2021] [Accepted: 05/20/2021] [Indexed: 12/20/2022] Open
Abstract
Antibiotic resistance has been highlighted by international organizations, including World Health Organization, World Bank and United Nations, as one of the most relevant global health problems. Classical approaches to study this problem have focused in infected humans, mainly at hospitals. Nevertheless, antibiotic resistance can expand through different ecosystems and geographical allocations, hence constituting a One-Health, Global-Health problem, requiring specific integrative analytic tools. Antibiotic resistance evolution and transmission are multilayer, hierarchically organized processes with several elements (from genes to the whole microbiome) involved. However, their study has been traditionally gene-centric, each element independently studied. The development of robust-economically affordable whole genome sequencing approaches, as well as other -omic techniques as transcriptomics and proteomics, is changing this panorama. These technologies allow the description of a system, either a cell or a microbiome as a whole, overcoming the problems associated with gene-centric approaches. We are currently at the time of combining the information derived from -omic studies to have a more holistic view of the evolution and spread of antibiotic resistance. This synthesis process requires the accurate integration of -omic information into computational models that serve to analyse the causes and the consequences of acquiring AR, fed by curated databases capable of identifying the elements involved in the acquisition of resistance. In this review, we analyse the capacities and drawbacks of the tools that are currently in use for the global analysis of AR, aiming to identify the more useful targets for effective corrective interventions.
Collapse
Affiliation(s)
- Teresa Gil-Gil
- Centro Nacional de Biotecnología, CSIC, Darwin 3, 28049 Madrid, Spain
| | | | - Fernando Baquero
- Department of Microbiology, Hospital Universitario Ramón y Cajal (IRYCIS), Madrid, Spain
- CIBER en Epidemiología y Salud Pública (CIBER-ESP), Madrid, Spain
| | | |
Collapse
|