1
|
Herazo-Álvarez J, Mora M, Cuadros-Orellana S, Vilches-Ponce K, Hernández-García R. A review of neural networks for metagenomic binning. Brief Bioinform 2025; 26:bbaf065. [PMID: 40131312 PMCID: PMC11934572 DOI: 10.1093/bib/bbaf065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 01/02/2025] [Accepted: 03/07/2025] [Indexed: 03/26/2025] Open
Abstract
One of the main goals of metagenomic studies is to describe the taxonomic diversity of microbial communities. A crucial step in metagenomic analysis is metagenomic binning, which involves the (supervised) classification or (unsupervised) clustering of metagenomic sequences. Various machine learning models have been applied to address this task. In this review, the contributions of artificial neural networks (ANN) in the context of metagenomic binning are detailed, addressing both supervised, unsupervised, and semi-supervised approaches. 34 ANN-based binning tools are systematically compared, detailing their architectures, input features, datasets, advantages, disadvantages, and other relevant aspects. The findings reveal that deep learning approaches, such as convolutional neural networks and autoencoders, achieve higher accuracy and scalability than traditional methods. Gaps in benchmarking practices are highlighted, and future directions are proposed, including standardized datasets and optimization of architectures, for third-generation sequencing. This review provides support to researchers in identifying trends and selecting suitable tools for the metagenomic binning problem.
Collapse
Affiliation(s)
- Jair Herazo-Álvarez
- Doctorado en Modelamiento Matemático Aplicado, Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Marco Mora
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Departamento de Computación e Industrias, Facultad de Ciencias de la Ingeniería, Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Sara Cuadros-Orellana
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Centro de Biotecnología de los Recursos Naturales (CENBio), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Karina Vilches-Ponce
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Ruber Hernández-García
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Departamento de Computación e Industrias, Facultad de Ciencias de la Ingeniería, Universidad Católica del Maule, Talca, Maule 3480564, Chile
| |
Collapse
|
2
|
Bird E, Pickens V, Molik D, Silver K, Nayduch D. BALROG-MON: a high-throughput pipeline for Bacterial AntimicrobiaL Resistance annOtation of Genomes-Metagenomic Oxford Nanopore. MICROPUBLICATION BIOLOGY 2025; 2025:10.17912/micropub.biology.001427. [PMID: 40046038 PMCID: PMC11880933 DOI: 10.17912/micropub.biology.001427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 02/11/2025] [Accepted: 02/11/2025] [Indexed: 03/09/2025]
Abstract
BALROG-MON is a Nextflow pipeline for automated analysis of metagenomic long-read data to detect pathogens, annotate antimicrobial resistance genes (ARGs), link ARGs to specific pathogens, predict ARG origin (e.g., plasmid, chromosomal) and optionally perform steps like community analysis. With both assembly-based and assembly-free workflows, BALROG-MON is applicable to a wide range of sample types with low or high coverage, varying complexities and origins. Optional genome binning provides a comprehensive overview of ARGs within the dataset. BALROG-MON additionally presents results in summarized reports, overall serving as a flexible analysis tool for exploring diverse metagenomic samples for pathogens and antibiotic resistance.
Collapse
Affiliation(s)
- Edward Bird
- Entomology, Kansas State University, Manhattan, Kansas, United States
| | - Victoria Pickens
- Entomology, Kansas State University, Manhattan, Kansas, United States
| | - David Molik
- Arthropod-Borne Animal Diseases Research Unit, Agricultural Research Service, United States Department of Agriculture, Manhattan, KS, United States
| | - Kristopher Silver
- Entomology, Kansas State University, Manhattan, Kansas, United States
| | - Dana Nayduch
- Arthropod-Borne Animal Diseases Research Unit, Agricultural Research Service, United States Department of Agriculture, Manhattan, KS, United States
| |
Collapse
|
3
|
Fulke AB, Eranezhath S, Raut S, Jadhav HS. Recent toolset of metagenomics for taxonomical and functional annotation of marine associated viruses: A review. REGIONAL STUDIES IN MARINE SCIENCE 2024; 77:103728. [DOI: 10.1016/j.rsma.2024.103728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
4
|
Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (BETHESDA, MD.) 2024; 14:jkae187. [PMID: 39148415 PMCID: PMC11540323 DOI: 10.1093/g3journal/jkae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites, and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualizing two-dimensional representations of read tetranucleotide composition learned by a variational autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualization tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
Collapse
Affiliation(s)
- Claudia C Weber
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
5
|
Mallawaarachchi V, Wickramarachchi A, Xue H, Papudeshi B, Grigson SR, Bouras G, Prahl RE, Kaphle A, Verich A, Talamantes-Becerra B, Dinsdale EA, Edwards RA. Solving genomic puzzles: computational methods for metagenomic binning. Brief Bioinform 2024; 25:bbae372. [PMID: 39082646 PMCID: PMC11289683 DOI: 10.1093/bib/bbae372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 06/05/2024] [Accepted: 07/15/2024] [Indexed: 08/03/2024] Open
Abstract
Metagenomics involves the study of genetic material obtained directly from communities of microorganisms living in natural environments. The field of metagenomics has provided valuable insights into the structure, diversity and ecology of microbial communities. Once an environmental sample is sequenced and processed, metagenomic binning clusters the sequences into bins representing different taxonomic groups such as species, genera, or higher levels. Several computational tools have been developed to automate the process of metagenomic binning. These tools have enabled the recovery of novel draft genomes of microorganisms allowing us to study their behaviors and functions within microbial communities. This review classifies and analyzes different approaches of metagenomic binning and different refinement, visualization, and evaluation techniques used by these methods. Furthermore, the review highlights the current challenges and areas of improvement present within the field of research.
Collapse
Affiliation(s)
- Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Hansheng Xue
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - George Bouras
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
- The Department of Surgery—Otolaryngology Head and Neck Surgery, University of Adelaide and the Basil Hetzel Institute for Translational Health Research, Central Adelaide Local Health Network, Adelaide, SA 5011, Australia
| | - Rosa E Prahl
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Andrey Verich
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
- The Kirby Institute, The University of New South Wales, Randwick, Sydney, NSW 2052, Australia
| | - Berenice Talamantes-Becerra
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Westmead, NSW 2145, Australia
| | - Elizabeth A Dinsdale
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, SA 5042, Australia
| |
Collapse
|
6
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 PMCID: PMC11955098 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
7
|
Greenman N, Hassouneh SAD, Abdelli LS, Johnston C, Azarian T. Improving Bacterial Metagenomic Research through Long-Read Sequencing. Microorganisms 2024; 12:935. [PMID: 38792764 PMCID: PMC11124196 DOI: 10.3390/microorganisms12050935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 04/22/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open
Abstract
Metagenomic sequencing analysis is central to investigating microbial communities in clinical and environmental studies. Short-read sequencing remains the primary approach for metagenomic research; however, long-read sequencing may offer advantages of improved metagenomic assembly and resolved taxonomic identification. To compare the relative performance for metagenomic studies, we simulated short- and long-read datasets using increasingly complex metagenomes comprising 10, 20, and 50 microbial taxa. Additionally, we used an empirical dataset of paired short- and long-read data generated from mouse fecal pellets to assess real-world performance. We compared metagenomic assembly quality, taxonomic classification, and metagenome-assembled genome (MAG) recovery rates. We show that long-read sequencing data significantly improve taxonomic classification and assembly quality. Metagenomic assemblies using simulated long reads were more complete and more contiguous with higher rates of MAG recovery. This resulted in more precise taxonomic classifications. Principal component analysis of empirical data demonstrated that sequencing technology affects compositional results as samples clustered by sequence type, not sample type. Overall, we highlight strengths of long-read metagenomic sequencing for microbiome studies, including improving the accuracy of classification and relative abundance estimates. These results will aid researchers when considering which sequencing approaches to use for metagenomic projects.
Collapse
Affiliation(s)
- Noah Greenman
- College of Medicine, University of Central Florida, Orlando, FL 32827, USA; (N.G.); (S.A.-D.H.); (C.J.)
| | - Sayf Al-Deen Hassouneh
- College of Medicine, University of Central Florida, Orlando, FL 32827, USA; (N.G.); (S.A.-D.H.); (C.J.)
| | - Latifa S. Abdelli
- Department of Health Science, College of Health Professions and Sciences, University of Central Florida, Orlando, FL 32816, USA;
| | - Catherine Johnston
- College of Medicine, University of Central Florida, Orlando, FL 32827, USA; (N.G.); (S.A.-D.H.); (C.J.)
| | - Taj Azarian
- College of Medicine, University of Central Florida, Orlando, FL 32827, USA; (N.G.); (S.A.-D.H.); (C.J.)
| |
Collapse
|
8
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
9
|
Rodriguez Ruiz A, Van Dam AR. Metagenomic binning of PacBio HiFi data prior to assembly reveals a complete genome of Cosmopolites sordidus (Germar) (Coleopterea: Curculionidae, Dryophthorinae) the most damaging arthropod pest of bananas and plantains. PeerJ 2023; 11:e16276. [PMID: 38025758 PMCID: PMC10676084 DOI: 10.7717/peerj.16276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 09/20/2023] [Indexed: 12/01/2023] Open
Abstract
PacBio HiFi sequencing was employed in combination with metagenomic binning to produce a high-quality reference genome of Cosmopolites sordidus. We compared k-mer and alignment reference based pre-binning and post-binning approaches to remove contamination. We were also interested to know if the post-binning approach had interspersed bacterial contamination within intragenic regions of Arthropoda binned contigs. Our analyses identified 3,433 genes that were composed with reads identified as of putative bacterial origins. The pre-binning approach yielded a C. sordidus genome of 1.07 Gb genome composed of 3,089 contigs with 98.6% and 97.1% complete and single copy genome and protein BUSCO scores respectively. In this article we demonstrate that in this case the pre-binning approach does not sacrifice assembly quality for more stringent metagenomic filtering. We also determine post-binning allows for increased intragenic contamination increased with increasing coverage, but the frequency of gene contamination increased with lower coverage. Future work should focus on developing reference free pre-binning approaches for HiFi reads produced from eukaryotic based metagenomic samples.
Collapse
Affiliation(s)
- Alfredo Rodriguez Ruiz
- Departamento de Biología, Universidad de Puerto Rico Recinto Universitario de Mayagüez, Mayagüez, Puerto Rico, United States of America
| | - Alex R. Van Dam
- Departamento de Biología, Universidad de Puerto Rico Recinto Universitario de Mayagüez, Mayagüez, Puerto Rico, United States of America
| |
Collapse
|
10
|
Baker JL. Illuminating the oral microbiome and its host interactions: recent advancements in omics and bioinformatics technologies in the context of oral microbiome research. FEMS Microbiol Rev 2023; 47:fuad051. [PMID: 37667515 PMCID: PMC10503653 DOI: 10.1093/femsre/fuad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/02/2023] [Accepted: 09/01/2023] [Indexed: 09/06/2023] Open
Abstract
The oral microbiota has an enormous impact on human health, with oral dysbiosis now linked to many oral and systemic diseases. Recent advancements in sequencing, mass spectrometry, bioinformatics, computational biology, and machine learning are revolutionizing oral microbiome research, enabling analysis at an unprecedented scale and level of resolution using omics approaches. This review contains a comprehensive perspective of the current state-of-the-art tools available to perform genomics, metagenomics, phylogenomics, pangenomics, transcriptomics, proteomics, metabolomics, lipidomics, and multi-omics analysis on (all) microbiomes, and then provides examples of how the techniques have been applied to research of the oral microbiome, specifically. Key findings of these studies and remaining challenges for the field are highlighted. Although the methods discussed here are placed in the context of their contributions to oral microbiome research specifically, they are pertinent to the study of any microbiome, and the intended audience of this includes researchers would simply like to get an introduction to microbial omics and/or an update on the latest omics methods. Continued research of the oral microbiota using omics approaches is crucial and will lead to dramatic improvements in human health, longevity, and quality of life.
Collapse
Affiliation(s)
- Jonathon L Baker
- Department of Oral Rehabilitation & Biosciences, School of Dentistry, Oregon Health & Science University, 3181 Sam Jackson Park Road, Portland, OR 97202, United States
- Genomic Medicine Group, J. Craig Venter Institute, La Jolla, CA 92037, United States
- Department of Pediatrics, UC San Diego School of Medicine, La Jolla, CA 92093, United States
| |
Collapse
|
11
|
Millan Arias P, Hill KA, Kari L. iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences. Bioinformatics 2023; 39:btad508. [PMID: 37589603 PMCID: PMC10483029 DOI: 10.1093/bioinformatics/btad508] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 07/18/2023] [Accepted: 08/16/2023] [Indexed: 08/18/2023] Open
Abstract
SUMMARY We present an interactive Deep Learning-based software tool for Unsupervised Clustering of DNA Sequences (iDeLUCS), that detects genomic signatures and uses them to cluster DNA sequences, without the need for sequence alignment or taxonomic identifiers. iDeLUCS is scalable and user-friendly: its graphical user interface, with support for hardware acceleration, allows the practitioner to fine-tune the different hyper-parameters involved in the training process without requiring extensive knowledge of deep learning. The performance of iDeLUCS was evaluated on a diverse set of datasets: several real genomic datasets from organisms in kingdoms Animalia, Protista, Fungi, Bacteria, and Archaea, three datasets of viral genomes, a dataset of simulated metagenomic reads from microbial genomes, and multiple datasets of synthetic DNA sequences. The performance of iDeLUCS was compared to that of two classical clustering algorithms (k-means++ and GMM) and two clustering algorithms specialized in DNA sequences (MeShClust v3.0 and DeLUCS), using both intrinsic cluster evaluation metrics and external evaluation metrics. In terms of unsupervised clustering accuracy, iDeLUCS outperforms the two classical algorithms by an average of ∼20%, and the two specialized algorithms by an average of ∼12%, on the datasets of real DNA sequences analyzed. Overall, our results indicate that iDeLUCS is a robust clustering method suitable for the clustering of large and diverse datasets of unlabeled DNA sequences. AVAILABILITY AND IMPLEMENTATION iDeLUCS is available at https://github.com/Kari-Genomics-Lab/iDeLUCS under the terms of the MIT licence.
Collapse
Affiliation(s)
- Pablo Millan Arias
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Kathleen A Hill
- Department of Biology, University of Western Ontario, London, ON N6A 5B7, Canada
| | - Lila Kari
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
12
|
Hesketh-Best PJ, Bosco-Santos A, Garcia SL, O’Beirne MD, Werne JP, Gilhooly WP, Silveira CB. Viruses of sulfur oxidizing phototrophs encode genes for pigment, carbon, and sulfur metabolisms. COMMUNICATIONS EARTH & ENVIRONMENT 2023; 4:126. [PMID: 38665202 PMCID: PMC11041744 DOI: 10.1038/s43247-023-00796-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 04/05/2023] [Indexed: 04/28/2024]
Abstract
Viral infections modulate bacterial metabolism and ecology. Here, we investigated the hypothesis that viruses influence the ecology of purple and green sulfur bacteria in anoxic and sulfidic lakes, analogs of euxinic oceans in the geologic past. By screening metagenomes from lake sediments and water column, in addition to publicly-available genomes of cultured purple and green sulfur bacteria, we identified almost 300 high and medium-quality viral genomes. Viruses carrying the gene psbA, encoding the small subunit of photosystem II protein D1, were ubiquitous, suggesting viral interference with the light reactions of sulfur oxidizing autotrophs. Viruses predicted to infect these autotrophs also encoded auxiliary metabolic genes for reductive sulfur assimilation as cysteine, pigment production, and carbon fixation. These observations show that viruses have the genomic potential to modulate the production of metabolic markers of phototrophic sulfur bacteria that are used to identify photic zone euxinia in the geologic past.
Collapse
Affiliation(s)
| | - Alice Bosco-Santos
- Institute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland
| | - Sofia L. Garcia
- Department of Biology, University of Miami, Coral Gables, FL USA
| | - Molly D. O’Beirne
- Department of Geology & Environmental Science, University of Pittsburgh, Pittsburgh, PA USA
| | - Josef P. Werne
- Department of Geology & Environmental Science, University of Pittsburgh, Pittsburgh, PA USA
| | - William P. Gilhooly
- Department of Earth Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, IN USA
| | | |
Collapse
|
13
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| |
Collapse
|