1
|
Meyer F, Robertson G, Deng ZL, Koslicki D, Gurevich A, McHardy AC. CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software. Nucleic Acids Res 2025:gkaf369. [PMID: 40331433 DOI: 10.1093/nar/gkaf369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/17/2025] [Accepted: 04/23/2025] [Indexed: 05/08/2025] Open
Abstract
Finding appropriate software and parameter settings to process shotgun metagenome data is essential for meaningful metagenomic analyses. To enable objective and comprehensive benchmarking of metagenomic software, the community-led initiative for the Critical Assessment of Metagenome Interpretation (CAMI) promotes standards and best practices. Since 2015, CAMI has provided comprehensive datasets, benchmarking guidelines, and challenges. However, benchmarking had to be conducted offline, requiring substantial time and technical expertise and leading to gaps in results between challenges. We introduce the CAMI Benchmarking Portal-a central repository of CAMI resources and web server for the evaluation and ranking of metagenome assembly, binning, and taxonomic profiling software. The portal simplifies evaluation, enabling users to easily compare their results with previous and other users' submissions through a variety of metrics and visualizations. As a demonstration, we benchmark software performance on the marine dataset of the CAMI II challenge. The portal currently hosts 28 675 results and is freely available at https://cami-challenge.org/.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - David Koslicki
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- Computer Science and Engineering, Penn State University, University Park, PA 16802, United States
- Biology, Penn State University , University Park, PA 16802, United States
| | - Alexey Gurevich
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), 66123 Saarbrücken, Germany
- Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, 38124 Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, 30625 Hannover, Germany
| |
Collapse
|
2
|
Refahi M, Sokhansanj BA, Mell JC, Brown JR, Yoo H, Hearne G, Rosen GL. Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization. Commun Biol 2025; 8:517. [PMID: 40155693 PMCID: PMC11953366 DOI: 10.1038/s42003-025-07902-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 03/07/2025] [Indexed: 04/01/2025] Open
Abstract
Analysis of genomic and metagenomic sequences is inherently more challenging than that of amino acid sequences due to the higher divergence among evolutionarily related nucleotide sequences, variable k-mer and codon usage within and among genomes of diverse species, and poorly understood selective constraints. We introduce Scorpio (Sequence Contrastive Optimization for Representation and Predictive Inference on DNA), a versatile framework designed for nucleotide sequences that employ contrastive learning to improve embeddings. By leveraging pre-trained genomic language models and k-mer frequency embeddings, Scorpio demonstrates competitive performance in diverse applications, including taxonomic and gene classification, antimicrobial resistance (AMR) gene identification, and promoter detection. A key strength of Scorpio is its ability to generalize to novel DNA sequences and taxa, addressing a significant limitation of alignment-based methods. Scorpio has been tested on multiple datasets with DNA sequences of varying lengths (long and short) and shows robust inference capabilities. Additionally, we provide an analysis of the biological information underlying this representation, including correlations between codon adaptation index as a gene expression factor, sequence similarity, and taxonomy, as well as the functional and structural information of genes.
Collapse
Affiliation(s)
| | - Bahrad A Sokhansanj
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Joshua C Mell
- College of Medicine, Drexel University, Philadelphia, PA, USA
| | - James R Brown
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Hyunwoo Yoo
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Gavin Hearne
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Gail L Rosen
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.
| |
Collapse
|