1
|
Wang S, Zhou Y, Wang Y, Tang K, Wang D, Hong J, Wang P, Ye S, Yan J, Li S, Zhou Z, Du J. Genetic landscape and evolution of Acinetobacter pittii, an underestimated emerging nosocomial pathogen. Commun Biol 2025; 8:738. [PMID: 40360786 PMCID: PMC12075791 DOI: 10.1038/s42003-025-08156-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 05/01/2025] [Indexed: 05/15/2025] Open
Abstract
As a member of Acinetobacter calcoaceticus-baumannii complex, Acinetobacter pittii has been an emerging concern in nosocomial infection due to its increasing prevalence and multidrug resistance (MDR). However, its population structure remains broadly unknown, hampering efficient tracing of its transmission and evolution. In this study, we developed a distributed core genome multilocus sequence typing (dcgMLST) for A. pittii based on 750 genomes and employed it to map the genetic landscape and evolution of A. pittii. The results demonstrated that two hierarchical clustering (HC) levels effectively correspond to genetic diversity from species (HC1100) to natural populations (HC450), as well as that a predominant lineage, HC1100_4, accounts for 33.9% of A. pittii strains. Subsequent analysis revealed that specific gene gain and loss events within HC1100_4 are linked to adaptations to environmental stress. Moreover, we identified a cluster of multidrug-resistant plasmids PT_712 responsible for the dissemination of blaNDM-1 genes within the genus of Acinetobacter. This study provides a framework for characterizing genetic diversity, evolutionary dynamics, molecular population distribution, and tracing of A. pittii, which has the potential to improve infection control strategies and public health policy.
Collapse
Affiliation(s)
- Shengke Wang
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
| | - Yan Zhou
- School of Medicine, Zhejiang University, Hangzhou, China
| | - Yuezhuo Wang
- Key Laboratory of Alkene-carbon Fibres-based Technology & Application for Detection of Major Infectious Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
| | - Keshu Tang
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
| | - Danqi Wang
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
| | - Jiawen Hong
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
- Taizhou Hospital of Zhejiang Province Affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Pengcheng Wang
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
| | - Sheng Ye
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China
| | - Jie Yan
- School of Medicine, Zhejiang University, Hangzhou, China.
| | - Shengkai Li
- Key Laboratory of Alkene-carbon Fibres-based Technology & Application for Detection of Major Infectious Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China.
| | - Zhemin Zhou
- Key Laboratory of Alkene-carbon Fibres-based Technology & Application for Detection of Major Infectious Diseases, MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China.
| | - Jimei Du
- Wenzhou Key Laboratory of Sanitary Microbiology, Department of Microbiology and Immunology, School of Laboratory Medicine, Institute of One Health, Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
2
|
Zhang T, Yin Z, Xu X, Yan L, Zhu F, Duan X, Schmidt B, Liu W. RabbitSketch: a high-performance sketching library for genome analysis. Bioinformatics 2025; 41:btaf249. [PMID: 40286290 PMCID: PMC12054975 DOI: 10.1093/bioinformatics/btaf249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 03/31/2025] [Accepted: 04/24/2025] [Indexed: 04/29/2025] Open
Abstract
SUMMARY We present RabbitSketch, a highly optimized library of sketching algorithms such as MinHash, OrderMinHash, and HyperLogLog that can exploit the power of modern multi-core CPUs. It provides significant speedups compared to existing implementations, ranging from 2.30× to 49.55×, as well as flexible and easy-to-use interfaces for both Python and C++. As a result, the similarity analysis of 455GB genomic data can be completed in only 5 minutes using RabbitSketch with merely 20 lines of Python code. As a case study, we enhanced RabbitTClust by integrating RabbitSketch's Kssd algorithm, resulting in a 1.54× speedup with no loss in accuracy. AVAILABILITY AND IMPLEMENTATION RabbitSketch is available at https://github.com/RabbitBio/RabbitSketch with an archived version at Zenodo: https://doi.org/10.5281/zenodo.14903962. Detailed API documentation is available at https://rabbitsketch.readthedocs.io/en/latest.
Collapse
Affiliation(s)
- Tong Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Zekun Yin
- School of Software, Shandong University, Jinan 250101, China
| | - Xiaoming Xu
- School of Software, Shandong University, Jinan 250101, China
| | - Lifeng Yan
- School of Software, Shandong University, Jinan 250101, China
| | - Fangjin Zhu
- School of Software, Shandong University, Jinan 250101, China
| | - Xiaohui Duan
- School of Software, Shandong University, Jinan 250101, China
| | - Bertil Schmidt
- Institute for Computer Science, Johannes Gutenberg University, Mainz 55128, Germany
| | - Weiguo Liu
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
3
|
Fu C, Niskanen EA, Wei GH, Yang Z, Sanvicente-García M, Güell M, Cheng L. k-mer manifold approximation and projection for visualizing DNA sequences. Genome Res 2025; 35:1234-1246. [PMID: 40210440 PMCID: PMC12047656 DOI: 10.1101/gr.279458.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Accepted: 02/20/2025] [Indexed: 04/12/2025]
Abstract
Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of k-mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each k-mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of k-mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the k-mer manifold. Leveraging this k-mer manifold theory, we develop a statistical method named KMAP for detecting k-mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.
Collapse
Affiliation(s)
- Chengbo Fu
- Department of Computer Science, School of Science, Aalto University, 02150 Espoo, Finland
| | - Einari A Niskanen
- Institute of Biomedicine, University of Eastern Finland, 70211 Kuopio, Finland
| | - Gong-Hong Wei
- Fudan University Shanghai Cancer Center & MOE Key Laboratory of Metabolism and Molecular Medicine and Department of Biochemistry and Molecular Biology of School of Basic Medical Sciences, Shanghai Medical College of Fudan University, 200032 Shanghai, China
- Disease Networks Research Unit, Faculty of Biochemistry and Molecular Medicine, Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| | - Zhirong Yang
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway
- Jinhua Institute of Zhejiang University, 321032 Zhengjiang, China
| | | | - Marc Güell
- Department of Medicine and Life Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, ICREA, 08003 Barcelona, Spain
| | - Lu Cheng
- Department of Computer Science, School of Science, Aalto University, 02150 Espoo, Finland;
- Institute of Biomedicine, University of Eastern Finland, 70211 Kuopio, Finland
| |
Collapse
|
4
|
Li S, Jiang G, Wang S, Wang M, Wu Y, Zhang J, Liu X, Zhong L, Zhou M, Xie S, Ren Y, He P, Lou Y, Li H, Du J, Zhou Z. Emergence and global spread of a dominant multidrug-resistant clade within Acinetobacter baumannii. Nat Commun 2025; 16:2787. [PMID: 40118837 PMCID: PMC11928498 DOI: 10.1038/s41467-025-58106-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 03/12/2025] [Indexed: 03/24/2025] Open
Abstract
The proliferation of multi-drug resistant (MDR) bacteria is driven by the global spread of epidemic lineages that accumulate antimicrobial resistance genes (ARGs). Acinetobacter baumannii, a leading cause of nosocomial infections, displays resistance to most frontline antimicrobials and represents a significant challenge to public health. In this study, we conduct a comprehensive genomic analysis of over 15,000 A. baumannii genomes to identify a predominant epidemic super-lineage (ESL) accounting for approximately 70% of global isolates. Through hierarchical classification of the ESL into distinct lineages, clusters, and clades, we identified a stepwise evolutionary trajectory responsible for the worldwide expansion and transmission of A. baumannii over the last eight decades. We observed the rise and global spread of a previously unrecognized Clade 2.5.6, which emerged in East Asia in 2006. The epidemic of the clade is linked to the ongoing acquisition of ARGs and virulence factors facilitated by genetic recombination. Our results highlight the necessity for One Health-oriented research and interventions to address the spread of this MDR pathogen.
Collapse
Affiliation(s)
- Shengkai Li
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
- Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China
| | - Guilai Jiang
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou, China
| | - Shengke Wang
- Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China
| | - Min Wang
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Yilei Wu
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
- Department of Life Sciences, Imperial College London, London, UK
| | - Jinzhi Zhang
- Department of Critical Care Medicine, Zhejiang Provincial People's Hospital, Hangzhou, China
| | - Xiao Liu
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou, China
| | - Ling Zhong
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
| | - Min Zhou
- Department of Immunology and Microbiology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shichang Xie
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China
- Iotabiome Biotechnology Inc., Suzhou, China
| | - Yi Ren
- Iotabiome Biotechnology Inc., Suzhou, China
| | - Ping He
- Department of Immunology and Microbiology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yongliang Lou
- Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China.
| | - Heng Li
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China.
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou, China.
| | - Jimei Du
- Key Laboratory of Laboratory Medicine, Ministry of Education, School of Laboratory Medicine and Life Sciences, Wenzhou Medical University, Wenzhou, China.
| | - Zhemin Zhou
- MOE Key Laboratory of Geriatric Diseases and Immunology, Cancer Institute, Suzhou Medical College, Soochow University, Suzhou, China.
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou, China.
- Department of Clinical Laboratory, The Second Affiliated Hospital of Soochow University, Suzhou, China.
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China.
| |
Collapse
|
5
|
Roberts MD, Davis O, Josephs EB, Williamson RJ. K-mer-based Approaches to Bridging Pangenomics and Population Genetics. Mol Biol Evol 2025; 42:msaf047. [PMID: 40111256 PMCID: PMC11925024 DOI: 10.1093/molbev/msaf047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 01/10/2025] [Accepted: 02/04/2025] [Indexed: 03/12/2025] Open
Abstract
Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers.
Collapse
Affiliation(s)
- Miles D Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Olivia Davis
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA
| | - Robert J Williamson
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
- Department of Biology and Biomedical Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| |
Collapse
|
6
|
Rouzé T, Martayan I, Marchet C, Limasset A. Fractional hitting sets for efficient multiset sketching. Algorithms Mol Biol 2025; 20:1. [PMID: 39923117 PMCID: PMC11807336 DOI: 10.1186/s13015-024-00268-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/01/2024] [Indexed: 02/10/2025] Open
Abstract
The exponential increase in publicly available sequencing data and genomic resources necessitates the development of highly efficient methods for data processing and analysis. Locality-sensitive hashing techniques have successfully transformed large datasets into smaller, more manageable sketches while maintaining comparability using metrics such as Jaccard and containment indices. However, fixed-size sketches encounter difficulties when applied to divergent datasets. Scalable sketching methods, such as sourmash, provide valuable solutions but still lack resource-efficient, tailored indexing. Our objective is to create lighter sketches with comparable results while enhancing efficiency. We introduce the concept of Fractional Hitting Sets, a generalization of Universal Hitting Sets, which cover a specified fraction of the k-mer space. In theory and practice, we demonstrate the feasibility of achieving such coverage with simple but highly efficient schemes. By encoding the covered k-mers as super-k-mers, we provide a space-efficient exact representation that also enables optimized comparisons. Our novel tool, supersampler, implements this scheme, and experimental results with real bacterial collections closely match our theoretical findings. In comparison to sourmash, supersampler achieves similar outcomes while utilizing an order of magnitude less space and memory and operating several times faster. This highlights the potential of our approach in addressing the challenges presented by the ever-expanding landscape of genomic data. supersampler is an open-source software and can be accessed at https://github.com/TimRouze/supersampler . The data required to reproduce the results presented in this manuscript is available at https://github.com/TimRouze/supersampler/experiments .
Collapse
Affiliation(s)
- Timothé Rouzé
- G5 - SeqBio, Institut pasteur, Université Paris Cité, 75724, Paris, France.
- UMR9189 CRIStAL, Univ Lille, CNRS, Centrale, 59000, Lille, France.
- Sorbonne Université, Collège Doctoral, 75005, Paris, France.
| | - Igor Martayan
- UMR9189 CRIStAL, Univ Lille, CNRS, Centrale, 59000, Lille, France
| | - Camille Marchet
- UMR9189 CRIStAL, Univ Lille, CNRS, Centrale, 59000, Lille, France
| | - Antoine Limasset
- UMR9189 CRIStAL, Univ Lille, CNRS, Centrale, 59000, Lille, France
| |
Collapse
|
7
|
Yang H, Lu X, Chang J, Chang Q, Zheng W, Chen Z, Yi H. Kssdtree: an interactive Python package for phylogenetic analysis based on sketching technique. Bioinformatics 2024; 40:btae566. [PMID: 39298462 PMCID: PMC11467128 DOI: 10.1093/bioinformatics/btae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 09/14/2024] [Accepted: 09/18/2024] [Indexed: 09/21/2024] Open
Abstract
SUMMARY Sketching technologies have recently emerged as a promising solution for real-time, large-scale phylogenetic analysis. However, existing sketching-based phylogenetic tools exhibit drawbacks, including platform restrictions, deficiencies in tree visualization, and inherent distance estimation bias. These limitations collectively impede the overall convenience and efficiency of the analysis. In this study, we introduce Kssdtree, an interactive Python package designed to address these challenges. Kssdtree surpasses other sketching-based tools by demonstrating superior performance in terms of both accuracy and time efficiency on comprehensive benchmarking datasets. Notably, Kssdtree offers key advantages such as intra-species phylogenomic analysis and GTDB-based phylogenetic placement analysis, significantly enhancing the scope and depth of phylogenetic investigations. Through extensive evaluations and comparisons, Kssdtree stands out as an efficient and versatile method for real-time, large-scale phylogenetic analysis. AVAILABILITY AND IMPLEMENTATION The Kssdtree Python package is freely accessible at https://pypi.org/project/kssdtree and source code is available at https://github.com/yhlink/kssdtree. The documentation and instantiation for the software is available at https://kssdtree.readthedocs.io/en/latest. The video tutorial is available at https://youtu.be/_6hg59Yn-Ws.
Collapse
Affiliation(s)
- Hang Yang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Jinzhong 030600, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518055, China
| | - Xiaoxin Lu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518055, China
| | - Jiaxing Chang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Jinzhong 030600, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518055, China
| | - Qing Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518055, China
| | - Wen Zheng
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Jinzhong 030600, China
| | - Zehua Chen
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Jinzhong 030600, China
| | - Huiguang Yi
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518055, China
| |
Collapse
|
8
|
Acheampong DA, Jenjaroenpun P, Wongsurawat T, Kurilung A, Pomyen Y, Kandel S, Kunadirek P, Chuaypen N, Kusonmano K, Nookaew I. CAIM: coverage-based analysis for identification of microbiome. Brief Bioinform 2024; 25:bbae424. [PMID: 39222062 PMCID: PMC11367759 DOI: 10.1093/bib/bbae424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/26/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024] Open
Abstract
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Collapse
Affiliation(s)
- Daniel A Acheampong
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Stowers Institute for Medical Research, 1000 E 50 St, Kansas City, MO 64110, United States
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| | - Alongkorn Kurilung
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
| | - Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, 54 Kamphaeng Phet Rd., Laksi, Bangkok 10210, Thailand
| | - Sangam Kandel
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, 575 Science Drive, Madison, WI 53711, United States
| | - Pattapon Kunadirek
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Rama 4 road, Pathumwan, Bangkok 10330, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Rama 4 road, Pathumwan, Bangkok 10330, Thailand
| | - Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Road, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand
- Systems Biology and Bioinformatics Research Laboratory, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi, 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Road, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Endocrinology, Department of Medicine, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Department of Physiology and Cell Biology, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| |
Collapse
|
9
|
Acheampong DA, Jenjaroenpun P, Wongsurawat T, Krulilung A, Pomyen Y, Kandel S, Kunadirek P, Chuaypen N, Kusonmano K, Nookaew I. CAIM: Coverage-based Analysis for Identification of Microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591018. [PMID: 38746391 PMCID: PMC11091946 DOI: 10.1101/2024.04.25.591018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic (WMS) approach. In this study, we developed a new bioinformatics tool, CAIM, for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consitently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similality of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and primary 44 liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Collapse
Affiliation(s)
- Daniel A. Acheampong
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Alongkorn Krulilung
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, Bangkok, 10210, Thailand
| | - Sangam Kandel
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Pattapon Kunadirek
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
- Systems Biology and Bioinformatics Research Laboratory, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
10
|
Wang F, Wang Y, Zeng X, Zhang S, Yu J, Li D, Zhang X. MIKE: an ultrafast, assembly-, and alignment-free approach for phylogenetic tree construction. Bioinformatics 2024; 40:btae154. [PMID: 38547397 PMCID: PMC10990684 DOI: 10.1093/bioinformatics/btae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 02/06/2024] [Indexed: 04/05/2024] Open
Abstract
MOTIVATION Constructing a phylogenetic tree requires calculating the evolutionary distance between samples or species via large-scale resequencing data, a process that is both time-consuming and computationally demanding. Striking the right balance between accuracy and efficiency is a significant challenge. RESULTS To address this, we introduce a new algorithm, MIKE (MinHash-based k-mer algorithm). This algorithm is designed for the swift calculation of the Jaccard coefficient directly from raw sequencing reads and enables the construction of phylogenetic trees based on the resultant Jaccard coefficient. Simulation results highlight the superior speed of MIKE compared to existing state-of-the-art methods. We used MIKE to reconstruct a phylogenetic tree, incorporating 238 yeast, 303 Zea, 141 Ficus, 67 Oryza, and 43 Saccharum spontaneum samples. MIKE demonstrated accurate performance across varying evolutionary scales, reproductive modes, and ploidy levels, proving itself as a powerful tool for phylogenetic tree construction. AVAILABILITY AND IMPLEMENTATION MIKE is publicly available on Github at https://github.com/Argonum-Clever2/mike.git.
Collapse
Affiliation(s)
- Fang Wang
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Yibin Wang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Xiaofei Zeng
- Department of Human Cell Biology and Genetics, Joint Laboratory of Guangdong-Hong Kong Universities for Vascular Homeostasis and Diseases, School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong 508055, China
| | - Shengcheng Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Jiaxin Yu
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Dongxi Li
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, Shanxi 030024, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| |
Collapse
|
11
|
Zhong L, Zhang M, Sun L, Yang Y, Wang B, Yang H, Shen Q, Xia Y, Cui J, Hang H, Ren Y, Pang B, Deng X, Zhan Y, Li H, Zhou Z. Distributed genotyping and clustering of Neisseria strains reveal continual emergence of epidemic meningococcus over a century. Nat Commun 2023; 14:7706. [PMID: 38001084 PMCID: PMC10673917 DOI: 10.1038/s41467-023-43528-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023] Open
Abstract
Core genome multilocus sequence typing (cgMLST) is commonly used to classify bacterial strains into different types, for taxonomical and epidemiological applications. However, cgMLST schemes require central databases for the nomenclature of new alleles and sequence types, which must be synchronized worldwide and involve increasingly intensive calculation and storage demands. Here, we describe a distributed cgMLST (dcgMLST) scheme that does not require a central database of allelic sequences and apply it to study evolutionary patterns of epidemic and endemic strains of the genus Neisseria. We classify 69,994 worldwide Neisseria strains into multi-level clusters that assign species, lineages, and local disease outbreaks. We divide Neisseria meningitidis into 168 endemic lineages and three epidemic lineages responsible for at least 9 epidemics in the past century. According to our analyses, the epidemic and endemic lineages experienced very different population dynamics in the past 100 years. Epidemic lineages repetitively emerged from endemic lineages, disseminated worldwide, and apparently disappeared rapidly afterward. We propose a stepwise model for the evolutionary trajectory of epidemic lineages in Neisseria, and expect that the development of similar dcgMLST schemes will facilitate epidemiological studies of other bacterial pathogens.
Collapse
Affiliation(s)
- Ling Zhong
- Pasteurien College, Suzhou Medical College, Soochow University, Suzhou, 215123, China
- Key Laboratory of Alkene-Carbon Fibers-Based Technology & Application for Detection of Major Infectious Diseases, Soochow University, Suzhou, 215123, China
| | - Menghan Zhang
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Libing Sun
- Department of Pathology, East District of Suzhou Municipal Hospital, Suzhou, 215000, China
| | - Yu Yang
- Pasteurien College, Suzhou Medical College, Soochow University, Suzhou, 215123, China
| | - Bo Wang
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Haibing Yang
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Qiang Shen
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Yu Xia
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Jiarui Cui
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Hui Hang
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China
| | - Yi Ren
- Iotabiome Biotechnology Inc, Suzhou, 215000, China
| | - Bo Pang
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, GA, USA
| | - Yahui Zhan
- Suzhou Center for Disease Control and Prevention, Suzhou, 215004, China.
| | - Heng Li
- Pasteurien College, Suzhou Medical College, Soochow University, Suzhou, 215123, China.
- Key Laboratory of Alkene-Carbon Fibers-Based Technology & Application for Detection of Major Infectious Diseases, Soochow University, Suzhou, 215123, China.
- Suzhou Key Laboratory of Pathogen Bioscience and Anti-infective Medicine, Soochow University, Suzhou, 215123, China.
| | - Zhemin Zhou
- Pasteurien College, Suzhou Medical College, Soochow University, Suzhou, 215123, China.
- Key Laboratory of Alkene-Carbon Fibers-Based Technology & Application for Detection of Major Infectious Diseases, Soochow University, Suzhou, 215123, China.
- National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China.
| |
Collapse
|
12
|
Xu X, Yin Z, Yan L, Yi H, Wang H, Schmidt B, Liu W. RabbitKSSD: accelerating genome distance estimation on modern multi-core architectures. Bioinformatics 2023; 39:btad695. [PMID: 37971961 PMCID: PMC10681859 DOI: 10.1093/bioinformatics/btad695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 11/07/2023] [Accepted: 11/15/2023] [Indexed: 11/19/2023] Open
Abstract
SUMMARY We propose RabbitKSSD, a high-speed genome distance estimation tool. Specifically, we leverage load-balanced task partitioning, fast I/O, efficient intermediate result accesses, and high-performance data structures to improve overall efficiency. Our performance evaluation demonstrates that RabbitKSSD achieves speedups ranging from 5.7× to 19.8× over Kssd for the time-consuming sketch generation and distance computation on commonly used workstations. In addition, it significantly outperforms Mash, BinDash, and Dashing2. Moreover, RabbitKSSD can efficiently perform all-vs-all distance computation for all RefSeq complete bacterial genomes (455 GB in FASTA format) in just 2 min on a 64-core workstation. AVAILABILITY AND IMPLEMENTATION RabbitKSSD is available at https://github.com/RabbitBio/RabbitKSSD.
Collapse
Affiliation(s)
- Xiaoming Xu
- School of Software, Shandong University, Jinan, China
| | - Zekun Yin
- School of Software, Shandong University, Jinan, China
| | - Lifeng Yan
- School of Software, Shandong University, Jinan, China
| | - Huiguang Yi
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Hua Wang
- School of Software, Shandong University, Jinan, China
| | - Bertil Schmidt
- Institute for Computer Science, Johannes Gutenberg University, Mainz, Germany
| | - Weiguo Liu
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
13
|
Xu X, Yin Z, Yan L, Zhang H, Xu B, Wei Y, Niu B, Schmidt B, Liu W. RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches. Genome Biol 2023; 24:121. [PMID: 37198663 PMCID: PMC10190105 DOI: 10.1186/s13059-023-02961-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 05/05/2023] [Indexed: 05/19/2023] Open
Abstract
We present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.0 TB in FASTA format, within only 34 min on a 128-core workstation. Our results further identify 1269 redundant genomes, with identical nucleotide content, in the RefSeq bacterial genomes database.
Collapse
Affiliation(s)
- Xiaoming Xu
- School of Software, Shandong University, Jinan, China
| | - Zekun Yin
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Lifeng Yan
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Hao Zhang
- School of Software, Shandong University, Jinan, China
- Shenzhen Research Institute of Shandong University, Shandong University, Shenzhen, China
| | - Borui Xu
- School of Software, Shandong University, Jinan, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Beifang Niu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
| | - Bertil Schmidt
- Institute for Computer Science, Johannes Gutenberg University, Mainz, Germany
| | - Weiguo Liu
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
14
|
Liu J, Sun J, Liu Y. Effective Identification of Bacterial Genomes From Short and Long Read Sequencing Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2806-2816. [PMID: 34232887 DOI: 10.1109/tcbb.2021.3095164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the development of sequencing technology, microbiological genome sequencing analysis has attracted extensive attention. For inexperienced users without sufficient bioinformatics skills, making sense of sequencing data for microbial identification, especially for bacterial identification, through reads analysis is still challenging. In order to address the challenge of effectively analyzing genomic information, in this paper, we develop an effective approach and automatic bioinformatics pipeline called PBGI for bacterial genome identification, performing automatedly and customized bioinformatics analysis using short-reads or long-reads sequencing data produced by multiple platforms such as Illumina, PacBio and Oxford Nanopore. An evaluation of the proposed approach on the practical data set is presented, showing that PBGI provides a user-friendly way to perform bacterial identification through short or long reads analysis, and could provide accurate analyzing results. The source code of the PBGI is freely available at https://github.com/lyotvincent/PBGI.
Collapse
|