1
|
Silva MKDP, Nicoleti VYU, Rodrigues BDPP, Araujo ASF, Ellwanger JH, de Almeida JM, Lemos LN. Exploring deep learning in phage discovery and characterization. Virology 2025; 609:110559. [PMID: 40359589 DOI: 10.1016/j.virol.2025.110559] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 03/24/2025] [Accepted: 04/28/2025] [Indexed: 05/15/2025]
Abstract
Bacteriophages, or bacterial viruses, play diverse ecological roles by shaping bacterial populations and also hold significant biotechnological and medical potential, including the treatment of infections caused by multidrug-resistant bacteria. The discovery of novel bacteriophages using large-scale metagenomic data has been accelerated by the accessibility of deep learning (Artificial Intelligence), the increased computing power of graphical processing units (GPUs), and new bioinformatics tools. This review addresses the recent revolution in bacteriophage research, ranging from the adoption of neural network algorithms applied to metagenomic data to the use of pre-trained language models, such as BERT, which have improved the reconstruction of viral metagenome-assembled genomes (vMAGs). This article also discusses the main aspects of bacteriophage biology using deep learning, highlighting the advances and limitations of this approach. Finally, prospects of deep-learning-based metagenomic algorithms and recommendations for future investigations are described.
Collapse
Affiliation(s)
| | - Vitória Yumi Uetuki Nicoleti
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| | | | | | - Joel Henrique Ellwanger
- Laboratory of Immunobiology and Immunogenetics, Department of Genetics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil.
| | - James Moraes de Almeida
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| | - Leandro Nascimento Lemos
- Ilum School of Science, Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, São Paulo, Brazil.
| |
Collapse
|
2
|
Godsil M, Ritz NL, Venkatesh S, Meeske AJ. Gut phages and their interactions with bacterial and mammalian hosts. J Bacteriol 2025; 207:e0042824. [PMID: 39846747 PMCID: PMC11844821 DOI: 10.1128/jb.00428-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2025] Open
Abstract
The mammalian gut microbiome is a dense and diverse community of microorganisms that reside in the distal gastrointestinal tract. In recent decades, the bacterial members of the gut microbiome have been the subject of intense research. Less well studied is the large community of bacteriophages that reside in the gut, which number in the billions of viral particles per gram of feces, and consist of considerable unknown viral "dark matter." This community of gut-residing bacteriophages, called the gut "phageome," plays a central role in the gut microbiome through predation and transformation of native gut bacteria, and through interactions with their mammalian hosts. In this review, we will summarize what is known about the composition and origins of the gut phageome, as well as its role in microbiome homeostasis and host health. Furthermore, we will outline the interactions of gut phages with their bacterial and mammalian hosts, and plot a course for the mechanistic study of these systems.
Collapse
Affiliation(s)
- Marshall Godsil
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| | | | | | - Alexander J. Meeske
- Department of Microbiology, University of Washington, Seattle, Washington, USA
| |
Collapse
|
3
|
Wei A, Xiao Z, Fu L, Zhao W, Jiang X. Predicting phage-host interactions via feature augmentation and regional graph convolution. Brief Bioinform 2024; 26:bbae672. [PMID: 39756070 PMCID: PMC11671694 DOI: 10.1093/bib/bbae672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 11/05/2024] [Accepted: 12/14/2024] [Indexed: 12/28/2024] Open
Abstract
Identifying phage-host interactions (PHIs) is a crucial step in developing phage therapy, which is the promising solution to addressing the issue of antibiotic resistance in superbugs. However, the lifestyle of phages, which strongly depends on their host for life activities, limits their cultivability, making the study of predicting PHIs time-consuming and labor-intensive for traditional wet lab experiments. Although many deep learning (DL) approaches have been applied to PHIs prediction, most DL methods are predominantly based on sequence information, failing to comprehensively model the intricate relationships within PHIs. Moreover, most existing approaches are limited for sub-optimal performance, due to the potential risk of overfitting induced by the highly data sparsity in the task of PHIs prediction. In this study, we propose a novel approach called MI-RGC, which introduces mutual information for feature augmentation and employs regional graph convolution to learn meaningful representations. Specifically, MI-RGC treats the presence status of phages in environmental samples as random variables, and derives the mutual information between these random variables as the dependency relationships among phages. Consequently, a mutual information-based heterogeneous network is construted as feature augmentation for sequence information of phages, which is utilized for building a sequence information-based heterogeneous network. By considering the different contributions of neighboring nodes at varying distances, a regional graph convolutional model is designed, in which the neighboring nodes are segmented into different regions and a regional-level attention mechanism is employed to derive node embeddings. Finally, the embeddings learned from these two networks are aggregated through an attention mechanism, on which the prediction of PHIs is condcuted accordingly. Experimental results on three benchmark datasets demonstrate that MI-RGC derives superior performance over other methods on the task of PHIs prediction.
Collapse
Affiliation(s)
- Ankang Wei
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Zhen Xiao
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Lingling Fu
- School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
| | - Weizhong Zhao
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media,Central China Normal University, Wuhan 430079, China
| | - Xingpeng Jiang
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan 430079, China
- School of Computer Science, Central China Normal University, Wuhan 430079, China
- National Language Resources Monitoring & Research Center for Network Media,Central China Normal University, Wuhan 430079, China
| |
Collapse
|
4
|
Liu F, Zhao Z, Liu Y. PHPGAT: predicting phage hosts based on multimodal heterogeneous knowledge graph with graph attention network. Brief Bioinform 2024; 26:bbaf017. [PMID: 39833104 PMCID: PMC11745545 DOI: 10.1093/bib/bbaf017] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 12/18/2024] [Accepted: 01/07/2025] [Indexed: 01/22/2025] Open
Abstract
Antibiotic resistance poses a significant threat to global health, making the development of alternative strategies to combat bacterial pathogens increasingly urgent. One such promising approach is the strategic use of bacteriophages (or phages) to specifically target and eradicate antibiotic-resistant bacteria. Phages, being among the most prevalent life forms on Earth, play a critical role in maintaining ecological balance by regulating bacterial communities and driving genetic diversity. Accurate prediction of phage hosts is essential for successfully applying phage therapy. However, existing prediction models may not fully encapsulate the complex dynamics of phage-host interactions in diverse microbial environments, indicating a need for improved accuracy through more sophisticated modeling techniques. In response to this challenge, this study introduces a novel phage-host prediction model, PHPGAT, which leverages a multimodal heterogeneous knowledge graph with the advanced GATv2 (Graph Attention Network v2) framework. The model first constructs a multimodal heterogeneous knowledge graph by integrating phage-phage, host-host, and phage-host interactions to capture the intricate connections between biological entities. GATv2 is then employed to extract deep node features and learn dynamic interdependencies, generating context-aware embeddings. Finally, an inner product decoder is designed to compute the likelihood of interaction between a phage and host pair based on the embedding vectors produced by GATv2. Evaluation results using two datasets demonstrate that PHPGAT achieves precise phage host predictions and outperforms other models. PHPGAT is available at https://github.com/ZhaoZMer/PHPGAT.
Collapse
Affiliation(s)
- Fu Liu
- College of Communication Engineering, Jilin University, No. 2699 Qianjin Street, Chaoyang District, Changchun 130012, China
| | - Zhimiao Zhao
- School of Artificial Intelligence, Jilin University, No. 5988 Renmin Street, Nanguan District, Changchun 130022, China
| | - Yun Liu
- College of Communication Engineering, Jilin University, No. 2699 Qianjin Street, Chaoyang District, Changchun 130012, China
| |
Collapse
|
5
|
Androsiuk L, Maane S, Tal S. CRISPR spacers acquired from plasmids primarily target backbone genes, making them valuable for predicting potential hosts and host range. Microbiol Spectr 2024; 12:e0010424. [PMID: 39508585 PMCID: PMC11619364 DOI: 10.1128/spectrum.00104-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 09/16/2024] [Indexed: 11/15/2024] Open
Abstract
In recent years, there has been a surge in metagenomic studies focused on identifying plasmids in environmental samples. Although these studies have unearthed numerous novel plasmids, enriching our understanding of their environmental roles, a significant gap remains: the scarcity of information regarding the bacterial hosts of these newly discovered plasmids. Furthermore, even when plasmids are identified within bacterial isolates, the reported host is typically limited to the original isolate, with no insights into alternative hosts or the plasmid's potential host range. Given that plasmids depend on hosts for their existence, investigating plasmids without the knowledge of potential hosts offers only a partial perspective. This study introduces a method for identifying potential hosts and host ranges for plasmids through alignment with CRISPR spacers. To validate the method, we compared the PLSDB plasmids database with the CRISPR spacers database, yielding host predictions for 46% of the plasmids. When compared with reported hosts, our predictions achieved 84% concordance at the family level and 99% concordance at the phylum level. Moreover, the method frequently identified multiple potential hosts for a plasmid, thereby enabling predictions of alternative hosts and the host range. Notably, we found that CRISPR spacers predominantly target plasmid backbone genes while sparing functional genes, such as those linked to antibiotic resistance, aligning with our hypothesis that CRISPR spacers are acquired from plasmid-specific regions rather than insertion elements from diverse sources. Finally, we illustrate the network of connections among different bacterial taxa through plasmids, revealing potential pathways for horizontal gene transfer.IMPORTANCEPlasmids are notorious for their role in distributing antibiotic resistance genes, but they may also carry and distribute other environmentally important genes. Since plasmids are not free-living entities and rely on host bacteria for survival and propagation, predicting their hosts is essential. This study presents a method for predicting potential hosts for plasmids and offers insights into the potential paths for spreading functional genes between different bacteria. Understanding plasmid-host relationships is crucial for comprehending the ecological and clinical impact of plasmids and implications for various biological processes.
Collapse
Affiliation(s)
- Lucy Androsiuk
- Marine Biology and Biotechnology Program, Department of Life Sciences, Ben-Gurion University of the Negev Eilat Campus, Eilat, Israel
- Israel Oceanographic & Limnological Research Ltd., National Center for Mariculture, Eilat, Israel
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Sivan Maane
- Department of Life Sciences, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Shay Tal
- Israel Oceanographic & Limnological Research Ltd., National Center for Mariculture, Eilat, Israel
| |
Collapse
|
6
|
Chen J, Sun C, Dong Y, Jin M, Lai S, Jia L, Zhao X, Wang H, Gao NL, Bork P, Liu Z, Chen W, Zhao X. Efficient Recovery of Complete Gut Viral Genomes by Combined Short- and Long-Read Sequencing. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2305818. [PMID: 38240578 PMCID: PMC10987132 DOI: 10.1002/advs.202305818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/01/2023] [Indexed: 04/04/2024]
Abstract
Current metagenome assembled human gut phage catalogs contained mostly fragmented genomes. Here, comprehensive gut virome detection procedure is developed involving virus-like particle (VLP) enrichment from ≈500 g feces and combined sequencing of short- and long-read. Applied to 135 samples, a Chinese Gut Virome Catalog (CHGV) is assembled consisting of 21,499 non-redundant viral operational taxonomic units (vOTUs) that are significantly longer than those obtained by short-read sequencing and contained ≈35% (7675) complete genomes, which is ≈nine times more than those in the Gut Virome Database (GVD, ≈4%, 1,443). Interestingly, the majority (≈60%, 13,356) of the CHGV vOTUs are obtained by either long-read or hybrid assemblies, with little overlap with those assembled from only the short-read data. With this dataset, vast diversity of the gut virome is elucidated, including the identification of 32% (6,962) novel vOTUs compare to public gut virome databases, dozens of phages that are more prevalent than the crAssphages and/or Gubaphages, and several viral clades that are more diverse than the two. Finally, the functional capacities are also characterized of the CHGV encoded proteins and constructed a viral-host interaction network to facilitate future research and applications.
Collapse
Affiliation(s)
- Jingchao Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Chuqing Sun
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Yanqi Dong
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Menglu Jin
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Senying Lai
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Longhao Jia
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
| | - Xueyang Zhao
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
| | - Huarui Wang
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
| | - Na L. Gao
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- Department of Laboratory MedicineZhongnan Hospital of Wuhan UniversityWuhan UniversityWuhan430071China
| | - Peer Bork
- European Molecular Biology LaboratoryStructural and Computational Biology Unit69117HeidelbergGermany
- Max Delbrück Centre for Molecular Medicine13125BerlinGermany
- Yonsei Frontier Lab (YFL)Yonsei University03722SeoulSouth Korea
- Department of BioinformaticsBiocenterUniversity of Würzburg97070WürzburgGermany
| | - Zhi Liu
- Department of BiotechnologyCollege of Life Science and TechnologyHuazhong University of Science and Technology430074WuhanChina
| | - Wei‐Hua Chen
- Key Laboratory of Molecular Biophysics of the Ministry of EducationHubei Key Laboratory of Bioinformatics and Molecular ImagingCenter for Artificial Intelligence BiologyDepartment of Bioinformatics and Systems BiologyCollege of Life Science and TechnologyHuazhong University of Science and TechnologyWuhanHubei430074China
- College of Life ScienceHenan Normal UniversityXinxiangHenan453007China
- Institution of Medical Artificial IntelligenceBinzhou Medical UniversityYantai264003China
| | - Xing‐Ming Zhao
- Department of NeurologyZhongshan Hospital and Institute of Science and Technology for Brain‐Inspired IntelligenceFudan UniversityShanghai200433China
- MOE Key Laboratory of Computational Neuroscience and Brain‐Inspired Intelligenceand MOE Frontiers Center for Brain ScienceFudan UniversityShanghai200433China
- State Key Laboratory of Medical NeurobiologyInstitute of Brain ScienceFudan UniversityShanghai200433China
- International Human Phenome Institutes (Shanghai)Shanghai200433China
| |
Collapse
|
7
|
Liu X, Liu Y, Liu J, Zhang H, Shan C, Guo Y, Gong X, Cui M, Li X, Tang M. Correlation between the gut microbiome and neurodegenerative diseases: a review of metagenomics evidence. Neural Regen Res 2024; 19:833-845. [PMID: 37843219 PMCID: PMC10664138 DOI: 10.4103/1673-5374.382223] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/19/2023] [Accepted: 06/17/2023] [Indexed: 10/17/2023] Open
Abstract
A growing body of evidence suggests that the gut microbiota contributes to the development of neurodegenerative diseases via the microbiota-gut-brain axis. As a contributing factor, microbiota dysbiosis always occurs in pathological changes of neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease, and amyotrophic lateral sclerosis. High-throughput sequencing technology has helped to reveal that the bidirectional communication between the central nervous system and the enteric nervous system is facilitated by the microbiota's diverse microorganisms, and for both neuroimmune and neuroendocrine systems. Here, we summarize the bioinformatics analysis and wet-biology validation for the gut metagenomics in neurodegenerative diseases, with an emphasis on multi-omics studies and the gut virome. The pathogen-associated signaling biomarkers for identifying brain disorders and potential therapeutic targets are also elucidated. Finally, we discuss the role of diet, prebiotics, probiotics, postbiotics and exercise interventions in remodeling the microbiome and reducing the symptoms of neurodegenerative diseases.
Collapse
Affiliation(s)
- Xiaoyan Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yi Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
- Institute of Animal Husbandry, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu Province, China
| | - Junlin Liu
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Hantao Zhang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Chaofan Shan
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Yinglu Guo
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Xun Gong
- Department of Rheumatology & Immunology, Affiliated Hospital of Jiangsu University, Zhenjiang, Jiangsu Province, China
| | - Mengmeng Cui
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Xiubin Li
- Department of Neurology, The Second Affiliated Hospital of Shandong First Medical University, Taian, Shandong Province, China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang, Jiangsu Province, China
| |
Collapse
|
8
|
Mahony J. Biological and bioinformatic tools for the discovery of unknown phage-host combinations. Curr Opin Microbiol 2024; 77:102426. [PMID: 38246125 DOI: 10.1016/j.mib.2024.102426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 12/21/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024]
Abstract
The field of microbial ecology has been transformed by metagenomics in recent decades and has culminated in vast datasets that facilitate the bioinformatic dissection of complex microbial communities. Recently, attention has turned from defining the microbiota composition to the interactions and relationships that occur between members of the microbiota. Within complex microbiota, the identification of bacteriophage-host combinations has been a major challenge. Recent developments in artificial intelligence tools to predict protein structure and function as well as the relationships between bacteria and their infecting bacteriophages allow a strategic approach to identifying and validating phage-host relationships. However, biological validation of these predictions remains essential and will serve to improve the existing predictive tools. In this review, I provide an overview of the most recent developments in both bioinformatic and experimental approaches to predicting and experimentally validating unknown phage-host combinations.
Collapse
Affiliation(s)
- Jennifer Mahony
- School of Microbiology & APC Microbiome Ireland, University College Cork, Western Road, T12 YT20 Cork, Ireland.
| |
Collapse
|
9
|
Yin H, Wu S, Tan J, Guo Q, Li M, Guo J, Wang Y, Jiang X, Zhu H. IPEV: identification of prokaryotic and eukaryotic virus-derived sequences in virome using deep learning. Gigascience 2024; 13:giae018. [PMID: 38649300 PMCID: PMC11034026 DOI: 10.1093/gigascience/giae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/14/2024] [Accepted: 03/25/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND The virome obtained through virus-like particle enrichment contains a mixture of prokaryotic and eukaryotic virus-derived fragments. Accurate identification and classification of these elements are crucial to understanding their roles and functions in microbial communities. However, the rapid mutation rates of viral genomes pose challenges in developing high-performance tools for classification, potentially limiting downstream analyses. FINDINGS We present IPEV, a novel method to distinguish prokaryotic and eukaryotic viruses in viromes, with a 2-dimensional convolutional neural network combining trinucleotide pair relative distance and frequency. Cross-validation assessments of IPEV demonstrate its state-of-the-art precision, significantly improving the F1-score by approximately 22% on an independent test set compared to existing methods when query viruses share less than 30% sequence similarity with known viruses. Furthermore, IPEV outperforms other methods in accuracy on marine and gut virome samples based on annotations by sequence alignments. IPEV reduces runtime by at most 1,225 times compared to existing methods under the same computing configuration. We also utilized IPEV to analyze longitudinal samples and found that the gut virome exhibits a higher degree of temporal stability than previously observed in persistent personal viromes, providing novel insights into the resilience of the gut virome in individuals. CONCLUSIONS IPEV is a high-performance, user-friendly tool that assists biologists in identifying and classifying prokaryotic and eukaryotic viruses within viromes. The tool is available at https://github.com/basehc/IPEV.
Collapse
Affiliation(s)
- Hengchuang Yin
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Jie Tan
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Mo Li
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Jinyuan Guo
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Yaqi Wang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing 100101, China
| | - Huaiqiu Zhu
- Department of Biomedical Engineering, College of Future Technology, and Center for Quantitative Biology, Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
10
|
Zhang YZ, Liu Y, Bai Z, Fujimoto K, Uematsu S, Imoto S. Zero-shot-capable identification of phage-host relationships with whole-genome sequence representation by contrastive learning. Brief Bioinform 2023; 24:bbad239. [PMID: 37466138 PMCID: PMC10516345 DOI: 10.1093/bib/bbad239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/17/2023] [Accepted: 06/08/2023] [Indexed: 07/20/2023] Open
Abstract
Accurately identifying phage-host relationships from their genome sequences is still challenging, especially for those phages and hosts with less homologous sequences. In this work, focusing on identifying the phage-host relationships at the species and genus level, we propose a contrastive learning based approach to learn whole-genome sequence embeddings that can take account of phage-host interactions (PHIs). Contrastive learning is used to make phages infecting the same hosts close to each other in the new representation space. Specifically, we rephrase whole-genome sequences with frequency chaos game representation (FCGR) and learn latent embeddings that 'encapsulate' phages and host relationships through contrastive learning. The contrastive learning method works well on the imbalanced dataset. Based on the learned embeddings, a proposed pipeline named CL4PHI can predict known hosts and unseen hosts in training. We compare our method with two recently proposed state-of-the-art learning-based methods on their benchmark datasets. The experiment results demonstrate that the proposed method using contrastive learning improves the prediction accuracy on known hosts and demonstrates a zero-shot prediction capability on unseen hosts. In terms of potential applications, the rapid pace of genome sequencing across different species has resulted in a vast amount of whole-genome sequencing data that require efficient computational methods for identifying phage-host interactions. The proposed approach is expected to address this need by efficiently processing whole-genome sequences of phages and prokaryotic hosts and capturing features related to phage-host relationships for genome sequence representation. This approach can be used to accelerate the discovery of phage-host interactions and aid in the development of phage-based therapies for infectious diseases.
Collapse
Affiliation(s)
- Yao-zhong Zhang
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Yunjie Liu
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Zeheng Bai
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Kosuke Fujimoto
- Department of Immunology and Genomics, Graduate School of Medicine, Osaka Metropolitan University, Asahi-machi 1-4-3, Abeno-ku, 545-8585 Osaka, Japan
- Division of Metagenome Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Satoshi Uematsu
- Department of Immunology and Genomics, Graduate School of Medicine, Osaka Metropolitan University, Asahi-machi 1-4-3, Abeno-ku, 545-8585 Osaka, Japan
- Division of Metagenome Medicine, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Shirokanedai 4-6-1, Minato-ku, 108-8639 Tokyo, Japan
| |
Collapse
|
11
|
Gonzales MEM, Ureta JC, Shrestha AMS. Protein embeddings improve phage-host interaction prediction. PLoS One 2023; 18:e0289030. [PMID: 37486915 PMCID: PMC10365317 DOI: 10.1371/journal.pone.0289030] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/07/2023] [Indexed: 07/26/2023] Open
Abstract
With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficulty in selecting the most informative sequence properties to serve as input to the model. In this paper, we framed phage-host interaction prediction as a multiclass classification problem that takes as input the embeddings of a phage's receptor-binding proteins, which are known to be the key machinery for host recognition, and predicts the host genus. We explored different protein language models to automatically encode these protein sequences into dense embeddings without the need for additional alignment or structural information. We show that the use of embeddings of receptor-binding proteins presents improvements over handcrafted genomic and protein sequence features. The highest performance was obtained using the transformer-based protein language model ProtT5, resulting in a 3% to 4% increase in weighted F1 and recall scores across different prediction confidence thresholds, compared to using selected handcrafted sequence features.
Collapse
Affiliation(s)
- Mark Edward M Gonzales
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Jennifer C Ureta
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Anish M S Shrestha
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Systems and Computational Biology Research Unit, Center for Natural Sciences and Environmental Research, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
12
|
Roux S, Camargo AP, Coutinho FH, Dabdoub SM, Dutilh BE, Nayfach S, Tritt A. iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol 2023; 21:e3002083. [PMID: 37083735 PMCID: PMC10155999 DOI: 10.1371/journal.pbio.3002083] [Citation(s) in RCA: 134] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/03/2023] [Accepted: 03/15/2023] [Indexed: 04/22/2023] Open
Abstract
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
Collapse
Affiliation(s)
- Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Antonio Pedro Camargo
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Shareef M Dabdoub
- Division of Biostatistics and Computational Biology, University of Iowa College of Dentistry, Iowa City, Iowa, United States of America
| | - Bas E Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, the Netherlands
| | - Stephen Nayfach
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Andrew Tritt
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
13
|
Viral Metagenomic Analysis of the Fecal Samples in Domestic Dogs (Canis lupus familiaris). Viruses 2023; 15:v15030685. [PMID: 36992396 PMCID: PMC10058366 DOI: 10.3390/v15030685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/24/2023] [Accepted: 03/02/2023] [Indexed: 03/08/2023] Open
Abstract
Canine diarrhea is a common intestinal illness that is usually caused by viruses, bacteria, and parasites, and canine diarrhea may induce morbidity and mortality of domestic dogs if treated improperly. Recently, viral metagenomics was applied to investigate the signatures of the enteric virome in mammals. In this research, the characteristics of the gut virome in healthy dogs and dogs with diarrhea were analyzed and compared using viral metagenomics. The alpha diversity analysis indicated that the richness and diversity of the gut virome in the dogs with diarrhea were much higher than the healthy dogs, while the beta diversity analysis revealed that the gut virome of the two groups was quite different. At the family level, the predominant viruses in the canine gut virome were certified to be Microviridae, Parvoviridae, Siphoviridae, Inoviridae, Podoviridae, Myoviridae, and others. At the genus level, the predominant viruses in the canine gut virome were certified to be Protoparvovirus, Inovirus, Chlamydiamicrovirus, Lambdavirus, Dependoparvovirus, Lightbulbvirus, Kostyavirus, Punavirus, Lederbergvirus, Fibrovirus, Peduovirus, and others. However, the viral communities between the two groups differed significantly. The unique viral taxa identified in the healthy dogs group were Chlamydiamicrovirus and Lightbulbvirus, while the unique viral taxa identified in the dogs with diarrhea group were Inovirus, Protoparvovirus, Lambdavirus, Dependoparvovirus, Kostyavirus, Punavirus, and other viruses. Phylogenetic analysis based on the near-complete genome sequences showed that the CPV strains collected in this study together with other CPV Chinese isolates clustered into a separate branch, while the identified CAV-2 strain D5-8081 and AAV-5 strain AAV-D5 were both the first near-complete genome sequences in China. Moreover, the predicted bacterial hosts of phages were certified to be Campylobacter, Escherichia, Salmonella, Pseudomonas, Acinetobacter, Moraxella, Mediterraneibacter, and other commensal microbiota. In conclusion, the enteric virome of the healthy dogs group and the dogs with diarrhea group was investigated and compared using viral metagenomics, and the viral communities might influence canine health and disease by interacting with the commensal gut microbiome.
Collapse
|
14
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
15
|
Andrianjakarivony HF, Bettarel Y, Armougom F, Desnues C. Phage-Host Prediction Using a Computational Tool Coupled with 16S rRNA Gene Amplicon Sequencing. Viruses 2022; 15:76. [PMID: 36680116 PMCID: PMC9862649 DOI: 10.3390/v15010076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Metagenomics studies have revealed tremendous viral diversity in aquatic environments. Yet, while the genomic data they have provided is extensive, it is unannotated. For example, most phage sequences lack accurate information about their bacterial host, which prevents reliable phage identification and the investigation of phage-host interactions. This study aimed to take this knowledge further, using a viral metagenomic framework to decipher the composition and diversity of phage communities and to predict their bacterial hosts. To this end, we used water and sediment samples collected from seven sites with varying contamination levels in the Ebrié Lagoon in Abidjan, Ivory Coast. The bacterial communities were characterized using the 16S rRNA metabarcoding approach, and a framework was developed to investigate the virome datasets that: (1) identified phage contigs with VirSorter and VIBRANT; (2) classified these contigs with MetaPhinder using the phage database (taxonomic annotation); and (3) predicted the phages' bacterial hosts with a machine learning-based tool: the Prokaryotic Virus-Host Predictor. The findings showed that the taxonomic profiles of phages and bacteria were specific to sediment or water samples. Phage sequences assigned to the Microviridae family were widespread in sediment samples, whereas phage sequences assigned to the Siphoviridae, Myoviridae and Podoviridae families were predominant in water samples. In terms of bacterial communities, the phyla Latescibacteria, Zixibacteria, Bacteroidetes, Acidobacteria, Calditrichaeota, Gemmatimonadetes, Cyanobacteria and Patescibacteria were most widespread in sediment samples, while the phyla Epsilonbacteraeota, Tenericutes, Margulisbacteria, Proteobacteria, Actinobacteria, Planctomycetes and Marinimicrobia were most prevalent in water samples. Significantly, the relative abundance of bacterial communities (at major phylum level) estimated by 16S rRNA metabarcoding and phage-host prediction were significantly similar. These results demonstrate the reliability of this novel approach for predicting the bacterial hosts of phages from shotgun metagenomic sequencing data.
Collapse
Affiliation(s)
- Harilanto Felana Andrianjakarivony
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Yvan Bettarel
- MARBEC, Marine Biodiversity, Exploitation & Conservation, Université de Montpellier, CNRS, Ifremer, IRD, 093 Place Eugène Bataillon, 34090 Montpellier, France
| | - Fabrice Armougom
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Christelle Desnues
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| |
Collapse
|
16
|
Abstract
Motivation Phage–host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage–host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH. Results On the validation set, ContigNet achieves 72–85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60–70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts. Availability and implementation The source code of ContigNet and related datasets can be downloaded from https://github.com/tianqitang1/ContigNet.
Collapse
Affiliation(s)
- Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
17
|
Shang J, Sun Y. Predicting the hosts of prokaryotic viruses using GCN-based semi-supervised learning. BMC Biol 2021; 19:250. [PMID: 34819064 PMCID: PMC8611875 DOI: 10.1186/s12915-021-01180-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 10/29/2021] [Indexed: 11/23/2022] Open
Abstract
Background Prokaryotic viruses, which infect bacteria and archaea, are the most abundant and diverse biological entities in the biosphere. To understand their regulatory roles in various ecosystems and to harness the potential of bacteriophages for use in therapy, more knowledge of viral-host relationships is required. High-throughput sequencing and its application to the microbiome have offered new opportunities for computational approaches for predicting which hosts particular viruses can infect. However, there are two main challenges for computational host prediction. First, the empirically known virus-host relationships are very limited. Second, although sequence similarity between viruses and their prokaryote hosts have been used as a major feature for host prediction, the alignment is either missing or ambiguous in many cases. Thus, there is still a need to improve the accuracy of host prediction. Results In this work, we present a semi-supervised learning model, named HostG, to conduct host prediction for novel viruses. We construct a knowledge graph by utilizing both virus-virus protein similarity and virus-host DNA sequence similarity. Then graph convolutional network (GCN) is adopted to exploit viruses with or without known hosts in training to enhance the learning ability. During the GCN training, we minimize the expected calibrated error (ECE) to ensure the confidence of the predictions. We tested HostG on both simulated and real sequencing data and compared its performance with other state-of-the-art methods specifically designed for virus host classification (VHM-net, WIsH, PHP, HoPhage, RaFAH, vHULK, and VPF-Class). Conclusion HostG outperforms other popular methods, demonstrating the efficacy of using a GCN-based semi-supervised learning approach. A particular advantage of HostG is its ability to predict hosts from new taxa. Supplementary Information The online version contains supplementary material available at (10.1186/s12915-021-01180-4).
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|