1
|
Wang RH, Ng YK, Zhang X, Wang J, Li SC. Coding genomes with gapped pattern graph convolutional network. Bioinformatics 2024; 40:btae188. [PMID: 38603603 PMCID: PMC11034989 DOI: 10.1093/bioinformatics/btae188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 03/11/2024] [Accepted: 04/05/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment. RESULTS Inspired by the theory and applications of "spaced seeds," we propose a graph representation of genome sequences called "gapped pattern graph." These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences. AVAILABILITY AND IMPLEMENTATION The framework is available at https://github.com/deepomicslab/GCNFrame.
Collapse
Affiliation(s)
- Ruo Han Wang
- Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
| | - Yen Kaow Ng
- Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
| | - Xianglilan Zhang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Jianping Wang
- Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong Shenzhen Research Institute, Shen Zhen, 518063, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
| |
Collapse
|
2
|
Nie W, Qiu T, Wei Y, Ding H, Guo Z, Qiu J. Advances in phage-host interaction prediction: in silico method enhances the development of phage therapies. Brief Bioinform 2024; 25:bbae117. [PMID: 38555471 PMCID: PMC10981677 DOI: 10.1093/bib/bbae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/15/2024] [Accepted: 03/02/2024] [Indexed: 04/02/2024] Open
Abstract
Phages can specifically recognize and kill bacteria, which lead to important application value of bacteriophage in bacterial identification and typing, livestock aquaculture and treatment of human bacterial infection. Considering the variety of human-infected bacteria and the continuous discovery of numerous pathogenic bacteria, screening suitable therapeutic phages that are capable of infecting pathogens from massive phage databases has been a principal step in phage therapy design. Experimental methods to identify phage-host interaction (PHI) are time-consuming and expensive; high-throughput computational method to predict PHI is therefore a potential substitute. Here, we systemically review bioinformatic methods for predicting PHI, introduce reference databases and in silico models applied in these methods and highlight the strengths and challenges of current tools. Finally, we discuss the application scope and future research direction of computational prediction methods, which contribute to the performance improvement of prediction models and the development of personalized phage therapy.
Collapse
Affiliation(s)
- Wanchun Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
- Shanghai Institute of Infectious Disease and Biosecurity, Fudan University, Shanghai, 200032, China
| | - Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Hao Ding
- Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Zhixiang Guo
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| |
Collapse
|
3
|
Choudhary A, Midha T, Gulati I, Baranwal S. Isolation, Genomic Characterization of Shigella prophage fPSFA that effectively infects multi-drug resistant Shigella isolates from the Indian Poultry Sector. Microb Pathog 2024; 188:106538. [PMID: 38184177 DOI: 10.1016/j.micpath.2024.106538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/28/2023] [Accepted: 01/04/2024] [Indexed: 01/08/2024]
Abstract
Because of uncontrolled use of antibiotics, emergence of multidrug-resistant Shigella species poses a huge potential of zoonotic transfer from poultry sector. With increasing resistance to current antibiotics, there is a critical need to explore antibiotic alternatives. Using a Shigella flexneri reference strain, we isolated a novel fPSFA phage after inducing with mitomycin C. The phage was found to be stable for wide ranges of temperature -20 °C-65 °C and pH 3 to 11. fPSFA shows a latent period that ranges from 20 to 30 min and generation times of 50-60 min. The genome analysis of phage reveals two major contigs of 23788 bp and 23285 bp with 50.16 % and 39.33 % G + C content containing a total of 80 CDS and 2 tRNA genes. The phage belongs to Straboviridae family and lacks any virulence or antimicrobial resistance gene, thus making it a suitable candidate for treatment of drug-resistant infections. To confirm lytic ability of novel phage, we isolated 54 multidrug-resistant Shigella species from thirty-five poultry fecal samples that shows multiple antibiotic resistance index ranging from 0.15 to 0.75 (from 3 Indian states). The fPSFA showed lytic activity against multidrug-resistant Shigella isolates (73.08 %) (MARI≥0.50). The wide host ranges of fPSFA phage demonstrate its potential to be used as a biocontrol agent.
Collapse
Affiliation(s)
- Aaina Choudhary
- Department of Microbiology, School of Basic Sciences, Central University of Punjab, VPO Ghudda, Bathinda, 151401, India
| | - Tushar Midha
- Department of Microbiology, School of Basic Sciences, Central University of Punjab, VPO Ghudda, Bathinda, 151401, India
| | - Ishita Gulati
- Department of Microbiology, School of Basic Sciences, Central University of Punjab, VPO Ghudda, Bathinda, 151401, India
| | - Somesh Baranwal
- Department of Microbiology, School of Basic Sciences, Central University of Punjab, VPO Ghudda, Bathinda, 151401, India.
| |
Collapse
|
4
|
Cisneros-Martínez AM, Rodriguez-Cruz UE, Alcaraz LD, Becerra A, Eguiarte LE, Souza V. Comparative evaluation of bioinformatic tools for virus-host prediction and their application to a highly diverse community in the Cuatro Ciénegas Basin, Mexico. PLoS One 2024; 19:e0291402. [PMID: 38300968 PMCID: PMC10833507 DOI: 10.1371/journal.pone.0291402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/27/2023] [Indexed: 02/03/2024] Open
Abstract
Due to the enormous diversity of non-culturable viruses, new viruses must be characterized using culture-independent techniques. The associated host is an important phenotypic feature that can be inferred from metagenomic viral contigs thanks to the development of several bioinformatic tools. Here, we compare the performance of recently developed virus-host prediction tools on a dataset of 1,046 virus-host pairs and then apply the best-performing tools to a metagenomic dataset derived from a highly diverse transiently hypersaline site known as the Archaean Domes (AD) within the Cuatro Ciénegas Basin, Coahuila, Mexico. Among host-dependent methods, alignment-based approaches had a precision of 66.07% and a sensitivity of 24.76%, while alignment-free methods had an average precision of 75.7% and a sensitivity of 57.5%. RaFAH, a virus-dependent alignment-based tool, had the best overall performance (F1_score = 95.7%). However, when predicting the host of AD viruses, methods based on public reference databases (such as RaFAH) showed lower inter-method agreement than host-dependent methods run against custom databases constructed from prokaryotes inhabiting AD. Methods based on custom databases also showed the greatest agreement between the source environment and the predicted host taxonomy, habitat, lifestyle, or metabolism. This highlights the value of including custom data when predicting hosts on a highly diverse metagenomic dataset, and suggests that using a combination of methods and qualitative validations related to the source environment and predicted host biology can increase the number of correct predictions. Finally, these predictions suggest that AD viruses infect halophilic archaea as well as a variety of bacteria that may be halophilic, halotolerant, alkaliphilic, thermophilic, oligotrophic, sulfate-reducing, or marine, which is consistent with the specific environment and the known geological and biological evolution of the Cuatro Ciénegas Basin and its microorganisms.
Collapse
Affiliation(s)
- Alejandro Miguel Cisneros-Martínez
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
- Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Ulises E. Rodriguez-Cruz
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
- Doctorado en Ciencias Biomédicas, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Luis D. Alcaraz
- Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Arturo Becerra
- Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Luis E. Eguiarte
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Valeria Souza
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Ciudad de México, México
- Centro de Estudios del Cuaternario de Fuego-Patagonia y Antártica (CEQUA), Punta Arenas, Chile
| |
Collapse
|
5
|
Qiu J, Nie W, Ding H, Dai J, Wei Y, Li D, Zhang Y, Xie J, Tian X, Wu N, Qiu T. PB-LKS: a python package for predicting phage-bacteria interaction through local K-mer strategy. Brief Bioinform 2024; 25:bbae010. [PMID: 38344864 PMCID: PMC10859729 DOI: 10.1093/bib/bbae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/16/2023] [Accepted: 01/05/2024] [Indexed: 02/15/2024] Open
Abstract
Bacteriophages can help the treatment of bacterial infections yet require in-silico models to deal with the great genetic diversity between phages and bacteria. Despite the tolerable prediction performance, the application scope of current approaches is limited to the prediction at the species level, which cannot accurately predict the relationship of phages across strain mutants. This has hindered the development of phage therapeutics based on the prediction of phage-bacteria relationships. In this paper, we present, PB-LKS, to predict the phage-bacteria interaction based on local K-mer strategy with higher performance and wider applicability. The utility of PB-LKS is rigorously validated through (i) large-scale historical screening, (ii) case study at the class level and (iii) in vitro simulation of bacterial antiphage resistance at the strain mutant level. The PB-LKS approach could outperform the current state-of-the-art methods and illustrate potential clinical utility in pre-optimized phage therapy design.
Collapse
Affiliation(s)
- Jingxuan Qiu
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Wanchun Nie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Hao Ding
- Institute of Clinical Science, Zhongshan Hospital, Shanghai Institute of Infectious Disease and Biosecurity, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Jia Dai
- Shanghai Institute of Phage, Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China
| | - Yiwen Wei
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Dezhi Li
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Yuxi Zhang
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Junting Xie
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Xinxin Tian
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
| | - Nannan Wu
- Shanghai Institute of Phage, Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China
| | - Tianyi Qiu
- Institute of Clinical Science, Zhongshan Hospital, Shanghai Institute of Infectious Disease and Biosecurity, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| |
Collapse
|
6
|
Du ZH, Zhong JP, Liu Y, Li JQ. Prokaryotic virus host prediction with graph contrastive augmentaion. PLoS Comput Biol 2023; 19:e1011671. [PMID: 38039280 PMCID: PMC10691718 DOI: 10.1371/journal.pcbi.1011671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Prokaryotic viruses, also known as bacteriophages, play crucial roles in regulating microbial communities and have the potential for phage therapy applications. Accurate prediction of phage-host interactions is essential for understanding the dynamics of these viruses and their impacts on bacterial populations. Numerous computational methods have been developed to tackle this challenging task. However, most existing prediction models can be constrained due to the substantial number of unknown interactions in comparison to the constrained diversity of available training data. To solve the problem, we introduce a model for prokaryotic virus host prediction with graph contrastive augmentation (PHPGCA). Specifically, we construct a comprehensive heterogeneous graph by integrating virus-virus protein similarity and virus-host DNA sequence similarity information. As the backbone encoder for learning node representations in the virus-prokaryote graph, we employ LGCN, a state-of-the-art graph embedding technique. Additionally, we apply graph contrastive learning to augment the node representations without the need for additional labels. We further conducted two case studies aimed at predicting the host range of multi-species phages, helping to understand the phage ecology and evolution.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Jun-Peng Zhong
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Yun Liu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guang-dong, China
| |
Collapse
|
7
|
da Silva JD, Melo LDR, Santos SB, Kropinski AM, Xisto MF, Dias RS, da Silva Paes I, Vieira MS, Soares JJF, Porcellato D, da Silva Duarte V, de Paula SO. Genomic and proteomic characterization of vB_SauM-UFV_DC4, a novel Staphylococcus jumbo phage. Appl Microbiol Biotechnol 2023; 107:7231-7250. [PMID: 37741937 PMCID: PMC10638138 DOI: 10.1007/s00253-023-12743-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 04/03/2023] [Accepted: 08/21/2023] [Indexed: 09/25/2023]
Abstract
Staphylococcus aureus is one of the most relevant mastitis pathogens in dairy cattle, and the acquisition of antimicrobial resistance genes presents a significant health issue in both veterinary and human fields. Among the different strategies to tackle S. aureus infection in livestock, bacteriophages have been thoroughly investigated in the last decades; however, few specimens of the so-called jumbo phages capable of infecting S. aureus have been described. Herein, we report the biological, genomic, and structural proteomic features of the jumbo phage vB_SauM-UFV_DC4 (DC4). DC4 exhibited a remarkable killing activity against S. aureus isolated from the veterinary environment and stability at alkaline conditions (pH 4 to 12). The complete genome of DC4 is 263,185 bp (GC content: 25%), encodes 263 predicted CDSs (80% without an assigned function), 1 tRNA (Phe-tRNA), multisubunit RNA polymerase, and an RNA-dependent DNA polymerase. Moreover, comparative analysis revealed that DC4 can be considered a new viral species belonging to a new genus DC4 and showed a similar set of lytic proteins and depolymerase activity with closely related jumbo phages. The characterization of a new S. aureus jumbo phage increases our understanding of the diversity of this group and provides insights into the biotechnological potential of these viruses. KEY POINTS: • vB_SauM-UFV_DC4 is a new viral species belonging to a new genus within the class Caudoviricetes. • vB_SauM-UFV_DC4 carries a set of RNA polymerase subunits and an RNA-directed DNA polymerase. • vB_SauM-UFV_DC4 and closely related jumbo phages showed a similar set of lytic proteins.
Collapse
Affiliation(s)
- Jéssica Duarte da Silva
- Department of Microbiology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - Luís D R Melo
- Centre of Biological Engineering - CEB, University of Minho, 4710-057, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Portugal
| | - Sílvio B Santos
- Centre of Biological Engineering - CEB, University of Minho, 4710-057, Braga, Portugal
- LABBELS - Associate Laboratory, Braga, Portugal
| | - Andrew M Kropinski
- Department of Pathobiology, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Mariana Fonseca Xisto
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - Roberto Sousa Dias
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - Isabela da Silva Paes
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - Marcella Silva Vieira
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - José Júnior Ferreira Soares
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| | - Davide Porcellato
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway
| | - Vinícius da Silva Duarte
- Department of Microbiology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil.
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway.
| | - Sérgio Oliveira de Paula
- Department of General Biology, Federal University of Viçosa, Av. Peter Henry Rolfs, S/N, Campus Universitário, Viçosa, Minas Gerais, 36570-900, Brazil
| |
Collapse
|
8
|
Pan J, You Z, You W, Zhao T, Feng C, Zhang X, Ren F, Ma S, Wu F, Wang S, Sun Y. PTBGRP: predicting phage-bacteria interactions with graph representation learning on microbial heterogeneous information network. Brief Bioinform 2023; 24:bbad328. [PMID: 37742053 DOI: 10.1093/bib/bbad328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 08/14/2023] [Accepted: 08/30/2023] [Indexed: 09/25/2023] Open
Abstract
Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)-based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage-bacteria interaction (PBI) and six bacteria-bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.
Collapse
Affiliation(s)
- Jie Pan
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Zhuhong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - Wencai You
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Tian Zhao
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Chenlu Feng
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Xuexia Zhang
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Fengzhi Ren
- North China Pharmaceutical Group, Shijiazhuang 050015, Hebei, China
- National Microbial Medicine Engineering & Research Center, Shijiazhuang 050015, Hebei, China
| | - Sanxing Ma
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Fan Wu
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Shiwei Wang
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Yanmei Sun
- Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, Provincial Key Laboratory of Biotechnology of Shaanxi Province, the College of Life Sciences, Northwest University, Xi'an 710069, China
| |
Collapse
|
9
|
Gonzales MEM, Ureta JC, Shrestha AMS. Protein embeddings improve phage-host interaction prediction. PLoS One 2023; 18:e0289030. [PMID: 37486915 PMCID: PMC10365317 DOI: 10.1371/journal.pone.0289030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/07/2023] [Indexed: 07/26/2023] Open
Abstract
With the growing interest in using phages to combat antimicrobial resistance, computational methods for predicting phage-host interactions have been explored to help shortlist candidate phages. Most existing models consider entire proteomes and rely on manual feature engineering, which poses difficulty in selecting the most informative sequence properties to serve as input to the model. In this paper, we framed phage-host interaction prediction as a multiclass classification problem that takes as input the embeddings of a phage's receptor-binding proteins, which are known to be the key machinery for host recognition, and predicts the host genus. We explored different protein language models to automatically encode these protein sequences into dense embeddings without the need for additional alignment or structural information. We show that the use of embeddings of receptor-binding proteins presents improvements over handcrafted genomic and protein sequence features. The highest performance was obtained using the transformer-based protein language model ProtT5, resulting in a 3% to 4% increase in weighted F1 and recall scores across different prediction confidence thresholds, compared to using selected handcrafted sequence features.
Collapse
Affiliation(s)
- Mark Edward M Gonzales
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Jennifer C Ureta
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| | - Anish M S Shrestha
- Bioinformatics Laboratory, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, Philippines
- Systems and Computational Biology Research Unit, Center for Natural Sciences and Environmental Research, De La Salle University, Manila, Philippines
- Department of Software Technology, College of Computer Studies, De La Salle University, Manila, Philippines
| |
Collapse
|
10
|
Sultan-Alolama MI, Amin A, Vijayan R, El-Tarabily KA. Isolation, Characterization, and Comparative Genomic Analysis of Bacteriophage Ec_MI-02 from Pigeon Feces Infecting Escherichia coli O157:H7. Int J Mol Sci 2023; 24:ijms24119506. [PMID: 37298457 DOI: 10.3390/ijms24119506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/23/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
The most significant serotype of Shiga-toxigenic Escherichia coli that causes foodborne illnesses is Escherichia coli O157:H7. Elimination of E. coli O157:H7 during food processing and storage is a possible solution. Bacteriophages have a significant impact on bacterial populations in nature due to their ability to lyse their bacterial host. In the current study, a virulent bacteriophage, Ec_MI-02, was isolated from the feces of a wild pigeon in the United Arab Emirates (UAE) for potential future use as a bio-preservative or in phage therapy. Using a spot test and an efficiency of plating analysis, Ec_MI-02 was found to infect in addition to the propagation host, E. coli O157:H7 NCTC 12900, five different serotypes of E. coli O157:H7 (three clinical samples from infected patients, one from contaminated green salad, and one from contaminated ground beef). Based on morphology and genome analysis, Ec_MI-02 belongs to the genus Tequatrovirus under the order Caudovirales. The adsorption rate constant (K) of Ec_MI-02 was found to be 1.55 × 10-8 mL/min. The latent period was 50 min with a burst size of almost 10 plaque forming units (pfu)/host cell in the one-step growth curve when the phage Ec_MI-02 was cultivated using the propagation host E. coli O157:H7 NCTC 12900. Ec_MI-02 was found to be stable at a wide range of pH, temperature, and commonly used laboratory disinfectants. Its genome is 165,454 bp long with a GC content of 35.5% and encodes 266 protein coding genes. Ec_MI-02 has genes encoding for rI, rII, and rIII lysis inhibition proteins, which supports the observation of delayed lysis in the one-step growth curve. The current study provides additional evidence that wild birds could also be a good natural reservoir for bacteriophages that do not carry antibiotic resistance genes and could be good candidates for phage therapy. In addition, studying the genetic makeup of bacteriophages that infect human pathogens is crucial for ensuring their safe usage in the food industry.
Collapse
Affiliation(s)
- Mohamad Ismail Sultan-Alolama
- Zayed Complex for Herbal Research and Traditional Medicine, Research and Innovation Center, Department of Health, Abu Dhabi 5674, United Arab Emirates
- Department of Biology, College of Science, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Amr Amin
- Department of Biology, College of Science, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
| | - Ranjit Vijayan
- Department of Biology, College of Science, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- The Big Data Analytics Center, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Zayed Center for Health Sciences, United Arab Emirates University, Al Ain P.O. Box 17666, United Arab Emirates
| | - Khaled A El-Tarabily
- Department of Biology, College of Science, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Khalifa Center for Genetic Engineering and Biotechnology, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates
- Harry Butler Institute, Murdoch University, Murdoch, WA 6150, Australia
| |
Collapse
|
11
|
Baláž A, Kajsik M, Budiš J, Szemes T, Turňa J. PHERI-Phage Host ExploRation Pipeline. Microorganisms 2023; 11:1398. [PMID: 37374901 DOI: 10.3390/microorganisms11061398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open
Abstract
Antibiotic resistance is becoming a common problem in medicine, food, and industry, with multidrug-resistant bacterial strains occurring in all regions. One of the possible future solutions is the use of bacteriophages. Phages are the most abundant form of life in the biosphere, so we can highly likely purify a specific phage against each target bacterium. The identification and consistent characterization of individual phages was a common form of phage work and included determining bacteriophages' host-specificity. With the advent of new modern sequencing methods, there was a problem with the detailed characterization of phages in the environment identified by metagenome analysis. The solution to this problem may be to use a bioinformatic approach in the form of prediction software capable of determining a bacterial host based on the phage whole-genome sequence. The result of our research is the machine learning algorithm-based tool called PHERI. PHERI predicts the suitable bacterial host genus for the purification of individual viruses from different samples. In addition, it can identify and highlight protein sequences that are important for host selection.
Collapse
Affiliation(s)
- Andrej Baláž
- Geneton Ltd., Ilkovicova 8, 841 04 Bratislava, Slovakia
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska dolina F1, 842 48 Bratislava, Slovakia
| | - Michal Kajsik
- Science Park, Comenius University, Ilkovicova 8, 841 04 Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovicova 6, 841 04 Bratislava, Slovakia
- Medirex Group Academy n.o., Novozamocka 1, 949 05 Nitra, Slovakia
| | - Jaroslav Budiš
- Geneton Ltd., Ilkovicova 8, 841 04 Bratislava, Slovakia
- Science Park, Comenius University, Ilkovicova 8, 841 04 Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information (SCSTI), Lamacska Cesta 8/A, 811 04 Bratislava, Slovakia
| | - Tomáš Szemes
- Geneton Ltd., Ilkovicova 8, 841 04 Bratislava, Slovakia
- Science Park, Comenius University, Ilkovicova 8, 841 04 Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovicova 6, 841 04 Bratislava, Slovakia
| | - Ján Turňa
- Science Park, Comenius University, Ilkovicova 8, 841 04 Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, Ilkovicova 6, 841 04 Bratislava, Slovakia
| |
Collapse
|
12
|
Lopez MES, Gontijo MTP, Cardoso RR, Batalha LS, Eller MR, Bazzolli DMS, Vidigal PMP, Mendonça RCS. Complete genome analysis of Tequatrovirus ufvareg1, a Tequatrovirus species inhibiting Escherichia coli O157:H7. Front Cell Infect Microbiol 2023; 13:1178248. [PMID: 37274318 PMCID: PMC10236363 DOI: 10.3389/fcimb.2023.1178248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 04/27/2023] [Indexed: 06/06/2023] Open
Abstract
Introduction Bacteriophages infecting human pathogens have been considered potential biocontrol agents, and studying their genetic content is essential to their safe use in the food industry. Tequatrovirus ufvareg1 is a bacteriophage named UFV-AREG1, isolated from cowshed wastewater and previously tested for its ability to inhibit Escherichia coli O157:H7. Methods T. ufvareg1 was previously isolated using E. coli O157:H7 (ATCC 43895) as a bacterial host. The same strain was used for bacteriophage propagation and the one-step growth curve. The genome of the T. ufvareg1 was sequenced using 305 Illumina HiSeq, and the genome comparison was calculated by VIRIDIC and VIPTree. Results Here, we characterize its genome and compare it to other Tequatrovirus. T. ufvareg1 virions have an icosahedral head (114 x 86 nm) and a contracted tail (117 x 23 nm), with a latent period of 25 min, and an average burst size was 18 phage particles per infected E. coli cell. The genome of the bacteriophage T. ufvareg1 contains 268 coding DNA sequences (CDS) and ten tRNA genes distributed in both negative and positive strains. T. ufvareg1 genome also contains 40 promoters on its regulatory regions and two rho-independent terminators. T. ufvareg1 shares an average intergenomic similarity (VIRIDC) of 88.77% and an average genomic similarity score (VipTree) of 88.91% with eight four reference genomes for Tequatrovirus available in the NCBI RefSeq database. The pan-genomic analysis confirmed the high conservation of Tequatrovirus genomes. Among all CDS annotated in the T. ufvareg1 genome, there are 123 core genes, 38 softcore genes, 94 shell genes, and 13 cloud genes. None of 268 CDS was classified as being exclusive of T. ufvareg1. Conclusion The results in this paper, combined with other previously published findings, indicate that T. ufvareg1 bacteriophage is a potential candidate for food protection against E. coli O157:H7 in foods.
Collapse
Affiliation(s)
- Maryoris Elisa Soto Lopez
- Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Departamento de Ingeniería de Alimentos, Universidad de Córdoba, Montería, Colombia
| | - Marco Tulio Pardini Gontijo
- Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Duke University, Durham, NC, United States
| | - Rodrigo Rezende Cardoso
- Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Laís Silva Batalha
- Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Monique Renon Eller
- Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | | | | | | |
Collapse
|
13
|
Zheng K, Dong Y, Liang Y, Liu Y, Zhang X, Zhang W, Wang Z, Shao H, Sung YY, Mok WJ, Wong LL, McMinn A, Wang M. Genomic diversity and ecological distribution of marine Pseudoalteromonas phages. MARINE LIFE SCIENCE & TECHNOLOGY 2023; 5:271-285. [PMID: 37275543 PMCID: PMC10232697 DOI: 10.1007/s42995-022-00160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 12/01/2022] [Indexed: 06/07/2023]
Abstract
Pseudoalteromonas, with a ubiquitous distribution, is one of the most abundant marine bacterial genera. It is especially abundant in the deep sea and polar seas, where it has been found to have a broad metabolic capacity and unique co-existence strategies with other organisms. However, only a few Pseudoalteromonas phages have so far been isolated and investigated and their genomic diversity and distribution patterns are still unclear. Here, the genomes, taxonomic features and distribution patterns of Pseudoalteromonas phages are systematically analyzed, based on the microbial and viral genomes and metagenome datasets. A total of 143 complete or nearly complete Pseudoalteromonas-associated phage genomes (PSAPGs) were identified, including 34 Pseudoalteromonas phage isolates, 24 proviruses, and 85 Pseudoalteromonas-associated uncultured viral genomes (UViGs); these were assigned to 47 viral clusters at the genus level. Many integrated proviruses (n = 24) and filamentous phages were detected (n = 32), suggesting the prevalence of viral lysogenic life cycle in Pseudoalteromonas. PSAPGs encoded 66 types of 249 potential auxiliary metabolic genes (AMGs) relating to peptidases and nucleotide metabolism. They may also participate in marine biogeochemical cycles through the manipulation of the metabolism of their hosts, especially in the phosphorus and sulfur cycles. Siphoviral and filamentous PSAPGs were the predominant viral lineages found in polar areas, while some myoviral and siphoviral PSAPGs encoding transposase were more abundant in the deep sea. This study has expanded our understanding of the taxonomy, phylogenetic and ecological scope of marine Pseudoalteromonas phages and deepens our knowledge of viral impacts on Pseudoalteromonas. It will provide a baseline for the study of interactions between phages and Pseudoalteromonas in the ocean. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-022-00160-z.
Collapse
Affiliation(s)
- Kaiyang Zheng
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Yue Dong
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Yantao Liang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
| | - Yundan Liu
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Xinran Zhang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Wenjing Zhang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Ziyue Wang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
| | - Hongbing Shao
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
| | - Yeong Yik Sung
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Wen Jye Mok
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Li Lian Wong
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu (UMT), 21030 Kuala Nerus, Malaysia
| | - Andrew McMinn
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, Australia
| | - Min Wang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, and Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao, 266100 China
- UMT-OUC Joint Center for Marine Studies, Qingdao, 266003 China
- Haide College, Ocean University of China, Qingdao, 266100 China
- The Affiliated Hospital of Qingdao University, Qingdao, 266000 China
| |
Collapse
|
14
|
Roux S, Camargo AP, Coutinho FH, Dabdoub SM, Dutilh BE, Nayfach S, Tritt A. iPHoP: An integrated machine learning framework to maximize host prediction for metagenome-derived viruses of archaea and bacteria. PLoS Biol 2023; 21:e3002083. [PMID: 37083735 PMCID: PMC10155999 DOI: 10.1371/journal.pbio.3002083] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/03/2023] [Accepted: 03/15/2023] [Indexed: 04/22/2023] Open
Abstract
The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.
Collapse
Affiliation(s)
- Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Antonio Pedro Camargo
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Shareef M Dabdoub
- Division of Biostatistics and Computational Biology, University of Iowa College of Dentistry, Iowa City, Iowa, United States of America
| | - Bas E Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, the Netherlands
| | - Stephen Nayfach
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Andrew Tritt
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
15
|
do Socorro Fôro Ramos E, Bahia SL, de Oliveira Ribeiro G, Villanova F, de Pádua Milagres FA, Brustulin R, Pandey RP, Deng X, Delwart E, da Costa AC, Leal É. Characterization of Phietavirus Henu 2 in the virome of individuals with acute gastroenteritis. Virus Genes 2023; 59:464-472. [PMID: 37004601 DOI: 10.1007/s11262-023-01990-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 03/15/2023] [Indexed: 04/04/2023]
Abstract
There is a growing interest in phages as potential biotechnological tools in human health owing to the antibacterial activity of these viruses. In this study, we characterized a new member (named PhiV_005_BRA/2016) of the recently identified phage species Phietavirus Henu 2. PhiV_005_BRA/2016 was detected through metagenomic analysis of stool samples of individuals with acute gastroenteritis. PhiV_005_BRA/2016 contains double-stranded linear DNA (dsDNA), it has a genome of 43,513 base pairs (bp), with a high identity score (99%) with phage of the genus Phietavirus, species of Phietavirus Henu 2. Life style prediction indicated that PhiV_005_BRA/2016 is a lysogenic phage whose the main host is methicillin-resistant Staphylococcus aureus (MRSA). Indeed, we found PhiV_005_BRA/2016 partially integrated in the genome of distinct MRSA strains. Our findings highlights the importance of large-scale screening of bacteriophages to better understand the emergence of multi-drug resistant bacterial.
Collapse
Affiliation(s)
- Endrya do Socorro Fôro Ramos
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicais, Universidade Federal do Pará, Belém, Pará, 66075-000, Brazil
| | - Santana Lobato Bahia
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicais, Universidade Federal do Pará, Belém, Pará, 66075-000, Brazil
| | - Geovani de Oliveira Ribeiro
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicais, Universidade Federal do Pará, Belém, Pará, 66075-000, Brazil
| | - Fabiola Villanova
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicais, Universidade Federal do Pará, Belém, Pará, 66075-000, Brazil
| | - Flávio Augusto de Pádua Milagres
- Secretaria de Saúde do Tocantins, Palmas, Tocantins, 77453-000, Brazil
- Laboratório Central de Saúde Pública do Tocantins (LACEN/TO), Palmas, Tocantins, 77016-330, Brazil
| | - Rafael Brustulin
- Secretaria de Saúde do Tocantins, Palmas, Tocantins, 77453-000, Brazil
- Laboratório Central de Saúde Pública do Tocantins (LACEN/TO), Palmas, Tocantins, 77016-330, Brazil
| | - Ramendra Pati Pandey
- Centre for Drug Design Discovery and Development (C4D), SRM University Delhi-NCR, Rajiv Gandhi Education City, Sonepat, Haryana, 131029, India
| | - Xutao Deng
- Vitalant Research Institute, 270 Masonic Avenue, San Francisco, CA, 94118-4417, USA
- Department Laboratory Medicine, University of California San Francisco, San Francisco, CA, 94143, USA
| | - Eric Delwart
- Vitalant Research Institute, 270 Masonic Avenue, San Francisco, CA, 94118-4417, USA
- Department Laboratory Medicine, University of California San Francisco, San Francisco, CA, 94143, USA
| | | | - Élcio Leal
- Laboratório de Diversidade Viral, Instituto de Ciências Biológicais, Universidade Federal do Pará, Belém, Pará, 66075-000, Brazil.
| |
Collapse
|
16
|
Beamud B, García-González N, Gómez-Ortega M, González-Candelas F, Domingo-Calap P, Sanjuan R. Genetic determinants of host tropism in Klebsiella phages. Cell Rep 2023; 42:112048. [PMID: 36753420 PMCID: PMC9989827 DOI: 10.1016/j.celrep.2023.112048] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 11/25/2022] [Accepted: 01/13/2023] [Indexed: 02/08/2023] Open
Abstract
Bacteriophages play key roles in bacterial ecology and evolution and are potential antimicrobials. However, the determinants of phage-host specificity remain elusive. Here, we isolate 46 phages to challenge 138 representative clinical isolates of Klebsiella pneumoniae, a widespread opportunistic pathogen. Spot tests show a narrow host range for most phages, with <2% of 6,319 phage-host combinations tested yielding detectable interactions. Bacterial capsule diversity is the main factor restricting phage host range. Consequently, phage-encoded depolymerases are key determinants of host tropism, and depolymerase sequence types are associated with the ability to infect specific capsular types across phage families. However, all phages with a broader host range found do not encode canonical depolymerases, suggesting alternative modes of entry. These findings expand our knowledge of the complex interactions between bacteria and their viruses and point out the feasibility of predicting the first steps of phage infection using bacterial and phage genome sequences.
Collapse
Affiliation(s)
- Beatriz Beamud
- Joint Research Unit Infection and Public Health, FISABIO-Universitat de València, 46020 València, Spain; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València-CSIC, 46980 Paterna, Spain
| | - Neris García-González
- Joint Research Unit Infection and Public Health, FISABIO-Universitat de València, 46020 València, Spain; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València-CSIC, 46980 Paterna, Spain
| | - Mar Gómez-Ortega
- Joint Research Unit Infection and Public Health, FISABIO-Universitat de València, 46020 València, Spain
| | - Fernando González-Candelas
- Joint Research Unit Infection and Public Health, FISABIO-Universitat de València, 46020 València, Spain; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València-CSIC, 46980 Paterna, Spain.
| | - Pilar Domingo-Calap
- Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València-CSIC, 46980 Paterna, Spain.
| | - Rafael Sanjuan
- Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València-CSIC, 46980 Paterna, Spain.
| |
Collapse
|
17
|
Bajiya N, Dhall A, Aggarwal S, Raghava GPS. Advances in the field of phage-based therapy with special emphasis on computational resources. Brief Bioinform 2023; 24:6961791. [PMID: 36575815 DOI: 10.1093/bib/bbac574] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/07/2022] [Accepted: 11/25/2022] [Indexed: 12/29/2022] Open
Abstract
In the current era, one of the major challenges is to manage the treatment of drug/antibiotic-resistant strains of bacteria. Phage therapy, a century-old technique, may serve as an alternative to antibiotics in treating bacterial infections caused by drug-resistant strains of bacteria. In this review, a systematic attempt has been made to summarize phage-based therapy in depth. This review has been divided into the following two sections: general information and computer-aided phage therapy (CAPT). In the case of general information, we cover the history of phage therapy, the mechanism of action, the status of phage-based products (approved and clinical trials) and the challenges. This review emphasizes CAPT, where we have covered primary phage-associated resources, phage prediction methods and pipelines. This review covers a wide range of databases and resources, including viral genomes and proteins, phage receptors, host genomes of phages, phage-host interactions and lytic proteins. In the post-genomic era, identifying the most suitable phage for lysing a drug-resistant strain of bacterium is crucial for developing alternate treatments for drug-resistant bacteria and this remains a challenging problem. Thus, we compile all phage-associated prediction methods that include the prediction of phages for a bacterial strain, the host for a phage and the identification of interacting phage-host pairs. Most of these methods have been developed using machine learning and deep learning techniques. This review also discussed recent advances in the field of CAPT, where we briefly describe computational tools available for predicting phage virions, the life cycle of phages and prophage identification. Finally, we describe phage-based therapy's advantages, challenges and opportunities.
Collapse
Affiliation(s)
- Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Suchet Aggarwal
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India
| |
Collapse
|
18
|
Iuchi H, Kawasaki J, Kubo K, Fukunaga T, Hokao K, Yokoyama G, Ichinose A, Suga K, Hamada M. Bioinformatics approaches for unveiling virus-host interactions. Comput Struct Biotechnol J 2023; 21:1774-1784. [PMID: 36874163 PMCID: PMC9969756 DOI: 10.1016/j.csbj.2023.02.044] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 02/22/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Junna Kawasaki
- Faculty of Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Kento Kubo
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Nishi Waseda, Shinjuku-ku, Tokyo 169-0051, Japan
| | - Koki Hokao
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Gentaro Yokoyama
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Akiko Ichinose
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Kanta Suga
- School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan
| | - Michiaki Hamada
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan.,School of Advanced Science and Engineering, Waseda University, Okubo Shinjuku-ku, Tokyo 169-8555, Japan.,Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
19
|
Fan X, Ji M, Sun K, Li Q. Microbial and phage communities as well as their interaction in PO saponification wastewater treatment systems. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2023; 87:354-365. [PMID: 36706286 DOI: 10.2166/wst.2022.422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Viruses or phages were considered affecting microbial community composition, metabolic process, and biogeochemical cycles. However, phage communities and their potential associations with microbial community are not well understood in the activated sludge (AS) of wastewater treatment plants (WWTPs). In this study, we explored the interactions between phages and microbial community by using propylene oxide (PO) saponification WWTPs as an example. Bacterial, eukaryal and archaeal communities were investigated and 34 phage contigs (>10 kb) were recovered from PO saponification WWTPs. At least 3 complete phage genomes were assembled. In all 34 phages, 21 of them have been predicted to their host. The association network analysis showed that abundant phages were associated with abundant microorganisms. This result conformed to Kill-the-Winner model. Notably, 45 auxiliary metabolic genes (AMGs) were identified from phage genomes (including small contig fragments). They influenced bacterial metabolism through facilitating phages replication and avoiding host death. Collectively, our results suggested that phage community affect microbial community and metabolic pathways by killing their hosts and AMGs transfer in AS of PO saponification WWTPs.
Collapse
Affiliation(s)
- Xiangyu Fan
- School of Biological Science and Technology, University of Jinan, Jinan 250022, China E-mail:
| | - Mengzhi Ji
- School of Biological Science and Technology, University of Jinan, Jinan 250022, China E-mail:
| | - Kaili Sun
- School of Biological Science and Technology, University of Jinan, Jinan 250022, China E-mail:
| | - Qiang Li
- School of Biological Science and Technology, University of Jinan, Jinan 250022, China E-mail:
| |
Collapse
|
20
|
Andrianjakarivony HF, Bettarel Y, Armougom F, Desnues C. Phage-Host Prediction Using a Computational Tool Coupled with 16S rRNA Gene Amplicon Sequencing. Viruses 2022; 15:76. [PMID: 36680116 PMCID: PMC9862649 DOI: 10.3390/v15010076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 12/29/2022] Open
Abstract
Metagenomics studies have revealed tremendous viral diversity in aquatic environments. Yet, while the genomic data they have provided is extensive, it is unannotated. For example, most phage sequences lack accurate information about their bacterial host, which prevents reliable phage identification and the investigation of phage-host interactions. This study aimed to take this knowledge further, using a viral metagenomic framework to decipher the composition and diversity of phage communities and to predict their bacterial hosts. To this end, we used water and sediment samples collected from seven sites with varying contamination levels in the Ebrié Lagoon in Abidjan, Ivory Coast. The bacterial communities were characterized using the 16S rRNA metabarcoding approach, and a framework was developed to investigate the virome datasets that: (1) identified phage contigs with VirSorter and VIBRANT; (2) classified these contigs with MetaPhinder using the phage database (taxonomic annotation); and (3) predicted the phages' bacterial hosts with a machine learning-based tool: the Prokaryotic Virus-Host Predictor. The findings showed that the taxonomic profiles of phages and bacteria were specific to sediment or water samples. Phage sequences assigned to the Microviridae family were widespread in sediment samples, whereas phage sequences assigned to the Siphoviridae, Myoviridae and Podoviridae families were predominant in water samples. In terms of bacterial communities, the phyla Latescibacteria, Zixibacteria, Bacteroidetes, Acidobacteria, Calditrichaeota, Gemmatimonadetes, Cyanobacteria and Patescibacteria were most widespread in sediment samples, while the phyla Epsilonbacteraeota, Tenericutes, Margulisbacteria, Proteobacteria, Actinobacteria, Planctomycetes and Marinimicrobia were most prevalent in water samples. Significantly, the relative abundance of bacterial communities (at major phylum level) estimated by 16S rRNA metabarcoding and phage-host prediction were significantly similar. These results demonstrate the reliability of this novel approach for predicting the bacterial hosts of phages from shotgun metagenomic sequencing data.
Collapse
Affiliation(s)
- Harilanto Felana Andrianjakarivony
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Yvan Bettarel
- MARBEC, Marine Biodiversity, Exploitation & Conservation, Université de Montpellier, CNRS, Ifremer, IRD, 093 Place Eugène Bataillon, 34090 Montpellier, France
| | - Fabrice Armougom
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| | - Christelle Desnues
- Microbes, Evolution, Phylogeny, and Infection (MEΦI), IHU—Méditerranée Infection, 19-21 Boulevard Jean Moulin, 13005 Marseille, France
- Microbiologie Environnementale Biotechnologie (MEB), Mediterranean Institute of Oceanography (MIO), 163 Avenue de Luminy, 13009 Marseille, France
| |
Collapse
|
21
|
Microbiome-phage interactions in inflammatory bowel disease. Clin Microbiol Infect 2022:S1198-743X(22)00506-7. [PMID: 36191844 DOI: 10.1016/j.cmi.2022.08.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/23/2022] [Accepted: 08/29/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Inflammatory bowel diseases (IBD) constitute a group of auto-inflammatory disorders impacting the gastrointestinal tract and other systemic organs. The gut microbiome contributes to IBD pathology through multiple mechanisms. Bacteriophages (hence termed phages) are viruses that are able to specifically infect bacteria. Considered as part of the gut microbiome, phages may impact bacterial community structure in various clinical contexts. Additionally, exogenous phage administration may represent a means of suppressing IBD-associated pathobionts, yet utilization of phage therapy remains at an early developmental phase. OBJECTIVES Herein, we summarize the latest advances in understanding endogenous phage impacts on the gut microbiome in health and in IBD. We highlight the prospect of phage utilization as a targeted mode of pathobiont eradication, in preventing and treating IBD manifestations and complications. SOURCES Selected peer-reviewed publications regarding the role of phages in health and in IBD, published between 2013 and 2022. CONTENT The human gut microbiome is increasingly suggested to play a significant role in the onset and progression of multiple non-communicable diseases such as IBD. Several studies suggest that this effect may be mediated by discrete disease-contributing commensals. However, eradication of such pathogenic bacteria remains a daunting unmet task. Altered community structure in IBD may be influenced by blooms of phages within the gut bacterial ecosystem. Moreover, combinations of phages specifically targeting disease-contributing pathobiont strain clades may be harnessed as potential eradication treatment preventing and treating IBD, while bearing minimal adverse impacts on the surrounding bacterial microbiome. IMPLICATIONS Understanding endogenous phage-gut commensal interactions in health and in IBD may enable phage utilization in precision gut microbiome editing, towards treating IBD and other non-communicable microbiome-associated diseases. Nevertheless, developing phage combination-mediated IBD pathobiont eradication treatment modalities will likely necessitate better strain-level bacterial target identification and resolution of treatment-related challenges, such as phage delivery, off-target effects, and bacterial resistance.
Collapse
|
22
|
Monshizadeh M, Zomorodi S, Mortensen K, Ye Y. Revealing bacteria-phage interactions in human microbiome through the CRISPR-Cas immune systems. Front Cell Infect Microbiol 2022; 12:933516. [PMID: 36250060 PMCID: PMC9554610 DOI: 10.3389/fcimb.2022.933516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/09/2022] [Indexed: 11/13/2022] Open
Abstract
The human gut microbiome is composed of a diverse consortium of microorganisms. Relatively little is known about the diversity of the bacteriophage population and their interactions with microbial organisms in the human microbiome. Due to the persistent rivalry between microbial organisms (hosts) and phages (invaders), genetic traces of phages are found in the hosts’ CRISPR-Cas adaptive immune system. Mobile genetic elements (MGEs) found in bacteria include genetic material from phage and plasmids, often resultant from invasion events. We developed a computational pipeline (BacMGEnet), which can be used for inference and exploratory analysis of putative interactions between microbial organisms and MGEs (phages and plasmids) and their interaction network. Given a collection of genomes as the input, BacMGEnet utilizes computational tools we have previously developed to characterize CRISPR-Cas systems in the genomes, which are then used to identify putative invaders from publicly available collections of phage/prophage sequences. In addition, BacMGEnet uses a greedy algorithm to summarize identified putative interactions to produce a bacteria-MGE network in a standard network format. Inferred networks can be utilized to assist further examination of the putative interactions and for discovery of interaction patterns. Here we apply the BacMGEnet pipeline to a few collections of genomic/metagenomic datasets to demonstrate its utilities. BacMGEnet revealed a complex interaction network of the Phocaeicola vulgatus pangenome with its phage invaders, and the modularity analysis of the resulted network suggested differential activities of the different P. vulgatus’ CRISPR-Cas systems (Type I-C and Type II-C) against some phages. Analysis of the phage-bacteria interaction network of human gut microbiome revealed a mixture of phages with a broad host range (resulting in large modules with many bacteria and phages), and phages with narrow host range. We also showed that BacMGEnet can be used to infer phages that invade bacteria and their interactions in wound microbiome. We anticipate that BacMGEnet will become an important tool for studying the interactions between bacteria and their invaders for microbiome research.
Collapse
|
23
|
Li J, Yang F, Xiao M, Li A. Advances and challenges in cataloging the human gut virome. Cell Host Microbe 2022; 30:908-916. [PMID: 35834962 DOI: 10.1016/j.chom.2022.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 06/02/2022] [Accepted: 06/07/2022] [Indexed: 11/17/2022]
Abstract
The human gut virome, which is often referred to as the "dark matter" of the gut microbiome, remains understudied. A better understanding of the composition and variations of the gut virome across populations is critical for exploring its impact on diseases and health. A series of advances in the characterization of human gut virome have unveiled high genetic diversity and various functional potentials of gut viruses. Here, we summarize the recently available human gut virome databases and discuss their features, procedures, and challenges with the intention to provide a reference to researchers to use while choosing a profiling database. We also propose a "best practice" for cataloging the viral population.
Collapse
Affiliation(s)
- Junhua Li
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China.
| | | | - Minfeng Xiao
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China.
| | - Aixin Li
- BGI-Shenzhen, Shenzhen 518083, China; Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|
24
|
Andrade-Martínez JS, Camelo Valera LC, Chica Cárdenas LA, Forero-Junco L, López-Leal G, Moreno-Gallego JL, Rangel-Pineros G, Reyes A. Computational Tools for the Analysis of Uncultivated Phage Genomes. Microbiol Mol Biol Rev 2022; 86:e0000421. [PMID: 35311574 PMCID: PMC9199400 DOI: 10.1128/mmbr.00004-21] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Over a century of bacteriophage research has uncovered a plethora of fundamental aspects of their biology, ecology, and evolution. Furthermore, the introduction of community-level studies through metagenomics has revealed unprecedented insights on the impact that phages have on a range of ecological and physiological processes. It was not until the introduction of viral metagenomics that we began to grasp the astonishing breadth of genetic diversity encompassed by phage genomes. Novel phage genomes have been reported from a diverse range of biomes at an increasing rate, which has prompted the development of computational tools that support the multilevel characterization of these novel phages based solely on their genome sequences. The impact of these technologies has been so large that, together with MAGs (Metagenomic Assembled Genomes), we now have UViGs (Uncultivated Viral Genomes), which are now officially recognized by the International Committee for the Taxonomy of Viruses (ICTV), and new taxonomic groups can now be created based exclusively on genomic sequence information. Even though the available tools have immensely contributed to our knowledge of phage diversity and ecology, the ongoing surge in software programs makes it challenging to keep up with them and the purpose each one is designed for. Therefore, in this review, we describe a comprehensive set of currently available computational tools designed for the characterization of phage genome sequences, focusing on five specific analyses: (i) assembly and identification of phage and prophage sequences, (ii) phage genome annotation, (iii) phage taxonomic classification, (iv) phage-host interaction analysis, and (v) phage microdiversity.
Collapse
Affiliation(s)
- Juan Sebastián Andrade-Martínez
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Carolina Camelo Valera
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Luis Alberto Chica Cárdenas
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Laura Forero-Junco
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Plant and Environmental Science, University of Copenhagen, Frederiksberg, Denmark
| | - Gamaliel López-Leal
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - J. Leonardo Moreno-Gallego
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- Department of Microbiome Science, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Guillermo Rangel-Pineros
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
25
|
Venhorst J, van der Vossen JMBM, Agamennone V. Battling Enteropathogenic Clostridia: Phage Therapy for Clostridioides difficile and Clostridium perfringens. Front Microbiol 2022; 13:891790. [PMID: 35770172 PMCID: PMC9234517 DOI: 10.3389/fmicb.2022.891790] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 04/19/2022] [Indexed: 12/17/2022] Open
Abstract
The pathogenic Clostridioides difficile and Clostridium perfringens are responsible for many health care-associated infections as well as systemic and enteric diseases. Therefore, they represent a major health threat to both humans and animals. Concerns regarding increasing antibiotic resistance (related to C. difficile and C. perfringens) have caused a surge in the pursual of novel strategies that effectively combat pathogenic infections, including those caused by both pathogenic species. The ban on antibiotic growth promoters in the poultry industry has added to the urgency of finding novel antimicrobial therapeutics for C. perfringens. These efforts have resulted in various therapeutics, of which bacteriophages (in short, phages) show much promise, as evidenced by the Eliava Phage Therapy Center in Tbilisi, Georgia (https://eptc.ge/). Bacteriophages are a type of virus that infect bacteria. In this review, the (clinical) impact of clostridium infections in intestinal diseases is recapitulated, followed by an analysis of the current knowledge and applicability of bacteriophages and phage-derived endolysins in this disease indication. Limitations of phage and phage endolysin therapy were identified and require considerations. These include phage stability in the gastrointestinal tract, influence on gut microbiota structure/function, phage resistance development, limited host range for specific pathogenic strains, phage involvement in horizontal gene transfer, and-for phage endolysins-endolysin resistance, -safety, and -immunogenicity. Methods to optimize features of these therapeutic modalities, such as mutagenesis and fusion proteins, are also addressed. The future success of phage and endolysin therapies require reliable clinical trial data for phage(-derived) products. Meanwhile, additional research efforts are essential to expand the potential of exploiting phages and their endolysins for mitigating the severe diseases caused by C. difficile and C. perfringens.
Collapse
Affiliation(s)
- Jennifer Venhorst
- Biomedical Health, Netherlands Organisation for Applied Scientific Research (TNO), Utrecht, Netherlands
| | - Jos M. B. M. van der Vossen
- Microbiology and Systems Biology, Netherlands Organisation for Applied Scientific Research (TNO), Zeist, Netherlands
| | - Valeria Agamennone
- Microbiology and Systems Biology, Netherlands Organisation for Applied Scientific Research (TNO), Zeist, Netherlands
| |
Collapse
|
26
|
Zhou F, Gan R, Zhang F, Ren C, Yu L, Si Y, Huang Z. PHISDetector: A Tool to Detect Diverse In Silico Phage-host Interaction Signals for Virome Studies. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:508-523. [PMID: 35272051 PMCID: PMC9801046 DOI: 10.1016/j.gpb.2022.02.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 12/22/2021] [Accepted: 02/28/2022] [Indexed: 01/26/2023]
Abstract
Phage-microbe interactions are appealing systems to study coevolution, and have also been increasingly emphasized due to their roles in human health, disease, and the development of novel therapeutics. Phage-microbe interactions leave diverse signals in bacterial and phage genomic sequences, defined as phage-host interaction signals (PHISs), which include clustered regularly interspaced short palindromic repeats (CRISPR) targeting, prophage, and protein-protein interaction signals. In the present study, we developed a novel tool phage-host interaction signal detector (PHISDetector) to predict phage-host interactions by detecting and integrating diverse in silico PHISs, and scoring the probability of phage-host interactions using machine learning models based on PHIS features. We evaluated the performance of PHISDetector on multiple benchmark datasets and application cases. When tested on a dataset of 758 annotated phage-host pairs, PHISDetector yields the prediction accuracies of 0.51 and 0.73 at the species and genus levels, respectively, outperforming other phage-host prediction tools. When applied to on 125,842 metagenomic viral contigs (mVCs) derived from 3042 geographically diverse samples, a detection rate of 54.54% could be achieved. Furthermore, PHISDetector could predict infecting phages for 85.6% of 368 multidrug-resistant (MDR) bacteria and 30% of 454 human gut bacteria obtained from the National Institutes of Health (NIH) Human Microbiome Project (HMP). The PHISDetector can be run either as a web server (http://www.microbiome-bigdata.com/PHISDetector/) for general users to study individual inputs or as a stand-alone version (https://github.com/HIT-ImmunologyLab/PHISDetector) to process massive phage contigs from virome studies.
Collapse
Affiliation(s)
- Fengxia Zhou
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Rui Gan
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Fan Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Chunyan Ren
- Department of Hematology/oncology, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Ling Yu
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Yu Si
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China
| | - Zhiwei Huang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150080, China,Corresponding author.
| |
Collapse
|
27
|
Hungaro HM, Vidigal PMP, do Nascimento EC, Gomes da Costa Oliveira F, Gontijo MTP, Lopez MES. Genomic Characterisation of UFJF_PfDIW6: A Novel Lytic Pseudomonas fluorescens-Phage with Potential for Biocontrol in the Dairy Industry. Viruses 2022; 14:v14030629. [PMID: 35337036 PMCID: PMC8951688 DOI: 10.3390/v14030629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/12/2022] [Accepted: 03/15/2022] [Indexed: 02/04/2023] Open
Abstract
In this study, we have presented the genomic characterisation of UFJF_PfDIW6, a novel lytic Pseudomonas fluorescens-phage with potential for biocontrol in the dairy industry. This phage showed a short linear double-stranded DNA genome (~42 kb) with a GC content of 58.3% and more than 50% of the genes encoding proteins with unknown functions. Nevertheless, UFJF_PfDIW6’s genome was organised into five functional modules: DNA packaging, structural proteins, DNA metabolism, lysogenic, and host lysis. Comparative genome analysis revealed that the UFJF_PfDIW6’s genome is distinct from other viral genomes available at NCBI databases, displaying maximum coverages of 5% among all alignments. Curiously, this phage showed higher sequence coverages (38–49%) when aligned with uncharacterised prophages integrated into Pseudomonas genomes. Phages compared in this study share conserved locally collinear blocks comprising genes of the modules’ DNA packing and structural proteins but were primarily differentiated by the composition of the DNA metabolism and lysogeny modules. Strategies for taxonomy assignment showed that UFJF_PfDIW6 was clustered into an unclassified genus in the Podoviridae clade. Therefore, our findings indicate that this phage could represent a novel genus belonging to the Podoviridae family.
Collapse
Affiliation(s)
- Humberto Moreira Hungaro
- Departamento de Ciências Farmacêuticas, Faculdade de Farmácia, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora 36036-900, MG, Brazil; (E.C.d.N.); (F.G.d.C.O.)
- Correspondence: (H.M.H.); (M.E.S.L.); Tel.: +55-32-2102-3804 (H.M.H.); +57-310-469-02-04 (M.E.S.L.)
| | - Pedro Marcus Pereira Vidigal
- Núcleo de Análise de Biomoléculas (NuBioMol), Campus da UFV, Universidade Federal de Viçosa (UFV), Viçosa 36570-900, MG, Brazil;
| | - Edilane Cristina do Nascimento
- Departamento de Ciências Farmacêuticas, Faculdade de Farmácia, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora 36036-900, MG, Brazil; (E.C.d.N.); (F.G.d.C.O.)
| | - Felipe Gomes da Costa Oliveira
- Departamento de Ciências Farmacêuticas, Faculdade de Farmácia, Universidade Federal de Juiz de Fora (UFJF), Juiz de Fora 36036-900, MG, Brazil; (E.C.d.N.); (F.G.d.C.O.)
| | - Marco Túlio Pardini Gontijo
- Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas (UNICAMP), Campinas 13083-872, SP, Brazil;
| | - Maryoris Elisa Soto Lopez
- Departamento de Engenharia de Alimentos, Universidade de Córdoba (UNICORDOBA), Córdoba 230002, Colombia
- Correspondence: (H.M.H.); (M.E.S.L.); Tel.: +55-32-2102-3804 (H.M.H.); +57-310-469-02-04 (M.E.S.L.)
| |
Collapse
|
28
|
In Vitro Demonstration of Targeted Phage Therapy and Competitive Exclusion as a Novel Strategy for Decolonization of Extended-Spectrum-Cephalosporin-Resistant Escherichia coli. Appl Environ Microbiol 2022; 88:e0227621. [PMID: 35254097 DOI: 10.1128/aem.02276-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Extended-spectrum cephalosporin-resistant (ESC-R) Escherichia coli have disseminated in food-producing animals globally, attributed to horizontal transmission of blaCTX-M variants, as seen in the InCI1-blaCTX-M-1 plasmid. This ease of transmission, coupled with its demonstrated long-term persistence, presents a significant One Health antimicrobial resistance (AMR) risk. Bacteriophage (phage) therapy is a potential strategy in eliminating ESC-R E. coli in food-producing animals; however, it is hindered by the development of phage-resistant bacteria and phage biosafety concerns. Another alternative to antimicrobials is probiotics, with this study demonstrating that AMR-free commensal E. coli, termed competitive exclusion clones (CECs), can be used to competitively exclude ESC-R E. coli. This study isolated and characterized phages that lysed E. coli clones harboring the InCI1-blaCTX-M-1 plasmid, before investigation of the effect and synergy of phage therapy and competitive exclusion as a novel strategy for decolonizing ESC-resistant E. coli. In vitro testing demonstrated superiority in the combined therapy, reducing and possibly eliminating ESC-R E. coli through phage-mediated lysis coupled with simultaneous prevention of regrowth of phage-resistant mutants due to competitive exclusion with the CEC. Further investigation into this combined therapy in vivo is warranted, with on-farm application possibly reducing ESC-R prevalence, while constricting newly emergent ESC-R E. coli outbreaks prior to their dissemination throughout food-producing animals or humans. IMPORTANCE The emergence and global dissemination of resistance toward critically important antimicrobials, including extended-spectrum cephalosporins in the livestock sector, deepens the One Health threat of antimicrobial resistance. This resistance has the potential to disseminate to humans, directly or indirectly, nullifying these last lines of defense in life-threatening human infections. This study explores a novel strategy, the coadministration of bacteriophages (phages) and a competitive exclusion clone (antimicrobial-susceptible commensal E. coli), to revert an antimicrobial-resistant population to a susceptible population. While phage therapy is vulnerable to the emergence of phage-resistant bacteria, no phage-resistant bacteria emerged when a competitive exclusion clone was used in combination with the phage. Novel strategies that reduce the prevalence and slow the dissemination of extended-spectrum cephalosporin-resistant E. coli in food-producing animals have the potential to extend the time frame in which antimicrobials remain available for effective use in animal and human health.
Collapse
|
29
|
Abstract
The field of metagenomics has rapidly expanded to become the go-to method for complex microbial community analyses. However, there is currently no straightforward route from metagenomics to traditional culture-based methods of strain isolation, particularly in (bacterio)phage biology, leading to an investigative bottleneck. Here, we describe a method that exploits specific phage receptor binding protein (RBP)-host cell surface receptor interaction enabling isolation of phage-host combinations from an environmental sample. The method was successfully applied to two complex sample types-a dairy-derived whey sample and an infant fecal sample, enabling retrieval of specific and culturable phage hosts. IMPORTANCE PhRACS aims to bridge the current divide between in silico genetic analyses (i.e., phageomic studies) and traditional culture-based methodology. Through the labeling of specific bacterial hosts with fluorescently tagged recombinant phage receptor binding proteins and the isolation of tagged cells using flow cytometry, PhRACS allows the full potential of phageomic data to be realized in the wet laboratory.
Collapse
|
30
|
Willenbücher K, Wibberg D, Huang L, Conrady M, Ramm P, Gätcke J, Busche T, Brandt C, Szewzyk U, Schlüter A, Barrero Canosa J, Maus I. Phage Genome Diversity in a Biogas-Producing Microbiome Analyzed by Illumina and Nanopore GridION Sequencing. Microorganisms 2022; 10:368. [PMID: 35208823 PMCID: PMC8879888 DOI: 10.3390/microorganisms10020368] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/02/2022] [Accepted: 02/03/2022] [Indexed: 11/16/2022] Open
Abstract
The microbial biogas network is complex and intertwined, and therefore relatively stable in its overall functionality. However, if key functional groups of microorganisms are affected by biotic or abiotic factors, the entire efficacy may be impaired. Bacteriophages are hypothesized to alter the steering process of the microbial network. In this study, an enriched fraction of virus-like particles was extracted from a mesophilic biogas reactor and sequenced on the Illumina MiSeq and Nanopore GridION sequencing platforms. Metagenome data analysis resulted in identifying 375 metagenome-assembled viral genomes (MAVGs). Two-thirds of the classified sequences were only assigned to the superkingdom Viruses and the remaining third to the family Siphoviridae, followed by Myoviridae, Podoviridae, Tectiviridae, and Inoviridae. The metavirome showed a close relationship to the phage genomes that infect members of the classes Clostridia and Bacilli. Using publicly available biogas metagenomic data, a fragment recruitment approach showed the widespread distribution of the MAVGs studied in other biogas microbiomes. In particular, phage sequences from mesophilic microbiomes were highly similar to the phage sequences of this study. Accordingly, the virus particle enrichment approach and metavirome sequencing provided additional genome sequence information for novel virome members, thus expanding the current knowledge of viral genetic diversity in biogas reactors.
Collapse
Affiliation(s)
- Katharina Willenbücher
- System Microbiology, Department Bioengineering, Leibniz Institute for Agricultural Engineering and Bioeconomy (ATB), Max-Eyth-Allee 100, 14469 Potsdam, Germany;
- Environmental Microbiology, Faculty of Process Sciences, Institute of Environmental Technology, Technische Universität Berlin, Ernst-Reuter-Platz 1, 10587 Berlin, Germany; (U.S.); (J.B.C.)
| | - Daniel Wibberg
- Center for Biotechnology (CeBiTec), Genome Research of Industrial Microorganisms, Bielefeld University, Universitätsstr. 27, 33615 Bielefeld, Germany; (D.W.); (T.B.); (A.S.)
| | - Liren Huang
- Faculty of Technology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany;
| | - Marius Conrady
- Institute of Agricultural and Urban Ecological Projects, Berlin Humboldt University (IASP), Philippstr. 13, 10115 Berlin, Germany; (M.C.); (P.R.)
| | - Patrice Ramm
- Institute of Agricultural and Urban Ecological Projects, Berlin Humboldt University (IASP), Philippstr. 13, 10115 Berlin, Germany; (M.C.); (P.R.)
| | - Julia Gätcke
- Biophysics of Photosynthesis, Institute for Biology, Humboldt-Universität zu Berlin, Philippstrasse 13, 10115 Berlin, Germany;
| | - Tobias Busche
- Center for Biotechnology (CeBiTec), Genome Research of Industrial Microorganisms, Bielefeld University, Universitätsstr. 27, 33615 Bielefeld, Germany; (D.W.); (T.B.); (A.S.)
| | - Christian Brandt
- Institute for Infection Medicine and Hospital Hygiene, University Hospital Jena, Kastanienstraße 1, 07747 Jena, Germany;
| | - Ulrich Szewzyk
- Environmental Microbiology, Faculty of Process Sciences, Institute of Environmental Technology, Technische Universität Berlin, Ernst-Reuter-Platz 1, 10587 Berlin, Germany; (U.S.); (J.B.C.)
| | - Andreas Schlüter
- Center for Biotechnology (CeBiTec), Genome Research of Industrial Microorganisms, Bielefeld University, Universitätsstr. 27, 33615 Bielefeld, Germany; (D.W.); (T.B.); (A.S.)
| | - Jimena Barrero Canosa
- Environmental Microbiology, Faculty of Process Sciences, Institute of Environmental Technology, Technische Universität Berlin, Ernst-Reuter-Platz 1, 10587 Berlin, Germany; (U.S.); (J.B.C.)
| | - Irena Maus
- Center for Biotechnology (CeBiTec), Genome Research of Industrial Microorganisms, Bielefeld University, Universitätsstr. 27, 33615 Bielefeld, Germany; (D.W.); (T.B.); (A.S.)
| |
Collapse
|
31
|
Versoza CJ, Pfeifer SP. Computational Prediction of Bacteriophage Host Ranges. Microorganisms 2022; 10:149. [PMID: 35056598 PMCID: PMC8778386 DOI: 10.3390/microorganisms10010149] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/06/2022] [Accepted: 01/11/2022] [Indexed: 12/27/2022] Open
Abstract
Increased antibiotic resistance has prompted the development of bacteriophage agents for a multitude of applications in agriculture, biotechnology, and medicine. A key factor in the choice of agents for these applications is the host range of a bacteriophage, i.e., the bacterial genera, species, and strains a bacteriophage is able to infect. Although experimental explorations of host ranges remain the gold standard, such investigations are inherently limited to a small number of viruses and bacteria amendable to cultivation. Here, we review recently developed bioinformatic tools that offer a promising and high-throughput alternative by computationally predicting the putative host ranges of bacteriophages, including those challenging to grow in laboratory environments.
Collapse
Affiliation(s)
- Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA;
| | - Susanne P. Pfeifer
- Center for Mechanisms of Evolution, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
32
|
Menor-Flores M, Vega-Rodríguez MA, Molina F. Computational design of phage cocktails based on phage-bacteria infection networks. Comput Biol Med 2022; 142:105186. [PMID: 34998221 DOI: 10.1016/j.compbiomed.2021.105186] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 12/22/2021] [Accepted: 12/26/2021] [Indexed: 01/16/2023]
Abstract
The misuse and overuse of antibiotics have boosted the proliferation of multidrug-resistant (MDR) bacteria, which are considered a major public health issue in the twenty-first century. Phage therapy may be a promising way in the treatment of infections caused by MDR pathogens, without the side effects of the current available antimicrobials. Phage therapy is based on phage cocktails, that is, combinations of phages able to lyse the target bacteria. In this work, we present and explain in detail two innovative computational methods to design phage cocktails taking into account a given phage-bacteria infection network. One of the methods (Exhaustive Search) always generates the best possible phage cocktail, while the other method (Network Metrics) always keeps a very reduced runtime (a few milliseconds). Both methods have been included in a Cytoscape application that is available for any user. A complete experimental study has been performed, evaluating and comparing the biological quality, runtime, and the impact when additional phages are included in the cocktail.
Collapse
Affiliation(s)
- Manuel Menor-Flores
- Escuela Politécnica, Universidad de Extremadura(1), Avda. de la Universidad s/n, 10 003, Cáceres, Spain.
| | - Miguel A Vega-Rodríguez
- Escuela Politécnica, Universidad de Extremadura(1), Avda. de la Universidad s/n, 10 003, Cáceres, Spain.
| | - Felipe Molina
- Facultad de Ciencias, Universidad de Extremadura(1), Avda. de Elvas s/n, 06 006, Badajoz, Spain.
| |
Collapse
|
33
|
Zielezinski A, Barylski J, Karlowski WM. Taxonomy-aware, sequence similarity ranking reliably predicts phage-host relationships. BMC Biol 2021; 19:223. [PMID: 34625070 PMCID: PMC8501573 DOI: 10.1186/s12915-021-01146-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 09/06/2021] [Indexed: 12/02/2022] Open
Abstract
Background Characterizing phage–host interactions is critical to understanding the ecological role of both partners and effective isolation of phage therapeuticals. Unfortunately, experimental methods for studying these interactions are markedly slow, low-throughput, and unsuitable for phages or hosts difficult to maintain in laboratory conditions. Therefore, a number of in silico methods emerged to predict prokaryotic hosts based on viral sequences. One of the leading approaches is the application of the BLAST tool that searches for local similarities between viral and microbial genomes. However, this prediction method has three major limitations: (i) top-scoring sequences do not always point to the actual host; (ii) mosaic virus genomes may match to many, typically related, bacteria; and (iii) viral and host sequences may diverge beyond the point where their relationship can be detected by a BLAST alignment. Results We created an extension to BLAST, named Phirbo, that improves host prediction quality beyond what is obtainable from standard BLAST searches. The tool harnesses information concerning sequence similarity and bacteria relatedness to predict phage–host interactions. Phirbo was evaluated on three benchmark sets of known virus–host pairs, and it improved precision and recall by 11–40 percentage points over currently available, state-of-the-art, alignment-based, alignment-free, and machine-learning host prediction tools. Moreover, the discriminatory power of Phirbo for the recognition of virus–host relationships surpassed the results of other tools by at least 10 percentage points (area under the curve = 0.95), yielding a mean host prediction accuracy of 57% and 68% at the genus and family levels, respectively, and drops by 12 percentage points when using only a fraction of viral genome sequences (3 kb). Finally, we provide insights into a repertoire of protein and ncRNA genes that are shared between phages and hosts and may be prone to horizontal transfer during infection. Conclusions Our results suggest that Phirbo is a simple and effective tool for predicting phage–host relationships. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-01146-6.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| | - Jakub Barylski
- Molecular Virology Research Unit, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
34
|
Ruohan W, Xianglilan Z, Jianping W, Shuai Cheng LI. DeepHost: phage host prediction with convolutional neural network. Brief Bioinform 2021; 23:6374063. [PMID: 34553750 DOI: 10.1093/bib/bbab385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/10/2021] [Accepted: 08/27/2021] [Indexed: 01/21/2023] Open
Abstract
Next-generation sequencing expands the known phage genomes rapidly. Unlike culture-based methods, the hosts of phages discovered from next-generation sequencing data remain uncharacterized. The high diversity of the phage genomes makes the host assignment task challenging. To solve the issue, we proposed a phage host prediction tool-DeepHost. To encode the phage genomes into matrices, we design a genome encoding method that applied various spaced $k$-mer pairs to tolerate sequence variations, including insertion, deletions, and mutations. DeepHost applies a convolutional neural network to predict host taxonomies. DeepHost achieves the prediction accuracy of 96.05% at the genus level (72 taxonomies) and 90.78% at the species level (118 taxonomies), which outperforms the existing phage host prediction tools by 10.16-30.48% and achieves comparable results to BLAST. For the genomes without hits in BLAST, DeepHost obtains the accuracy of 38.00% at the genus level and 26.47% at the species level, making it suitable for genomes of less homologous sequences with the existing datasets. DeepHost is alignment-free, and it is faster than BLAST, especially for large datasets. DeepHost is available at https://github.com/deepomicslab/DeepHost.
Collapse
Affiliation(s)
- Wang Ruohan
- Department of Computer Science at City University of Hong Kong
| | - Zhang Xianglilan
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology
| | - Wang Jianping
- Department of Computer Science at City University of Hong Kong
| | - L I Shuai Cheng
- Department of Computer Science at City University of Hong Kong
| |
Collapse
|
35
|
Sørensen AN, Woudstra C, Sørensen MCH, Brøndsted L. Subtypes of tail spike proteins predicts the host range of Ackermannviridae phages. Comput Struct Biotechnol J 2021; 19:4854-4867. [PMID: 34527194 PMCID: PMC8432352 DOI: 10.1016/j.csbj.2021.08.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 12/01/2022] Open
Abstract
Phages belonging to the Ackermannviridae family encode up to four tail spike proteins (TSPs), each recognizing a specific receptor of their bacterial hosts. Here, we determined the TSPs diversity of 99 Ackermannviridae phages by performing a comprehensive in silico analysis. Based on sequence diversity, we assigned all TSPs into distinctive subtypes of TSP1, TSP2, TSP3 and TSP4, and found each TSP subtype to be specifically associated with the genera (Kuttervirus, Agtrevirus, Limestonevirus, Taipeivirus) of the Ackermannviridae family. Further analysis showed that the N-terminal XD1 and XD2 domains in TSP2 and TSP4, hinging the four TSPs together, are preserved. In contrast, the C-terminal receptor binding modules were only conserved within TSP subtypes, except for some Kuttervirus TSP1s and TSP3s that were similar to specific TSP4s. A conserved motif in TSP1, TSP3 and TSP4 of Kuttervirus phages may allow recombination between receptor binding modules, thus altering host recognition. The receptors for numerous uncharacterized phages expressing TSPs in the same subtypes were predicted using previous host range data. To validate our predictions, we experimentally determined the host recognition of three of the four TSPs expressed by kuttervirus S117. We confirmed that S117 TSP1 and TSP2 bind to their predicted host receptors, and identified the receptor for TSP3, which is shared by 51 other Kuttervirus phages. Kuttervirus phages were thus shown encode a vast genetic diversity of potentially exchangeable TSPs influencing host recognition. Overall, our study demonstrates that comprehensive in silico and host range analysis of TSPs can predict host recognition of Ackermannviridae phages.
Collapse
Key Words
- ANI, Average nucleotide identity
- Ackermannviridae family
- Bacteriophage
- CPS, Capsular polysaccharide
- EOP, Efficiency of plating
- Escherichia coli O:157
- Host range
- LB, Luria-Bertani
- LPS, Lipopolysaccharide
- NCBI, National Center for Biotechnology Information
- O-antigen
- ORF, Open reading frame
- PFU, Plaque formation unit
- RBP, Receptor binding protein
- Receptor-binding proteins
- Salmonella
- TSP, Tail spike protein
- Tail spike proteins
- VriC, Virulence-associated protein
Collapse
Affiliation(s)
- Anders Nørgaard Sørensen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Cedric Woudstra
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Martine C Holst Sørensen
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| | - Lone Brøndsted
- Department of Veterinary and Animal Sciences, University of Copenhagen, Stigbøjlen 4, 1870 Frederiksberg C, Denmark
| |
Collapse
|
36
|
Wu S, Fang Z, Tan J, Li M, Wang C, Guo Q, Xu C, Jiang X, Zhu H. DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach. Gigascience 2021; 10:giab056. [PMID: 34498685 PMCID: PMC8427542 DOI: 10.1093/gigascience/giab056] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment. FINDINGS DeePhage uses a "one-hot" encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. CONCLUSIONS DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.
Collapse
Affiliation(s)
- Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Mo Li
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Chunhui Wang
- Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, Beijing, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Congmin Xu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing 100871, Beijing, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University,
GA 30332, Atlanta, USA
- Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, Beijing, China
| |
Collapse
|
37
|
Li M, Wang Y, Li F, Zhao Y, Liu M, Zhang S, Bin Y, Smith AI, Webb GI, Li J, Song J, Xia J. A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1801-1810. [PMID: 32813660 PMCID: PMC8703204 DOI: 10.1109/tcbb.2020.3017386] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.
Collapse
|
38
|
Li M, Zhang W. PHIAF: prediction of phage-host interactions with GAN-based data augmentation and sequence-based feature fusion. Brief Bioinform 2021; 23:6362109. [PMID: 34472593 DOI: 10.1093/bib/bbab348] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 07/05/2021] [Accepted: 07/18/2021] [Indexed: 01/01/2023] Open
Abstract
Phage therapy has become one of the most promising alternatives to antibiotics in the treatment of bacterial diseases, and identifying phage-host interactions (PHIs) helps to understand the possible mechanism through which a phage infects bacteria to guide the development of phage therapy. Compared with wet experiments, computational methods of identifying PHIs can reduce costs and save time and are more effective and economic. In this paper, we propose a PHI prediction method with a generative adversarial network (GAN)-based data augmentation and sequence-based feature fusion (PHIAF). First, PHIAF applies a GAN-based data augmentation module, which generates pseudo PHIs to alleviate the data scarcity. Second, PHIAF fuses the features originated from DNA and protein sequences for better performance. Third, PHIAF utilizes an attention mechanism to consider different contributions of DNA/protein sequence-derived features, which also provides interpretability of the prediction model. In computational experiments, PHIAF outperforms other state-of-the-art PHI prediction methods when evaluated via 5-fold cross-validation (AUC and AUPR are 0.88 and 0.86, respectively). An ablation study shows that data augmentation, feature fusion and an attention mechanism are all beneficial to improve the prediction performance of PHIAF. Additionally, four new PHIs with the highest PHIAF score in the case study were verified by recent literature. In conclusion, PHIAF is a promising tool to accelerate the exploration of phage therapy.
Collapse
Affiliation(s)
- Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
39
|
Guo Q, Li M, Wang C, Guo J, Jiang X, Tan J, Wu S, Wang P, Xiao T, Zhou M, Fang Z, Xiao Y, Zhu H. Predicting hosts based on early SARS-CoV-2 samples and analyzing the 2020 pandemic. Sci Rep 2021; 11:17422. [PMID: 34465838 PMCID: PMC8408148 DOI: 10.1038/s41598-021-96903-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 08/18/2021] [Indexed: 11/16/2022] Open
Abstract
The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF's computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.
Collapse
Affiliation(s)
- Qian Guo
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, 30332, USA
| | - Mo Li
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program, School of Life Sciences, Peking University, Beijing, 100871, China
| | - Chunhui Wang
- Peking University-Tsinghua University-National Institute of Biological Sciences (PTN) Joint PhD Program, School of Life Sciences, Peking University, Beijing, 100871, China
| | - Jinyuan Guo
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, 30332, USA
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
- Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China
| | - Jie Tan
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
| | - Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Peihong Wang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
| | - Tingting Xiao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, 310006, China
| | - Man Zhou
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Yonghong Xiao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, 310006, China.
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
- Center for Quantitative Biology, Peking University, Beijing, 100871, China.
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, 30332, USA.
- Institute of Medical Technology, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
40
|
Tan J, Fang Z, Wu S, Guo Q, Jiang X, Zhu H. HoPhage: an ab initio tool for identifying hosts of phage fragments from metaviromes. Bioinformatics 2021; 38:543-545. [PMID: 34383025 PMCID: PMC8723153 DOI: 10.1093/bioinformatics/btab585] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 07/27/2021] [Accepted: 08/10/2021] [Indexed: 02/03/2023] Open
Abstract
SUMMARY We present HoPhage (Host of Phage) to identify the host of a given phage fragment from metavirome data at the genus level. HoPhage integrates two modules using a deep learning algorithm and a Markov chain model, respectively. HoPhage achieves 47.90% and 82.47% mean accuracy at the genus and phylum levels for ∼1-kb long artificial phage fragments when predicting host among 50 genera, representing 7.54-20.22% and 13.55-24.31% improvement, respectively. By testing on three real virome samples, HoPhage yields 81.11% mean accuracy at the genus level within a much broader candidate host range. AVAILABILITY AND IMPLEMENTATION HoPhage is available at http://cqb.pku.edu.cn/ZhuLab/HoPhage/data/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jie Tan
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Zhencheng Fang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Shufang Wu
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Qian Guo
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | - Xiaoqing Jiang
- State Key Laboratory for Turbulence and Complex Systems, Department of Biomedical Engineering, College of Engineering and Center for Quantitative Biology, Peking University, Beijing 100871, China
| | | |
Collapse
|
41
|
Abstract
Viruses are the most abundant biological entity on Earth, infect cellular organisms from all domains of life, and are central players in the global biosphere. Over the last century, the discovery and characterization of viruses have progressed steadily alongside much of modern biology. In terms of outright numbers of novel viruses discovered, however, the last few years have been by far the most transformative for the field. Advances in methods for identifying viral sequences in genomic and metagenomic datasets, coupled to the exponential growth of environmental sequencing, have greatly expanded the catalog of known viruses and fueled the tremendous growth of viral sequence databases. Development and implementation of new standards, along with careful study of the newly discovered viruses, have transformed and will continue to transform our understanding of microbial evolution, ecology, and biogeochemical cycles, leading to new biotechnological innovations across many diverse fields, including environmental, agricultural, and biomedical sciences.
Collapse
Affiliation(s)
- Lee Call
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Stephen Nayfach
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; ,
| |
Collapse
|
42
|
Nami Y, Imeni N, Panahi B. Application of machine learning in bacteriophage research. BMC Microbiol 2021; 21:193. [PMID: 34174831 PMCID: PMC8235560 DOI: 10.1186/s12866-021-02256-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Accepted: 06/08/2021] [Indexed: 12/20/2022] Open
Abstract
Phages are one of the key components in the structure, dynamics, and interactions of microbial communities in different bins. It has a clear impact on human health and the food industry. Bacteriophage characterization using in vitro approaches are time/cost consuming and laborious tasks. On the other hand, with the advent of new high-throughput sequencing technology, the development of a powerful computational framework to characterize the newly identified bacteriophages is inevitable for future research. Machine learning includes powerful techniques that enable the analysis of complex datasets for knowledge discovery and pattern recognition. In this study, we have conducted a comprehensive review of machine learning methods application using different types of features were applied in various aspects of bacteriophage research including, automated curation, identification, classification, host species recognition, virion protein identification, and life cycle prediction. Moreover, potential limitations and advantages of the developed frameworks were discussed.
Collapse
Affiliation(s)
- Yousef Nami
- Department of Food Biotechnology, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran
| | - Nazila Imeni
- Young Researchers and Elite Clube, Marand Branch, Islamic Azad University, Marand, Iran
| | - Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Tabriz, Iran.
| |
Collapse
|
43
|
Coutinho FH, Zaragoza-Solas A, López-Pérez M, Barylski J, Zielezinski A, Dutilh BE, Edwards R, Rodriguez-Valera F. RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content. PATTERNS 2021; 2:100274. [PMID: 34286299 PMCID: PMC8276007 DOI: 10.1016/j.patter.2021.100274] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 11/23/2020] [Accepted: 05/07/2021] [Indexed: 02/06/2023]
Abstract
Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), that uses scores to 43,644 protein clusters to assign hosts to complete or fragmented genomes of viruses of Archaea and Bacteria. RaFAH displayed performance comparable with that of other methods for virus-host prediction in three different benchmarks encompassing viruses from RefSeq, single amplified genomes, and metagenomes. RaFAH was applied to assembled metagenomic datasets of uncultured viruses from eight different biomes of medical, biotechnological, and environmental relevance. Our analyses led to the identification of 537 sequences of archaeal viruses representing unknown lineages, whose genomes encode novel auxiliary metabolic genes, shedding light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/. RaFAH was developed to predict the hosts of viruses of Bacteria and Archaea RaFAH displayed comparable or superior performance to other host-prediction tools RaFAH performed well across viromes from eight different ecosystems RaFAH identified hundreds of genomic sequences as derived from viruses of Archaea
Viruses that infect Bacteria and Archaea are ubiquitous and extremely abundant. Recent advances have led to the discovery of many thousands of complete and partial genomes of these biological entities. Understanding the biology of these viruses and how they influence their ecosystems depends on knowing which hosts they infect. We developed a tool that uses data from complete or fragmented genomes to predict the hosts of viruses using a machine-learning approach. Our tool, RaFAH, displayed performance comparable with or superior to that of other host-prediction tools. In addition, it identified hundreds of sequences as derived from the genomes of viruses of Archaea, which are one of the least characterized fractions of the global virosphere.
Collapse
Affiliation(s)
- Felipe Hernandes Coutinho
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Asier Zaragoza-Solas
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Mario López-Pérez
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain
| | - Jakub Barylski
- Molecular Virology Research Unit, Faculty of Biology, Adam Mickiewicz University Poznan, 61-614 Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, 61-614 Poznan, Poland
| | - Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Centre/Radboud Institute for Molecular Life Sciences, 6525 GA Nijmegen, the Netherlands.,Theoretical Biology and Bioinformatics, Science for Life, Utrecht University (UU), 3584 CH Utrecht, the Netherlands
| | - Robert Edwards
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
| | - Francisco Rodriguez-Valera
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, Aptdo. 18., Ctra. Alicante-Valencia N-332, s/n, San Juan de Alicante, 03550 Alicante, Spain.,Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| |
Collapse
|
44
|
Pratama AA, Bolduc B, Zayed AA, Zhong ZP, Guo J, Vik DR, Gazitúa MC, Wainaina JM, Roux S, Sullivan MB. Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation. PeerJ 2021; 9:e11447. [PMID: 34178438 PMCID: PMC8210812 DOI: 10.7717/peerj.11447] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 04/22/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). RESULTS The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. CONCLUSION Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses 'hidden' in diverse sequence datasets.
Collapse
Affiliation(s)
- Akbar Adjie Pratama
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Benjamin Bolduc
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Ahmed A. Zayed
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Zhi-Ping Zhong
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, United States of America
| | - Jiarong Guo
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | - Dean R. Vik
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
| | | | - James M. Wainaina
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Infectious Diseases Institute at The Ohio State University, Ohio State University, Columbus, OH, United States of America
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Matthew B. Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, United States of America
- Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America
- Environmental and Geodetic Engineering, Ohio State University, Department of Civil, Columbus, OH, United States of America
| |
Collapse
|
45
|
Global overview and major challenges of host prediction methods for uncultivated phages. Curr Opin Virol 2021; 49:117-126. [PMID: 34126465 DOI: 10.1016/j.coviro.2021.05.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 05/20/2021] [Accepted: 05/22/2021] [Indexed: 12/14/2022]
Abstract
Bacterial communities play critical roles across all of Earth's biomes, affecting human health and global ecosystem functioning. They do so under strong constraints exerted by viruses, that is, bacteriophages or 'phages'. Phages can reshape bacterial communities' structure, influence long-term evolution of bacterial populations, and alter host cell metabolism during infection. Metagenomics approaches, that is, shotgun sequencing of environmental DNA or RNA, recently enabled large-scale exploration of phage genomic diversity, yielding several millions of phage genomes now to be further analyzed and characterized. One major challenge however is the lack of direct host information for these phages. Several methods and tools have been proposed to bioinformatically predict the potential host(s) of uncultivated phages based only on genome sequence information. Here we review these different approaches and highlight their distinct strengths and limitations. We also outline complementary experimental assays which are being proposed to validate and refine these bioinformatic predictions.
Collapse
|
46
|
Gabashvili E, Kobakhidze S, Koulouris S, Robinson T, Kotetishvili M. Bi- and Multi-directional Gene Transfer in the Natural Populations of Polyvalent Bacteriophages, and Their Host Species Spectrum Representing Foodborne Versus Other Human and/or Animal Pathogens. FOOD AND ENVIRONMENTAL VIROLOGY 2021; 13:179-202. [PMID: 33484405 DOI: 10.1007/s12560-021-09460-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/06/2021] [Indexed: 06/12/2023]
Abstract
Unraveling the trends of phage-host versus phage-phage coevolution is critical for avoiding possible undesirable outcomes from the use of phage preparations intended for therapeutic, food safety or environmental safety purposes. We aimed to investigate a phenomenon of intergeneric recombination and its trajectories across the natural populations of phages predominantly linked to foodborne pathogens. The results from the recombination analyses, using a large array of the recombination detection algorithms imbedded in SplitsTree, RDP4, and Simplot software packages, provided strong evidence (fit: 100; P ≤ 0.014) for both bi- and multi-directional intergeneric recombination of the genetic loci involved collectively in phage morphogenesis, host specificity, virulence, replication, and persistence. Intergeneric recombination was determined to occur not only among conspecifics of the virulent versus temperate phages but also between the phages with these different lifestyles. The recombining polyvalent phages were suggested to interact with fairly large host species networks, including sometimes genetically very distinct species, such as e.g., Salmonella enterica and/or Escherichia coli versus Staphylococcus aureus or Yersinia pestis. Further studies are needed to understand whether phage-driven intergeneric recombination can lead to undesirable changes of intestinal and other microbiota in humans and animals.
Collapse
Affiliation(s)
- Ekaterine Gabashvili
- School of Natural Sciences and Medicine, Ilia State University, 1 Giorgi Tsereteli exit, 0162, Tbilisi, Georgia
- Division of Risk Assessment, Scientific-Research Center of Agriculture, 6 Marshal Gelovani ave., 0159, Tbilisi, Georgia
| | - Saba Kobakhidze
- Division of Risk Assessment, Scientific-Research Center of Agriculture, 6 Marshal Gelovani ave., 0159, Tbilisi, Georgia
| | - Stylianos Koulouris
- Engagement and Cooperation Unit, European Food Safety Authority, Via Carlo Magno 1A, 43126, Parma, Italy
| | - Tobin Robinson
- Scientific Committee, and Emerging Risks Unit, European Food Safety Authority, Via Carlo Magno 1A, 43126, Parma, Italy
| | - Mamuka Kotetishvili
- Division of Risk Assessment, Scientific-Research Center of Agriculture, 6 Marshal Gelovani ave., 0159, Tbilisi, Georgia.
- Hygiene and Medical Ecology, G. Natadze Scientific-Research Institute of Sanitation, 78 D. Uznadze St., 0102, Tbilisi, Georgia.
| |
Collapse
|
47
|
Abdelsattar AS, Dawoud A, Makky S, Nofal R, Aziz RK, El-Shibiny A. Bacteriophages: from isolation to application. Curr Pharm Biotechnol 2021; 23:337-360. [PMID: 33902418 DOI: 10.2174/1389201022666210426092002] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 01/29/2021] [Accepted: 03/11/2021] [Indexed: 11/22/2022]
Abstract
Bacteriophages are considered as a potential alternative to fight pathogenic bacteria during the antibiotic resistance era. With their high specificity, they are being widely used in various applications: medicine, food industry, agriculture, animal farms, biotechnology, diagnosis, etc. Many techniques have been designed by different researchers for phage isolation, purification, and amplification, each of which has strengths and weaknesses. However, all aim at having a reasonably pure phage sample that can be further characterized. Phages can be characterized based on their physiological, morphological or inactivation tests. Microscopy, in particular, has opened a wide gate not only for visualizing phage morphological structure, but also for monitoring biochemistry and behavior. Meanwhile, computational analysis of phage genomes provides more details about phage history, lifestyle, and potential for toxigenic or lysogenic conversion, which translate to safety in biocontrol and phage therapy applications. This review summarizes phage application pipelines at different levels and addresses specific restrictions and knowledge gaps in the field. Recently developed computational approaches, which are used in phage genome analysis, are critically assessed. We hope that this assessment provides researchers with useful insights for selection of suitable approaches for Phage-related research aims and applications.
Collapse
Affiliation(s)
- Abdallah S Abdelsattar
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Alyaa Dawoud
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Salsabil Makky
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Rana Nofal
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| | - Ramy K Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Qasr El-Ainy St, Cairo. Egypt
| | - Ayman El-Shibiny
- Center for Microbiology and Phage Therapy, Zewail City of Science and Technology, October Gardens, 6th of October City, Giza, 12578. Egypt
| |
Collapse
|
48
|
Moon K, Cho JC. Metaviromics coupled with phage-host identification to open the viral 'black box'. J Microbiol 2021; 59:311-323. [PMID: 33624268 DOI: 10.1007/s12275-021-1016-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/28/2021] [Accepted: 01/28/2021] [Indexed: 12/22/2022]
Abstract
Viruses are found in almost all biomes on Earth, with bacteriophages (phages) accounting for the majority of viral particles in most ecosystems. Phages have been isolated from natural environments using the plaque assay and liquid medium-based dilution culturing. However, phage cultivation is restricted by the current limitations in the number of culturable bacterial strains. Unlike prokaryotes, which possess universally conserved 16S rRNA genes, phages lack universal marker genes for viral taxonomy, thus restricting cultureindependent analyses of viral diversity. To circumvent these limitations, shotgun viral metagenome sequencing (i.e., metaviromics) has been developed to enable the extensive sequencing of a variety of viral particles present in the environment and is now widely used. Using metaviromics, numerous studies on viral communities have been conducted in oceans, lakes, rivers, and soils, resulting in many novel phage sequences. Furthermore, auxiliary metabolic genes such as ammonic monooxygenase C and β-lactamase have been discovered in viral contigs assembled from viral metagenomes. Current attempts to identify putative bacterial hosts of viral metagenome sequences based on sequence homology have been limited due to viral sequence variations. Therefore, culture-independent approaches have been developed to predict bacterial hosts using single-cell genomics and fluorescentlabeling. This review focuses on recent viral metagenome studies conducted in natural environments, especially in aquatic ecosystems, and their contributions to phage ecology. Here, we concluded that although metaviromics is a key tool for the study of viral ecology, this approach must be supplemented with phage-host identification, which in turn requires the cultivation of phage-bacteria systems.
Collapse
Affiliation(s)
- Kira Moon
- Biological Resources Utilization Division, Honam National Institute of Biological Resources, Mokpo, 58762, Republic of Korea
| | - Jang-Cheon Cho
- Department of Biological Sciences and Bioengineering, Inha University, Incheon, 22212, Republic of Korea.
| |
Collapse
|
49
|
Boeckaerts D, Stock M, Criel B, Gerstmans H, De Baets B, Briers Y. Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins. Sci Rep 2021; 11:1467. [PMID: 33446856 PMCID: PMC7809048 DOI: 10.1038/s41598-021-81063-4] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 12/30/2020] [Indexed: 12/04/2022] Open
Abstract
Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.
Collapse
Affiliation(s)
- Dimitri Boeckaerts
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Michiel Stock
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Bjorn Criel
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
| | - Hans Gerstmans
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium
- Laboratory of Gene Technology, Department of Biosystems, KU Leuven, Leuven, Belgium
- MeBioS-Biosensors group, Department of BioSystems, KU Leuven, Leuven, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yves Briers
- Laboratory of Applied Biotechnology, Department of Biotechnology, Ghent University, Ghent, Belgium.
| |
Collapse
|
50
|
Lu C, Zhang Z, Cai Z, Zhu Z, Qiu Y, Wu A, Jiang T, Zheng H, Peng Y. Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics. BMC Biol 2021; 19:5. [PMID: 33441133 PMCID: PMC7807511 DOI: 10.1186/s12915-020-00938-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 12/09/2020] [Indexed: 12/19/2022] Open
Abstract
Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.
Collapse
Affiliation(s)
- Congyu Lu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zheng Zhang
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zena Cai
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Zhaozhong Zhu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Ye Qiu
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Aiping Wu
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.,Suzhou Institute of Systems Medicine, Suzhou, 215123, Jiangsu, China
| | - Taijiao Jiang
- Center for Systems Medicine, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100005, China.,Suzhou Institute of Systems Medicine, Suzhou, 215123, Jiangsu, China
| | - Heping Zheng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China
| | - Yousong Peng
- Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, China.
| |
Collapse
|