1
|
Breimann S, Frishman D. AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. BIOINFORMATICS ADVANCES 2024; 4:vbae165. [PMID: 39544628 PMCID: PMC11562964 DOI: 10.1093/bioadv/vbae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/10/2024] [Accepted: 10/23/2024] [Indexed: 11/17/2024]
Abstract
Summary Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications. Availability and implementation The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
- Division of Metabolic Biochemistry, Biomedical Center (BMC), LMU Munich, Munich, 81377, Germany
- Biochemistry of γ-Secretase, German Center for Neurodegenerative Diseases (DZNE), Munich, 81377, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
| |
Collapse
|
2
|
Zhou F, Yang H, Si Y, Gan R, Yu L, Chen C, Ren C, Wu J, Zhang F. PhageTailFinder: A tool for phage tail module detection and annotation. Front Genet 2023; 14:947466. [PMID: 36755570 PMCID: PMC9901426 DOI: 10.3389/fgene.2023.947466] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 01/05/2023] [Indexed: 01/24/2023] Open
Abstract
Decades of overconsumption of antimicrobials in the treatment and prevention of bacterial infections have resulted in the increasing emergence of drug-resistant bacteria, which poses a significant challenge to public health, driving the urgent need to find alternatives to conventional antibiotics. Bacteriophages are viruses infecting specific bacterial hosts, often destroying the infected bacterial hosts. Phages attach to and enter their potential hosts using their tail proteins, with the composition of the tail determining the range of potentially infected bacteria. To aid the exploitation of bacteriophages for therapeutic purposes, we developed the PhageTailFinder algorithm to predict tail-related proteins and identify the putative tail module in previously uncharacterized phages. The PhageTailFinder relies on a two-state hidden Markov model (HMM) to predict the probability of a given protein being tail-related. The process takes into account the natural modularity of phage tail-related proteins, rather than simply considering amino acid properties or secondary structures for each protein in isolation. The PhageTailFinder exhibited robust predictive power for phage tail proteins in novel phages due to this sequence-independent operation. The performance of the prediction model was evaluated in 13 extensively studied phages and a sample of 992 complete phages from the NCBI database. The algorithm achieved a high true-positive prediction rate (>80%) in over half (571) of the studied phages, and the ROC value was 0.877 using general models and 0.968 using corresponding morphologic models. It is notable that the median ROC value of 992 complete phages is more than 0.75 even for novel phages, indicating the high accuracy and specificity of the PhageTailFinder. When applied to a dataset containing 189,680 viral genomes derived from 11,810 bulk metagenomic human stool samples, the ROC value was 0.895. In addition, tail protein clusters could be identified for further studies by density-based spatial clustering of applications with the noise algorithm (DBSCAN). The developed PhageTailFinder tool can be accessed either as a web server (http://www.microbiome-bigdata.com/PHISDetector/index/tools/PhageTailFinder) or as a stand-alone program on a standard desktop computer (https://github.com/HIT-ImmunologyLab/PhageTailFinder).
Collapse
Affiliation(s)
- Fengxia Zhou
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Han Yang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yu Si
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Rui Gan
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ling Yu
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chuangeng Chen
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyan Ren
- Department of Hematology, Department of Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| | - Jiqiu Wu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Fan Zhang
- HIT Center for Life Sciences, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| |
Collapse
|
3
|
Microbiome-phage interactions in inflammatory bowel disease. Clin Microbiol Infect 2022:S1198-743X(22)00506-7. [PMID: 36191844 DOI: 10.1016/j.cmi.2022.08.027] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/23/2022] [Accepted: 08/29/2022] [Indexed: 11/22/2022]
Abstract
BACKGROUND Inflammatory bowel diseases (IBD) constitute a group of auto-inflammatory disorders impacting the gastrointestinal tract and other systemic organs. The gut microbiome contributes to IBD pathology through multiple mechanisms. Bacteriophages (hence termed phages) are viruses that are able to specifically infect bacteria. Considered as part of the gut microbiome, phages may impact bacterial community structure in various clinical contexts. Additionally, exogenous phage administration may represent a means of suppressing IBD-associated pathobionts, yet utilization of phage therapy remains at an early developmental phase. OBJECTIVES Herein, we summarize the latest advances in understanding endogenous phage impacts on the gut microbiome in health and in IBD. We highlight the prospect of phage utilization as a targeted mode of pathobiont eradication, in preventing and treating IBD manifestations and complications. SOURCES Selected peer-reviewed publications regarding the role of phages in health and in IBD, published between 2013 and 2022. CONTENT The human gut microbiome is increasingly suggested to play a significant role in the onset and progression of multiple non-communicable diseases such as IBD. Several studies suggest that this effect may be mediated by discrete disease-contributing commensals. However, eradication of such pathogenic bacteria remains a daunting unmet task. Altered community structure in IBD may be influenced by blooms of phages within the gut bacterial ecosystem. Moreover, combinations of phages specifically targeting disease-contributing pathobiont strain clades may be harnessed as potential eradication treatment preventing and treating IBD, while bearing minimal adverse impacts on the surrounding bacterial microbiome. IMPLICATIONS Understanding endogenous phage-gut commensal interactions in health and in IBD may enable phage utilization in precision gut microbiome editing, towards treating IBD and other non-communicable microbiome-associated diseases. Nevertheless, developing phage combination-mediated IBD pathobiont eradication treatment modalities will likely necessitate better strain-level bacterial target identification and resolution of treatment-related challenges, such as phage delivery, off-target effects, and bacterial resistance.
Collapse
|
4
|
Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience 2022; 11:giac076. [PMID: 35950840 PMCID: PMC9366990 DOI: 10.1093/gigascience/giac076] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/08/2022] [Accepted: 07/11/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Muxuan Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
5
|
Chu Y, Guo S, Cui D, Fu X, Ma Y. DeephageTP: a convolutional neural network framework for identifying phage-specific proteins from metagenomic sequencing data. PeerJ 2022; 10:e13404. [PMID: 35698617 PMCID: PMC9188312 DOI: 10.7717/peerj.13404] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/18/2022] [Indexed: 01/14/2023] Open
Abstract
Bacteriophages (phages) are the most abundant and diverse biological entity on Earth. Due to the lack of universal gene markers and database representatives, there about 50-90% of genes of phages are unable to assign functions. This makes it a challenge to identify phage genomes and annotate functions of phage genes efficiently by homology search on a large scale, especially for newly phages. Portal (portal protein), TerL (large terminase subunit protein), and TerS (small terminase subunit protein) are three specific proteins of Caudovirales phage. Here, we developed a CNN (convolutional neural network)-based framework, DeephageTP, to identify the three specific proteins from metagenomic data. The framework takes one-hot encoding data of original protein sequences as the input and automatically extracts predictive features in the process of modeling. To overcome the false positive problem, a cutoff-loss-value strategy is introduced based on the distributions of the loss values of protein sequences within the same category. The proposed model with a set of cutoff-loss-values demonstrates high performance in terms of Precision in identifying TerL and Portal sequences (94% and 90%, respectively) from the mimic metagenomic dataset. Finally, we tested the efficacy of the framework using three real metagenomic datasets, and the results shown that compared to the conventional alignment-based methods, our proposed framework had a particular advantage in identifying the novel phage-specific protein sequences of portal and TerL with remote homology to their counterparts in the training datasets. In summary, our study for the first time develops a CNN-based framework for identifying the phage-specific protein sequences with high complexity and low conservation, and this framework will help us find novel phages in metagenomic sequencing data. The DeephageTP is available at https://github.com/chuym726/DeephageTP.
Collapse
Affiliation(s)
- Yunmeng Chu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China,Department of Bioengineering and Biotechnology, Huaqiao University, Xiamen, Fujian, P.R. China
| | - Shun Guo
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Dachao Cui
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Xiongfei Fu
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| | - Yingfei Ma
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese, Shenzhen, Guangdong, P.R. China
| |
Collapse
|
6
|
Martínez-Ruiz EB, Cooper M, Barrero-Canosa J, Haryono MAS, Bessarab I, Williams RBH, Szewzyk U. Genome analysis of Pseudomonas sp. OF001 and Rubrivivax sp. A210 suggests multicopper oxidases catalyze manganese oxidation required for cylindrospermopsin transformation. BMC Genomics 2021; 22:464. [PMID: 34157973 PMCID: PMC8218464 DOI: 10.1186/s12864-021-07766-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 06/03/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cylindrospermopsin is a highly persistent cyanobacterial secondary metabolite toxic to humans and other living organisms. Strain OF001 and A210 are manganese-oxidizing bacteria (MOB) able to transform cylindrospermopsin during the oxidation of Mn2+. So far, the enzymes involved in manganese oxidation in strain OF001 and A210 are unknown. Therefore, we analyze the genomes of two cylindrospermopsin-transforming MOB, Pseudomonas sp. OF001 and Rubrivivax sp. A210, to identify enzymes that could catalyze the oxidation of Mn2+. We also investigated specific metabolic features related to pollutant degradation and explored the metabolic potential of these two MOB with respect to the role they may play in biotechnological applications and/or in the environment. RESULTS Strain OF001 encodes two multicopper oxidases and one haem peroxidase potentially involved in Mn2+ oxidation, with a high similarity to manganese-oxidizing enzymes described for Pseudomonas putida GB-1 (80, 83 and 42% respectively). Strain A210 encodes one multicopper oxidase potentially involved in Mn2+ oxidation, with a high similarity (59%) to the manganese-oxidizing multicopper oxidase in Leptothrix discophora SS-1. Strain OF001 and A210 have genes that might confer them the ability to remove aromatic compounds via the catechol meta- and ortho-cleavage pathway, respectively. Based on the genomic content, both strains may grow over a wide range of O2 concentrations, including microaerophilic conditions, fix nitrogen, and reduce nitrate and sulfate in an assimilatory fashion. Moreover, the strain A210 encodes genes which may convey the ability to reduce nitrate in a dissimilatory manner, and fix carbon via the Calvin cycle. Both MOB encode CRISPR-Cas systems, several predicted genomic islands, and phage proteins, which likely contribute to their genome plasticity. CONCLUSIONS The genomes of Pseudomonas sp. OF001 and Rubrivivax sp. A210 encode sequences with high similarity to already described MCOs which may catalyze manganese oxidation required for cylindrospermopsin transformation. Furthermore, the analysis of the general metabolism of two MOB strains may contribute to a better understanding of the niches of cylindrospermopsin-removing MOB in natural habitats and their implementation in biotechnological applications to treat water.
Collapse
Affiliation(s)
- Erika Berenice Martínez-Ruiz
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany.
| | - Myriel Cooper
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany.
| | - Jimena Barrero-Canosa
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany
| | - Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Irina Bessarab
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, 119077, Singapore
| | - Ulrich Szewzyk
- Chair of Environmental Microbiology, Technische Universität Berlin, Institute of Environmental Technology, Straße des 17. Juni 135, 10623, Berlin, Germany
| |
Collapse
|
7
|
Abstract
The enormous diversity of RNA viruses in insects is continuously validated. Parasitoid wasps, as biocontrol insects which are widely used against insect pests in agroecosystems, may also carry many “good” RNA viruses. In this study, many virus-like fragments were obtained from transcriptomes of three wasp species, including Anisopteromalus calandrae (8), Lariophagus distinguendus (3), and Theocolax elegans (18), which can parasitize and control rice weevil Sitophilus oryzae, a serious insect pest of farm-stored grains. By further bioinformatic analysis and sequencing, we identified six novel RNA viruses with complete genomes and named them WWPSRV-1, WWPSRV-2, AcPSRV-1, AcNSRV-1, AcNSRV-2, and LdNSRV-1. PCR-based detection revealed that WWPSRV-1 and WWPSRV-2 had the possibility of interspecies virus transmission, especially WWPSRV-2, which was also present in the rice weevil adults. Phylogenetically, three out of these six viruses appeared to be members of order Picornavirales: WWPSRV-1 belonged to unassigned virus families of this order, whereas WWPSRV-2 and AcPSRV-1 belonged to families Iflaviridae and Dicistroviridae, respectively. The conserved picornavirus-typical domains helicase, protease, and RNA-dependent RNA polymerase could be found in the nonstructural protein encoded by the three viruses, whose genomes consisted of the different numbers of open reading frames (ORFs). The other three RNA viruses could be classified to order Mononegavirales: AcNSRV-1 and AcNSRV-2 belonged to family Lispiviridae, whereas LdNSRV-1 belonged to a big family Rhabdoviridae. The genomes of the three viruses contained at least five ORFs, encoding deduced proteins in the following order: 3′-N-P-M-G-L-5′. All the ORFs were separated by conserved intergenic sequences which likely regulated the transcription termination and initiation. Our findings enhance the understanding of RNA viruses in weevil wasps and set the foundation for the future study of the association among weevils, weevil wasps, and RNA viruses. IMPORTANCE The enormous diversity of RNA viruses in insects is continuously validated. Parasitoid wasps, as biocontrol insects which are widely used against insect pests in agroecosystems, may also carry many “good” RNA viruses. Some RNA viruses in parasitoid wasps have been reported to affect the host wasps or the wasps’ host. Here, six novel RNA viruses with complete genomes were identified in three parasitoid wasps of the rice weevil. One of these viruses was also detected in the rice weevil adults. Phylogenetically, WWPSRV-1 was the first unambiguous detection of Nora-like virus in insect parasitoids. WWPSRV-2 and AcPSRV-1 belong to families Iflaviridae and Dicistroviridae, some viruses of which can result in lethal infections in silkworms and honeybees. The other three RNA viruses belong to order Mononegavirales, which comprises many well-known insect-associated viruses.
Collapse
|
8
|
Gil P, Dupuy V, Koual R, Exbrayat A, Loire E, Fall AG, Gimonneau G, Biteye B, Talla Seck M, Rakotoarivony I, Marie A, Frances B, Lambert G, Reveillaud J, Balenghien T, Garros C, Albina E, Eloit M, Gutierrez S. A library preparation optimized for metagenomics of RNA viruses. Mol Ecol Resour 2021; 21:1788-1807. [PMID: 33713395 DOI: 10.1111/1755-0998.13378] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/23/2021] [Accepted: 02/25/2021] [Indexed: 11/28/2022]
Abstract
Our understanding of the viral communities associated to animals has not yet reached the level attained on the bacteriome. This situation is due to, among others, technical challenges in adapting metagenomics using high-throughput sequencing to the study of RNA viromes in animals. Although important developments have been achieved in most steps of viral metagenomics, there is yet a key step that has received little attention: the library preparation. This situation differs from bacteriome studies in which developments in library preparation have largely contributed to the democratisation of metagenomics. Here, we present a library preparation optimized for metagenomics of RNA viruses from insect vectors of viral diseases. The library design allows a simple PCR-based preparation, such as those routinely used in bacterial metabarcoding, that is adapted to shotgun sequencing as required in viral metagenomics. We first optimized our library preparation using mock viral communities and then validated a full metagenomic approach incorporating our preparation in two pilot studies with field-caught insect vectors; one including a comparison with a published metagenomic protocol. Our approach provided a fold increase in virus-like sequences compared to other studies, and nearly-full genomes from new virus species. Moreover, our results suggested conserved trends in virome composition within a population of a mosquito species. Finally, the sensitivity of our approach was compared to a commercial diagnostic PCR for the detection of an arbovirus in field-caught insect vectors. Our approach could facilitate studies on viral communities from animals and the democratization of metagenomics in community ecology of viruses.
Collapse
Affiliation(s)
- Patricia Gil
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Virginie Dupuy
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Rachid Koual
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Antoni Exbrayat
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Etienne Loire
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Assane G Fall
- Laboratoire National de l'Elevage et de Recherches Vétérinaires, Institut Sénégalais de Recherches Agricoles (ISRA), Dakar-Hann, Senegal
| | - Geoffrey Gimonneau
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France.,Laboratoire National de l'Elevage et de Recherches Vétérinaires, Institut Sénégalais de Recherches Agricoles (ISRA), Dakar-Hann, Senegal
| | - Biram Biteye
- Laboratoire National de l'Elevage et de Recherches Vétérinaires, Institut Sénégalais de Recherches Agricoles (ISRA), Dakar-Hann, Senegal
| | - Momar Talla Seck
- Laboratoire National de l'Elevage et de Recherches Vétérinaires, Institut Sénégalais de Recherches Agricoles (ISRA), Dakar-Hann, Senegal
| | - Ignace Rakotoarivony
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | | | | | | | - Julie Reveillaud
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France
| | - Thomas Balenghien
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Claire Garros
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Emmanuel Albina
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| | - Marc Eloit
- Pathogen Discovery Laboratory, Institut Pasteur, Paris, France.,The OIE Collaborating Centre for Detection and Identification in Humans of Emerging Animal Pathogens, Institut Pasteur, Paris, France.,École nationale vétérinaire d'Alfort, Maisons-Alfort, France
| | - Serafin Gutierrez
- ASTRE, Cirad, INRAE, University of Montpellier, Montpellier, France.,Cirad, UMR ASTRE, Montpellier, F-34398, France
| |
Collapse
|
9
|
Fang Z, Zhou H. VirionFinder: Identification of Complete and Partial Prokaryote Virus Virion Protein From Virome Data Using the Sequence and Biochemical Properties of Amino Acids. Front Microbiol 2021; 12:615711. [PMID: 33613485 PMCID: PMC7894196 DOI: 10.3389/fmicb.2021.615711] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 01/04/2021] [Indexed: 01/22/2023] Open
Abstract
Viruses are some of the most abundant biological entities on Earth, and prokaryote virus are the dominant members of the viral community. Because of the diversity of prokaryote virus, functional annotation cannot be performed on a large number of genes from newly discovered prokaryote virus by searching the current database; therefore, the development of an alignment-free algorithm for functional annotation of prokaryote virus proteins is important to understand the viral community. The identification of prokaryote virus proteins (PVVPs) is a critical step for many viral analyses, such as species classification, phylogenetic analysis and the exploration of how prokaryote virus interact with their hosts. Although a series of PVVP prediction tools have been developed, the performance of these tools is still not satisfactory. Moreover, viral metagenomic data contains fragmented sequences, leading to the existence of some incomplete genes. Therefore, a tool that can identify partial prokaryote virus proteins is also needed. In this work, we present a novel algorithm, called VirionFinder, to identify the complete and partial PVVPs from non-prokaryote virus virion proteins (non-PVVPs). VirionFinder uses the sequence and biochemical properties of 20 amino acids as the mathematical model to encode the protein sequences and uses a deep learning technique to identify whether a given protein is a PVVP. Compared with the state-of-the-art tools using artificial benchmark datasets, the results show that under the same specificity (Sp), the sensitivity (Sn) of VirionFinder is approximately 10-34% much higher than the Sn of these tools on both complete and partial proteins. When evaluating related tools using real virome data, the recognition rate of PVVP-like sequences of VirionFinder is also much higher than that of the other tools. We expect that VirionFinder will be a powerful tool for identifying novel virion proteins from both complete prokaryote virus genomes and viral metagenomic data. VirionFinder is freely available at https://github.com/zhenchengfang/VirionFinder.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- State Key Laboratory of Organ Failure Research, Southern Medical University, Guangzhou, China
| |
Collapse
|
10
|
Cantu VA, Salamon P, Seguritan V, Redfield J, Salamon D, Edwards RA, Segall AM. PhANNs, a fast and accurate tool and web server to classify phage structural proteins. PLoS Comput Biol 2020; 16:e1007845. [PMID: 33137102 PMCID: PMC7660903 DOI: 10.1371/journal.pcbi.1007845] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 11/12/2020] [Accepted: 09/26/2020] [Indexed: 02/07/2023] Open
Abstract
For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F1-score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally. Bacteriophages (phages, viruses that infect bacteria) are the most abundant biological entity on Earth. They outnumber bacteria by a factor of ten. As phages are very different from each other and from bacteria, and we have relatively few phage genes in our database compared to bacterial genes, we are unable to assign function to 50–90% of phage genes. In this work, we developed PhANNs, a machine learning tool that can classify a phage gene as one of ten structural roles, or “other”. This approach does not require a similar gene to be known.
Collapse
Affiliation(s)
- Vito Adrian Cantu
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
| | - Peter Salamon
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America
| | - Victor Seguritan
- Computational Science Research Center, San Diego State University, San Diego, United States of America
| | - Jackson Redfield
- Department of Biology, San Diego State University, San Diego, United States of America
| | - David Salamon
- Department of Mathematics and Statistics, San Diego State University, San Diego, United States of America
| | - Robert A. Edwards
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Biology, San Diego State University, San Diego, United States of America
| | - Anca M. Segall
- Computational Science Research Center, San Diego State University, San Diego, United States of America
- Viral Information Institute, San Diego State University, San Diego, United States of America
- Department of Biology, San Diego State University, San Diego, United States of America
- * E-mail:
| |
Collapse
|
11
|
Bojko J, Jennings LA, Behringer DC. A novel positive single-stranded RNA virus from the crustacean parasite, Probopyrinella latreuticola (Peracarida: Isopoda: Bopyridae). J Invertebr Pathol 2020; 177:107494. [PMID: 33115693 DOI: 10.1016/j.jip.2020.107494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/22/2020] [Accepted: 10/15/2020] [Indexed: 12/29/2022]
Abstract
A positive, single-stranded RNA virus is identified from the transcriptome of Probopyrinella latreuticola Gissler, 1882; a bopyrid isopod parasite of the Sargassum shrimp, Latreutes fucorum Fabricius, 1789. The viral sequence is 13,098 bp in length (including polyA), encoding four open reading frames (ORF). ORF-1 encodes a polyprotein, with three computationally discernible functional domains: viral methyltransferase; viral helicase; and RNA-directed RNA polymerase. The remaining ORFs encode a transmembrane protein, a capsid protein and a protein of undetermined function. The raw transcriptomic data reveal a low level of background single nucleotide mutations within the data. Comparison of the protein sequence data and synteny with other viral isolates reveals that the greatest protein similarity (<39%) is shared with the Negevirus group, a group that exclusively infects insects. Phylogenetic assessment of the individual polyprotein domains revealed a mixed prediction of phylogenetic origins, suggesting with low confidence that the novel +ssRNA virus could be present in multiple places throughout the individual gene trees. A concatenated approach strongly suggested that this new virus is an early diverging isolate, branching before the Negevirus and Cilevirus groups. Alongside the new isolate are other marine viruses, also present toward the base of the tree. The isopod virosphere, with the addition of this novel virus, is discussed relative to viral genomics/systematics. A great diversity of nege-like viruses appears to be present in marine invertebrate hosts, which require greater efforts for discovery and identification.
Collapse
Affiliation(s)
- Jamie Bojko
- School of Health and Life Sciences, Teesside University, Teesside, Middlesbrough TS1 3BA, UK; National Horizons Centre, Teesside University, Darlington DL1 1HG, UK.
| | - Lucas A Jennings
- Fisheries and Aquatic Sciences, University of Florida, Gainesville FL 32653, USA
| | - Donald C Behringer
- Fisheries and Aquatic Sciences, University of Florida, Gainesville FL 32653, USA; Emerging Pathogens Institute, University of Florida, Gainesville FL 32611, USA
| |
Collapse
|
12
|
Bojko J, Subramaniam K, Waltzek TB, Stentiford GD, Behringer DC. Genomic and developmental characterisation of a novel bunyavirus infecting the crustacean Carcinus maenas. Sci Rep 2019; 9:12957. [PMID: 31506463 PMCID: PMC6736955 DOI: 10.1038/s41598-019-49260-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/15/2019] [Indexed: 12/20/2022] Open
Abstract
Carcinus maenas is in the top 100 globally invasive species and harbours a wide diversity of pathogens, including viruses. We provide a detailed description for a novel bunyavirus (Carcinus maenas Portunibunyavirus 1) infecting C. maenas from its native range in the Faroe Islands. The virus genome is tripartite, including large (L) (6766 bp), medium (M) (3244 bp) and small (S) (1608 bp) negative sense, single-stranded RNA segments. Individual genomic segments are flanked by 4 bp regions of similarity (CCUG). The segments encode an RNA-dependent RNA-polymerase, glycoprotein, non-structural protein with a Zinc-Finger domain and a nucleoprotein. Most show highest identity to the 'Wenling Crustacean Virus 9' from an unidentified crustacean host. Phylogenomics of crustacean-infecting bunyaviruses place them across multiple bunyavirus families. We discuss the diversity of crustacean bunyaviruses and provide an overview of how these viruses may affect the health and survival of crustacean hosts, including those inhabiting niches outside of their native range.
Collapse
Affiliation(s)
- Jamie Bojko
- Fisheries and Aquatic Science, University of Florida, Gainesville, Florida, 32653, USA. .,Emerging Pathogens Institute, University of Florida, Gainesville, Florida, 32611, USA.
| | - Kuttichantran Subramaniam
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, 32610, USA
| | - Thomas B Waltzek
- Department of Infectious Diseases and Immunology, College of Veterinary Medicine, University of Florida, Gainesville, FL, 32610, USA
| | - Grant D Stentiford
- International Centre of Excellence for Aquatic Animal Health, Centre for Environment, Fisheries and Aquaculture Science, Weymouth, Dorset, DT4 8UB, UK.,Centre for Sustainable Aquaculture Futures, College of Life and Environmental Sciences, University of Exeter, Stocker Road, Exeter, EX4 4QD, UK
| | - Donald C Behringer
- Fisheries and Aquatic Science, University of Florida, Gainesville, Florida, 32653, USA. .,Emerging Pathogens Institute, University of Florida, Gainesville, Florida, 32611, USA.
| |
Collapse
|
13
|
Rosani U, Gerdol M. A bioinformatics approach reveals seven nearly-complete RNA-virus genomes in bivalve RNA-seq data. Virus Res 2016; 239:33-42. [PMID: 27769778 DOI: 10.1016/j.virusres.2016.10.009] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 10/17/2016] [Accepted: 10/17/2016] [Indexed: 01/17/2023]
Abstract
Viral metagenomics (viromics) can provide a great contribution in expanding the knowledge of viruses and the relationship with their hosts. Viromic studies on marine organisms are still at a very early stage and only little efforts have been spent in the identification of viruses associated to marine invertebrates to date, leaving the complexity of marine viromes associated to bivalve hosts almost completely unexplored. However, the potential use of viromic approaches in the management of viral diseases affecting aquacultured species has been recently evidenced by the flourishing of studies on the Ostreid herpesvirus type-1, which has been associated with bivalve mortality events. Herein we discuss an effective pipeline to retrieve and reconstruct nearly complete and previously unreported viral genomes from existing host RNA-seq data. As a case study, we report the identification of seven RNA-virus genomes within the frame of a highly diversified viral community that characterizes both Crassostrea gigas and Mytilus galloprovincialis samples collected from the lagoon of Goro (Italy).
Collapse
Affiliation(s)
- Umberto Rosani
- Dept. of Biology, University of Padua, Via U. Bassi 58/B, 35121 Padova Italy.
| | - Marco Gerdol
- Dept. of Life Sciences, University of Trieste, Via L. Giorgieri 5, 34127 Trieste Italy
| |
Collapse
|