1
|
Lu Y, Ma X, Yang L, Zhang T, Liu Y, Chu Q, He T, Li Y, Ouyang W. GUPNet++: Geometry Uncertainty Propagation Network for Monocular 3D Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:900-915. [PMID: 39374292 DOI: 10.1109/tpami.2024.3475583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
Geometry plays a significant role in monocular 3D object detection. It can be used to estimate object depth by using the perspective projection between object's physical size and 2D projection in the image plane, which can introduce mathematical priors into deep models. However, this projection process also introduces error amplification, where the error of the estimated height is amplified and reflected into the projected depth. It leads to unreliable depth inferences and also impairs training stability. To tackle this problem, we propose a novel Geometry Uncertainty Propagation Network (GUPNet++) by modeling geometry projection in a probabilistic manner. This ensures depth predictions are well-bounded and associated with a reasonable uncertainty. The significance of introducing such geometric uncertainty is two-fold: (1). It models the uncertainty propagation relationship of the geometry projection during training, improving the stability and efficiency of the end-to-end model learning. (2). It can be derived to a highly reliable confidence to indicate the quality of the 3D detection result, enabling more reliable detection inference. Experiments show that the proposed approach not only obtains (state-of-the-art) SOTA performance in image-based monocular 3D detection but also demonstrates superiority in efficacy with a simplified framework. The code and model will be released at https://github.com/SuperMHP/GUPNet_Plus.
Collapse
|
2
|
Li X, Zhang J, Ma D, Fan X, Zheng X, Liu YX. Exploring protein natural diversity in environmental microbiomes with DeepMetagenome. CELL REPORTS METHODS 2024; 4:100896. [PMID: 39515333 PMCID: PMC11705764 DOI: 10.1016/j.crmeth.2024.100896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 06/21/2024] [Accepted: 10/15/2024] [Indexed: 11/16/2024]
Abstract
Protein natural diversity offers a vast sequence space for protein engineering, and deep learning enables its detection from metagenomes/proteomes without prior assumptions. DeepMetagenome, a Python-based method, explores protein diversity through modules for training and analyzing sequence datasets. The deep learning model includes Embedding, Conv1D, LSTM, and Dense layers, with sequence feature analysis for data cleaning. Applied to metallothioneins from a database of over 146 million coding features, DeepMetagenome identified over 500 high-confidence metallothionein sequences, outperforming DIAMOND and CNN-based models. It showed stable performance compared to a Transformer-based model over 25 epochs. Among 23 synthesized sequences, 20 exhibited metal resistance. The tool also successfully explored the diversity of three additional protein families and is freely available on GitHub with detailed instructions.
Collapse
Affiliation(s)
- Xiaofang Li
- Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang 050021, China
| | - Jun Zhang
- College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071000, China
| | - Dan Ma
- College of Life Sciences, Hebei University, Baoding 071002, China
| | - Xiaofei Fan
- College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071000, China.
| | - Xin Zheng
- Center for Agricultural Resources Research, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Shijiazhuang 050021, China.
| | - Yong-Xin Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China.
| |
Collapse
|
3
|
Wintachai P, Thaion F, Clokie MRJ, Thomrongsuwannakij T. Isolation and Characterization of a Novel Escherichia Bacteriophage with Potential to Control Multidrug-Resistant Avian Pathogenic Escherichia coli and Biofilms. Antibiotics (Basel) 2024; 13:1083. [PMID: 39596776 PMCID: PMC11590954 DOI: 10.3390/antibiotics13111083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 10/29/2024] [Accepted: 11/11/2024] [Indexed: 11/29/2024] Open
Abstract
Background/Objectives: Avian pathogenic Escherichia coli (APEC) infection is a significant problem for the global chicken industry, as it decreases animal welfare and is associated with substantial economic losses. Traditionally, APEC infections have been controlled through the use of antibiotics, which has led to an increased prevalence of antibiotic-resistant E. coli. Therefore, developing alternative treatments for APEC infection is crucial. Methods: In this study, an Escherichia phage specific to multidrug-resistant (MDR) APEC, designated as phage vB_EcoP_PW8 (phage vECPW8), was isolated. The morphology, phage adsorption to host cells, one-step growth curve, thermal stability, pH stability, whole-genome sequencing, antibacterial ability, and antibiofilm efficacy of phage vECPW8 were evaluated. Results: The results demonstrated that phage vECPW8 has a Podoviridae morphology and is effective at lysing bacteria. Phage vECPW8 exhibited a high absorption rate to bacterial cells (more than 85% within 10 min) and had a latent period of 20 min, with a burst size of 143 plaque-forming units per cell. Additionally, phage vECPW8 showed good temperature and pH stability. The phage displayed strong antibacterial activity in vitro, and its efficacy in controlling bacteria was confirmed through scanning electron microscopy. Whole-genome sequencing revealed that the phage has a linear genome with 69,579 base pairs. The genome analysis supported the safety of the phage, as no toxin, virulence, or resistance-related genes were detected. Phage vECPW8 was identified as a novel lytic phage in the Gamaleyavirus genus and Schitoviridae family. The phage also demonstrated antibiofilm efficacy by reducing and preventing biofilm formation, as evidenced by biofilm biomass and bacterial cell viability measurements. Conclusions: These results indicate that phage vECPW8 is a promising candidate for the effective treatment of MDR APEC infections in poultry.
Collapse
Affiliation(s)
- Phitchayapak Wintachai
- Bacteriophage Laboratory, Walailak University, Thasala, Nakhon Si Thammarat 80161, Thailand;
- School of Science, Walailak University, Thasala, Nakhon Si Thammarat 80161, Thailand
- Functional Materials and Nanotechnology Center of Excellence, Walailak University, Thasala, Nakhon Si Thammarat 80161, Thailand
| | - Fahsai Thaion
- Bacteriophage Laboratory, Walailak University, Thasala, Nakhon Si Thammarat 80161, Thailand;
- School of Science, Walailak University, Thasala, Nakhon Si Thammarat 80161, Thailand
| | - Martha R. J. Clokie
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK;
| | - Thotsapol Thomrongsuwannakij
- Akkhraratchakumari Veterinary College, Walailak University, Nakhon Si Thammarat 80161, Thailand;
- Centre for One Health, Walailak University, Nakhon Si Thammarat 80161, Thailand
| |
Collapse
|
4
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion. BMC Genomics 2024; 25:1019. [PMID: 39478465 PMCID: PMC11526662 DOI: 10.1186/s12864-024-10954-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024] Open
Abstract
The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
5
|
Albin D, Ramsahoye M, Kochavi E, Alistar M. PhageScanner: a reconfigurable machine learning framework for bacteriophage genomic and metagenomic feature annotation. Front Microbiol 2024; 15:1446097. [PMID: 39355420 PMCID: PMC11442244 DOI: 10.3389/fmicb.2024.1446097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 08/23/2024] [Indexed: 10/03/2024] Open
Abstract
Bacteriophages are the most prolific organisms on Earth, yet many of their genomes and assemblies from metagenomic sources lack protein sequences with identified functions. While most bacteriophage proteins are structural proteins, categorized as Phage Virion Proteins (PVPs), a considerable number remain unclassified. Complicating matters further, traditional lab-based methods for PVP identification can be tedious. To expedite the process of identifying PVPs, machine-learning models are increasingly being employed. Existing tools have developed models for predicting PVPs from protein sequences as input. However, none of these efforts have built software allowing for both genomic and metagenomic data as input. In addition, there is currently no framework available for easily curating data and creating new types of machine learning models. In response, we introduce PhageScanner, an open-source platform that streamlines data collection for genomic and metagenomic datasets, model training and testing, and includes a prediction pipeline for annotating genomic and metagenomic data. PhageScanner also features a graphical user interface (GUI) for visualizing annotations on genomic and metagenomic data. We further introduce a BLAST-based classifier that outperforms ML-based models and an efficient Long Short-Term Memory (LSTM) classifier. We then showcase the capabilities of PhageScanner by predicting PVPs in six previously uncharacterized bacteriophage genomes. In addition, we create a new model that predicts phage-encoded toxins within bacteriophage genomes, thus displaying the utility of the framework.
Collapse
Affiliation(s)
- Dreycey Albin
- Department of Computer Science, University of Colorado at Boulder, Boulder, CO, United States
| | - Michelle Ramsahoye
- Department of Computer Science, University of Colorado at Boulder, Boulder, CO, United States
| | - Eitan Kochavi
- Department of Computer Science, University of Colorado at Boulder, Boulder, CO, United States
| | - Mirela Alistar
- Department of Computer Science, University of Colorado at Boulder, Boulder, CO, United States
- ATLAS Institute, University of Colorado at Boulder, Boulder, CO, United States
| |
Collapse
|
6
|
Parker DR, Nugen SR. Bacteriophage-Based Bioanalysis. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2024; 17:393-410. [PMID: 39018352 DOI: 10.1146/annurev-anchem-071323-084224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2024]
Abstract
Bacteriophages, which are viral predators of bacteria, have evolved to efficiently recognize, bind, infect, and lyse their host, resulting in the release of tens to hundreds of propagated viruses. These abilities have attracted biosensor developers who have developed new methods to detect bacteria. Recently, several comprehensive reviews have covered many of the advances made regarding the performance of phage-based biosensors. Therefore, in this review, we first describe the landscape of phage-based biosensors and then cover advances in other aspects of phage biology and engineering that can be used to make high-impact contributions to biosensor development. Many of these advances are in fields adjacent to analytical chemistry such as synthetic biology, machine learning, and genetic engineering and will allow those looking to develop phage-based biosensors to start taking alternative approaches, such as a bottom-up design and synthesis of custom phages with the singular task of detecting their host.
Collapse
Affiliation(s)
- David R Parker
- Department of Food Science, Cornell University, Ithaca, New York, USA;
| | - Sam R Nugen
- Department of Food Science, Cornell University, Ithaca, New York, USA;
| |
Collapse
|
7
|
Flamholz ZN, Biller SJ, Kelly L. Large language models improve annotation of prokaryotic viral proteins. Nat Microbiol 2024; 9:537-549. [PMID: 38287147 PMCID: PMC11311208 DOI: 10.1038/s41564-023-01584-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 12/08/2023] [Indexed: 01/31/2024]
Abstract
Viral genomes are poorly annotated in metagenomic samples, representing an obstacle to understanding viral diversity and function. Current annotation approaches rely on alignment-based sequence homology methods, which are limited by the paucity of characterized viral proteins and divergence among viral sequences. Here we show that protein language models can capture prokaryotic viral protein function, enabling new portions of viral sequence space to be assigned biologically meaningful labels. When applied to global ocean virome data, our classifier expanded the annotated fraction of viral protein families by 29%. Among previously unannotated sequences, we highlight the identification of an integrase defining a mobile element in marine picocyanobacteria and a capsid protein that anchors globally widespread viral elements. Furthermore, improved high-level functional annotation provides a means to characterize similarities in genomic organization among diverse viral sequences. Protein language models thus enhance remote homology detection of viral proteins, serving as a useful complement to existing approaches.
Collapse
Affiliation(s)
- Zachary N Flamholz
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Steven J Biller
- Department of Biological Sciences, Wellesley College, Wellesley, MA, USA
| | - Libusha Kelly
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA.
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY, USA.
| |
Collapse
|
8
|
Wu S, Feng T, Tang W, Qi C, Gao J, He X, Wang J, Zhou H, Fang Z. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform 2024; 25:bbae085. [PMID: 38487846 PMCID: PMC10940841 DOI: 10.1093/bib/bbae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/26/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model-based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing on gut metagenomes from probiotic-treated individuals, it revealed the probioticity of intervention strains-derived bins and other probiotic-associated bins beyond the training data, such as a plasmid-like bin. Analyses of these bins revealed various probiotic mechanisms and bai operon as probiotic Ruminococcaceae's potential marker. In different health-disease cohorts, these bins were more common in healthy individuals, signifying their probiotic role, but relevant health predictions based on the abundance profiles of these bins faced cross-disease challenges. To better understand the heterogeneous nature of probiotics, we used metaProbiotics to construct a comprehensive probiotic genome set from global gut metagenomic data. Module analysis of this set shows that diseased individuals often lack certain probiotic gene modules, with significant variation of the missing modules across different diseases. Additionally, different gene modules on the same probiotic have heterogeneous effects on various diseases. We thus believe that gene function integrity of the probiotic community is more crucial in maintaining gut homeostasis than merely increasing specific gene abundance, and adding probiotics indiscriminately might not boost health. We expect that the innovative language model-based metaProbiotics tool will promote novel probiotic discovery using large-scale metagenomic data and facilitate systematic research on bacterial probiotic effects. The metaProbiotics program can be freely downloaded at https://github.com/zhenchengfang/metaProbiotics.
Collapse
Affiliation(s)
- Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Waijiao Tang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Cancan Qi
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jie Gao
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Department of Gastroenterology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xiaolong He
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jiaxuan Wang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
9
|
Feng T, Wu S, Zhou H, Fang Z. MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model. Gigascience 2024; 13:giae047. [PMID: 39101782 PMCID: PMC11299106 DOI: 10.1093/gigascience/giae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/31/2024] [Accepted: 06/24/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Mobilization typing (MOB) is a classification scheme for plasmid genomes based on their relaxase gene. The host ranges of plasmids of different MOB categories are diverse, and MOB is crucial for investigating plasmid mobilization, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristics of metagenomic contigs. RESULTS We developed MOBFinder, an 11-class classifier, for categorizing plasmid fragments into 10 MOB types and a nonmobilizable category. We first performed MOB typing to classify complete plasmid genomes according to relaxase information and then constructed an artificial benchmark dataset of plasmid metagenomic fragments (PMFs) from those complete plasmid genomes whose MOB types are well annotated. Next, based on natural language models, we used word vectors to characterize the PMFs. Several random forest classification models were trained and integrated to predict fragments of different lengths. Evaluating the tool using the benchmark dataset, we found that MOBFinder outperforms previous tools such as MOBscan and MOB-suite, with an overall accuracy approximately 59% higher than that of MOB-suite. Moreover, the balanced accuracy, harmonic mean, and F1-score reached up to 99% for some MOB types. When applied to a cohort of patients with type 2 diabetes (T2D), MOBFinder offered insights suggesting that the MOBF type plasmid, which is widely present in Escherichia and Klebsiella, and the MOBQ type plasmid might accelerate antibiotic resistance transmission in patients with T2D. CONCLUSIONS To the best of our knowledge, MOBFinder is the first tool for MOB typing of PMFs. The tool is freely available at https://github.com/FengTaoSMU/MOBFinder.
Collapse
Affiliation(s)
- Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|
10
|
Grigson SR, Giles SK, Edwards RA, Papudeshi B. Knowing and Naming: Phage Annotation and Nomenclature for Phage Therapy. Clin Infect Dis 2023; 77:S352-S359. [PMID: 37932119 PMCID: PMC10627814 DOI: 10.1093/cid/ciad539] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023] Open
Abstract
Bacteriophages, or phages, are viruses that infect bacteria shaping microbial communities and ecosystems. They have gained attention as potential agents against antibiotic resistance. In phage therapy, lytic phages are preferred for their bacteria killing ability, while temperate phages, which can transfer antibiotic resistance or toxin genes, are avoided. Selection relies on plaque morphology and genome sequencing. This review outlines annotating genomes, identifying critical genomic features, and assigning functional labels to protein-coding sequences. These annotations prevent the transfer of unwanted genes, such as antimicrobial resistance or toxin genes, during phage therapy. Additionally, it covers International Committee on Taxonomy of Viruses (ICTV)-an established phage nomenclature system for simplified classification and communication. Accurate phage genome annotation and nomenclature provide insights into phage-host interactions, replication strategies, and evolution, accelerating our understanding of the diversity and evolution of phages and facilitating the development of phage-based therapies.
Collapse
Affiliation(s)
- Susanna R Grigson
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Sarah K Giles
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Robert A Edwards
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
11
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. Identification of plant vacuole proteins by using graph neural network and contact maps. BMC Bioinformatics 2023; 24:357. [PMID: 37740195 PMCID: PMC10517492 DOI: 10.1186/s12859-023-05475-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023] Open
Abstract
Plant vacuoles are essential organelles in the growth and development of plants, and accurate identification of their proteins is crucial for understanding their biological properties. In this study, we developed a novel model called GraphIdn for the identification of plant vacuole proteins. The model uses SeqVec, a deep representation learning model, to initialize the amino acid sequence. We utilized the AlphaFold2 algorithm to obtain the structural information of corresponding plant vacuole proteins, and then fed the calculated contact maps into a graph convolutional neural network. GraphIdn achieved accuracy values of 88.51% and 89.93% in independent testing and fivefold cross-validation, respectively, outperforming previous state-of-the-art predictors. As far as we know, this is the first model to use predicted protein topology structure graphs to identify plant vacuole proteins. Furthermore, we assessed the effectiveness and generalization capability of our GraphIdn model by applying it to identify and locate peroxisomal proteins, which yielded promising outcomes. The source code and datasets can be accessed at https://github.com/SJNNNN/GraphIdn .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
12
|
Shang J, Peng C, Tang X, Sun Y. PhaVIP: Phage VIrion Protein classification based on chaos game representation and Vision Transformer. Bioinformatics 2023; 39:i30-i39. [PMID: 37387136 DOI: 10.1093/bioinformatics/btad229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION As viruses that mainly infect bacteria, phages are key players across a wide range of ecosystems. Analyzing phage proteins is indispensable for understanding phages' functions and roles in microbiomes. High-throughput sequencing enables us to obtain phages in different microbiomes with low cost. However, compared to the fast accumulation of newly identified phages, phage protein classification remains difficult. In particular, a fundamental need is to annotate virion proteins, the structural proteins, such as major tail, baseplate, etc. Although there are experimental methods for virion protein identification, they are too expensive or time-consuming, leaving a large number of proteins unclassified. Thus, there is a great demand to develop a computational method for fast and accurate phage virion protein (PVP) classification. RESULTS In this work, we adapted the state-of-the-art image classification model, Vision Transformer, to conduct virion protein classification. By encoding protein sequences into unique images using chaos game representation, we can leverage Vision Transformer to learn both local and global features from sequence "images". Our method, PhaVIP, has two main functions: classifying PVP and non-PVP sequences and annotating the types of PVP, such as capsid and tail. We tested PhaVIP on several datasets with increasing difficulty and benchmarked it against alternative tools. The experimental results show that PhaVIP has superior performance. After validating the performance of PhaVIP, we investigated two applications that can use the output of PhaVIP: phage taxonomy classification and phage host prediction. The results showed the benefit of using classified proteins over all proteins. AVAILABILITY AND IMPLEMENTATION The web server of PhaVIP is available via: https://phage.ee.cityu.edu.hk/phavip. The source code of PhaVIP is available via: https://github.com/KennthShang/PhaVIP.
Collapse
Affiliation(s)
- Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Cheng Peng
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Xubo Tang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong (SAR), China
| |
Collapse
|
13
|
Flamholz ZN, Biller SJ, Kelly L. Large language models improve annotation of viral proteins. RESEARCH SQUARE 2023:rs.3.rs-2852098. [PMID: 37205395 PMCID: PMC10187409 DOI: 10.21203/rs.3.rs-2852098/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Viral sequences are poorly annotated in environmental samples, a major roadblock to understanding how viruses influence microbial community structure. Current annotation approaches rely on alignment-based sequence ho-mology methods, which are limited by available viral sequences and sequence divergence in viral proteins. Here, we show that protein language model representations capture viral protein function beyond the limits of remote sequence homology by targeting two axes of viral sequence annotation: systematic labeling of protein families and function identification for biologic discovery. Protein language model representations capture protein functional properties specific to viruses and expand the annotated fraction of ocean virome viral protein sequences by 37%. Among unannotated viral protein families, we identify a novel DNA editing protein family that defines a new mobile element in marine picocyanobacteria. Protein language models thus significantly enhance remote homology detection of viral proteins and can be utilized to enable new biological discovery across diverse functional categories.
Collapse
Affiliation(s)
- Zachary N. Flamholz
- Department of Systems and Computational Biology, Albert Einstein College of Medicine; Bronx, NY, USA
| | - Steve J. Biller
- Department of Biological Sciences, Wellesley College; Wellesley, MA USA
| | - Libusha Kelly
- Department of Systems and Computational Biology, Albert Einstein College of Medicine; Bronx, NY, USA
- Department of Microbiology and Immunology, Albert Einstein College of Medicine; Bronx, NY, USA
| |
Collapse
|
14
|
Fang Z, Feng T, Zhou H, Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning. Gigascience 2022; 11:giac076. [PMID: 35950840 PMCID: PMC9366990 DOI: 10.1093/gigascience/giac076] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/08/2022] [Accepted: 07/11/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.
Collapse
Affiliation(s)
- Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| | - Muxuan Chen
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510280, China
| |
Collapse
|