1
|
Peng Y, Wu J, Sun Y, Zhang Y, Wang Q, Shao S. Contrastive-learning of language embedding and biological features for cross modality encoding and effector prediction. Nat Commun 2025; 16:1299. [PMID: 39900608 PMCID: PMC11791096 DOI: 10.1038/s41467-025-56526-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open
Abstract
Identifying and characterizing virulence proteins secreted by Gram-negative bacteria are fundamental for deciphering microbial pathogenicity as well as aiding the development of therapeutic strategies. Effector predictors utilizing pre-trained protein language models (PLMs) have shown sound performance by leveraging extensive evolutionary and sequential protein features. However, the accuracy and sensitivity of effector prediction remain challenging. Here, we introduce a model named Contrastive-learning of Language Embedding and Biological Features (CLEF) leveraging contrastive learning to integrate PLM representations with supplementary biological features. Biologically information is captured in learned contextualized embeddings to yield meaningful representations. With cross-modality biological features, CLEF outperforms state-of-the-art (SOTA) models in predicting type III, type IV, and type VI secreted effectors (T3SEs/T4SEs/T6SEs) in enteric pathogens. All experimentally verified effectors in Enterohemorrhagic Escherichia coli and 41 of 43 experimentally verified T3SEs of Salmonella Typhimurium are recognized. Moreover, 12 predicted T3SEs and 11 predicted T6SEs are validated by extensive experiments in Edwardsiella piscicida. Furthermore, integrating omics data via CLEF framework enhances protein representations to illustrate effector-effector interactions and determine in vivo colonization-essential genes. Collectively, CLEF provides a blueprint to bridge the gap between in silico PLM's capacity and experimental biological information to fulfill complicated tasks.
Collapse
Affiliation(s)
- Yue Peng
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Junze Wu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Yi Sun
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Yuanxing Zhang
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), 519000, Zhuhai, China
| | - Qiyao Wang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
- Shanghai Engineering Research Center of Maricultured Animal Vaccines, Shanghai, China
- Laboratory of Aquatic Animal Diseases of MOA, Shanghai, China
| | - Shuai Shao
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China.
- Shanghai Engineering Research Center of Maricultured Animal Vaccines, Shanghai, China.
- Laboratory of Aquatic Animal Diseases of MOA, Shanghai, China.
| |
Collapse
|
2
|
Gao M, Song C, Liu T. PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings. J Cell Biochem 2025; 126:e30642. [PMID: 39164870 DOI: 10.1002/jcb.30642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 08/04/2024] [Accepted: 08/07/2024] [Indexed: 08/22/2024]
Abstract
The Type III secretion effectors (T3SEs) are bacterial proteins synthesized by Gram-negative pathogens and delivered into host cells via the Type III secretion system (T3SS). These effectors usually play a pivotal role in the interactions between bacteria and hosts. Hence, the precise identification of T3SEs aids researchers in exploring the pathogenic mechanisms of bacterial infections. Since the diversity and complexity of T3SE sequences often make traditional experimental methods time-consuming, it is imperative to explore more efficient and convenient computational approaches for T3SE prediction. Inspired by the promising potential exhibited by pre-trained language models in protein recognition tasks, we proposed a method called PLM-T3SE that utilizes protein language models (PLMs) for effective recognition of T3SEs. First, we utilized PLM embeddings and evolutionary features from the position-specific scoring matrix (PSSM) profiles to transform protein sequences into fixed-length vectors for model training. Second, we employed the extreme gradient boosting (XGBoost) algorithm to rank these features based on their importance. Finally, a MLP neural network model was used to predict T3SEs based on the selected optimal feature set. Experimental results from the cross-validation and independent test demonstrated that our model exhibited superior performance compared to the existing models. Specifically, our model achieved an accuracy of 98.1%, which is 1.8%-42.4% higher than the state-of-the-art predictors based on the same independent data set test. These findings highlight the superiority of the PLM-T3SE and the remarkable characterization ability of PLM embeddings for T3SE prediction.
Collapse
Affiliation(s)
- Mengru Gao
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| | - Chen Song
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| |
Collapse
|
3
|
Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024; 23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open
Abstract
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Collapse
Affiliation(s)
- Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
- Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Qinyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yanyan Zhu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Hematology, Zhejiang University School of Medicine, The First Affiliated Hospital, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
4
|
Zhao M, Lei C, Zhou K, Huang Y, Fu C, Yang S, Zhang Z. POOE: predicting oomycete effectors based on a pre-trained large protein language model. mSystems 2024; 9:e0100423. [PMID: 38078741 PMCID: PMC10804963 DOI: 10.1128/msystems.01004-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/23/2023] [Indexed: 01/24/2024] Open
Abstract
Oomycetes are fungus-like eukaryotic microorganisms which can cause catastrophic diseases in many plants. Successful infection of oomycetes depends highly on their effector proteins that are secreted into plant cells to subvert plant immunity. Thus, systematic identification of effectors from the oomycete proteomes remains an initial but crucial step in understanding plant-pathogen relationships. However, the number of experimentally identified oomycete effectors is still limited. Currently, only a few bioinformatics predictors exist to detect potential effectors, and their prediction performance needs to be improved. Here, we used the sequence embeddings from a pre-trained large protein language model (ProtTrans) as input and developed a support vector machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance with an area under the precision-recall curve of 0.804 (area under the receiver operating characteristic curve = 0.893, accuracy = 0.874, precision = 0.777, recall = 0.684, and specificity = 0.936) in the fivefold cross-validation, considerably outperforming various combinations of popular machine learning algorithms and other commonly used sequence encoding schemes. A similar prediction performance was also observed in the independent test. Compared with the existing oomycete effector prediction methods, POOE provided very competitive and promising performance, suggesting that ProtTrans effectively captures rich protein semantic information and dramatically improves the prediction task. We anticipate that POOE can accelerate the identification of oomycete effectors and provide new hints to systematically understand the functional roles of effectors in plant-pathogen interactions. The web server of POOE is freely accessible at http://zzdlab.com/pooe/index.php. The corresponding source codes and data sets are also available at https://github.com/zzdlabzm/POOE.IMPORTANCEIn this work, we use the sequence representations from a pre-trained large protein language model (ProtTrans) as input and develop a Support Vector Machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance in the independent test set, considerably outperforming existing oomycete effector prediction methods. We expect that this new bioinformatics tool will accelerate the identification of oomycete effectors and further guide the experimental efforts to interrogate the functional roles of effectors in plant-pathogen interaction.
Collapse
Affiliation(s)
- Miao Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Chenping Lei
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Kewei Zhou
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yan Huang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Chen Fu
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China
| | - Shiping Yang
- State Key Laboratory of Plant Environmental Resilience, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| |
Collapse
|
5
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
6
|
Zhang Y, Guan J, Li C, Wang Z, Deng Z, Gasser RB, Song J, Ou HY. DeepSecE: A Deep-Learning-Based Framework for Multiclass Prediction of Secreted Proteins in Gram-Negative Bacteria. RESEARCH (WASHINGTON, D.C.) 2023; 6:0258. [PMID: 37886621 PMCID: PMC10599158 DOI: 10.34133/research.0258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/08/2023] [Indexed: 10/28/2023]
Abstract
Proteins secreted by Gram-negative bacteria are tightly linked to the virulence and adaptability of these microbes to environmental changes. Accurate identification of such secreted proteins can facilitate the investigations of infections and diseases caused by these bacterial pathogens. However, current bioinformatic methods for predicting bacterial secreted substrate proteins have limited computational efficiency and application scope on a genome-wide scale. Here, we propose a novel deep-learning-based framework-DeepSecE-for the simultaneous inference of multiple distinct groups of secreted proteins produced by Gram-negative bacteria. DeepSecE remarkably improves their classification from nonsecreted proteins using a pretrained protein language model and transformer, achieving a macro-average accuracy of 0.883 on 5-fold cross-validation. Performance benchmarking suggests that DeepSecE achieves competitive performance with the state-of-the-art binary predictors specialized for individual types of secreted substrates. The attention mechanism corroborates salient patterns and motifs at the N or C termini of the protein sequences. Using this pipeline, we further investigate the genome-wide prediction of novel secreted proteins and their taxonomic distribution across ~1,000 Gram-negative bacterial genomes. The present analysis demonstrates that DeepSecE has major potential for the discovery of disease-associated secreted proteins in a diverse range of Gram-negative bacteria. An online web server of DeepSecE is also publicly available to predict and explore various secreted substrate proteins via the input of bacterial genome sequences.
Collapse
Affiliation(s)
- Yumeng Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Key Laboratory of Veterinary Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jiahao Guan
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology,
Monash University, Melbourne, VIC 3800, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology,
Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute,
Monash University, Melbourne, VIC 3800, Australia
| | - Zixin Deng
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
| | - Robin B. Gasser
- Melbourne Veterinary School, Faculty of Science,
The University of Melbourne, Parkville, VIC 3010, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology,
Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute,
Monash University, Melbourne, VIC 3800, Australia
- Melbourne Veterinary School, Faculty of Science,
The University of Melbourne, Parkville, VIC 3010, Australia
| | - Hong-Yu Ou
- State Key Laboratory of Microbial Metabolism, Joint International Laboratory on Metabolic & Developmental Sciences, School of Life Sciences and Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
- Shanghai Key Laboratory of Veterinary Biotechnology,
Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
7
|
Jerez SA, Plaza N, Bravo V, Urrutia IM, Blondel CJ. Vibrio type III secretion system 2 is not restricted to the Vibrionaceae and encodes differentially distributed repertoires of effector proteins. Microb Genom 2023; 9:mgen000973. [PMID: 37018030 PMCID: PMC10210961 DOI: 10.1099/mgen.0.000973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 02/01/2023] [Indexed: 04/06/2023] Open
Abstract
Vibrio parahaemolyticus is the leading cause of seafood-borne gastroenteritis worldwide. A distinctive feature of the O3:K6 pandemic clone, and its derivatives, is the presence of a second, phylogenetically distinct, type III secretion system (T3SS2) encoded within the genomic island VPaI-7. The T3SS2 allows the delivery of effector proteins directly into the cytosol of infected eukaryotic cells to subvert key host-cell processes, critical for V. parahaemolyticus to colonize and cause disease. Furthermore, the T3SS2 also increases the environmental fitness of V. parahaemolyticus in its interaction with bacterivorous protists; hence, it has been proposed that it contributed to the global oceanic spread of the pandemic clone. Several reports have identified T3SS2-related genes in Vibrio and non-Vibrio species, suggesting that the T3SS2 gene cluster is not restricted to the Vibrionaceae and can mobilize through horizontal gene transfer events. In this work, we performed a large-scale genomic analysis to determine the phylogenetic distribution of the T3SS2 gene cluster and its repertoire of effector proteins. We identified putative T3SS2 gene clusters in 1130 bacterial genomes from 8 bacterial genera, 5 bacterial families and 47 bacterial species. A hierarchical clustering analysis allowed us to define six T3SS2 subgroups (I-VI) with different repertoires of effector proteins, redefining the concepts of T3SS2 core and accessory effector proteins. Finally, we identified a subset of the T3SS2 gene clusters (subgroup VI) that lacks most T3SS2 effector proteins described to date and provided a list of 10 novel effector candidates for this subgroup through bioinformatic analysis. Collectively, our findings indicate that the T3SS2 extends beyond the family Vibrionaceae and suggest that different effector protein repertories could have a differential impact on the pathogenic potential and environmental fitness of each bacterium that has acquired the Vibrio T3SS2 gene cluster.
Collapse
Affiliation(s)
- Sebastian A. Jerez
- Instituto de Ciencias Biomédicas, Facultad de Medicina y Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile
| | - Nicolas Plaza
- Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago, Chile
| | - Veronica Bravo
- Programa Centro de Investigación Biomédica y Aplicada (CIBAP), Escuela de Medicina, Facultad de Ciencias Médicas, Universidad de Santiago de Chile, Santiago, Chile
| | - Italo M. Urrutia
- Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud, Universidad Autónoma de Chile, Santiago, Chile
| | - Carlos J. Blondel
- Instituto de Ciencias Biomédicas, Facultad de Medicina y Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile
| |
Collapse
|
8
|
De Ryck J, Van Damme P, Goormachtig S. From prediction to function: Current practices and challenges towards the functional characterization of type III effectors. Front Microbiol 2023; 14:1113442. [PMID: 36846751 PMCID: PMC9945535 DOI: 10.3389/fmicb.2023.1113442] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 01/19/2023] [Indexed: 02/10/2023] Open
Abstract
The type III secretion system (T3SS) is a well-studied pathogenicity determinant of many bacteria through which effectors (T3Es) are translocated into the host cell, where they exercise a wide range of functions to deceive the host cell's immunity and to establish a niche. Here we look at the different approaches that are used to functionally characterize a T3E. Such approaches include host localization studies, virulence screenings, biochemical activity assays, and large-scale omics, such as transcriptomics, interactomics, and metabolomics, among others. By means of the phytopathogenic Ralstonia solanacearum species complex (RSSC) as a case study, the current advances of these methods will be explored, alongside the progress made in understanding effector biology. Data obtained by such complementary methods provide crucial information to comprehend the entire function of the effectome and will eventually lead to a better understanding of the phytopathogen, opening opportunities to tackle it.
Collapse
Affiliation(s)
- Joren De Ryck
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Petra Van Damme
- iRIP Unit, Laboratory of Microbiology, Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
| | - Sofie Goormachtig
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| |
Collapse
|
9
|
Wagner N, Alburquerque M, Ecker N, Dotan E, Zerah B, Pena MM, Potnis N, Pupko T. Natural language processing approach to model the secretion signal of type III effectors. FRONTIERS IN PLANT SCIENCE 2022; 13:1024405. [PMID: 36388586 PMCID: PMC9659976 DOI: 10.3389/fpls.2022.1024405] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Mendonca Pena
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
10
|
Wagner N, Avram O, Gold-Binshtok D, Zerah B, Teper D, Pupko T. Effectidor: an automated machine-learning-based web server for the prediction of type-III secretion system effectors. Bioinformatics 2022; 38:2341-2343. [PMID: 35157036 DOI: 10.1093/bioinformatics/btac087] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 01/31/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Type-III secretion systems are utilized by many Gram-negative bacteria to inject type-3 effectors (T3Es) to eukaryotic cells. These effectors manipulate host processes for the benefit of the bacteria and thus promote disease. They can also function as host-specificity determinants through their recognition as avirulence proteins that elicit immune response. Identifying the full effector repertoire within a set of bacterial genomes is of great importance to develop appropriate treatments against the associated pathogens. RESULTS We present Effectidor, a user-friendly web server that harnesses several machine-learning techniques to predict T3Es within bacterial genomes. We compared the performance of Effectidor to other available tools for the same task on three pathogenic bacteria. Effectidor outperformed these tools in terms of classification accuracy (area under the precision-recall curve above 0.98 in all cases). AVAILABILITY AND IMPLEMENTATION Effectidor is available at: https://effectidor.tau.ac.il, and the source code is available at: https://github.com/naamawagner/Effectidor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dafna Gold-Binshtok
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Doron Teper
- Department of Plant Pathology and Weed Research, Institute of Plant Protection Agricultural Research Organization (ARO), Volcani Center, Rishon LeZion 7505101, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
11
|
Jin Y, Yang Y. ProtPlat: an efficient pre-training platform for protein classification based on FastText. BMC Bioinformatics 2022; 23:66. [PMID: 35148686 PMCID: PMC8832758 DOI: 10.1186/s12859-022-04604-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 02/02/2022] [Indexed: 11/24/2022] Open
Abstract
Background For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few. Results In this paper, we propose a pre-training platform for representing protein sequences, called ProtPlat, which uses the Pfam database to train a three-layer neural network, and then uses specific training data from downstream tasks to fine-tune the model. ProtPlat can learn good representations for amino acids, and at the same time achieve efficient classification. We conduct experiments on three protein classification tasks, including the identification of type III secreted effectors, the prediction of subcellular localization, and the recognition of signal peptides. The experimental results show that the pre-training can enhance model performance effectively and ProtPlat is competitive to the state-of-the-art predictors, especially for small datasets. We implement the ProtPlat platform as a web service (https://compbio.sjtu.edu.cn/protplat) that is accessible to the public. Conclusions To enhance the feature representation of protein amino acid sequences and improve the performance of sequence-based classification tasks, we develop ProtPlat, a general platform for the pre-training of protein sequences, which is featured by a large-scale supervised training based on Pfam database and an efficient learning model, FastText. The experimental results of three downstream classification tasks demonstrate the efficacy of ProtPlat. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04604-2.
Collapse
Affiliation(s)
- Yuan Jin
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, China.
| |
Collapse
|
12
|
Genomic and Functional Dissections of Dickeya zeae Shed Light on the Role of Type III Secretion System and Cell Wall-Degrading Enzymes to Host Range and Virulence. Microbiol Spectr 2022; 10:e0159021. [PMID: 35107329 PMCID: PMC8809351 DOI: 10.1128/spectrum.01590-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Dickeya zeae is a worldwide destructive pathogen that causes soft rot diseases on various hosts such as rice, maize, banana, and potato. The strain JZL7 we recently isolated from clivia represents the first monocot-specific D. zeae and also has reduced pathogenicity compared to that of other D. zeae strains (e.g., EC1 and MS2). To elucidate the molecular mechanisms underlying its more restricted host range and weakened pathogenicity, we sequenced the complete genome of JZL7 and performed comparative genomic and functional analyses of JZL7 and other D. zeae strains. We found that, while having the largest genome among D. zeae strains, JZL7 lost almost the entire type III secretion system (T3SS), which is a key component of the virulence suite of many bacterial pathogens. Importantly, the deletion of T3SS in MS2 substantially diminished the expression of most type III secreted effectors (T3SEs) and MS2's pathogenicity on both dicots and monocots. Moreover, although JZL7 and MS2 share almost the same repertoire of cell wall-degrading enzymes (CWDEs), we found broad reduction in the production of CWDEs and expression levels of CWDE genes in JZL7. The lower expression of CWDEs, pectin lyases in particular, would probably make it difficult for JZL7 to break down the cell wall of dicots, which is rich in pectin. Together, our results suggest that the loss of T3SS and reduced CWDE activity together might have contributed to the host specificity and virulence of JZL7. Our findings also shed light on the pathogenic mechanism of Dickeya and other soft rot Pectobacteriaceae species in general. IMPORTANCE Dickeya zeae is an important, aggressive bacterial phytopathogen that can cause severe diseases in many crops and ornamental plants, thus leading to substantial economic losses. Strains from different sources showed significant diversity in their natural hosts, suggesting complicated evolution history and pathogenic mechanisms. However, molecular mechanisms that cause the differences in the host range of D. zeae strains remain poorly understood. This study carried out genomic and functional dissections of JZL7, a D. zeae strain with restricted host range, and revealed type III secretion system (T3SS) and cell wall-degrading enzymes (CWDEs) as two major factors contributing to the host range and virulence of D. zeae, which will provide a valuable reference for the exploration of pathogenic mechanisms in other bacteria and present new insights for the control of bacterial soft rot diseases on crops.
Collapse
|
13
|
Jing R, Wen T, Liao C, Xue L, Liu F, Yu L, Luo J. DeepT3 2.0: improving type III secreted effector predictions by an integrative deep learning framework. NAR Genom Bioinform 2021; 3:lqab086. [PMID: 34617013 PMCID: PMC8489581 DOI: 10.1093/nargab/lqab086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 08/12/2021] [Accepted: 09/09/2021] [Indexed: 11/13/2022] Open
Abstract
Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.
Collapse
Affiliation(s)
- Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Tingke Wen
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Chengxiang Liao
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
14
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
15
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
16
|
Wang J, Li J, Hou Y, Dai W, Xie R, Marquez-Lago TT, Leier A, Zhou T, Torres V, Hay I, Stubenrauch C, Zhang Y, Song J, Lithgow T. BastionHub: a universal platform for integrating and analyzing substrates secreted by Gram-negative bacteria. Nucleic Acids Res 2021; 49:D651-D659. [PMID: 33084862 PMCID: PMC7778982 DOI: 10.1093/nar/gkaa899] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 09/22/2020] [Accepted: 10/01/2020] [Indexed: 01/08/2023] Open
Abstract
Gram-negative bacteria utilize secretion systems to export substrates into their surrounding environment or directly into neighboring cells. These substrates are proteins that function to promote bacterial survival: by facilitating nutrient collection, disabling competitor species or, for pathogens, to disable host defenses. Following a rapid development of computational techniques, a growing number of substrates have been discovered and subsequently validated by wet lab experiments. To date, several online databases have been developed to catalogue these substrates but they have limited user options for in-depth analysis, and typically focus on a single type of secreted substrate. We therefore developed a universal platform, BastionHub, that incorporates extensive functional modules to facilitate substrate analysis and integrates the five major Gram-negative secreted substrate types (i.e. from types I-IV and VI secretion systems). To our knowledge, BastionHub is not only the most comprehensive online database available, it is also the first to incorporate substrates secreted by type I or type II secretion systems. By providing the most up-to-date details of secreted substrates and state-of-the-art prediction and visualized relationship analysis tools, BastionHub will be an important platform that can assist biologists in uncovering novel substrates and formulating new hypotheses. BastionHub is freely available at http://bastionhub.erc.monash.edu/.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia.,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China.,School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yi Hou
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Wei Dai
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia.,School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - Von Torres
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Iain Hay
- School of Biological Sciences, The University of Auckland, Auckland 1010, New Zealand
| | - Christopher Stubenrauch
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| |
Collapse
|
17
|
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6690299. [PMID: 33505516 PMCID: PMC7806399 DOI: 10.1155/2021/6690299] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/24/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022]
Abstract
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
Collapse
|
18
|
Jing R, Li Y, Xue L, Liu F, Li M, Luo J. autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences. J Chem Inf Model 2020; 60:3755-3764. [PMID: 32786512 DOI: 10.1021/acs.jcim.0c00409] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.
Collapse
Affiliation(s)
- Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610065, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan 646000, China
| |
Collapse
|
19
|
Abstract
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions.IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
Collapse
|
20
|
Laflamme B, Dillon MM, Martel A, Almeida RND, Desveaux D, Guttman DS. The pan-genome effector-triggered immunity landscape of a host-pathogen interaction. Science 2020; 367:763-768. [PMID: 32054757 DOI: 10.1126/science.aax4079] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 10/18/2019] [Accepted: 01/17/2020] [Indexed: 12/24/2022]
Abstract
Effector-triggered immunity (ETI), induced by host immune receptors in response to microbial effectors, protects plants against virulent pathogens. However, a systematic study of ETI prevalence against species-wide pathogen diversity is lacking. We constructed the Pseudomonas syringae Type III Effector Compendium (PsyTEC) to reduce the pan-genome complexity of 5127 unique effector proteins, distributed among 70 families from 494 strains, to 529 representative alleles. We screened PsyTEC on the model plant Arabidopsis thaliana and identified 59 ETI-eliciting alleles (11.2%) from 19 families (27.1%), with orthologs distributed among 96.8% of P. syringae strains. We also identified two previously undescribed host immune receptors, including CAR1, which recognizes the conserved effectors AvrE and HopAA1, and found that 94.7% of strains harbor alleles predicted to be recognized by either CAR1 or ZAR1.
Collapse
Affiliation(s)
- Bradley Laflamme
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Marcus M Dillon
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Alexandre Martel
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Renan N D Almeida
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada
| | - Darrell Desveaux
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada.
| | - David S Guttman
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON M5S 3B2, Canada. .,Center for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5S 3B2, Canada
| |
Collapse
|
21
|
ACNNT3: Attention-CNN Framework for Prediction of Sequence-Based Bacterial Type III Secreted Effectors. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:3974598. [PMID: 32328150 PMCID: PMC7157791 DOI: 10.1155/2020/3974598] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/09/2020] [Accepted: 03/17/2020] [Indexed: 12/18/2022]
Abstract
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive. New deep-learning methods have recently been successfully applied to T3SE recognition, but improving the recognition accuracy of T3SEs is still a challenge. In this study, we developed a new deep-learning framework, ACNNT3, based on the attention mechanism. We converted 100 residues of the N-terminal of the protein sequence into a fusion feature vector of protein primary structure information (one-hot encoding) and position-specific scoring matrix (PSSM) which are used as the feature input of the network model. We then embedded the attention layer into CNN to learn the characteristic preferences of type III effector proteins, which can accurately classify any protein directly as either T3SEs or non-T3SEs. We found that the introduction of new protein features can improve the recognition accuracy of the model. Our method combines the advantages of CNN and the attention mechanism and is superior in many indicators when compared to other popular methods. Using the common independent dataset, our method is more accurate than the previous method, showing an improvement of 4.1-20.0%.
Collapse
|
22
|
Li J, Wei L, Guo F, Zou Q. EP3: an ensemble predictor that accurately identifies type III secreted effectors. Brief Bioinform 2020; 22:1918-1928. [PMID: 32043137 DOI: 10.1093/bib/bbaa008] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 12/25/2019] [Accepted: 01/10/2020] [Indexed: 01/09/2023] Open
Abstract
Type III secretion systems (T3SS) can be found in many pathogenic bacteria, such as Dysentery bacillus, Salmonella typhimurium, Vibrio cholera and pathogenic Escherichia coli. The routes of infection of these bacteria include the T3SS transferring a large number of type III secreted effectors (T3SE) into host cells, thereby blocking or adjusting the communication channels of the host cells. Therefore, the accurate identification of T3SEs is the precondition for the further study of pathogenic bacteria. In this article, a new T3SEs ensemble predictor was developed, which can accurately distinguish T3SEs from any unknown protein. In the course of the experiment, methods and models are strictly trained and tested. Compared with other methods, EP3 demonstrates better performance, including the absence of overfitting, strong robustness and powerful predictive ability. EP3 (an ensemble predictor that accurately identifies T3SEs) is designed to simplify the user's (especially nonprofessional users) access to T3SEs for further investigation, which will have a significant impact on understanding the progression of pathogenic bacterial infections. Based on the integrated model that we proposed, a web server had been established to distinguish T3SEs from non-T3SEs, where have EP3_1 and EP3_2. The users can choose the model according to the species of the samples to be tested. Our related tools and data can be accessed through the link http://lab.malab.cn/∼lijing/EP3.html.
Collapse
|
23
|
Fu X, Yang Y. WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0184-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
24
|
Sabbagh CRR, Carrere S, Lonjon F, Vailleau F, Macho AP, Genin S, Peeters N. Pangenomic type III effector database of the plant pathogenic Ralstonia spp. PeerJ 2019; 7:e7346. [PMID: 31579561 PMCID: PMC6762002 DOI: 10.7717/peerj.7346] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 06/25/2019] [Indexed: 12/21/2022] Open
Abstract
Background The bacterial plant pathogenic Ralstonia species belong to the beta-proteobacteria class and are soil-borne pathogens causing vascular bacterial wilt disease, affecting a wide range of plant hosts. These bacteria form a heterogeneous group considered as a “species complex” gathering three newly defined species. Like many other Gram negative plant pathogens, Ralstonia pathogenicity relies on a type III secretion system, enabling bacteria to secrete/inject a large repertoire of type III effectors into their plant host cells. Type III-secreted effectors (T3Es) are thought to participate in generating a favorable environment for the pathogen (countering plant immunity and modifying the host metabolism and physiology). Methods Expert genome annotation, followed by specific type III-dependent secretion, allowed us to improve our Hidden-Markov-Model and Blast profiles for the prediction of type III effectors. Results We curated the T3E repertoires of 12 plant pathogenic Ralstonia strains, representing a total of 12 strains spread over the different groups of the species complex. This generated a pangenome repertoire of 102 T3E genes and 16 hypothetical T3E genes. Using this database, we scanned for the presence of T3Es in the 155 available genomes representing 140 distinct plant pathogenic Ralstonia strains isolated from different host plants in different areas of the globe. All this information is presented in a searchable database. A presence/absence analysis, modulated by a strain sequence/gene annotation quality score, enabled us to redefine core and accessory T3E repertoires.
Collapse
Affiliation(s)
| | | | - Fabien Lonjon
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | | | - Alberto P Macho
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Stephane Genin
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-tolosan, France
| | - Nemo Peeters
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-tolosan, France
| |
Collapse
|
25
|
Zeng C, Zou L. An account of in silico identification tools of secreted effector proteins in bacteria and future challenges. Brief Bioinform 2019; 20:110-129. [PMID: 28981574 DOI: 10.1093/bib/bbx078] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Indexed: 01/08/2023] Open
Abstract
Bacterial pathogens secrete numerous effector proteins via six secretion systems, type I to type VI secretion systems, to adapt to new environments or to promote virulence by bacterium-host interactions. Many computational approaches have been used in the identification of effector proteins before the subsequent experimental verification because they tolerate laborious biological procedures and are genome scale, automated and highly efficient. Prevalent examples include machine learning methods and statistical techniques. In this article, we summarize the computational progress toward predicting secreted effector proteins in bacteria, with an opening of an introduction of features that are used to discriminate effectors from non-effectors. The mechanism, contribution and deficiency of previous developed detection tools are presented, which are further benchmarked based on a curated testing data set. According to the results of benchmarking, potential improvements of the prediction performance are discussed, which include (1) more informative features for discriminating the effectors from non-effectors; (2) the construction of comprehensive training data set of the machine learning algorithms; (3) the advancement of reliable prediction methods and (4) a better interpretation of the mechanisms behind the molecular processes. The future of in silico identification of bacterial secreted effectors includes both opportunities and challenges.
Collapse
Affiliation(s)
- Cong Zeng
- Bioinformatics Center, Third Military Medical University (TMMU), China
| | | |
Collapse
|
26
|
Rangel LT, Marden J, Colston S, Setubal JC, Graf J, Gogarten JP. Identification and characterization of putative Aeromonas spp. T3SS effectors. PLoS One 2019; 14:e0214035. [PMID: 31163020 PMCID: PMC6548356 DOI: 10.1371/journal.pone.0214035] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/21/2019] [Indexed: 11/23/2022] Open
Abstract
The genetic determinants of bacterial pathogenicity are highly variable between species and strains. However, a factor that is commonly associated with virulent Gram-negative bacteria, including many Aeromonas spp., is the type 3 secretion system (T3SS), which is used to inject effector proteins into target eukaryotic cells. In this study, we developed a bioinformatics pipeline to identify T3SS effector proteins, applied this approach to the genomes of 105 Aeromonas strains isolated from environmental, mutualistic, or pathogenic contexts and evaluated the cytotoxicity of the identified effectors through their heterologous expression in yeast. The developed pipeline uses a two-step approach, where candidate Aeromonas gene families are initially selected using Hidden Markov Model (HMM) profile searches against the Virulence Factors DataBase (VFDB), followed by strict comparisons against positive and negative control datasets, greatly reducing the number of false positives. This approach identified 21 Aeromonas T3SS likely effector families, of which 8 represent known or characterized effectors, while the remaining 13 have not previously been described in Aeromonas. We experimentally validated our in silico findings by assessing the cytotoxicity of representative effectors in Saccharomyces cerevisiae BY4741, with 15 out of 21 assayed proteins eliciting a cytotoxic effect in yeast. The results of this study demonstrate the utility of our approach, combining a novel in silico search method with in vivo experimental validation, and will be useful in future research aimed at identifying and authenticating bacterial effector proteins from other genera.
Collapse
Affiliation(s)
- Luiz Thiberio Rangel
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Jeremiah Marden
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - Sophie Colston
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
| | - João Carlos Setubal
- Interunidades em Bioinformática, Universidade de São Paulo, São Paulo, Brasil
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brasil
| | - Joerg Graf
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, United States of America
| |
Collapse
|
27
|
Dillon MM, Almeida RN, Laflamme B, Martel A, Weir BS, Desveaux D, Guttman DS. Molecular Evolution of Pseudomonas syringae Type III Secreted Effector Proteins. FRONTIERS IN PLANT SCIENCE 2019; 10:418. [PMID: 31024592 PMCID: PMC6460904 DOI: 10.3389/fpls.2019.00418] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/19/2019] [Indexed: 05/02/2023]
Abstract
Diverse Gram-negative pathogens like Pseudomonas syringae employ type III secreted effector (T3SE) proteins as primary virulence factors that combat host immunity and promote disease. T3SEs can also be recognized by plant hosts and activate an effector triggered immune (ETI) response that shifts the interaction back toward plant immunity. Consequently, T3SEs are pivotal in determining the virulence potential of individual P. syringae strains, and ultimately help to restrict P. syringae pathogens to a subset of potential hosts that are unable to recognize their repertoires of T3SEs. While a number of effector families are known to be present in the P. syringae species complex, one of the most persistent challenges has been documenting the complex variation in T3SE contents across a diverse collection of strains. Using the entire pan-genome of 494 P. syringae strains isolated from more than 100 hosts, we conducted a global analysis of all known and putative T3SEs. We identified a total of 14,613 putative T3SEs, 4,636 of which were unique at the amino acid level, and show that T3SE repertoires of different P. syringae strains vary dramatically, even among strains isolated from the same hosts. We also find substantial diversification within many T3SE families, and in many cases find strong signatures of positive selection. Furthermore, we identify multiple gene gain and loss events for several families, demonstrating an important role of horizontal gene transfer (HGT) in the evolution of P. syringae T3SEs. These analyses provide insight into the evolutionary history of P. syringae T3SEs as they co-evolve with the host immune system, and dramatically expand the database of P. syringae T3SEs alleles.
Collapse
Affiliation(s)
- Marcus M. Dillon
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Renan N.D. Almeida
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Bradley Laflamme
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Alexandre Martel
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
| | | | - Darrell Desveaux
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada
| | - David S. Guttman
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
- Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
28
|
Li J, Tai C, Deng Z, Zhong W, He Y, Ou HY. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria. Brief Bioinform 2019; 19:566-574. [PMID: 28077405 DOI: 10.1093/bib/bbw141] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Indexed: 11/13/2022] Open
Abstract
VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile.
Collapse
Affiliation(s)
- Jun Li
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P.R.China
| | - Cui Tai
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Zixin Deng
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Weihong Zhong
- College of Biotechnology and Bioengineering, Zhejiang University of Technology, Hangzhou, P.R.China
| | - Yongqun He
- Department of microbiology and immunology research, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Hong-Yu Ou
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
29
|
Borah SM, Jha AN. Identification and analysis of structurally critical fragments in HopS2. BMC Bioinformatics 2019; 19:552. [PMID: 30717655 PMCID: PMC7394326 DOI: 10.1186/s12859-018-2551-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Accepted: 11/30/2018] [Indexed: 12/02/2022] Open
Abstract
Background Among the diverse roles of the Type III secretion-system (T3SS), one of the notable functions is that it serves as unique nano machineries in gram-negative bacteria that facilitate the translocation of effector proteins from bacteria into their host. These effector proteins serve as potential targets to control the pathogenicity conferred to the bacteria. Despite being ideal choices to disrupt bacterial systems, it has been quite an ordeal in the recent times to experimentally reveal and establish a concrete sequence-structure-function relationship for these effector proteins. This work focuses on the disease-causing spectrum of an effector protein, HopS2 secreted by the phytopathogen Pseudomonas syringae pv. tomato DC3000. Results The study addresses the structural attributes of HopS2 via a bioinformatics approach to by-pass some of the experimental shortcomings resulting in mining some critical regions in the effector protein. We have elucidated the functionally important regions of HopS2 with the assistance of sequence and structural analyses. The sequence based data supports the presence of important regions in HopS2 that are present in the other functional parts of Hop family proteins. Furthermore, these regions have been validated by an ab-initio structure prediction of the protein followed by 100 ns long molecular dynamics (MD) simulation. The assessment of these secondary structural regions has revealed the stability and importance of these regions in the protein structure. Conclusions The analysis has provided insights on important functional regions that may be vital to the effector functioning. In dearth of ample experimental evidence, such a bioinformatics approach has helped in the revelation of a few structural regions which will aid in future experiments to attain and evaluate the structural and functional aspects of this protein family. Electronic supplementary material The online version of this article (10.1186/s12859-018-2551-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sapna M Borah
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, 784028, India
| | - Anupam Nath Jha
- Department of Molecular Biology and Biotechnology, Tezpur University, Tezpur, Assam, 784028, India.
| |
Collapse
|
30
|
Recombination of ecologically and evolutionarily significant loci maintains genetic cohesion in the Pseudomonas syringae species complex. Genome Biol 2019; 20:3. [PMID: 30606234 PMCID: PMC6317194 DOI: 10.1186/s13059-018-1606-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 12/06/2018] [Indexed: 01/25/2023] Open
Abstract
Background Pseudomonas syringae is a highly diverse bacterial species complex capable of causing a wide range of serious diseases on numerous agronomically important crops. We examine the evolutionary relationships of 391 agricultural and environmental strains using whole-genome sequencing and evolutionary genomic analyses. Results We describe the phylogenetic distribution of all 77,728 orthologous gene families in the pan-genome, reconstruct the core genome phylogeny using the 2410 core genes, hierarchically cluster the accessory genome, identify the diversity and distribution of type III secretion systems and their effectors, predict ecologically and evolutionary relevant loci, and establish the molecular evolutionary processes operating on gene families. Phylogenetic and recombination analyses reveals that the species complex is subdivided into primary and secondary phylogroups, with the former primarily comprised of agricultural isolates, including all of the well-studied P. syringae strains. In contrast, the secondary phylogroups include numerous environmental isolates. These phylogroups also have levels of genetic diversity typically found among distinct species. An analysis of rates of recombination within and between phylogroups revealed a higher rate of recombination within primary phylogroups than between primary and secondary phylogroups. We also find that “ecologically significant” virulence-associated loci and “evolutionarily significant” loci under positive selection are over-represented among loci that undergo inter-phylogroup genetic exchange. Conclusions While inter-phylogroup recombination occurs relatively rarely, it is an important force maintaining the genetic cohesion of the species complex, particularly among primary phylogroup strains. This level of genetic cohesion, and the shared plant-associated niche, argues for considering the primary phylogroups as a single biological species. Electronic supplementary material The online version of this article (10.1186/s13059-018-1606-y) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Xue L, Tang B, Chen W, Luo J. DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence. Bioinformatics 2018; 35:2051-2057. [DOI: 10.1093/bioinformatics/bty931] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 10/22/2018] [Accepted: 11/07/2018] [Indexed: 11/12/2022] Open
Affiliation(s)
- Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Bin Tang
- Basic Medical College of Southwest Medical University, Luzhou, Sichuan, PR, China
| | - Wei Chen
- Integrative Genomics Core, City of Hope National Medical Center, Duarte, CA, USA
| | - Jiesi Luo
- Key Laboratory for Aging and Regenerative Medicine, Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan, China
| |
Collapse
|
32
|
Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A, Hayashida M, Akutsu T, Zhang Y, Chou KC, Selkrig J, Zhou T, Song J, Lithgow T. Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics 2018; 35:2017-2028. [PMID: 30388198 PMCID: PMC7963071 DOI: 10.1093/bioinformatics/bty914] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/15/2018] [Accepted: 10/31/2018] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION http://bastion3.erc.monash.edu/. CONTACT selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Bingjiao Yang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Morihiro Hayashida
- National Institute of Technology, Matsue College, Matsue, Shimane, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Selkrig
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | | | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
33
|
An Y, Wang J, Li C, Leier A, Marquez-Lago T, Wilksch J, Zhang Y, Webb GI, Song J, Lithgow T. Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief Bioinform 2018; 19:148-161. [PMID: 27777222 DOI: 10.1093/bib/bbw100] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Indexed: 11/15/2022] Open
Abstract
Bacterial effector proteins secreted by various protein secretion systems play crucial roles in host-pathogen interactions. In this context, computational tools capable of accurately predicting effector proteins of the various types of bacterial secretion systems are highly desirable. Existing computational approaches use different machine learning (ML) techniques and heterogeneous features derived from protein sequences and/or structural information. These predictors differ not only in terms of the used ML methods but also with respect to the used curated data sets, the features selection and their prediction performance. Here, we provide a comprehensive survey and benchmarking of currently available tools for the prediction of effector proteins of bacterial types III, IV and VI secretion systems (T3SS, T4SS and T6SS, respectively). We review core algorithms, feature selection techniques, tool availability and applicability and evaluate the prediction performance based on carefully curated independent test data sets. In an effort to improve predictive performance, we constructed three ensemble models based on ML algorithms by integrating the output of all individual predictors reviewed. Our benchmarks demonstrate that these ensemble models outperform all the reviewed tools for the prediction of effector proteins of T3SS and T4SS. The webserver of the proposed ensemble methods for T3SS and T4SS effector protein prediction is freely available at http://tbooster.erc.monash.edu/index.jsp. We anticipate that this survey will serve as a useful guide for interested users and that the new ensemble predictors will stimulate research into host-pathogen relationships and inspiration for the development of new bioinformatics tools for predicting effector proteins of T3SS, T4SS and T6SS.
Collapse
|
34
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
35
|
Yang S, Li H, He H, Zhou Y, Zhang Z. Critical assessment and performance improvement of plant–pathogen protein–protein interaction prediction methods. Brief Bioinform 2017; 20:274-287. [DOI: 10.1093/bib/bbx123] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Indexed: 01/15/2023] Open
Affiliation(s)
- Shiping Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Hong Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Huaqin He
- College of Life Sciences, Fujian Agriculture and Forestry University
| | - Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| |
Collapse
|
36
|
An Y, Wang J, Li C, Revote J, Zhang Y, Naderer T, Hayashida M, Akutsu T, Webb GI, Lithgow T, Song J. SecretEPDB: a comprehensive web-based resource for secreted effector proteins of the bacterial types III, IV and VI secretion systems. Sci Rep 2017; 7:41031. [PMID: 28112271 PMCID: PMC5253721 DOI: 10.1038/srep41031] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 12/14/2016] [Indexed: 12/28/2022] Open
Abstract
Bacteria translocate effector molecules to host cells through highly evolved secretion systems. By definition, the function of these effector proteins is to manipulate host cell biology and the sequence, structural and functional annotations of these effector proteins will provide a better understanding of how bacterial secretion systems promote bacterial survival and virulence. Here we developed a knowledgebase, termed SecretEPDB (Bacterial Secreted Effector Protein DataBase), for effector proteins of type III secretion system (T3SS), type IV secretion system (T4SS) and type VI secretion system (T6SS). SecretEPDB provides enriched annotations of the aforementioned three classes of effector proteins by manually extracting and integrating structural and functional information from currently available databases and the literature. The database is conservative and strictly curated to ensure that every effector protein entry is supported by experimental evidence that demonstrates it is secreted by a T3SS, T4SS or T6SS. The annotations of effector proteins documented in SecretEPDB are provided in terms of protein characteristics, protein function, protein secondary structure, Pfam domains, metabolic pathway and evolutionary details. It is our hope that this integrated knowledgebase will serve as a useful resource for biological investigation and the generation of new hypotheses for research efforts aimed at bacterial secretion systems.
Collapse
Affiliation(s)
- Yi An
- College of Information Engineering, Northwest A&F University, Yangling 712100, China.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiawei Wang
- School of Electronic and Computer Engineering, Peking University, Beijing 100871, China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jerico Revote
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC 3800, Australia
| | - Yang Zhang
- College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Thomas Naderer
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
37
|
Goldberg T, Rost B, Bromberg Y. Computational prediction shines light on type III secretion origins. Sci Rep 2016; 6:34516. [PMID: 27713481 PMCID: PMC5054392 DOI: 10.1038/srep34516] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 09/15/2016] [Indexed: 01/27/2023] Open
Abstract
Type III secretion system is a key bacterial symbiosis and pathogenicity mechanism responsible for a variety of infectious diseases, ranging from food-borne illnesses to the bubonic plague. In many Gram-negative bacteria, the type III secretion system transports effector proteins into host cells, converting resources to bacterial advantage. Here we introduce a computational method that identifies type III effectors by combining homology-based inference with de novo predictions, reaching up to 3-fold higher performance than existing tools. Our work reveals that signals for recognition and transport of effectors are distributed over the entire protein sequence instead of being confined to the N-terminus, as was previously thought. Our scan of hundreds of prokaryotic genomes identified previously unknown effectors, suggesting that type III secretion may have evolved prior to the archaea/bacteria split. Crucially, our method performs well for short sequence fragments, facilitating evaluation of microbial communities and rapid identification of bacterial pathogenicity – no genome assembly required. pEffect and its data sets are available at http://services.bromberglab.org/peffect.
Collapse
Affiliation(s)
- Tatyana Goldberg
- Department of Informatics, Bioinformatics &Computational Biology - I12, TUM, Garching, Germany.,Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), TUM, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics &Computational Biology - I12, TUM, Garching, Germany.,Institute for Advanced Study (TUM-IAS), Garching, Germany.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Yana Bromberg
- Institute for Advanced Study (TUM-IAS), Garching, Germany.,Department of Biochemistry and Microbiology, School of Environmental and Biological Sciences, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
38
|
Eichinger V, Nussbaumer T, Platzer A, Jehl MA, Arnold R, Rattei T. EffectiveDB--updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems. Nucleic Acids Res 2015; 44:D669-74. [PMID: 26590402 PMCID: PMC4702896 DOI: 10.1093/nar/gkv1269] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/03/2015] [Indexed: 11/17/2022] Open
Abstract
Protein secretion systems play a key role in the interaction of bacteria and hosts. EffectiveDB (http://effectivedb.org) contains pre-calculated predictions of bacterial secreted proteins and of intact secretion systems. Here we describe a major update of the database, which was previously featured in the NAR Database Issue. EffectiveDB bundles various tools to recognize Type III secretion signals, conserved binding sites of Type III chaperones, Type IV secretion peptides, eukaryotic-like domains and subcellular targeting signals in the host. Beyond the analysis of arbitrary protein sequence collections, the new release of EffectiveDB also provides a ‘genome-mode’, in which protein sequences from nearly complete genomes or metagenomic bins can be screened for the presence of three important secretion systems (Type III, IV, VI). EffectiveDB contains pre-calculated predictions for currently 1677 bacterial genomes from the EggNOG 4.0 database and for additional bacterial genomes from NCBI RefSeq. The new, user-friendly and informative web portal offers a submission tool for running the EffectiveDB prediction tools on user-provided data.
Collapse
Affiliation(s)
- Valerie Eichinger
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria
| | - Thomas Nussbaumer
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria
| | - Alexander Platzer
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria
| | - Marc-André Jehl
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria
| | - Roland Arnold
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1X8, Canada
| | - Thomas Rattei
- Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1090 Vienna, Austria
| |
Collapse
|