1
|
Peng Y, Wu J, Sun Y, Zhang Y, Wang Q, Shao S. Contrastive-learning of language embedding and biological features for cross modality encoding and effector prediction. Nat Commun 2025; 16:1299. [PMID: 39900608 PMCID: PMC11791096 DOI: 10.1038/s41467-025-56526-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open
Abstract
Identifying and characterizing virulence proteins secreted by Gram-negative bacteria are fundamental for deciphering microbial pathogenicity as well as aiding the development of therapeutic strategies. Effector predictors utilizing pre-trained protein language models (PLMs) have shown sound performance by leveraging extensive evolutionary and sequential protein features. However, the accuracy and sensitivity of effector prediction remain challenging. Here, we introduce a model named Contrastive-learning of Language Embedding and Biological Features (CLEF) leveraging contrastive learning to integrate PLM representations with supplementary biological features. Biologically information is captured in learned contextualized embeddings to yield meaningful representations. With cross-modality biological features, CLEF outperforms state-of-the-art (SOTA) models in predicting type III, type IV, and type VI secreted effectors (T3SEs/T4SEs/T6SEs) in enteric pathogens. All experimentally verified effectors in Enterohemorrhagic Escherichia coli and 41 of 43 experimentally verified T3SEs of Salmonella Typhimurium are recognized. Moreover, 12 predicted T3SEs and 11 predicted T6SEs are validated by extensive experiments in Edwardsiella piscicida. Furthermore, integrating omics data via CLEF framework enhances protein representations to illustrate effector-effector interactions and determine in vivo colonization-essential genes. Collectively, CLEF provides a blueprint to bridge the gap between in silico PLM's capacity and experimental biological information to fulfill complicated tasks.
Collapse
Affiliation(s)
- Yue Peng
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Junze Wu
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Yi Sun
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
| | - Yuanxing Zhang
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), 519000, Zhuhai, China
| | - Qiyao Wang
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China
- Shanghai Engineering Research Center of Maricultured Animal Vaccines, Shanghai, China
- Laboratory of Aquatic Animal Diseases of MOA, Shanghai, China
| | - Shuai Shao
- State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai, China.
- Shanghai Engineering Research Center of Maricultured Animal Vaccines, Shanghai, China.
- Laboratory of Aquatic Animal Diseases of MOA, Shanghai, China.
| |
Collapse
|
2
|
Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024; 23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open
Abstract
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Collapse
Affiliation(s)
- Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
- Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Qinyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yanyan Zhu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Hematology, Zhejiang University School of Medicine, The First Affiliated Hospital, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
3
|
Ran Z, Wang C, Sun H, Pan S, Li F. Characterizing Secretion System Effector Proteins With Structure-Aware Graph Neural Networks and Pre-Trained Language Models. IEEE J Biomed Health Inform 2024; 28:5649-5657. [PMID: 38865232 DOI: 10.1109/jbhi.2024.3413146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024]
Abstract
The Type III Secretion Systems (T3SSs) play a pivotal role in host-pathogen interactions by mediating the secretion of type III secretion system effectors (T3SEs) into host cells. These T3SEs mimic host cell protein functions, influencing interactions between Gram-negative bacterial pathogens and their hosts. Identifying T3SEs is essential in biomedical research for comprehending bacterial pathogenesis and its implications on human cells. This study presents EDIFIER, a novel multi-channel model designed for accurate T3SE prediction. It incorporates a graph structural channel, utilizing graph convolutional networks (GCN) to capture protein 3D structural features and a sequence channel based on the ProteinBERT pre-trained model to extract the sequence context features of T3SEs. Rigorous benchmarking tests, including ablation studies and comparative analysis, validate that EDIFIER outperforms current state-of-the-art tools in T3SE prediction. To enhance EDIFIER's accessibility to the broader scientific community, we developed a webserver that is publicly accessible at http://edifier.unimelb-biotools.cloud.edu.au/. We anticipate EDIFIER will contribute to the field by providing reliable T3SE predictions, thereby advancing our understanding of host-pathogen dynamics.
Collapse
|
4
|
Tang X, Luo L, Wang S. TSE-ARF: An adaptive prediction method of effectors across secretion system types. Anal Biochem 2024; 686:115407. [PMID: 38030053 DOI: 10.1016/j.ab.2023.115407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/12/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023]
Abstract
Bacterial effector proteins are secreted by a variety of protein secretion systems and play an important role in the interaction between the host and pathogenic bacteria. Therefore, it is important to find a fast and inexpensive method to discover bacterial effectors. In this study, we propose a multi-type secretion effector adaptive random forest (TSE-ARF) to adaptively identify secretion effectors across T1SE-T4SE and T6SE based only on protein sequences. First, we proposed two new feature descriptors by considering some characteristic protein information and fused them with some universal features to form a 290-dimensional feature vector with good versatility. Then, the TSE-ARF model was used to make classification predictions by parameter adaptation of different secretion effectors integrating Shuffled Frog Leaping Algorithm and random forest. The perfect performance in TSE-ARF under different data sets and settings shows its considerable generalization ability, with which more candidate effectors were screened in the whole genome. Source code is available at https://github.com/AIMOVE/TSE-ARF.
Collapse
Affiliation(s)
- Xianjun Tang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Longfei Luo
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China; Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, Yunnan, China.
| |
Collapse
|
5
|
Nielsen H. Protein Sorting Prediction. Methods Mol Biol 2024; 2715:27-63. [PMID: 37930519 DOI: 10.1007/978-1-0716-3445-5_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global property-based, and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches are described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
| |
Collapse
|
6
|
Zhao Z, Hu Y, Hu Y, White AP, Wang Y. Features and algorithms: facilitating investigation of secreted effectors in Gram-negative bacteria. Trends Microbiol 2023; 31:1162-1178. [PMID: 37349207 DOI: 10.1016/j.tim.2023.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
Gram-negative bacteria deliver effector proteins through type III, IV, or VI secretion systems (T3SSs, T4SSs, and T6SSs) into host cells, causing infections and diseases. In general, effector proteins for each of these distinct secretion systems lack homology and are difficult to identify. Sequence analysis has disclosed many common features, helping us to understand the evolution, function, and secretion mechanisms of the effectors. In combination with various algorithms, the known common features have facilitated accurate prediction of new effectors. Ensemblers or integrated pipelines achieve a better prediction of performance, which combines multiple computational models or modules with multidimensional features. Natural language processing (NLP) models also show the merits, which could enable discovery of novel features and, in turn, facilitate more precise effector prediction, extending our knowledge about each secretion mechanism.
Collapse
Affiliation(s)
- Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Aaron P White
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China; Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen 518060, China.
| |
Collapse
|
7
|
Wagner N, Alburquerque M, Ecker N, Dotan E, Zerah B, Pena MM, Potnis N, Pupko T. Natural language processing approach to model the secretion signal of type III effectors. FRONTIERS IN PLANT SCIENCE 2022; 13:1024405. [PMID: 36388586 PMCID: PMC9659976 DOI: 10.3389/fpls.2022.1024405] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 10/11/2022] [Indexed: 06/16/2023]
Abstract
Type III effectors are proteins injected by Gram-negative bacteria into eukaryotic hosts. In many plant and animal pathogens, these effectors manipulate host cellular processes to the benefit of the bacteria. Type III effectors are secreted by a type III secretion system that must "classify" each bacterial protein into one of two categories, either the protein should be translocated or not. It was previously shown that type III effectors have a secretion signal within their N-terminus, however, despite numerous efforts, the exact biochemical identity of this secretion signal is generally unknown. Computational characterization of the secretion signal is important for the identification of novel effectors and for better understanding the molecular translocation mechanism. In this work we developed novel machine-learning algorithms for characterizing the secretion signal in both plant and animal pathogens. Specifically, we represented each protein as a vector in high-dimensional space using Facebook's protein language model. Classification algorithms were next used to separate effectors from non-effector proteins. We subsequently curated a benchmark dataset of hundreds of effectors and thousands of non-effector proteins. We showed that on this curated dataset, our novel approach yielded substantially better classification accuracy compared to previously developed methodologies. We have also tested the hypothesis that plant and animal pathogen effectors are characterized by different secretion signals. Finally, we integrated the novel approach in Effectidor, a web-server for predicting type III effector proteins, leading to a more accurate classification of effectors from non-effectors.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Edo Dotan
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Michelle Mendonca Pena
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
8
|
Meparambu Prabhakaran D, Patel HR, Sivakumar Krishnankutty Chandrika S, Thomas S. Genomic attributes differ between Vibrio parahaemolyticus environmental and clinical isolates including pathotypes. ENVIRONMENTAL MICROBIOLOGY REPORTS 2022; 14:365-375. [PMID: 34461673 DOI: 10.1111/1758-2229.13000] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 08/07/2021] [Accepted: 08/10/2021] [Indexed: 06/13/2023]
Abstract
Vibrio parahaemolyticus is a marine bacterium and causes opportunistic gastroenteritis in humans. Clinical strains of V. parahaemolyticus contain haemolysin and type III secretion systems (T3SS) that define their pathotype. A growing number of strains isolated recently from the environment have acquired these virulence genes constituting a pool of potential pathogens. This study used comparative genomics to identify genetic factors that delineate environmental and clinical V. parahaemolyticus population and understand the similarities and differences between the T3SS2 phylotypes. The comparative analysis revealed the presence of a cluster of genes belonging to bacterial cellulose synthesis (bcs) in isolates of environmental origin. This cluster, previously unreported in V. parahaemolyticus, exhibit significant similarity to that of Aliivibrio fischeri, and might dictate a potentially new mechanism of its environmental adaptation and persistence. The study also identified many genes predicted in silico to be T3SS effectors that are unique to T3SS2β of tdh- trh+ and tdh+ trh+ pathotype and having no identifiable homologue in tdh+ trh- T3SS2α. Overall, these findings highlight the importance of understanding the genes and strategies V. parahaemolyticus utilize for the myriad interactions with its hosts, either marine invertebrates or humans.
Collapse
Affiliation(s)
- Divya Meparambu Prabhakaran
- Cholera and Biofilm Research Lab, Department of Pathogen Biology, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India
| | - Hardip R Patel
- The John Curtin School of Medical Research, The Australian National University, Canberra, ACT, 2601, Australia
| | | | - Sabu Thomas
- Cholera and Biofilm Research Lab, Department of Pathogen Biology, Rajiv Gandhi Centre for Biotechnology, Thiruvananthapuram, Kerala, India
| |
Collapse
|
9
|
Wagner N, Avram O, Gold-Binshtok D, Zerah B, Teper D, Pupko T. Effectidor: an automated machine-learning-based web server for the prediction of type-III secretion system effectors. Bioinformatics 2022; 38:2341-2343. [PMID: 35157036 DOI: 10.1093/bioinformatics/btac087] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 01/31/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Type-III secretion systems are utilized by many Gram-negative bacteria to inject type-3 effectors (T3Es) to eukaryotic cells. These effectors manipulate host processes for the benefit of the bacteria and thus promote disease. They can also function as host-specificity determinants through their recognition as avirulence proteins that elicit immune response. Identifying the full effector repertoire within a set of bacterial genomes is of great importance to develop appropriate treatments against the associated pathogens. RESULTS We present Effectidor, a user-friendly web server that harnesses several machine-learning techniques to predict T3Es within bacterial genomes. We compared the performance of Effectidor to other available tools for the same task on three pathogenic bacteria. Effectidor outperformed these tools in terms of classification accuracy (area under the precision-recall curve above 0.98 in all cases). AVAILABILITY AND IMPLEMENTATION Effectidor is available at: https://effectidor.tau.ac.il, and the source code is available at: https://github.com/naamawagner/Effectidor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dafna Gold-Binshtok
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ben Zerah
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Doron Teper
- Department of Plant Pathology and Weed Research, Institute of Plant Protection Agricultural Research Organization (ARO), Volcani Center, Rishon LeZion 7505101, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
10
|
Molecular and Genomic Characterization of the Pseudomonas syringae Phylogroup 4: An Emerging Pathogen of Arabidopsis thaliana and Nicotiana benthamiana. Microorganisms 2022; 10:microorganisms10040707. [PMID: 35456758 PMCID: PMC9030749 DOI: 10.3390/microorganisms10040707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/15/2022] [Accepted: 03/21/2022] [Indexed: 12/10/2022] Open
Abstract
Environmental fluctuations such as increased temperature, water availability, and air CO2 concentration triggered by climate change influence plant disease dynamics by affecting hosts, pathogens, and their interactions. Here, we describe a newly discovered Pseudomonas syringae strain found in a natural population of Arabidopsis thaliana collected from the southwest of France. This strain, called Psy RAYR-BL, is highly virulent on natural Arabidopsis accessions, Arabidopsis model accession Columbia 0, and tobacco plants. Despite the severe disease phenotype caused by the Psy RAYR-BL strain, we identified a reduced repertoire of putative Type III virulence effectors by genomic sequencing compared to P. syringae pv tomato (Pst) DC3000. Furthermore, hopBJ1Psy is found exclusively on the Psy RAYR-BL genome but not in the Pst DC3000 genome. The plant expression of HopBJ1Psy induces ROS accumulation and cell death. In addition, HopBJ1Psy participates as a virulence factor in this plant-pathogen interaction, likely explaining the severity of the disease symptoms. This research describes the characterization of a newly discovered plant pathogen strain and possible virulence mechanisms underlying the infection process shaped by natural and changing environmental conditions.
Collapse
|
11
|
Genomic and Functional Dissections of Dickeya zeae Shed Light on the Role of Type III Secretion System and Cell Wall-Degrading Enzymes to Host Range and Virulence. Microbiol Spectr 2022; 10:e0159021. [PMID: 35107329 PMCID: PMC8809351 DOI: 10.1128/spectrum.01590-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Dickeya zeae is a worldwide destructive pathogen that causes soft rot diseases on various hosts such as rice, maize, banana, and potato. The strain JZL7 we recently isolated from clivia represents the first monocot-specific D. zeae and also has reduced pathogenicity compared to that of other D. zeae strains (e.g., EC1 and MS2). To elucidate the molecular mechanisms underlying its more restricted host range and weakened pathogenicity, we sequenced the complete genome of JZL7 and performed comparative genomic and functional analyses of JZL7 and other D. zeae strains. We found that, while having the largest genome among D. zeae strains, JZL7 lost almost the entire type III secretion system (T3SS), which is a key component of the virulence suite of many bacterial pathogens. Importantly, the deletion of T3SS in MS2 substantially diminished the expression of most type III secreted effectors (T3SEs) and MS2's pathogenicity on both dicots and monocots. Moreover, although JZL7 and MS2 share almost the same repertoire of cell wall-degrading enzymes (CWDEs), we found broad reduction in the production of CWDEs and expression levels of CWDE genes in JZL7. The lower expression of CWDEs, pectin lyases in particular, would probably make it difficult for JZL7 to break down the cell wall of dicots, which is rich in pectin. Together, our results suggest that the loss of T3SS and reduced CWDE activity together might have contributed to the host specificity and virulence of JZL7. Our findings also shed light on the pathogenic mechanism of Dickeya and other soft rot Pectobacteriaceae species in general. IMPORTANCE Dickeya zeae is an important, aggressive bacterial phytopathogen that can cause severe diseases in many crops and ornamental plants, thus leading to substantial economic losses. Strains from different sources showed significant diversity in their natural hosts, suggesting complicated evolution history and pathogenic mechanisms. However, molecular mechanisms that cause the differences in the host range of D. zeae strains remain poorly understood. This study carried out genomic and functional dissections of JZL7, a D. zeae strain with restricted host range, and revealed type III secretion system (T3SS) and cell wall-degrading enzymes (CWDEs) as two major factors contributing to the host range and virulence of D. zeae, which will provide a valuable reference for the exploration of pathogenic mechanisms in other bacteria and present new insights for the control of bacterial soft rot diseases on crops.
Collapse
|
12
|
Jing R, Wen T, Liao C, Xue L, Liu F, Yu L, Luo J. DeepT3 2.0: improving type III secreted effector predictions by an integrative deep learning framework. NAR Genom Bioinform 2021; 3:lqab086. [PMID: 34617013 PMCID: PMC8489581 DOI: 10.1093/nargab/lqab086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 08/12/2021] [Accepted: 09/09/2021] [Indexed: 11/13/2022] Open
Abstract
Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.
Collapse
Affiliation(s)
- Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Tingke Wen
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Chengxiang Liao
- School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang 550018, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
13
|
Strugnell R, Lithgow T. Why predicting secreted effectors and what they do is important: Comment on "An elegant nano-injection machinery for sabotaging the host: Role of Type III secretion systems in virulence of different human and animal pathogenic bacteria" by Hajra, Nair and Chakravortty. Phys Life Rev 2021; 39:85-87. [PMID: 34452849 DOI: 10.1016/j.plrev.2021.08.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 08/05/2021] [Indexed: 10/20/2022]
Affiliation(s)
- Richard Strugnell
- Department of Microbiology & Immunology, The University of Melbourne at the Doherty Institute, Melbourne VIC, Australia.
| | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton VIC, Australia
| |
Collapse
|
14
|
Marian M, Fujikawa T, Shimizu M. Genome analysis provides insights into the biocontrol ability of Mitsuaria sp. strain TWR114. Arch Microbiol 2021; 203:3373-3388. [PMID: 33880605 DOI: 10.1007/s00203-021-02327-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 04/07/2021] [Accepted: 04/09/2021] [Indexed: 12/31/2022]
Abstract
Mitsuaria sp. TWR114 is a biocontrol agent against tomato bacterial wilt (TBW). We aimed to gain genomic insights relevant to the biocontrol mechanisms and colonization ability of this strain. The draft genome size was found to be 5,632,523 bp, with a GC content of 69.5%, assembled into 1144 scaffolds. Genome annotation predicted a total of 4675 protein coding sequences (CDSs), 914 pseudogenes, 49 transfer RNAs, 3 noncoding RNAs, and 2 ribosomal RNAs. Genome analysis identified multiple CDSs associated with various pathways for the metabolism and transport of amino acids and carbohydrates, motility and chemotactic capacities, protection against stresses (oxidative, antibiotic, and phage), production of secondary metabolites, peptidases, quorum-quenching enzymes, and indole-3-acetic acid, as well as protein secretion systems and their related appendages. The genome resource will extend our understanding of the genomic features related to TWR114's biocontrol and colonization abilities and facilitate its development as a new biopesticide against TBW.
Collapse
Affiliation(s)
- Malek Marian
- Faculty of Applied Biological Sciences, Gifu University, Gifu, 501-1193, Japan.,College of Agriculture, Ibaraki University, Ami, Inashiki, Ibaraki, 300-0393, Japan
| | - Takashi Fujikawa
- Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Ibaraki, 305-8605, Japan
| | - Masafumi Shimizu
- Faculty of Applied Biological Sciences, Gifu University, Gifu, 501-1193, Japan.
| |
Collapse
|
15
|
Computational prediction of secreted proteins in gram-negative bacteria. Comput Struct Biotechnol J 2021; 19:1806-1828. [PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/29/2022] Open
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.
Collapse
|
16
|
Nguyen HN, Markin A, Friedberg I, Eulenstein O. Finding orthologous gene blocks in bacteria: the computational hardness of the problem and novel methods to address it. Bioinformatics 2021; 36:i668-i674. [PMID: 33381825 PMCID: PMC7773486 DOI: 10.1093/bioinformatics/btaa794] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation The evolution of complexity is one of the most fascinating and challenging problems in modern biology, and tracing the evolution of complex traits is an open problem. In bacteria, operons and gene blocks provide a model of tractable evolutionary complexity at the genomic level. Gene blocks are structures of co-located genes with related functions, and operons are gene blocks whose genes are co-transcribed on a single mRNA molecule. The genes in operons and gene blocks typically work together in the same system or molecular complex. Previously, we proposed a method that explains the evolution of orthologous gene blocks (orthoblocks) as a combination of a small set of events that take place in vertical evolution from common ancestors. A heuristic method was proposed to solve this problem. However, no study was done to identify the complexity of the problem. Results Here, we establish that finding the homologous gene block problem is NP-hard and APX-hard. We have developed a greedy algorithm that runs in polynomial time and guarantees an O(lnn) approximation. In addition, we formalize our problem as an integer linear program problem and solve it using the PuLP package and the standard CPLEX algorithm. Our exploration of several candidate operons reveals that our new method provides more optimal results than the results from the heuristic approach, and is significantly faster. Availability and implementation The software and data accompanying this paper are available under the GPLv3 and CC0 license respectively on: https://github.com/nguyenngochuy91/Relevant-Operon.
Collapse
Affiliation(s)
- Huy N Nguyen
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA.,Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA.,Interdepartmental Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA.,Interdepartmental Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
17
|
Yu L, Liu F, Li Y, Luo J, Jing R. DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors. Front Microbiol 2021; 12:605782. [PMID: 33552038 PMCID: PMC7858263 DOI: 10.3389/fmicb.2021.605782] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 01/04/2021] [Indexed: 01/17/2023] Open
Abstract
Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.
Collapse
Affiliation(s)
- Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, China
| |
Collapse
|
18
|
Littmann M, Heinzinger M, Dallago C, Olenyi T, Rost B. Embeddings from deep learning transfer GO annotations beyond homology. Sci Rep 2021; 11:1160. [PMID: 33441905 PMCID: PMC7806674 DOI: 10.1038/s41598-020-80786-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022] Open
Abstract
Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- School of Life Sciences Weihenstephan (TUM-WZW), TUM (Technical University of Munich), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
19
|
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6690299. [PMID: 33505516 PMCID: PMC7806399 DOI: 10.1155/2021/6690299] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/24/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022]
Abstract
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
Collapse
|
20
|
Jing R, Li Y, Xue L, Liu F, Li M, Luo J. autoBioSeqpy: A Deep Learning Tool for the Classification of Biological Sequences. J Chem Inf Model 2020; 60:3755-3764. [PMID: 32786512 DOI: 10.1021/acs.jcim.0c00409] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Deep learning has proven to be a powerful method with applications in various fields including image, language, and biomedical data. Thanks to the libraries and toolkits such as TensorFlow, PyTorch, and Keras, researchers can use different deep learning architectures and data sets for rapid modeling. However, the available implementations of neural networks using these toolkits are usually designed for a specific research and are difficult to transfer to other work. Here, we present autoBioSeqpy, a tool that uses deep learning for biological sequence classification. The advantage of this tool is its simplicity. Users only need to prepare the input data set and then use a command line interface. Then, autoBioSeqpy automatically executes a series of customizable steps including text reading, parameter initialization, sequence encoding, model loading, training, and evaluation. In addition, the tool provides various ready-to-apply and adapt model templates to improve the usability of these networks. We introduce the application of autoBioSeqpy on three biological sequence problems: the prediction of type III secreted proteins, protein subcellular localization, and CRISPR/Cas9 sgRNA activity. autoBioSeqpy is freely available with examples at https://github.com/jingry/autoBioSeqpy.
Collapse
Affiliation(s)
- Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Yizhou Li
- College of Cybersecurity, Sichuan University, Chengdu 610065, China
| | - Li Xue
- School of Public Health, Southwest Medical University, Luzhou, Sichuan 646000, China
| | - Fengjuan Liu
- School of Geography and Resources, Guizhou Education University, Guiyang 550018, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610065, China
| | - Jiesi Luo
- Department of Pharmacology, School of Pharmacy, Southwest Medical University, Luzhou, Sichuan 646000, China
| |
Collapse
|
21
|
Abstract
Many Gram-negative bacteria infect hosts and cause diseases by translocating a variety of type III secreted effectors (T3SEs) into the host cell cytoplasm. However, despite a dramatic increase in the number of available whole-genome sequences, it remains challenging for accurate prediction of T3SEs. Traditional prediction models have focused on atypical sequence features buried in the N-terminal peptides of T3SEs, but unfortunately, these models have had high false-positive rates. In this research, we integrated promoter information along with characteristic protein features for signal regions, chaperone-binding domains, and effector domains for T3SE prediction. Machine learning algorithms, including deep learning, were adopted to predict the atypical features mainly buried in signal sequences of T3SEs, followed by development of a voting-based ensemble model integrating the individual prediction results. We assembled this into a unified T3SE prediction pipeline, T3SEpp, which integrated the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools. The T3SEpp pipeline and sequence features observed here will facilitate the accurate identification of new T3SEs, with numerous benefits for future studies on host-pathogen interactions.IMPORTANCE Type III secreted effector (T3SE) prediction remains a big computational challenge. In practical applications, current software tools often suffer problems of high false-positive rates. One of the causal factors could be the relatively unitary type of biological features used for the design and training of the models. In this research, we made a comprehensive survey on the sequence-based features of T3SEs, including signal sequences, chaperone-binding domains, effector domains, and transcription factor binding promoter sites, and assembled a unified prediction pipeline integrating multi-aspect biological features within homology-based and multiple machine learning models. To our knowledge, we have compiled the most comprehensive biological sequence feature analysis for T3SEs in this research. The T3SEpp pipeline integrating the variety of features and assembling different models showed high accuracy, which should facilitate more accurate identification of T3SEs in new and existing bacterial whole-genome sequences.
Collapse
|
22
|
Schubert AF, Nguyen JV, Franklin TG, Geurink PP, Roberts CG, Sanderson DJ, Miller LN, Ovaa H, Hofmann K, Pruneda JN, Komander D. Identification and characterization of diverse OTU deubiquitinases in bacteria. EMBO J 2020; 39:e105127. [PMID: 32567101 PMCID: PMC7396840 DOI: 10.15252/embj.2020105127] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/19/2020] [Accepted: 05/29/2020] [Indexed: 02/06/2023] Open
Abstract
Manipulation of host ubiquitin signaling is becoming an increasingly apparent evolutionary strategy among bacterial and viral pathogens. By removing host ubiquitin signals, for example, invading pathogens can inactivate immune response pathways and evade detection. The ovarian tumor (OTU) family of deubiquitinases regulates diverse ubiquitin signals in humans. Viral pathogens have also extensively co-opted the OTU fold to subvert host signaling, but the extent to which bacteria utilize the OTU fold was unknown. We have predicted and validated a set of OTU deubiquitinases encoded by several classes of pathogenic bacteria. Biochemical assays highlight the ubiquitin and polyubiquitin linkage specificities of these bacterial deubiquitinases. By determining the ubiquitin-bound structures of two examples, we demonstrate the novel strategies that have evolved to both thread an OTU fold and recognize a ubiquitin substrate. With these new examples, we perform the first cross-kingdom structural analysis of the OTU fold that highlights commonalities among distantly related OTU deubiquitinases.
Collapse
Affiliation(s)
- Alexander F Schubert
- Medical Research Council Laboratory of Molecular BiologyCambridgeUK
- Present address:
Department of Structural BiologyGenentech Inc.South San FranciscoCAUSA
| | - Justine V Nguyen
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - Tyler G Franklin
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - Paul P Geurink
- Oncode Institute & Department of Cell and Chemical BiologyLeiden University Medical CentreLeidenThe Netherlands
| | - Cameron G Roberts
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - Daniel J Sanderson
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - Lauren N Miller
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - Huib Ovaa
- Oncode Institute & Department of Cell and Chemical BiologyLeiden University Medical CentreLeidenThe Netherlands
| | - Kay Hofmann
- Institute for GeneticsUniversity of CologneCologneGermany
| | - Jonathan N Pruneda
- Medical Research Council Laboratory of Molecular BiologyCambridgeUK
- Department of Molecular Microbiology & ImmunologyOregon Health & Science UniversityPortlandORUSA
| | - David Komander
- Medical Research Council Laboratory of Molecular BiologyCambridgeUK
- Ubiquitin Signalling DivisionThe Walter and Eliza Hall Institute of Medical ResearchParkvilleVICAustralia
- Department of Medical BiologyThe University of MelbourneMelbourneVICAustralia
| |
Collapse
|
23
|
Nguyen HN, Jain A, Eulenstein O, Friedberg I. Tracing the ancestry of operons in bacteria. Bioinformatics 2020; 35:2998-3004. [PMID: 30689726 DOI: 10.1093/bioinformatics/btz053] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Revised: 01/11/2019] [Accepted: 01/21/2019] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Complexity is a fundamental attribute of life. Complex systems are made of parts that together perform functions that a single component, or subsets of components, cannot. Examples of complex molecular systems include protein structures such as the F1Fo-ATPase, the ribosome, or the flagellar motor: each one of these structures requires most or all of its components to function properly. Given the ubiquity of complex systems in the biosphere, understanding the evolution of complexity is central to biology. At the molecular level, operons are classic examples of a complex system. An operon's genes are co-transcribed under the control of a single promoter to a polycistronic mRNA molecule, and the operon's gene products often form molecular complexes or metabolic pathways. With the large number of complete bacterial genomes available, we now have the opportunity to explore the evolution of these complex entities, by identifying possible intermediate states of operons. RESULTS In this work, we developed a maximum parsimony algorithm to reconstruct ancestral operon states, and show a simple vertical evolution model of how operons may evolve from the individual component genes. We describe several ancestral states that are plausible functional intermediate forms leading to the full operon. We also offer Reconstruction of Ancestral Gene blocks Using Events or ROAGUE as a software tool for those interested in exploring gene block and operon evolution. AVAILABILITY AND IMPLEMENTATION The software accompanying this paper is available under GPLv3 license on: https://github.com/nguyenngochuy91/Ancestral-Blocks-Reconstruction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Huy N Nguyen
- Department of Veterinary Microbiology and Preventive Medicine, lowa State University, Ames, IA, USA.,Department of Computer Science, Iowa State University, Ames, IA, USA
| | - Ashish Jain
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, lowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
24
|
Oliver C, Sánchez P, Valenzuela K, Hernández M, Pontigo JP, Rauch MC, Garduño RA, Avendaño-Herrera R, Yáñez AJ. Subcellular Location of Piscirickettsia salmonis Heat Shock Protein 60 (Hsp60) Chaperone by Using Immunogold Labeling and Proteomic Analysis. Microorganisms 2020; 8:microorganisms8010117. [PMID: 31952216 PMCID: PMC7023422 DOI: 10.3390/microorganisms8010117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/30/2019] [Accepted: 12/31/2019] [Indexed: 12/29/2022] Open
Abstract
Piscirickettsia salmonis is the causative bacterial agent of piscirickettsiosis, a systemic fish disease that significantly impacts the Chilean salmon industry. This bacterium possesses a type IV secretion system (T4SS), several proteins of the type III secretion system (T3SS), and a single heat shock protein 60 (Hsp60/GroEL). It has been suggested that due to its high antigenicity, the P. salmonis Hsp60 could be surface-exposed, translocated across the membrane, and (or) secreted into the extracellular matrix. This study tests the hypothesis that P. salmonis Hsp60 could be located on the bacterial surface. Immunogold electron microscopy and proteomic analyses suggested that although P. salmonis Hsp60 was predominantly associated with the bacterial cell cytoplasm, Hsp60-positive spots also exist on the bacterial cell envelope. IgY antibodies against P. salmonis Hsp60 protected SHK-1 cells against infection. Several bioinformatics approaches were used to assess Hsp60 translocation by the T4SS, T3SS, and T6SS, with negative results. These data support the hypothesis that small amounts of Hsp60 must reach the bacterial cell surface in a manner probably not mediated by currently characterized secretion systems, and that they remain biologically active during P. salmonis infection, possibly mediating adherence and (or) invasion.
Collapse
Affiliation(s)
- Cristian Oliver
- Laboratorio de Inmunología y Estrés de Organismos Acuáticos, Instituto de Patología Animal, Facultad de Ciencias Veterinarias, Universidad Austral de Chile, Valdivia 5090000, Chile;
| | - Patricio Sánchez
- Interdisciplinary Center for Aquaculture Research, (INCAR), Concepción 4070386, Chile;
- Instituto de Bioquímica y Microbiología, Facultad de Ciencias, Universidad Austral de Chile, Valdivia 5090000, Chile; (J.P.P.); (M.C.R.)
| | - Karla Valenzuela
- Microbiology and Immunology Department, Dalhousie University, Halifax, NS B3H 4R2, Canada; (K.V.); (R.A.G.)
| | - Mauricio Hernández
- Austral-OMICS, Faculty of Sciences, Universidad Austral de Chile, Valdivia 5090000, Chile;
| | - Juan Pablo Pontigo
- Instituto de Bioquímica y Microbiología, Facultad de Ciencias, Universidad Austral de Chile, Valdivia 5090000, Chile; (J.P.P.); (M.C.R.)
| | - Maria C. Rauch
- Instituto de Bioquímica y Microbiología, Facultad de Ciencias, Universidad Austral de Chile, Valdivia 5090000, Chile; (J.P.P.); (M.C.R.)
| | - Rafael A. Garduño
- Microbiology and Immunology Department, Dalhousie University, Halifax, NS B3H 4R2, Canada; (K.V.); (R.A.G.)
- Canadian Food Inspection Agency, Dartmouth Laboratory, Dartmouth, NS B3B 1Y9, Canada
| | - Ruben Avendaño-Herrera
- Interdisciplinary Center for Aquaculture Research, (INCAR), Concepción 4070386, Chile;
- Universidad Andrés Bello, Laboratorio de Patología de Organismos Acuáticos y Biotecnología Acuícola, Facultad Ciencias de la Vida, Viña del Mar 2531015, Chile
- Correspondence: (R.A.-H.); (A.J.Y.)
| | - Alejandro J. Yáñez
- Interdisciplinary Center for Aquaculture Research, (INCAR), Concepción 4070386, Chile;
- Facultad de Ciencias, Universidad Austral de Chile, Valdivia 5090000, Chile
- Correspondence: (R.A.-H.); (A.J.Y.)
| |
Collapse
|
25
|
Fu X, Yang Y. WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning. QUANTITATIVE BIOLOGY 2019. [DOI: 10.1007/s40484-019-0184-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
Zeng C, Zou L. An account of in silico identification tools of secreted effector proteins in bacteria and future challenges. Brief Bioinform 2019; 20:110-129. [PMID: 28981574 DOI: 10.1093/bib/bbx078] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Indexed: 01/08/2023] Open
Abstract
Bacterial pathogens secrete numerous effector proteins via six secretion systems, type I to type VI secretion systems, to adapt to new environments or to promote virulence by bacterium-host interactions. Many computational approaches have been used in the identification of effector proteins before the subsequent experimental verification because they tolerate laborious biological procedures and are genome scale, automated and highly efficient. Prevalent examples include machine learning methods and statistical techniques. In this article, we summarize the computational progress toward predicting secreted effector proteins in bacteria, with an opening of an introduction of features that are used to discriminate effectors from non-effectors. The mechanism, contribution and deficiency of previous developed detection tools are presented, which are further benchmarked based on a curated testing data set. According to the results of benchmarking, potential improvements of the prediction performance are discussed, which include (1) more informative features for discriminating the effectors from non-effectors; (2) the construction of comprehensive training data set of the machine learning algorithms; (3) the advancement of reliable prediction methods and (4) a better interpretation of the mechanisms behind the molecular processes. The future of in silico identification of bacterial secreted effectors includes both opportunities and challenges.
Collapse
Affiliation(s)
- Cong Zeng
- Bioinformatics Center, Third Military Medical University (TMMU), China
| | | |
Collapse
|
27
|
Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A, Hayashida M, Akutsu T, Zhang Y, Chou KC, Selkrig J, Zhou T, Song J, Lithgow T. Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics 2018; 35:2017-2028. [PMID: 30388198 PMCID: PMC7963071 DOI: 10.1093/bioinformatics/bty914] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/15/2018] [Accepted: 10/31/2018] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model. RESULTS In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction. AVAILABILITY AND IMPLEMENTATION http://bastion3.erc.monash.edu/. CONTACT selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| | - Jiahui Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Bingjiao Yang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Morihiro Hayashida
- National Institute of Technology, Matsue College, Matsue, Shimane, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Selkrig
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Tieli Zhou
- Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | | | - Trevor Lithgow
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
28
|
HrpE, the major component of the Xanthomonas type three protein secretion pilus, elicits plant immunity responses. Sci Rep 2018; 8:9842. [PMID: 29959345 PMCID: PMC6026121 DOI: 10.1038/s41598-018-27869-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 06/11/2018] [Indexed: 02/06/2023] Open
Abstract
Like several pathogenic bacteria, Xanthomonas infect host plants through the secretion of effector proteins by the Hrp pilus of the Type Three Protein Secretion System (T3SS). HrpE protein was identified as the major structural component of this pilus. Here, using the Xanthomonas citri subsp. citri (Xcc) HrpE as a model, a novel role for this protein as an elicitor of plant defense responses was found. HrpE triggers defense responses in host and non-host plants revealed by the development of plant lesions, callose deposition, hydrogen peroxide production and increase in the expression levels of genes related to plant defense responses. Moreover, pre-infiltration of citrus or tomato leaves with HrpE impairs later Xanthomonas infections. Particularly, HrpE C-terminal region, conserved among Xanthomonas species, was sufficient to elicit these responses. HrpE was able to interact with plant Glycine-Rich Proteins from citrus (CsGRP) and Arabidopsis (AtGRP-3). Moreover, an Arabidopsis atgrp-3 knockout mutant lost the capacity to respond to HrpE. This work demonstrate that plants can recognize the conserved C-terminal region of the T3SS pilus HrpE protein as a danger signal to defend themselves against Xanthomonas, triggering defense responses that may be mediated by GRPs.
Collapse
|
29
|
Tchagang CF, Xu R, Doumbou CL, Tambong JT. Genome analysis of two novel Pseudomonas strains exhibiting differential hypersensitivity reactions on tobacco seedlings reveals differences in nonflagellar T3SS organization and predicted effector proteins. Microbiologyopen 2018; 7:e00553. [PMID: 29464939 PMCID: PMC5911992 DOI: 10.1002/mbo3.553] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 07/17/2017] [Accepted: 07/18/2017] [Indexed: 11/06/2022] Open
Abstract
Multilocus sequence analysis (MLSA) of two new biological control strains (S1E40 and S3E12) of Pseudomonas was performed to assess their taxonomic position relative to close lineages, and comparative genomics employed to investigate whether these strains differ in key genetic features involved in hypersensitivity responses (HRs). Strain S3E12, at high concentration, incites HRs on tobacco and corn plantlets while S1E40 does not. Phylogenies based on individual genes and 16S rRNA-gyrB-rpoB-rpoD concatenated sequence data show strains S1E40 and S3E12 clustering in distinct groups. Strain S3E12 consistently clustered with Pseudomonas marginalis, a bacterium causing soft rots on plant tissues. MLSA data suggest that strains S1E40 and S3E12 are novel genotypes. This is consistent with the data of genome-based DNA-DNA homology values that are below the proposed cutoff species boundary. Comparative genomics analysis of the two strains revealed major differences in the type III secretion systems (T3SS) as well as the predicted T3SS secreted effector proteins (T3Es). One nonflagellar (NF-T3SS) and two flagellar T3SSs (F-T3SS) clusters were identified in both strains. While F-T3SS clusters in both strains were relatively conserved, the NF-T3SS clusters differed in the number of core components present. The predicted T3Es also differed in the type and number of CDSs with both strains having unique predicted protease-related effectors. In addition, the T1SS organization of the S3E12 genome has protein-coding sequences (CDSs) encoding for key factors such as T1SS secreted agglutinin repeats-toxins (a group of cytolysins and cytotoxins), a membrane fusion protein (LapC), a T1SS ATPase of LssB family (LapB), and T1SS-associated transglutaminase-like cysteine proteinase (LapP). In contrast, strain S1E40 has all CDSs for the seven-gene operon (pelA-pelG) required for Pel biosynthesis but not S3E12, suggesting that biofilm formation in these strains is modulated differently. The data presented here provide an insight of the genome organization of these two phytobacterial strains.
Collapse
Affiliation(s)
- Caetanie F. Tchagang
- Ottawa Research and Development CentreOttawaONCanada
- Institut des sciences de santé et de la vie Collège La CitéOttawaONCanada
| | - Renlin Xu
- Ottawa Research and Development CentreOttawaONCanada
| | - Cyr Lézin Doumbou
- Institut des sciences de santé et de la vie Collège La CitéOttawaONCanada
| | | |
Collapse
|
30
|
Abstract
Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.
Collapse
Affiliation(s)
- Henrik Nielsen
- Technical University of Denmark, Kemitorvet, Building 208, DK-2800, Kgs. Lyngby, Denmark.
| |
Collapse
|
31
|
Orfanoudaki G, Markaki M, Chatzi K, Tsamardinos I, Economou A. MatureP: prediction of secreted proteins with exclusive information from their mature regions. Sci Rep 2017; 7:3263. [PMID: 28607462 PMCID: PMC5468347 DOI: 10.1038/s41598-017-03557-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Collapse
Affiliation(s)
- Georgia Orfanoudaki
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece
| | - Maria Markaki
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Katerina Chatzi
- KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium
| | - Ioannis Tsamardinos
- Computer Science Department, University of Crete, Heraklion, Greece.,Gnosis Data Analysis PC, Heraklion, Greece
| | - Anastassios Economou
- Institute of Molecular Biology and Biotechnology-FORTH and Department of Biology-University of Crete, PO Box 1385, Heraklion, Crete, Greece. .,KU Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, B-3000, Leuven, Belgium.
| |
Collapse
|