1
|
Geller AM, Shalom M, Zlotkin D, Blum N, Levy A. Identification of type VI secretion system effector-immunity pairs using structural bioinformatics. Mol Syst Biol 2024:10.1038/s44320-024-00035-8. [PMID: 38658795 DOI: 10.1038/s44320-024-00035-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 03/24/2024] [Accepted: 04/09/2024] [Indexed: 04/26/2024] Open
Abstract
The type VI secretion system (T6SS) is an important mediator of microbe-microbe and microbe-host interactions. Gram-negative bacteria use the T6SS to inject T6SS effectors (T6Es), which are usually proteins with toxic activity, into neighboring cells. Antibacterial effectors have cognate immunity proteins that neutralize self-intoxication. Here, we applied novel structural bioinformatic tools to perform systematic discovery and functional annotation of T6Es and their cognate immunity proteins from a dataset of 17,920 T6SS-encoding bacterial genomes. Using structural clustering, we identified 517 putative T6E families, outperforming sequence-based clustering. We developed a logistic regression model to reliably quantify protein-protein interaction of new T6E-immunity pairs, yielding candidate immunity proteins for 231 out of the 517 T6E families. We used sensitive structure-based annotation which yielded functional annotations for 51% of the T6E families, again outperforming sequence-based annotation. Next, we validated four novel T6E-immunity pairs using basic experiments in E. coli. In particular, we showed that the Pfam domain DUF3289 is a homolog of Colicin M and that DUF943 acts as its cognate immunity protein. Furthermore, we discovered a novel T6E that is a structural homolog of SleB, a lytic transglycosylase, and identified a specific glutamate that acts as its putative catalytic residue. Overall, this study applies novel structural bioinformatic tools to T6E-immunity pair discovery, and provides an extensive database of annotated T6E-immunity pairs.
Collapse
Affiliation(s)
- Alexander M Geller
- Department of Plant Pathology and Microbiology, The Institute of Environmental Science, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Maor Shalom
- Department of Plant Pathology and Microbiology, The Institute of Environmental Science, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - David Zlotkin
- Department of Plant Pathology and Microbiology, The Institute of Environmental Science, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Noam Blum
- Department of Plant Pathology and Microbiology, The Institute of Environmental Science, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Asaf Levy
- Department of Plant Pathology and Microbiology, The Institute of Environmental Science, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
2
|
Jänes J, Beltrao P. Deep learning for protein structure prediction and design-progress and applications. Mol Syst Biol 2024; 20:162-169. [PMID: 38291232 PMCID: PMC10912668 DOI: 10.1038/s44320-024-00016-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 12/21/2023] [Accepted: 01/11/2024] [Indexed: 02/01/2024] Open
Abstract
Proteins are the key molecular machines that orchestrate all biological processes of the cell. Most proteins fold into three-dimensional shapes that are critical for their function. Studying the 3D shape of proteins can inform us of the mechanisms that underlie biological processes in living cells and can have practical applications in the study of disease mutations or the discovery of novel drug treatments. Here, we review the progress made in sequence-based prediction of protein structures with a focus on applications that go beyond the prediction of single monomer structures. This includes the application of deep learning methods for the prediction of structures of protein complexes, different conformations, the evolution of protein structures and the application of these methods to protein design. These developments create new opportunities for research that will have impact across many areas of biomedical research.
Collapse
Affiliation(s)
- Jürgen Jänes
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pedro Beltrao
- Institute of Molecular Systems Biology, ETH Zürich, 8093, Zürich, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
3
|
Varadi M, Bordin N, Orengo C, Velankar S. The opportunities and challenges posed by the new generation of deep learning-based protein structure predictors. Curr Opin Struct Biol 2023; 79:102543. [PMID: 36807079 DOI: 10.1016/j.sbi.2023.102543] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 01/04/2023] [Accepted: 01/13/2023] [Indexed: 02/21/2023]
Abstract
The function of proteins can often be inferred from their three-dimensional structures. Experimental structural biologists spent decades studying these structures, but the accelerated pace of protein sequencing continuously increases the gaps between sequences and structures. The early 2020s saw the advent of a new generation of deep learning-based protein structure prediction tools that offer the potential to predict structures based on any number of protein sequences. In this review, we give an overview of the impact of this new generation of structure prediction tools, with examples of the impacted field in the life sciences. We discuss the novel opportunities and new scientific and technical challenges these tools present to the broader scientific community. Finally, we highlight some potential directions for the future of computational protein structure prediction.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK. https://twitter.com/nicolabordin
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College, London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
4
|
Levine TP. Structural bioinformatics predicts that the Retinitis Pigmentosa-28 protein of unknown function FAM161A is a homologue of the microtubule nucleation factor Tpx2. F1000Res 2020; 9:1052. [PMID: 33093951 PMCID: PMC7551519 DOI: 10.12688/f1000research.25870.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/13/2020] [Indexed: 11/20/2022] Open
Abstract
Background: FAM161A is a microtubule-associated protein conserved widely across eukaryotes, which is mutated in the inherited blinding disease Retinitis Pigmentosa-28. FAM161A is also a centrosomal protein, being a core component of a complex that forms an internal skeleton of centrioles. Despite these observations about the importance of FAM161A, current techniques used to examine its sequence reveal no homologies to other proteins. Methods: Sequence profiles derived from multiple sequence alignments of FAM161A homologues were constructed by PSI-BLAST and HHblits, and then used by the profile-profile search tool HHsearch, implemented online as HHpred, to identify homologues. These in turn were used to create profiles for reverse searches and pair-wise searches. Multiple sequence alignments were also used to identify amino acid usage in functional elements. Results: FAM161A has a single homologue: the targeting protein for
Xenopus kinesin-like protein-2 (Tpx2), which is a strong hit across more than 200 residues. Tpx2 is also a microtubule-associated protein, and it has been shown previously by a cryo-EM molecular structure to nucleate microtubules through two small elements: an extended loop and a short helix. The homology between FAM161A and Tpx2 includes these elements, as FAM161A has three copies of the loop, and one helix that has many, but not all, properties of the one in Tpx2. Conclusions: FAM161A and its homologues are predicted to be a previously unknown variant of Tpx2, and hence bind microtubules in the same way. This prediction allows precise, testable molecular models to be made of FAM161A-microtubule complexes.
Collapse
Affiliation(s)
- Timothy P Levine
- UCL Institute of Ophthalmology, University College London, London, EC1V 9EL, UK
| |
Collapse
|
5
|
Polanco C, Samaniego Mendoza JL, Buhse T, Uversky VN, Bañuelos Chao IP, Bañuelos Cedano MA, Tavera FM, Tavera DM, Falconi M, Ponce de León AV. On the Regularities of the Polar Profiles of Proteins Related to Ebola Virus Infection and their Functional Domains. Cell Biochem Biophys 2018; 76:411-31. [PMID: 29511990 DOI: 10.1007/s12013-018-0839-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 02/16/2018] [Indexed: 11/25/2022]
Abstract
The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins—one named the “functional domain”—with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai—adding one amino acid at a time and plotting each time its polar profile—it was observed that the resulting graphs can be divided into groups with similar polar profiles.
Collapse
|
6
|
Larcombe L, Hendricusdottir R, Attwood TK, Bacall F, Beard N, Bellis LJ, Dunn WB, Hancock JM, Nenadic A, Orengo C, Overduin B, Sansone SA, Thurston M, Viant MR, Winder CL, Goble CA, Ponting CP, Rustici G. ELIXIR-UK role in bioinformatics training at the national level and across ELIXIR. F1000Res 2017; 6. [PMID: 28781748 PMCID: PMC5521157 DOI: 10.12688/f1000research.11837.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/19/2017] [Indexed: 11/20/2022] Open
Abstract
ELIXIR-UK is the UK node of ELIXIR, the European infrastructure for life science data. Since its foundation in 2014, ELIXIR-UK has played a leading role in training both within the UK and in the ELIXIR Training Platform, which coordinates and delivers training across all ELIXIR members. ELIXIR-UK contributes to the Training Platform’s coordination and supports the development of training to address key skill gaps amongst UK scientists. As part of this work it acts as a conduit for nationally-important bioinformatics training resources to promote their activities to the ELIXIR community. ELIXIR-UK also leads ELIXIR’s flagship Training Portal, TeSS, which collects information about a diverse range of training and makes it easily accessible to the community. ELIXIR-UK also works with others to provide key digital skills training, partnering with the Software Sustainability Institute to provide Software Carpentry training to the ELIXIR community and to establish the Data Carpentry initiative, and taking a lead role amongst national stakeholders to deliver the StaTS project – a coordinated effort to drive engagement with training in statistics.
Collapse
Affiliation(s)
- L Larcombe
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - R Hendricusdottir
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - T K Attwood
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - F Bacall
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - N Beard
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - L J Bellis
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - W B Dunn
- Birmingham Metabolomics Training Centre, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | | | - A Nenadic
- The Software Sustainability Institute, University of Manchester, Manchester, M13 9PL, UK
| | - C Orengo
- University College London, London, WC1E 6BT, UK
| | - B Overduin
- Edinburgh Genomics, University of Edinburgh, Edinburgh, EH9 3FL, UK
| | - S-A Sansone
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | - M Thurston
- Oxford e-Research Centre, University of Oxford, Oxford, OX1 3QG, UK
| | - M R Viant
- Birmingham Metabolomics Training Centre, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - C L Winder
- Birmingham Metabolomics Training Centre, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - C A Goble
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - C P Ponting
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - G Rustici
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| |
Collapse
|
7
|
Abstract
Calculating solvent accessible surface areas (SASA) is a run-of-the-mill calculation in structural biology. Although there are many programs available for this calculation, there are no free-standing, open-source tools designed for easy tool-chain integration. FreeSASA is an open source C library for SASA calculations that provides both command-line and Python interfaces in addition to its C API. The library implements both Lee and Richards’ and Shrake and Rupley’s approximations, and is highly configurable to allow the user to control molecular parameters, accuracy and output granularity. It only depends on standard C libraries and should therefore be easy to compile and install on any platform. The library is well-documented, stable and efficient. The command-line interface can easily replace closed source legacy programs, with comparable or better accuracy and speed, and with some added functionality.
Collapse
|