1
|
Shirali A, Stebliankin V, Karki U, Shi J, Chapagain P, Narasimhan G. A comprehensive survey of scoring functions for protein docking models. BMC Bioinformatics 2025; 26:25. [PMID: 39844036 PMCID: PMC11755896 DOI: 10.1186/s12859-024-05991-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 11/18/2024] [Indexed: 01/24/2025] Open
Abstract
BACKGROUND While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes. RESULTS In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications. CONCLUSIONS We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.
Collapse
Affiliation(s)
- Azam Shirali
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Vitalii Stebliankin
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Ukesh Karki
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Jimeng Shi
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Prem Chapagain
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA.
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA.
| |
Collapse
|
2
|
Oriol F, Alberto M, Joachim AP, Patrick G, M BP, Ruben MF, Jaume B, Altair CH, Ferran P, Oriol G, Narcis FF, Baldo O. Structure-based learning to predict and model protein-DNA interactions and transcription-factor co-operativity in cis-regulatory elements. NAR Genom Bioinform 2024; 6:lqae068. [PMID: 38867914 PMCID: PMC11167492 DOI: 10.1093/nargab/lqae068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 04/18/2024] [Accepted: 05/23/2024] [Indexed: 06/14/2024] Open
Abstract
Transcription factor (TF) binding is a key component of genomic regulation. There are numerous high-throughput experimental methods to characterize TF-DNA binding specificities. Their application, however, is both laborious and expensive, which makes profiling all TFs challenging. For instance, the binding preferences of ∼25% human TFs remain unknown; they neither have been determined experimentally nor inferred computationally. We introduce a structure-based learning approach to predict the binding preferences of TFs and the automated modelling of TF regulatory complexes. We show the advantage of using our approach over the classical nearest-neighbor prediction in the limits of remote homology. Starting from a TF sequence or structure, we predict binding preferences in the form of motifs that are then used to scan a DNA sequence for occurrences. The best matches are either profiled with a binding score or collected for their subsequent modeling into a higher-order regulatory complex with DNA. Co-operativity is modelled by: (i) the co-localization of TFs and (ii) the structural modeling of protein-protein interactions between TFs and with co-factors. We have applied our approach to automatically model the interferon-β enhanceosome and the pioneering complexes of OCT4, SOX2 (or SOX11) and KLF4 with a nucleosome, which are compared with the experimentally known structures.
Collapse
Affiliation(s)
- Fornes Oriol
- Centre for Molecular Medicine and Therapeutics. BC Children's Hospital Research Institute. Department of Medical Genetics. University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Meseguer Alberto
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | | | - Gohl Patrick
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bota Patricia M
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Molina-Fernández Ruben
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Bonet Jaume
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
- Laboratory of Protein Design & Immunoengineering. School of Engineering. Ecole Polytechnique Federale de Lausanne. Lausanne 1015, Vaud, Switzerland
| | - Chinchilla-Hernandez Altair
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Pegenaute Ferran
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Gallego Oriol
- Live-Cell Structural Biology. Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| | - Fernandez-Fuentes Narcis
- Institute of Biological, Environmental and Rural Science. Aberystwyth University, SY23 3DA Aberystwyth, UK
| | - Oliva Baldo
- Structural Bioinformatics Lab (GRIB-IMIM). Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona 08005 Catalonia, Spain
| |
Collapse
|
3
|
Sen N, Madhusudhan MS. A structural database of chain–chain and domain–domain interfaces of proteins. Protein Sci 2022. [DOI: 10.1002/pro.4406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Neeladri Sen
- Indian Institute of Science Education and Research Pune India
- Institute of Structural and Molecular Biology University College London London UK
| | | |
Collapse
|
4
|
Sen N, Anishchenko I, Bordin N, Sillitoe I, Velankar S, Baker D, Orengo C. Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs. Brief Bioinform 2022; 23:bbac187. [PMID: 35641150 PMCID: PMC9294430 DOI: 10.1093/bib/bbac187] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Collapse
Affiliation(s)
- Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| |
Collapse
|
5
|
Rajendran M, Ferran MC, Babbitt GA. Identifying vaccine escape sites via statistical comparisons of short-term molecular dynamics. BIOPHYSICAL REPORTS 2022; 2:100056. [PMID: 35403093 PMCID: PMC8978532 DOI: 10.1016/j.bpr.2022.100056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 03/31/2022] [Indexed: 01/08/2023]
Abstract
The identification of viral mutations that confer escape from antibodies is crucial for understanding the interplay between immunity and viral evolution. We describe a molecular dynamics (MD)-based approach that goes beyond contact mapping, scales well to a desktop computer with a modern graphics processor, and enables the user to identify functional protein sites that are prone to vaccine escape in a viral antigen. We first implement our MD pipeline to employ site-wise calculation of Kullback-Leibler divergence in atom fluctuation over replicate sets of short-term MD production runs thus enabling a statistical comparison of the rapid motion of influenza hemagglutinin (HA) in both the presence and absence of three well-known neutralizing antibodies. Using this simple comparative method applied to motions of viral proteins, we successfully identified in silico all previously empirically confirmed sites of escape in influenza HA, predetermined via selection experiments and neutralization assays. Upon the validation of our computational approach, we then surveyed potential hotspot residues in the receptor binding domain of the SARS-CoV-2 virus in the presence of COVOX-222 and S2H97 antibodies. We identified many single sites in the antigen-antibody interface that are similarly prone to potential antibody escape and that match many of the known sites of mutations arising in the SARS-CoV-2 variants of concern. In the Omicron variant, we find only minimal adaptive evolutionary shifts in the functional binding profiles of both antibodies. In summary, we provide an inexpensive and accurate computational method to monitor hotspots of functional evolution in antibody binding footprints.
Collapse
|
6
|
Guo L, He J, Lin P, Huang SY, Wang J. TRScore: a three-dimensional RepVGG-based scoring method for ranking protein docking models. Bioinformatics 2022; 38:2444-2451. [PMID: 35199137 DOI: 10.1093/bioinformatics/btac120] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 01/19/2022] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPI) play important roles in cellular activities. Due to the technical difficulty and high cost of experimental methods, there are considerable interests towards the development of computational approaches, such as protein docking, to decipher PPI patterns. One of the important and difficult aspects in protein docking is recognizing near-native conformations from a set of decoys, but unfortunately traditional scoring functions still suffer from limited accuracy. Therefore, new scoring methods are pressingly needed in methodological and/or practical implications. RESULTS We present a new deep learning-based scoring method for ranking protein-protein docking models based on a three-dimensional (3D) RepVGG network, named TRScore. To recognize near-native conformations from a set of decoys, TRScore voxelizes the protein-protein interface into a 3D grid labeled by the number of atoms in different physicochemical classes. Benefiting from the deep convolutional RepVGG architecture, TRScore can effectively capture the subtle differences between energetically favorable near-native models and unfavorable non-native decoys without needing extra information. TRScore was extensively evaluated on diverse test sets including protein-protein docking benchmark 5.0 update set, DockGround decoy set, as well as realistic CAPRI decoy set, and overall obtained a significant improvement over existing methods in cross validation and independent evaluations. AVAILABILITY Codes available at: https://github.com/BioinformaticsCSU/TRScore.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jianxin Wang
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
7
|
Sharma VK, Lahiri M. Interplay between p300 and HDAC1 regulate acetylation and stability of Api5 to regulate cell proliferation. Sci Rep 2021; 11:16427. [PMID: 34385547 PMCID: PMC8361156 DOI: 10.1038/s41598-021-95941-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Api5, is a known anti-apoptotic and nuclear protein that is responsible for inhibiting cell death in serum-starved conditions. The only known post-translational modification of Api5 is acetylation at lysine 251 (K251). K251 acetylation of Api5 is responsible for maintaining its stability while the de-acetylated form of Api5 is unstable. This study aimed to find out the enzymes regulating acetylation and deacetylation of Api5 and the effect of acetylation on its function. Our studies suggest that acetylation of Api5 at lysine 251 is mediated by p300 histone acetyltransferase while de-acetylation is carried out by HDAC1. Inhibition of acetylation by p300 leads to a reduction in Api5 levels while inhibition of deacetylation by HDAC1 results in increased levels of Api5. This dynamic switch between acetylation and deacetylation regulates the localisation of Api5 in the cell. This study also demonstrates that the regulation of acetylation and deacetylation of Api5 is an essential factor for the progression of the cell cycle.
Collapse
Affiliation(s)
- Virender Kumar Sharma
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune, Maharashtra, 411008, India
| | - Mayurika Lahiri
- Department of Biology, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune, Maharashtra, 411008, India.
| |
Collapse
|