1
|
McGuffin LJ, Alhaddad SN, Behzadi B, Edmunds NS, Genc AG, Adiyaman R. Prediction and quality assessment of protein quaternary structure models using the MultiFOLD2 and ModFOLDdock2 servers. Nucleic Acids Res 2025:gkaf336. [PMID: 40276971 DOI: 10.1093/nar/gkaf336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Revised: 04/02/2025] [Accepted: 04/15/2025] [Indexed: 04/26/2025] Open
Abstract
Understanding the structures of protein complexes is pivotal for breakthroughs in health, agriculture, bioengineering, and beyond. MultiFOLD2 and ModFOLDdock2 are leading servers for protein quaternary structure prediction and model quality assessment, respectively. MultiFOLD2 includes integrated stoichiometry prediction for quaternary structures and improved sampling and scoring, leading to high performance in continuous independent benchmarks such as CAMEO. ModFOLDdock2 uses a hybrid consensus approach to generate global and local quality scores for predicted quaternary structures. ModFOLDdock2 is integrated with MultiFOLD2 while also being available as a stand-alone server, enabling the independent evaluation of quaternary structure models from any source. Both servers have been independently rigorously evaluated, demonstrating high performance and ranking among the top servers in their respective categories in the recent CASP16 experiment. The MultiFOLD2 and ModFOLDdock2 servers are freely accessible through user-friendly web interfaces at https://www.reading.ac.uk/bioinf/.
Collapse
Affiliation(s)
- Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| | - Shaima N Alhaddad
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| | - Behnosh Behzadi
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| | - Nicholas S Edmunds
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| | - Ahmet G Genc
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| | - Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6DH, United Kingdom
| |
Collapse
|
2
|
Liu J, Neupane P, Cheng J. Improving AlphaFold2 and 3-based protein complex structure prediction with MULTICOM4 in CASP16. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.06.641913. [PMID: 40161604 PMCID: PMC11952293 DOI: 10.1101/2025.03.06.641913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
With AlphaFold achieving high-accuracy tertiary structure prediction for most single-chain proteins (monomers), the next major challenge in protein structure prediction is accurately modeling multi-chain protein complexes (multimers). We developed MULTICOM4, the latest version of the MULTICOM system, to improve protein complex structure prediction by integrating transformer-based AlphaFold2, diffusion model-based AlphaFold3, and our in-house techniques. These include protein complex stoichiometry prediction, diverse multiple sequence alignment (MSA) generation leveraging both sequence and structure comparison, modeling exception handling, and deep learning-based model quality assessment. MULTICOM4 was blindly evaluated in the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) in 2024. In Phase 0 of CASP16, where stoichiometry information was unavailable, MULTICOM predictors performed best, with MULTICOM_human achieving a TM-score of 0.752 and a DockQ score of 0.584 for top-ranked predictions on average. In Phase 1 of CASP16, with stoichiometry information provided, MULTICOM_human remained among the top predictors, attaining a TM-score of 0.797 and a DockQ score of 0.558 on average. The CASP16 results demonstrate that integrating complementary AlphaFold2 and 3 with enhanced MSA inputs, comprehensive model ranking, exception handling, and accurate stoichiometry prediction can effectively improve protein complex structure prediction.
Collapse
|
3
|
Liu J, Neupane P, Cheng J. Estimating Protein Complex Model Accuracy Using Graph Transformers and Pairwise Similarity Graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.04.636562. [PMID: 39975041 PMCID: PMC11838578 DOI: 10.1101/2025.02.04.636562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Motivation Estimation of protein complex structure accuracy is an essential step in protein complex structure prediction and is also important for users to select good structural models for various applications, such as protein function analysis and drug design. Despite the success of structure prediction methods such as AlphaFold2 and AlphaFold3, predicting the quality of predicted complex structures (structural models) and selecting top ones from large model pools remains challenging. Results We present GATE, a novel method that uses graph transformers on pairwise model similarity graphs to predict the quality (accuracy) of complex structural models. By integrating single-model and multi-model quality features, GATE captures both the characteristics of individual models and the geometric similarity between them to make robust predictions. On the dataset of the 15th Critical Assessment of Protein Structure Prediction (CASP15), GATE achieved the highest Pearson's correlation (0.748) and the lowest ranking loss (0.1191) compared to existing methods. In the blind CASP16 experiment, GATE was ranked 4th according to the overall sum of z-scores of multiple metrics based on both TM-score and Oligo-GDTTS scores. In terms of per-target average metrics based on TM-score, GATE achieved a Pearson's correlation of 0.7076 (1st place among all methods), a Spearman's correlation of 0.4514 (3rd place), a ranking loss of 0.1221 (3rd place), and an Area Under the Curve (AUC) score of 0.6680 (3rd place), highlighting its strong, balanced ability of estimating complex model accuracy and selecting good models. Availability The source code of GATE is freely available at https://github.com/BioinfoMachineLearning/GATE.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, 65211, MO, USA
| | - Pawan Neupane
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, 65211, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, 65211, MO, USA
| |
Collapse
|
4
|
Wang H, Sun M, Xie L, Liu D, Zhang G. Physical-aware model accuracy estimation for protein complex using deep learning method. Comput Struct Biotechnol J 2025; 27:478-487. [PMID: 39916698 PMCID: PMC11799971 DOI: 10.1016/j.csbj.2025.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/18/2025] [Accepted: 01/21/2025] [Indexed: 02/09/2025] Open
Abstract
With the breakthrough of AlphaFold2 on monomers, the research focus of structure prediction has shifted to protein complexes, driving the continued development of new methods for multimer structure prediction. Therefore, it is crucial to accurately estimate quality scores for the multimer model independent of the used prediction methods. In this work, we propose a physical-aware deep learning method, DeepUMQA-PA, to evaluate the residue-wise quality of protein complex models. Given the input protein complex model, the residue-based contact area and orientation features were first constructed using Voronoi tessellation, representing the potential physical interactions and hydrophobic properties. Then, the relationship between local residues and the overall complex topology as well as the inter-residue evolutionary information are characterized by geometry-based features, protein language model embedding representation, and knowledge-based statistical potential features. Finally, these features are fed into a fused network architecture employing equivalent graph neural network and ResNet network to estimate residue-wise model accuracy. Experimental results on the CASP15 test set demonstrate that our method outperforms the state-of-the-art method DeepUMQA3 by 3.69 % and 3.49 % on Pearson and Spearman, respectively. Notably, our method achieved 16.8 % and 15.5 % improvement in Pearson and Spearman, respectively, for the evaluation of nanobody-antigens. In addition, DeepUMQA-PA achieved better MAE scores than AlphaFold-Multimer and AlphaFold3 self-assessment methods on 43 % and 50 % of the targets, respectively. All these results suggest that physical-aware information based on the area and orientation of atom-atom and atom-solvent contacts has the potential to capture sequence-structure-quality relationships of proteins, especially in the case of flexible proteins. The DeepUMQA-PA server is freely available at http://zhanglab-bioinf.com/DeepUMQA-PA/.
Collapse
Affiliation(s)
- Haodong Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Meng Sun
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
5
|
Olechnovič K, Banciul R, Dapkūnas J, Venclovas Č. FTDMP: A Framework for Protein-Protein, Protein-DNA, and Protein-RNA Docking and Scoring. Proteins 2025. [PMID: 39748638 DOI: 10.1002/prot.26792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Revised: 11/27/2024] [Accepted: 12/18/2024] [Indexed: 01/04/2025]
Abstract
FTDMP is a software framework for biomolecular docking and scoring. It can perform docking of subunits containing one or more protein, DNA, or RNA chains, followed by subsequent scoring of the resulting models. FTDMP can also be used for the ranking of user-provided models of biomolecular complexes, generated by any structure prediction method. FTDMP evaluates models according to the consensus-based method VoroIF-jury, which combines individual scores derived from the Voronoi tessellation of biomolecular structures. In addition to the default scoring mode, FTDMP can easily adopt additional scores; thus, it may be used as a tool to assess newly developed scoring functions. FTDMP was evaluated during blind testing in recent CAPRI experiments and using protein-protein, protein-DNA, and protein-RNA docking benchmarks. It proved to be a useful tool for different research tasks, related to modeling biomolecular interactions. The software, cleaned docking benchmarks, and benchmarking results are available at https://bioinformatics.lt/software/ftdmp/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
- Université Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Rita Banciul
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
6
|
Shuvo MH, Bhattacharya D. EquiRank: Improved protein-protein interface quality estimation using protein language-model-informed equivariant graph neural networks. Comput Struct Biotechnol J 2024; 27:160-170. [PMID: 39850657 PMCID: PMC11755013 DOI: 10.1016/j.csbj.2024.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 12/18/2024] [Accepted: 12/20/2024] [Indexed: 01/25/2025] Open
Abstract
Quality estimation of the predicted interaction interface of protein complex structural models is not only important for complex model evaluation and selection but also useful for protein-protein docking. Despite recent progress fueled by symmetry-aware deep learning architectures and pretrained protein language models (pLMs), existing methods for estimating protein complex quality have yet to fully exploit the collective potentials of these advances for accurate estimation of protein-protein interface. Here we present EquiRank, an improved protein-protein interface quality estimation method by leveraging the strength of a symmetry-aware E(3) equivariant deep graph neural network (EGNN) and integrating pLM embeddings from the pretrained ESM-2 model. Our method estimates the quality of the protein-protein interface through an effective graph-based representation of interacting residue pairs, incorporating a diverse set of features, including ESM-2 embeddings, and then by learning the representation using symmetry-aware EGNNs. Our experimental results demonstrate improved ranking performance on diverse datasets over existing latest protein complex quality estimation methods including the top-performing CASP15 protein complex quality estimation method VoroIF_GNN and the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring and across different performance evaluation metrics. Additionally, our ablation studies demonstrate the contributions of both pLMs and the equivariant nature of EGNN for improved protein-protein interface quality estimation performance. EquiRank is freely available at https://github.com/mhshuvo1/EquiRank.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Prairie View A&M University, Prairie View, 77446, TX, USA
| | | |
Collapse
|
7
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
8
|
Siciliano AJ, Zhao C, Liu T, Wang Z. EGG: Accuracy Estimation of Individual Multimeric Protein Models Using Deep Energy-Based Models and Graph Neural Networks. Int J Mol Sci 2024; 25:6250. [PMID: 38892437 PMCID: PMC11173161 DOI: 10.3390/ijms25116250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 05/25/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open
Abstract
Reliable and accurate methods of estimating the accuracy of predicted protein models are vital to understanding their respective utility. Discerning how the quaternary structure conforms can significantly improve our collective understanding of cell biology, systems biology, disease formation, and disease treatment. Accurately determining the quality of multimeric protein models is still computationally challenging, as the space of possible conformations is significantly larger when proteins form in complex with one another. Here, we present EGG (energy and graph-based architectures) to assess the accuracy of predicted multimeric protein models. We implemented message-passing and transformer layers to infer the overall fold and interface accuracy scores of predicted multimeric protein models. When evaluated with CASP15 targets, our methods achieved promising results against single model predictors: fourth and third place for determining the highest-quality model when estimating overall fold accuracy and overall interface accuracy, respectively, and first place for determining the top three highest quality models when estimating both overall fold accuracy and overall interface accuracy.
Collapse
Affiliation(s)
- Andrew Jordan Siciliano
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA; (A.J.S.); (T.L.)
| | - Chenguang Zhao
- Computer Information Sciences Department, St. Ambrose University, 518 W. Locust Street, Davenport, IA 52803, USA;
| | - Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA; (A.J.S.); (T.L.)
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL 33124, USA; (A.J.S.); (T.L.)
| |
Collapse
|
9
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
10
|
Parvathy J, Yazhini A, Srinivasan N, Sowdhamini R. Interfacial residues in protein-protein complexes are in the eyes of the beholder. Proteins 2024; 92:509-528. [PMID: 37982321 DOI: 10.1002/prot.26628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 11/21/2023]
Abstract
Interactions between proteins are vital in almost all biological processes. The characterization of protein-protein interactions helps us understand the mechanistic basis of biological processes, thereby enabling the manipulation of proteins for biotechnological and clinical purposes. The interface residues of a protein-protein complex are assumed to have the following two properties: (a) they always interact with a residue of a partner protein, which forms the basis for distance-based interface residue identification methods, and (b) they are solvent-exposed in the isolated form of the protein and become buried in the complex form, which forms the basis for Accessible Surface Area (ASA)-based methods. The study interrogates this popular assumption by recognizing interface residues in protein-protein complexes through these two methods. The results show that a few residues are identified uniquely by each method, and the extent of conservation, propensities, and their contribution to the stability of protein-protein interaction varies substantially between these residues. The case study analyses showed that interface residues, unique to distance, participate in crucial interactions that hold the proteins together, whereas the interface residues unique to the ASA method have a potential role in the recognition, dynamics, and specificity of the complex and can also be a hotspot. Overall, the study recommends applying both distance and ASA methods so that some interface residues missed by either method but crucial to the stability, recognition, dynamics, and function of protein-protein complexes are identified in a complementary manner.
Collapse
Affiliation(s)
- Jayadevan Parvathy
- Interdisciplinary Mathematical Sciences Initiative (IMI), Indian Institute of Science, Bangalore, India
- Molecular Biophysics Unit (MBU), Indian Institute of Science, Bangalore, India
| | | | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit (MBU), Indian Institute of Science, Bangalore, India
- National Center for Biological Sciences (NCBS), Bangalore, India
| |
Collapse
|
11
|
Morehead A, Liu J, Cheng J. Protein structure accuracy estimation using geometry-complete perceptron networks. Protein Sci 2024; 33:e4932. [PMID: 38380738 PMCID: PMC10880424 DOI: 10.1002/pro.4932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 01/05/2024] [Accepted: 02/01/2024] [Indexed: 02/22/2024]
Abstract
Estimating the accuracy of protein structural models is a critical task in protein bioinformatics. The need for robust methods in the estimation of protein model accuracy (EMA) is prevalent in the field of protein structure prediction, where computationally-predicted structures need to be screened rapidly for the reliability of the positions predicted for each of their amino acid residues and their overall quality. Current methods proposed for EMA are either coupled tightly to existing protein structure prediction methods or evaluate protein structures without sufficiently leveraging the rich, geometric information available in such structures to guide accuracy estimation. In this work, we propose a geometric message passing neural network referred to as the geometry-complete perceptron network for protein structure EMA (GCPNet-EMA), where we demonstrate through rigorous computational benchmarks that GCPNet-EMA's accuracy estimations are 47% faster and more than 10% (6%) more correlated with ground-truth measures of per-residue (per-target) structural accuracy compared to baseline state-of-the-art methods for tertiary (multimer) structure EMA including AlphaFold 2. The source code and data for GCPNet-EMA are available on GitHub, and a public web server implementation is freely available.
Collapse
Affiliation(s)
- Alex Morehead
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jian Liu
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer ScienceUniversity of MissouriColumbiaMissouriUSA
| |
Collapse
|
12
|
Olechnovič K, Valančauskas L, Dapkūnas J, Venclovas Č. Prediction of protein assemblies by structure sampling followed by interface-focused scoring. Proteins 2023; 91:1724-1733. [PMID: 37578163 DOI: 10.1002/prot.26569] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 07/12/2023] [Accepted: 07/31/2023] [Indexed: 08/15/2023]
Abstract
Proteins often function as part of permanent or transient multimeric complexes, and understanding function of these assemblies requires knowledge of their three-dimensional structures. While the ability of AlphaFold to predict structures of individual proteins with unprecedented accuracy has revolutionized structural biology, modeling structures of protein assemblies remains challenging. To address this challenge, we developed a protocol for predicting structures of protein complexes involving model sampling followed by scoring focused on the subunit-subunit interaction interface. In this protocol, we diversified AlphaFold models by varying construction and pairing of multiple sequence alignments as well as increasing the number of recycles. In cases when AlphaFold failed to assemble a full protein complex or produced unreliable results, additional diverse models were constructed by docking of monomers or subcomplexes. All the models were then scored using a newly developed method, VoroIF-jury, which relies only on structural information. Notably, VoroIF-jury is independent of AlphaFold self-assessment scores and therefore can be used to rank models originating from different structure prediction methods. We tested our protocol in CASP15 and obtained top results, significantly outperforming the standard AlphaFold-Multimer pipeline. Analysis of our results showed that the accuracy of our assembly models was capped mainly by structure sampling rather than model scoring. This observation suggests that better sampling, especially for the antibody-antigen complexes, may lead to further improvement. Our protocol is expected to be useful for modeling and/or scoring protein assemblies.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Lukas Valančauskas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
13
|
Liu J, Liu D, Zhang GJ. DeepUMQA3: a web server for accurate assessment of interface residue accuracy in protein complexes. Bioinformatics 2023; 39:btad591. [PMID: 37740296 PMCID: PMC10560100 DOI: 10.1093/bioinformatics/btad591] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/21/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Model quality assessment is a crucial part of protein structure prediction and a gateway to proper usage of models in biomedical applications. Many methods have been proposed for assessing the quality of structural models of protein monomers, but few methods for evaluating protein complex models. As protein complex structure prediction becomes a new challenge, there is an urgent need for model quality assessment methods that can accurately assess the accuracy of interface residues of complex structures. RESULTS Here, we present DeepUMQA3, a web server for evaluating the accuracy of interface residues of protein complex structures using deep neural networks. For an input complex structure, features are extracted from three levels of overall complex, intra-monomer, and inter-monomer, and an improved deep residual neural network is used to predict per-residue lDDT and interface residue accuracy. DeepUMQA3 ranks first in the blind test of interface residue accuracy estimation in CASP15, with Pearson, Spearman, and AUC of 0.564, 0.535, and 0.755 under the lDDT measurement, which are 17.6%, 23.6%, and 10.9% higher than the second best method, respectively. DeepUMQA3 can also assess the accuracy of all residues in the entire complex and distinguish high- and low-precision residues. AVAILABILITY AND IMPLEMENTATION The web sever of DeepUMQA3 are freely available at http://zhanglab-bioinf.com/DeepUMQA_server/.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|