1
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
2
|
Larrea-Sebal A, Jebari-Benslaiman S, Galicia-Garcia U, Jose-Urteaga AS, Uribe KB, Benito-Vicente A, Martín C. Predictive Modeling and Structure Analysis of Genetic Variants in Familial Hypercholesterolemia: Implications for Diagnosis and Protein Interaction Studies. Curr Atheroscler Rep 2023; 25:839-859. [PMID: 37847331 PMCID: PMC10618353 DOI: 10.1007/s11883-023-01154-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2023] [Indexed: 10/18/2023]
Abstract
PURPOSE OF REVIEW Familial hypercholesterolemia (FH) is a hereditary condition characterized by elevated levels of low-density lipoprotein cholesterol (LDL-C), which increases the risk of cardiovascular disease if left untreated. This review aims to discuss the role of bioinformatics tools in evaluating the pathogenicity of missense variants associated with FH. Specifically, it highlights the use of predictive models based on protein sequence, structure, evolutionary conservation, and other relevant features in identifying genetic variants within LDLR, APOB, and PCSK9 genes that contribute to FH. RECENT FINDINGS In recent years, various bioinformatics tools have emerged as valuable resources for analyzing missense variants in FH-related genes. Tools such as REVEL, Varity, and CADD use diverse computational approaches to predict the impact of genetic variants on protein function. These tools consider factors such as sequence conservation, structural alterations, and receptor binding to aid in interpreting the pathogenicity of identified missense variants. While these predictive models offer valuable insights, the accuracy of predictions can vary, especially for proteins with unique characteristics that might not be well represented in the databases used for training. This review emphasizes the significance of utilizing bioinformatics tools for assessing the pathogenicity of FH-associated missense variants. Despite their contributions, a definitive diagnosis of a genetic variant necessitates functional validation through in vitro characterization or cascade screening. This step ensures the precise identification of FH-related variants, leading to more accurate diagnoses. Integrating genetic data with reliable bioinformatics predictions and functional validation can enhance our understanding of the genetic basis of FH, enabling improved diagnosis, risk stratification, and personalized treatment for affected individuals. The comprehensive approach outlined in this review promises to advance the management of this inherited disorder, potentially leading to better health outcomes for those affected by FH.
Collapse
Affiliation(s)
- Asier Larrea-Sebal
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
- Fundación Biofisika Bizkaia, 48940, Leioa, Spain
| | - Shifa Jebari-Benslaiman
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Unai Galicia-Garcia
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - Ane San Jose-Urteaga
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Kepa B Uribe
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
| | - Asier Benito-Vicente
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain
| | - César Martín
- Department of Biochemistry and Molecular Biology, Universidad del País Vasco UPV/EHU, 48080, Bilbao, Spain.
- Department of Molecular Biophysics, Biofisika Institute, University of Basque Country and Consejo Superior de Investigaciones Científicas (UPV/EHU, CSIC), 48940, Leioa, Spain.
| |
Collapse
|
3
|
Chen Z, Liu N, Huang Y, Min X, Zeng X, Ge S, Zhang J, Xia N. PointDE: Protein Docking Evaluation Using 3D Point Cloud Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3128-3138. [PMID: 37220029 DOI: 10.1109/tcbb.2023.3279019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Protein-protein interactions (PPIs) play essential roles in many vital movements and the determination of protein complex structure is helpful to discover the mechanism of PPI. Protein-protein docking is being developed to model the structure of the protein. However, there is still a challenge to selecting the near-native decoys generated by protein-protein docking. Here, we propose a docking evaluation method using 3D point cloud neural network named PointDE. PointDE transforms protein structure to the point cloud. Using the state-of-the-art point cloud network architecture and a novel grouping mechanism, PointDE can capture the geometries of the point cloud and learn the interaction information from the protein interface. On public datasets, PointDE surpasses the state-of-the-art method using deep learning. To further explore the ability of our method in different types of protein structures, we developed a new dataset generated by high-quality antibody-antigen complexes. The result in this antibody-antigen dataset shows the strong performance of PointDE, which will be helpful for the understanding of PPI mechanisms.
Collapse
|
4
|
Barradas-Bautista D, Almajed A, Oliva R, Kalnis P, Cavallo L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. BIOINFORMATICS ADVANCES 2023; 3:vbad012. [PMID: 36789292 PMCID: PMC9923443 DOI: 10.1093/bioadv/vbad012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/20/2023] [Accepted: 02/01/2023] [Indexed: 02/04/2023]
Abstract
Motivation Protein-protein interactions drive many relevant biological events, such as infection, replication and recognition. To control or engineer such events, we need to access the molecular details of the interaction provided by experimental 3D structures. However, such experiments take time and are expensive; moreover, the current technology cannot keep up with the high discovery rate of new interactions. Computational modeling, like protein-protein docking, can help to fill this gap by generating docking poses. Protein-protein docking generally consists of two parts, sampling and scoring. The sampling is an exhaustive search of the tridimensional space. The caveat of the sampling is that it generates a large number of incorrect poses, producing a highly unbalanced dataset. This limits the utility of the data to train machine learning classifiers. Results Using weak supervision, we developed a data augmentation method that we named hAIkal. Using hAIkal, we increased the labeled training data to train several algorithms. We trained and obtained different classifiers; the best classifier has 81% accuracy and 0.51 Matthews' correlation coefficient on the test set, surpassing the state-of-the-art scoring functions. Availability and implementation Docking models from Benchmark 5 are available at https://doi.org/10.5281/zenodo.4012018. Processed tabular data are available at https://repository.kaust.edu.sa/handle/10754/666961. Google colab is available at https://colab.research.google.com/drive/1vbVrJcQSf6\_C3jOAmZzgQbTpuJ5zC1RP?usp=sharing. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Ali Almajed
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Romina Oliva
- Department of Sciences and Technologies, University of Naples “Parthenope”, I-80143 Naples, Italy
| | - Panos Kalnis
- Computer, Electrical and Mathematical Science and Engineering Division, Kaust Extreme Computing Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Luigi Cavallo
- Physical Sciences and Engineering Division, Kaust Catalysis Center, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
5
|
Jung Y, Geng C, Bonvin AMJJ, Xue LC, Honavar VG. MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations. Biomolecules 2023; 13:121. [PMID: 36671507 PMCID: PMC9855734 DOI: 10.3390/biom13010121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/22/2022] [Accepted: 12/26/2022] [Indexed: 01/11/2023] Open
Abstract
Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Cunliang Geng
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Li C. Xue
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboudumc, Greet Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | - Vasant G. Honavar
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA 16802, USA
- College of Information Sciences & Technology, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA 16823, USA
| |
Collapse
|
6
|
Bernau CR, Knödler M, Emonts J, Jäpel RC, Buyel JF. The use of predictive models to develop chromatography-based purification processes. Front Bioeng Biotechnol 2022; 10:1009102. [PMID: 36312533 PMCID: PMC9605695 DOI: 10.3389/fbioe.2022.1009102] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Chromatography is the workhorse of biopharmaceutical downstream processing because it can selectively enrich a target product while removing impurities from complex feed streams. This is achieved by exploiting differences in molecular properties, such as size, charge and hydrophobicity (alone or in different combinations). Accordingly, many parameters must be tested during process development in order to maximize product purity and recovery, including resin and ligand types, conductivity, pH, gradient profiles, and the sequence of separation operations. The number of possible experimental conditions quickly becomes unmanageable. Although the range of suitable conditions can be narrowed based on experience, the time and cost of the work remain high even when using high-throughput laboratory automation. In contrast, chromatography modeling using inexpensive, parallelized computer hardware can provide expert knowledge, predicting conditions that achieve high purity and efficient recovery. The prediction of suitable conditions in silico reduces the number of empirical tests required and provides in-depth process understanding, which is recommended by regulatory authorities. In this article, we discuss the benefits and specific challenges of chromatography modeling. We describe the experimental characterization of chromatography devices and settings prior to modeling, such as the determination of column porosity. We also consider the challenges that must be overcome when models are set up and calibrated, including the cross-validation and verification of data-driven and hybrid (combined data-driven and mechanistic) models. This review will therefore support researchers intending to establish a chromatography modeling workflow in their laboratory.
Collapse
Affiliation(s)
- C. R. Bernau
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - M. Knödler
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. Emonts
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
| | - R. C. Jäpel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
| | - J. F. Buyel
- Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Aachen, Germany
- Institute for Molecular Biotechnology, RWTH Aachen University, Aachen, Germany
- University of Natural Resources and Life Sciences, Vienna (BOKU), Department of Biotechnology (DBT), Institute of Bioprocess Science and Engineering (IBSE), Vienna, Austria
- *Correspondence: J. F. Buyel,
| |
Collapse
|
7
|
Xu G, Wang Y, Wang Q, Ma J. Studying protein-protein interaction through side-chain modeling method OPUS-Mut. Brief Bioinform 2022; 23:6663639. [PMID: 35959990 DOI: 10.1093/bib/bbac330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/17/2022] [Accepted: 07/20/2022] [Indexed: 12/12/2022] Open
Abstract
Protein side chains are vitally important to many biological processes such as protein-protein interaction. In this study, we evaluate the performance of our previous released side-chain modeling method OPUS-Mut, together with some other methods, on three oligomer datasets, CASP14 (11), CAMEO-Homo (65) and CAMEO-Hetero (21). The results show that OPUS-Mut outperforms other methods measured by all residues or by the interfacial residues. We also demonstrate our method on evaluating protein-protein docking pose on a dataset Oligomer-Dock (75) created using the top 10 predictions from ZDOCK 3.0.2. Our scoring function correctly identifies the native pose as the top-1 in 45 out of 75 targets. Different from traditional scoring functions, our method is based on the overall side-chain packing favorableness in accordance with the local packing environment. It emphasizes the significance of side chains and provides a new and effective scoring term for studying protein-protein interaction.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| | - Yilin Wang
- Georgetown Preparatory School, North Bethesda, MD 20852, USA
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China.,Shanghai AI Laboratory, Shanghai 200030, China
| |
Collapse
|
8
|
Yin R, Feng BY, Varshney A, Pierce BG. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci 2022; 31:e4379. [PMID: 35900023 PMCID: PMC9278006 DOI: 10.1002/pro.4379] [Citation(s) in RCA: 183] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 06/06/2022] [Accepted: 06/09/2022] [Indexed: 12/17/2022]
Abstract
High-resolution experimental structural determination of protein-protein interactions has led to valuable mechanistic insights, yet due to the massive number of interactions and experimental limitations there is a need for computational methods that can accurately model their structures. Here we explore the use of the recently developed deep learning method, AlphaFold, to predict structures of protein complexes from sequence. With a benchmark of 152 diverse heterodimeric protein complexes, multiple implementations and parameters of AlphaFold were tested for accuracy. Remarkably, many cases (43%) had near-native models (medium or high critical assessment of predicted interactions accuracy) generated as top-ranked predictions by AlphaFold, greatly surpassing the performance of unbound protein-protein docking (9% success rate for near-native top-ranked models), however AlphaFold modeling of antibody-antigen complexes within our set was unsuccessful. We identified sequence and structural features associated with lack of AlphaFold success, and we also investigated the impact of multiple sequence alignment input. Benchmarking of a multimer-optimized version of AlphaFold (AlphaFold-Multimer) with a set of recently released antibody-antigen structures confirmed a low rate of success for antibody-antigen complexes (11% success), and we found that T cell receptor-antigen complexes are likewise not accurately modeled by that algorithm, showing that adaptive immune recognition poses a challenge for the current AlphaFold algorithm and model. Overall, our study demonstrates that end-to-end deep learning can accurately model many transient protein complexes, and highlights areas of improvement for future developments to reliably model any protein-protein interaction of interest.
Collapse
Affiliation(s)
- Rui Yin
- Institute for Bioscience and Biotechnology ResearchUniversity of MarylandRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brandon Y. Feng
- Department of Computer ScienceUniversity of MarylandCollege ParkMarylandUSA
| | - Amitabh Varshney
- Department of Computer ScienceUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- Institute for Bioscience and Biotechnology ResearchUniversity of MarylandRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
- Marlene and Stewart Greenebaum Comprehensive Cancer CenterUniversity of Maryland School of MedicineBaltimoreMarylandUSA
| |
Collapse
|
9
|
Barradas-Bautista D, Cao Z, Vangone A, Oliva R, Cavallo L. A random forest classifier for protein-protein docking models. BIOINFORMATICS ADVANCES 2021; 2:vbab042. [PMID: 36699405 PMCID: PMC9710594 DOI: 10.1093/bioadv/vbab042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 11/11/2021] [Accepted: 12/06/2021] [Indexed: 01/28/2023]
Abstract
Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated 3 × 10 4 docking models for each of the 230 complexes in the protein-protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of ≈ 7 × 10 6 docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions. Supplementary information Supplementary data are available at Bioinformatics Advances online. Software and data availability statement The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.
Collapse
Affiliation(s)
- Didier Barradas-Bautista
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| | - Zhen Cao
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia
| | - Anna Vangone
- Pharma Research and Early Development, Therapeutic Modalities, Roche Innovation Center Munich Large Molecule Research, 82377 Penzberg, Germany
| | - Romina Oliva
- Department of Sciences and Technologies, University Parthenope of Naples, Centro Direzionale Isola C4, I-80143 Naples, Italy,To whom correspondence should be addressed. or or
| | - Luigi Cavallo
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| |
Collapse
|
10
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
11
|
Jandova Z, Vargiu AV, Bonvin AMJJ. Native or Non-Native Protein-Protein Docking Models? Molecular Dynamics to the Rescue. J Chem Theory Comput 2021; 17:5944-5954. [PMID: 34342983 PMCID: PMC8444332 DOI: 10.1021/acs.jctc.1c00336] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Indexed: 11/29/2022]
Abstract
Molecular docking excels at creating a plethora of potential models of protein-protein complexes. To correctly distinguish the favorable, native-like models from the remaining ones remains, however, a challenge. We assessed here if a protocol based on molecular dynamics (MD) simulations would allow distinguishing native from non-native models to complement scoring functions used in docking. To this end, the first models for 25 protein-protein complexes were generated using HADDOCK. Next, MD simulations complemented with machine learning were used to discriminate between native and non-native complexes based on a combination of metrics reporting on the stability of the initial models. Native models showed higher stability in almost all measured properties, including the key ones used for scoring in the Critical Assessment of PRedicted Interaction (CAPRI) competition, namely the positional root mean square deviations and fraction of native contacts from the initial docked model. A random forest classifier was trained, reaching a 0.85 accuracy in correctly distinguishing native from non-native complexes. Reasonably modest simulation lengths of the order of 50-100 ns are sufficient to reach this accuracy, which makes this approach applicable in practice.
Collapse
Affiliation(s)
- Zuzana Jandova
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Attilio Vittorio Vargiu
- Physics
Department, University of Cagliari, Cittadella
Universitaria, S.P. 8 km 0.700, 09042 Monserrato, Italy
| | - Alexandre M. J. J. Bonvin
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
12
|
Agamennone M, Nicoli A, Bayer S, Weber V, Borro L, Gupta S, Fantacuzzi M, Di Pizio A. Protein-protein interactions at a glance: Protocols for the visualization of biomolecular interactions. Methods Cell Biol 2021; 166:271-307. [PMID: 34752337 DOI: 10.1016/bs.mcb.2021.06.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Protein-protein interactions (PPIs) play a key role in many biological processes and are intriguing targets for drug discovery campaigns. Advancements in experimental and computational techniques are leading to a growth of data accessibility, and, with it, an increased need for the analysis of PPIs. In this respect, visualization tools are essential instruments to represent and analyze biomolecular interactions. In this chapter, we reviewed some of the available tools, highlighting their features, and describing their functions with practical information on their usage.
Collapse
Affiliation(s)
| | - Alessandro Nicoli
- Leibniz-Institute for Food Systems Biology at the Technical University of Munich, Freising, Germany
| | - Sebastian Bayer
- Leibniz-Institute for Food Systems Biology at the Technical University of Munich, Freising, Germany
| | - Verena Weber
- Leibniz-Institute for Food Systems Biology at the Technical University of Munich, Freising, Germany
| | - Luca Borro
- Department of Imaging, Advanced Cardiovascular Imaging Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Shailendra Gupta
- Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
| | | | - Antonella Di Pizio
- Leibniz-Institute for Food Systems Biology at the Technical University of Munich, Freising, Germany.
| |
Collapse
|
13
|
Sotudian S, Desta IT, Hashemi N, Zarbafian S, Kozakov D, Vakili P, Vajda S, Paschalidis IC. Improved cluster ranking in protein-protein docking using a regression approach. Comput Struct Biotechnol J 2021; 19:2269-2278. [PMID: 33995918 PMCID: PMC8102165 DOI: 10.1016/j.csbj.2021.04.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 04/08/2021] [Accepted: 04/09/2021] [Indexed: 11/21/2022] Open
Abstract
We develop a Regression-based Ranking by Pairwise Cluster Comparisons (RRPCC) method to rank clusters of similar protein complex conformations generated by an underlying docking program. The method leverages robust regression to predict the relative quality difference between any pair or clusters and combines these pairwise assessments to form a ranked list of clusters, from higher to lower quality. We apply RRPCC to clusters produced by the automated docking server ClusPro and, depending on the training/validation strategy, we show improvement by 24-100% in ranking acceptable or better quality clusters first, and by 15-100% in ranking medium or better quality clusters first. We compare the RRPCC-ClusPro combination to a number of alternatives, and show that very different machine learning approaches to scoring docked structures yield similar success rates. Finally, we discuss the current limitations on sampling and scoring, looking ahead to further improvements. Interestingly, some features important for improved scoring are internal energy terms that occur only due to the local energy minimization applied in the refinement stage following rigid body docking.
Collapse
Affiliation(s)
| | | | - Nasser Hashemi
- Division of Systems Engineering, Boston University, Boston, USA
| | | | - Dima Kozakov
- Laufer Center for Physical and Quantitative Biology, Institute for Advanced Computational Sciences, Stony Brook University, Stony Brook, USA
| | - Pirooz Vakili
- Division of Systems Engineering, Boston University, Boston, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University
- Department of Chemistry, Boston University
| | - Ioannis Ch. Paschalidis
- Division of Systems Engineering, Boston University, Boston, USA
- Department of Biomedical Engineering, Boston University
- Department of Electrical & Computer Engineering, and Faculty for Computing & Data Sciences, Boston University
| |
Collapse
|
14
|
Guest JD, Vreven T, Zhou J, Moal I, Jeliazkov JR, Gray JJ, Weng Z, Pierce BG. An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants. Structure 2021; 29:606-621.e5. [PMID: 33539768 DOI: 10.1016/j.str.2021.01.005] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 11/15/2020] [Accepted: 01/11/2021] [Indexed: 01/04/2023]
Abstract
Accurate predictive modeling of antibody-antigen complex structures and structure-based antibody design remain major challenges in computational biology, with implications for biotherapeutics, immunity, and vaccines. Through a systematic search for high-resolution structures of antibody-antigen complexes and unbound antibody and antigen structures, in conjunction with identification of experimentally determined binding affinities, we have assembled a non-redundant set of test cases for antibody-antigen docking and affinity prediction. This benchmark more than doubles the number of antibody-antigen complexes and corresponding affinities available in our previous benchmarks, providing an unprecedented view of the determinants of antibody recognition and insights into molecular flexibility. Initial assessments of docking and affinity prediction tools highlight the challenges posed by this diverse set of cases, which includes camelid nanobodies, therapeutic monoclonal antibodies, and broadly neutralizing antibodies targeting viral glycoproteins. This dataset will enable development of advanced predictive modeling and design methods for this therapeutically relevant class of protein-protein interactions.
Collapse
Affiliation(s)
- Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA; Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Thom Vreven
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Jing Zhou
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Iain Moal
- Computational Sciences, GlaxoSmithKline Research and Development, Stevenage SG1 2NY, UK
| | - Jeliazko R Jeliazkov
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA; Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
15
|
Desta IT, Porter KA, Xia B, Kozakov D, Vajda S. Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure 2020; 28:1071-1081.e3. [PMID: 32649857 DOI: 10.1016/j.str.2020.06.006] [Citation(s) in RCA: 419] [Impact Index Per Article: 83.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 04/19/2020] [Accepted: 06/19/2020] [Indexed: 12/13/2022]
Abstract
The development of fast Fourier transform (FFT) algorithms enabled the sampling of billions of complex conformations and thus revolutionized protein-protein docking. FFT-based methods are now widely available and have been used in hundreds of thousands of docking calculations. Although the methods perform "soft" docking, which allows for some overlap of component proteins, the rigid body assumption clearly introduces limitations on accuracy and reliability. In addition, the method can work only with energy expressions represented by sums of correlation functions. In this paper we use a well-established protein-protein docking benchmark set to evaluate the results of these limitations by focusing on the performance of the docking server ClusPro, which implements one of the best rigid body methods. Furthermore, we explore the theoretical limits of accuracy when using established energy terms for scoring, provide comparison with flexible docking algorithms, and review the historical performance of servers in the CAPRI docking experiment.
Collapse
Affiliation(s)
- Israel T Desta
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Bing Xia
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
16
|
Rosell M, Fernández-Recio J. Docking approaches for modeling multi-molecular assemblies. Curr Opin Struct Biol 2020; 64:59-65. [PMID: 32615514 PMCID: PMC7324114 DOI: 10.1016/j.sbi.2020.05.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 05/13/2020] [Accepted: 05/21/2020] [Indexed: 12/12/2022]
Abstract
Computational docking approaches aim to overcome the limited availability of experimental structural data on protein-protein interactions, which are key in biology. The field is rapidly moving from the traditional docking methodologies for modeling of binary complexes to more integrative approaches using template-based, data-driven modeling of multi-molecular assemblies. We will review here the predictive capabilities of current docking methods in blind conditions, based on the results from the most recent community-wide blind experiments. Integration of template-based and ab initio docking approaches is emerging as the optimal strategy for modeling protein complexes and multimolecular assemblies. We will also review the new methodological advances on ab initio docking and integrative modeling.
Collapse
Affiliation(s)
- Mireia Rosell
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de La Rioja - Gobierno de La Rioja, 26007 Logroño, Spain.
| |
Collapse
|
17
|
Abstract
Many of the biological functions of the cell are driven by protein-protein interactions. However, determining which proteins interact and exactly how they do so to enable their functions, remain major research questions. Functional interactions are dependent on a number of complicated factors; therefore, modeling the three-dimensional structure of protein-protein complexes is still considered a complex endeavor. Nevertheless, the rewards for modeling protein interactions to atomic level detail are substantial, and there are numerous examples of how models can provide useful information for drug design, protein engineering, systems biology, and understanding of the immune system. Here, we provide practical guidelines for docking proteins using the web-server, SwarmDock, a flexible protein-protein docking method. Moreover, we provide an overview of the factors that need to be considered when deciding whether docking is likely to be successful.
Collapse
Affiliation(s)
- Iain H Moal
- European Bioinformatics Institute, Hinxton, UK
| | | | | | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK.
| |
Collapse
|
18
|
Geng C, Jung Y, Renaud N, Honavar V, Bonvin AMJJ, Xue LC. iScore: a novel graph kernel-based function for scoring protein-protein docking models. Bioinformatics 2020; 36:112-121. [PMID: 31199455 PMCID: PMC6956772 DOI: 10.1093/bioinformatics/btz496] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 05/08/2019] [Accepted: 06/11/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein complexes play critical roles in many aspects of biological functions. Three-dimensional (3D) structures of protein complexes are critical for gaining insights into structural bases of interactions and their roles in the biomolecular pathways that orchestrate key cellular processes. Because of the expense and effort associated with experimental determinations of 3D protein complex structures, computational docking has evolved as a valuable tool to predict 3D structures of biomolecular complexes. Despite recent progress, reliably distinguishing near-native docking conformations from a large number of candidate conformations, the so-called scoring problem, remains a major challenge. RESULTS Here we present iScore, a novel approach to scoring docked conformations that combines HADDOCK energy terms with a score obtained using a graph representation of the protein-protein interfaces and a measure of evolutionary conservation. It achieves a scoring performance competitive with, or superior to, that of state-of-the-art scoring functions on two independent datasets: (i) Docking software-specific models and (ii) the CAPRI score set generated by a wide variety of docking approaches (i.e. docking software-non-specific). iScore ranks among the top scoring approaches on the CAPRI score set (13 targets) when compared with the 37 scoring groups in CAPRI. The results demonstrate the utility of combining evolutionary, topological and energetic information for scoring docked conformations. This work represents the first successful demonstration of graph kernels to protein interfaces for effective discrimination of near-native and non-native conformations of protein complexes. AVAILABILITY AND IMPLEMENTATION The iScore code is freely available from Github: https://github.com/DeepRank/iScore (DOI: 10.5281/zenodo.2630567). And the docking models used are available from SBGrid: https://data.sbgrid.org/dataset/684). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cunliang Geng
- Bijvoet Center for Biomolecular Research, Faculty of Science – Chemistry, Utrecht University, Utrecht 3584 CH, The Netherlands
| | - Yong Jung
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16823, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Nicolas Renaud
- Netherlands eScience Center, Amsterdam 1098 XG, The Netherlands
| | - Vasant Honavar
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16823, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA 16823, USA
- Institute for Cyberscience, University Park, PA 16802, USA
- Clinical and Translational Sciences Institute, University Park, PA 16802, USA
- College of Information Sciences & Technology, Pennsylvania State University, University Park, PA 16802, USA
| | - Alexandre M J J Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science – Chemistry, Utrecht University, Utrecht 3584 CH, The Netherlands
| | - Li C Xue
- Bijvoet Center for Biomolecular Research, Faculty of Science – Chemistry, Utrecht University, Utrecht 3584 CH, The Netherlands
| |
Collapse
|
19
|
Renaud N, Jung Y, Honavar V, Geng C, Bonvin AM, Xue LC. iScore: An MPI supported software for ranking protein-protein docking models based on a random walk graph kernel and support vector machines. SOFTWAREX 2020; 11:100462. [PMID: 35419466 PMCID: PMC9005067 DOI: 10.1016/j.softx.2020.100462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Computational docking is a promising tool to model three-dimensional (3D) structures of protein-protein complexes, which provides fundamental insights of protein functions in the cellular life. Singling out near-native models from the huge pool of generated docking models (referred to as the scoring problem) remains as a major challenge in computational docking. We recently published iScore, a novel graph kernel based scoring function. iScore ranks docking models based on their interface graph similarities to the training interface graph set. iScore uses a support vector machine approach with random-walk graph kernels to classify and rank protein-protein interfaces. Here, we present the software for iScore. The software provides executable scripts that fully automate the computational workflow. In addition, the creation and analysis of the interface graph can be distributed across different processes using Message Passing interface (MPI) and can be offloaded to GPUs thanks to dedicated CUDA kernels.
Collapse
Affiliation(s)
- Nicolas Renaud
- Netherlands eScience Center, Science Park 140, 1098 XG, Amsterdam, The Netherlands
| | - Yong Jung
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
| | - Vasant Honavar
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- College of Information Science & Technology, Pennsylvania State University, University Park, PA 16802, USA
| | - Cunliang Geng
- Netherlands eScience Center, Science Park 140, 1098 XG, Amsterdam, The Netherlands
- Bijvoet Centre for Biomolecular Research Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Alexandre M.J.J. Bonvin
- Bijvoet Centre for Biomolecular Research Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Li C. Xue
- Bijvoet Centre for Biomolecular Research Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, The Netherlands
| |
Collapse
|
20
|
Rosell M, Rodríguez‐Lumbreras LA, Romero‐Durana M, Jiménez‐García B, Díaz L, Fernández‐Recio J. Integrative modeling of protein‐protein interactions with pyDock for the new docking challenges. Proteins 2019; 88:999-1008. [DOI: 10.1002/prot.25858] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/30/2019] [Accepted: 11/15/2019] [Indexed: 01/12/2023]
Affiliation(s)
- Mireia Rosell
- Barcelona Supercomputing Center (BSC) Barcelona Spain
- Instituto de Ciencias de la Vid y del Vino (CSIC, Universidad de La Rioja, Gobierno de La Rioja) Logroño Spain
| | - Luis A. Rodríguez‐Lumbreras
- Barcelona Supercomputing Center (BSC) Barcelona Spain
- Instituto de Ciencias de la Vid y del Vino (CSIC, Universidad de La Rioja, Gobierno de La Rioja) Logroño Spain
| | - Miguel Romero‐Durana
- Barcelona Supercomputing Center (BSC) Barcelona Spain
- Instituto de Ciencias de la Vid y del Vino (CSIC, Universidad de La Rioja, Gobierno de La Rioja) Logroño Spain
- Structural Biology Unit, Instituto de Biología Molecular de Barcelona (IBMB‐CSIC) Barcelona Spain
| | | | - Lucía Díaz
- Barcelona Supercomputing Center (BSC) Barcelona Spain
| | - Juan Fernández‐Recio
- Barcelona Supercomputing Center (BSC) Barcelona Spain
- Instituto de Ciencias de la Vid y del Vino (CSIC, Universidad de La Rioja, Gobierno de La Rioja) Logroño Spain
- Structural Biology Unit, Instituto de Biología Molecular de Barcelona (IBMB‐CSIC) Barcelona Spain
| |
Collapse
|
21
|
Jankauskaite J, Jiménez-García B, Dapkunas J, Fernández-Recio J, Moal IH. SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics 2019; 35:462-469. [PMID: 30020414 PMCID: PMC6361233 DOI: 10.1093/bioinformatics/bty635] [Citation(s) in RCA: 189] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/17/2018] [Indexed: 11/18/2022] Open
Abstract
Motivation Understanding the relationship between the sequence, structure, binding energy, binding kinetics and binding thermodynamics of protein–protein interactions is crucial to understanding cellular signaling, the assembly and regulation of molecular complexes, the mechanisms through which mutations lead to disease, and protein engineering. Results We present SKEMPI 2.0, a major update to our database of binding free energy changes upon mutation for structurally resolved protein–protein interactions. This version now contains manually curated binding data for 7085 mutations, an increase of 133%, including changes in kinetics for 1844 mutations, enthalpy and entropy changes for 443 mutations, and 440 mutations, which abolish detectable binding. Availability and implementation The database is available as supplementary data and at https://life.bsc.es/pid/skempi2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Justina Jankauskaite
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Brian Jiménez-García
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Bijvoet Center for Biomolecular Research, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | - Justas Dapkunas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Juan Fernández-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Institut de Biologia Molecular de Barcelona (IBMB), CSIC, Barcelona, Spain
| | - Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
22
|
Porter KA, Desta I, Kozakov D, Vajda S. What method to use for protein-protein docking? Curr Opin Struct Biol 2019; 55:1-7. [PMID: 30711743 PMCID: PMC6669123 DOI: 10.1016/j.sbi.2018.12.010] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 12/22/2018] [Indexed: 10/27/2022]
Abstract
A number of well-established servers perform 'free' docking of proteins of known structures. In contrast, template-based docking can start from sequences if structures are available for complexes that are homologous to the target. On the basis of the results of the CAPRI-CASP structure prediction experiments, template-based methods yield more accurate predictions if good templates can be found, but generally fail without such templates. However, free global docking, or focused docking around even poor quality template-based models, can still generate acceptable docked structures in these cases. In accordance with the analysis of a benchmark set, free docking of heterodimers yields acceptable or better predictions in the top 10 models for around 40% of structures. However, it is likely that a combination of template-based and free docking methods can perform better for targets that have template structures available. Another way of improving the reliability of predictions is adding experimental information as restraints, an option built into several docking servers.
Collapse
Affiliation(s)
- Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, NY, USA; Laufer Center for Physical and Quantitative Biology, Stony Brook University, NY, USA.
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; Department of Chemistry, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
23
|
Geng C, Xue LC, Roel‐Touris J, Bonvin AMJJ. Finding the ΔΔ
G
spot: Are predictors of binding affinity changes upon mutations in protein–protein interactions ready for it? WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1410] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Cunliang Geng
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Li C. Xue
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Jorge Roel‐Touris
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science—Chemistry Utrecht University Utrecht The Netherlands
| |
Collapse
|
24
|
Pfeiffenberger E, Bates PA. Refinement of protein-protein complexes in contact map space with metadynamics simulations. Proteins 2019; 87:12-22. [PMID: 30370948 PMCID: PMC6492248 DOI: 10.1002/prot.25612] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Revised: 09/21/2018] [Accepted: 09/26/2018] [Indexed: 12/18/2022]
Abstract
Accurate protein-protein complex prediction, to atomic detail, is a challenging problem. For flexible docking cases, current state-of-the-art docking methods are limited in their ability to exhaustively search the high dimensionality of the problem space. In this study, to obtain more accurate models, an investigation into the local optimization of initial docked solutions is presented with respect to a reference crystal structure. We show how physics-based refinement of protein-protein complexes in contact map space (CMS), within a metadynamics protocol, can be performed. The method uses 5 times replicated 10 ns simulations for sampling and ranks the generated conformational snapshots with ZRANK to identify an ensemble of n snapshots for final model building. Furthermore, we investigated whether the reconstructed free energy surface (FES), or a combination of both FES and ZRANK, referred to as CSα , can help to reduce snapshot ranking error.
Collapse
Affiliation(s)
- Erik Pfeiffenberger
- Biomolecular Modelling LaboratoryThe Francis Crick InstituteLondonUnited Kingdom
| | - Paul A. Bates
- Biomolecular Modelling LaboratoryThe Francis Crick InstituteLondonUnited Kingdom
| |
Collapse
|
25
|
Zarbafian S, Moghadasi M, Roshandelpoor A, Nan F, Li K, Vakli P, Vajda S, Kozakov D, Paschalidis IC. Protein docking refinement by convex underestimation in the low-dimensional subspace of encounter complexes. Sci Rep 2018; 8:5896. [PMID: 29650980 PMCID: PMC5955889 DOI: 10.1038/s41598-018-23982-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Accepted: 03/21/2018] [Indexed: 01/18/2023] Open
Abstract
We propose a novel stochastic global optimization algorithm with applications to the refinement stage of protein docking prediction methods. Our approach can process conformations sampled from multiple clusters, each roughly corresponding to a different binding energy funnel. These clusters are obtained using a density-based clustering method. In each cluster, we identify a smooth “permissive” subspace which avoids high-energy barriers and then underestimate the binding energy function using general convex polynomials in this subspace. We use the underestimator to bias sampling towards its global minimum. Sampling and subspace underestimation are repeated several times and the conformations sampled at the last iteration form a refined ensemble. We report computational results on a comprehensive benchmark of 224 protein complexes, establishing that our refined ensemble significantly improves the quality of the conformations of the original set given to the algorithm. We also devise a method to enhance the ensemble from which near-native models are selected.
Collapse
Affiliation(s)
- Shahrooz Zarbafian
- Department of Mechanical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Mohammad Moghadasi
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Athar Roshandelpoor
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Feng Nan
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Keyong Li
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Pirooz Vakli
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America.,Department of Mechanical Engineering, Boston University, Boston, Massachusetts, United States of America
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America.
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics and Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America.
| | - Ioannis Ch Paschalidis
- Division of Systems Engineering, Boston University, Boston, Massachusetts, United States of America. .,Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America. .,Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts, United States of America. .,8 Saint Mary's St., Boston, MA, 02215, United States of America.
| |
Collapse
|
26
|
Abstract
The atomic structures of protein complexes can provide useful information for drug design, protein engineering, systems biology, and understanding pathology. Obtaining this information experimentally can be challenging. However, if the structures of the subunits are known, then it is often possible to model the complex computationally. This chapter provide practical guidelines for docking proteins using the SwarmDock flexible protein-protein docking method, providing an overview of the factors that need to be considered when deciding whether docking is likely to be successful, the preparation of structural input, generation of docked poses, analysis and ranking of docked poses, and the validation of models using external data.
Collapse
Affiliation(s)
- Iain H Moal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|