1
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
2
|
Guo L, Qiu T, Wang J. ViTScore: A Novel Three-Dimensional Vision Transformer Method for Accurate Prediction of Protein-Ligand Docking Poses. IEEE Trans Nanobioscience 2023; 22:734-743. [PMID: 37159314 DOI: 10.1109/tnb.2023.3274640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery, and due to the complexity and high cost of experimental methods, there is a great demand for computational approaches, such as protein-ligand docking, to decipher PLI patterns. One of the most challenging aspects of protein-ligand docking is to identify near-native conformations from a set of poses, but traditional scoring functions still have limited accuracy. Therefore, new scoring methods are urgently needed for methodological and/or practical implications. We present a novel deep learning-based scoring function for ranking protein-ligand docking poses based on Vision Transformer (ViT), named ViTScore. To recognize near-native poses from a set of poses, ViTScore voxelizes the protein-ligand interactional pocket into a 3D grid labeled by the occupancy contribution of atoms in different physicochemical classes. This allows ViTScore to capture the subtle differences between spatially and energetically favorable near-native poses and unfavorable non-native poses without needing extra information. After that, ViTScore will output the prediction of the root mean square deviation (rmsd) of a docking pose with reference to the native binding pose. ViTScore is extensively evaluated on diverse test sets including PDBbind2019 and CASF2016, and obtains significant improvements over existing methods in terms of RMSE, R and docking power. Moreover, the results demonstrate that ViTScore is a promising scoring function for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Furthermore, the results suggest that ViTScore is a powerful tool for protein-ligand docking, and it can be used to accurately identify near-native poses from a set of poses. Additionally, ViTScore can be used to identify potential drug targets and to design new drugs with improved efficacy and safety.
Collapse
|
3
|
Zhang X, Shen C, Wang T, Deng Y, Kang Y, Li D, Hou T, Pan P. ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions. Brief Bioinform 2023; 24:bbad295. [PMID: 37738401 DOI: 10.1093/bib/bbad295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/17/2023] [Accepted: 07/31/2023] [Indexed: 09/24/2023] Open
Abstract
Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Tianyue Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yafeng Deng
- Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou 310018, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Dan Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
4
|
Liu C, Kutchukian P, Nguyen ND, AlQuraishi M, Sorger PK. A Hybrid Structure-Based Machine Learning Approach for Predicting Kinase Inhibition by Small Molecules. J Chem Inf Model 2023; 63:5457-5472. [PMID: 37595065 PMCID: PMC10498990 DOI: 10.1021/acs.jcim.3c00347] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Indexed: 08/20/2023]
Abstract
Kinases have been the focus of drug discovery programs for three decades leading to over 70 therapeutic kinase inhibitors and biophysical affinity measurements for over 130,000 kinase-compound pairs. Nonetheless, the precise target spectrum for many kinases remains only partly understood. In this study, we describe a computational approach to unlocking qualitative and quantitative kinome-wide binding measurements for structure-based machine learning. Our study has three components: (i) a Kinase Inhibitor Complex (KinCo) data set comprising in silico predicted kinase structures paired with experimental binding constants, (ii) a machine learning loss function that integrates qualitative and quantitative data for model training, and (iii) a structure-based machine learning model trained on KinCo. We show that our approach outperforms methods trained on crystal structures alone in predicting binary and quantitative kinase-compound interaction affinities; relative to structure-free methods, our approach also captures known kinase biochemistry and more successfully generalizes to distant kinase sequences and compound scaffolds.
Collapse
Affiliation(s)
- Changchang Liu
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| | - Peter Kutchukian
- Novartis
Institutes for Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Nhan D. Nguyen
- Pritzker
School of Molecular Engineering, University
of Chicago, Chicago, Illinois 60637, United
States
| | - Mohammed AlQuraishi
- Department
of Systems Biology, Columbia University, New York, New York 10032, United States
| | - Peter K. Sorger
- Laboratory
of Systems Pharmacology, Department of Systems Biology, Harvard Program
in Therapeutic Science, Harvard Medical
School, Boston, Massachusetts 02115, United States
| |
Collapse
|
5
|
Zhang X, Shen C, Jiang D, Zhang J, Ye Q, Xu L, Hou T, Pan P, Kang Y. TB-IECS: an accurate machine learning-based scoring function for virtual screening. J Cheminform 2023; 15:63. [PMID: 37403155 DOI: 10.1186/s13321-023-00731-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/18/2023] [Indexed: 07/06/2023] Open
Abstract
Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.
Collapse
Affiliation(s)
- Xujun Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jintu Zhang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qing Ye
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of, Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
6
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
7
|
Fundamental considerations in drug design. COMPUTER AIDED DRUG DESIGN (CADD): FROM LIGAND-BASED METHODS TO STRUCTURE-BASED APPROACHES 2022:17-55. [PMCID: PMC9212230 DOI: 10.1016/b978-0-323-90608-1.00005-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The drug discovery paradigm has been very time-consuming, challenging, and expensive; however, the disease conditions originating from bacteria, virus, protozoa, fungus and other microorganisms are steadily shooting up. For instance, COVID-19 is the latest viral infection that affects millions of people and the world’s economy very severely. Therefore, the quest for discovery of novel and potent drug compounds against deadly pathogens is crucial at the moment. Despite a lot of drawbacks in drug discovery and development and its pertaining technology, the advancement must be taken into account so the time duration and cost would be minimized. In this chapter, basic principles in drug design and discovery have been discussed together with advances in drug development.
Collapse
|
8
|
Masoudi-Sobhanzadeh Y, Jafari B, Parvizpour S, Pourseif MM, Omidi Y. A novel multi-objective metaheuristic algorithm for protein-peptide docking and benchmarking on the LEADS-PEP dataset. Comput Biol Med 2021; 138:104896. [PMID: 34601392 DOI: 10.1016/j.compbiomed.2021.104896] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 01/03/2023]
Abstract
Protein-peptide interactions have attracted the attention of many drug discovery scientists due to their possible druggability features on most key biological activities such as regulating disease-related signaling pathways and enhancing the immune system's responses. Different studies have utilized some protein-peptide-specific docking algorithms/methods to predict protein-peptide interactions. However, the existing algorithms/methods suffer from two serious limitations which make them unsuitable for protein-peptide docking problems. First, it seems that the prevalent approaches require to be modified and remodeled for weighting the unbounded forces between a protein and a peptide. Second, they do not employ state-of-the-art search algorithms for detecting the 3D pose of a peptide relative to a protein. To address these restrictions, the present study aims to introduce a novel multi-objective algorithm, which first generates some potential 3D poses of a peptide, and then, improves them through its operators. The candidate solutions are further evaluated using Multi-Objective Pareto Front (MOPF) optimization concepts. To this end, van der Waals, electrostatic, solvation, and hydrogen bond energies between the atoms of a protein and designated peptide are computed. To evaluate the algorithm, it is first applied to the LEADS-PEP dataset containing 53 protein-peptide complexes with up to 53 rotatable branches/bonds and then compared with three popular/efficient algorithms. The obtained results indicate that the MOPF-based approaches which reduce the backbone RMSD between the original and predicted states, achieve significantly better results in terms of the success rate in predicting the near-native conditions. Besides, a comparison between the different types of search algorithms reveals that efficient ones like the multi-objective Trader/differential evolution algorithm can predict protein-peptide interactions better than the popular algorithms such as the multi-objective genetic/particle swarm optimization algorithms.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Behzad Jafari
- Department of Medicinal Chemistry, Faculty of Pharmacy, Urmia University of Medical Sciences, Urmia, Iran
| | - Sepideh Parvizpour
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mohammad M Pourseif
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
9
|
Kadukova M, Machado KDS, Chacón P, Grudinin S. KORP-PL: a coarse-grained knowledge-based scoring function for protein-ligand interactions. Bioinformatics 2021; 37:943-950. [PMID: 32840574 DOI: 10.1093/bioinformatics/btaa748] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/27/2020] [Accepted: 08/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Despite the progress made in studying protein-ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations. RESULTS Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction. AVAILABILITYAND IMPLEMENTATION The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Kadukova
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Karina Dos Santos Machado
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Computational Biology Laboratory, Centro de Ciências Computacionais, Universidade Federal do Rio Grande - FURG, Rio Grande, RS 96201-090, Brazil
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid 28006, Spain
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
10
|
Bao J, He X, Zhang JZH. DeepBSP-a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures. J Chem Inf Model 2021; 61:2231-2240. [PMID: 33979150 DOI: 10.1021/acs.jcim.1c00334] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In recent years, machine-learning-based scoring functions have significantly improved the scoring power. However, many of these methods do not perform well in distinguishing the native structure from docked decoy poses due to the lack of decoy structural information in their training data. Here, we developed a machine-learning model, named DeepBSP, that can directly predict the root mean square deviation (rmsd) of a ligand docking pose with reference to its native binding pose. Unlike the binding affinity, the rmsd between the docking poses with reference to their native structures can be straightforwardly determined. By training on a generated data set with 11,925 native complexes and more than 165,000 docked poses, our model shows excellent docking power on our test set and also on the CASF-2016 docking decoy set compared to other major scoring functions. Thus, by combining molecular dockings that generate many poses with the application of DeepBSP, one can more accurately predict the best binding pose that is closest to the native complex structure. This DeepBSP model shall be very useful in picking out poses close to their natives from many poses generated from a dock application.
Collapse
Affiliation(s)
- Jingxiao Bao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China.,NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China.,Department of Chemistry, New York University, New York, New York 10003, United States.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| |
Collapse
|
11
|
Bao J, He X, Zhang JZ. Development of a New Scoring Function for Virtual Screening: APBScore. J Chem Inf Model 2020; 60:6355-6365. [DOI: 10.1021/acs.jcim.0c00474] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Jingxiao Bao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
| | - Xiao He
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
| | - John Z.H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU-ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| |
Collapse
|
12
|
Improving the binding affinity estimations of protein-ligand complexes using machine-learning facilitated force field method. J Comput Aided Mol Des 2020; 34:817-830. [PMID: 32185583 DOI: 10.1007/s10822-020-00305-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/07/2020] [Indexed: 10/24/2022]
Abstract
Scoring functions are routinely deployed in structure-based drug design to quantify the potential for protein-ligand (PL) complex formation. Here, we present a new scoring function Bappl+ that is designed to predict the binding affinities of non-metallo and metallo PL complexes. Bappl+ outperforms other state-of-the-art scoring functions, achieving a high Pearson correlation coefficient of up to ~ 0.76 with low standard deviations. The biggest contributors to the increased performance are the use of a machine-learning model and the enlarged training dataset. We have also evaluated the performance of Bappl+ on target-specific proteins, which highlighted the limitations of our function and provides a way for further improvements. We believe that Bappl+ methodology could prove valuable in ranking candidate molecules against a target metallo or non-metallo protein by reliably predicting their binding affinities, thus helping in the drug discovery process.
Collapse
|
13
|
Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T. Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions. Brief Bioinform 2020; 22:497-514. [PMID: 31982914 DOI: 10.1093/bib/bbz173] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 12/10/2019] [Accepted: 11/21/2019] [Indexed: 01/12/2023] Open
Abstract
How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.
Collapse
|
14
|
DLIGAND2: an improved knowledge-based energy function for protein-ligand interactions using the distance-scaled, finite, ideal-gas reference state. J Cheminform 2019; 11:52. [PMID: 31392430 PMCID: PMC6686496 DOI: 10.1186/s13321-019-0373-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 07/27/2019] [Indexed: 12/14/2022] Open
Abstract
Performance of structure-based molecular docking largely depends on the accuracy of scoring functions. One important type of scoring functions are knowledge-based potentials derived from known three-dimensional structures of proteins and/or protein–ligand complex structures. This study seeks to improve a knowledge-based protein–ligand potential based on a distance-scale finite ideal-gas reference (DFIRE) state (DLIGAND) by expanding the representation of protein atoms from 13 mol2 atom types to 167 residue-specific atom types, and employing a recently updated dataset containing 12,450 monomer protein chains for training. We found that the updated version DLIGAND2 has a consistent improvement over DLIGAND in predicting binding affinities for either native complex structures or docking-generated poses. More importantly, DLIGAND2 has a 52% increase over DLIGAND in enrichment factors in top 1% predictions based on the DUD-E decoy set, and consistently improves over Autodock Vina and other statistical energy functions in all three benchmark tests. We further found that DLIGAND2 outperforms empirical and machine-learning methods compared for virtual screening on new targets that are not homologous to the DUD-E training set. Given the best performance as a parameter-free statistical potential and among the best in all performance measures, DLIGAND2 should be useful for re-assessing the poses generated by docking software, or acting as one term in other scoring functions. The program is available at https://github.com/sysu-yanglab/DLIGAND2.![]()
Collapse
|
15
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|
16
|
Li J, Fu A, Zhang L. An Overview of Scoring Functions Used for Protein-Ligand Interactions in Molecular Docking. Interdiscip Sci 2019; 11:320-328. [PMID: 30877639 DOI: 10.1007/s12539-019-00327-w] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Revised: 02/06/2019] [Accepted: 03/06/2019] [Indexed: 12/17/2022]
Abstract
Currently, molecular docking is becoming a key tool in drug discovery and molecular modeling applications. The reliability of molecular docking depends on the accuracy of the adopted scoring function, which can guide and determine the ligand poses when thousands of possible poses of ligand are generated. The scoring function can be used to determine the binding mode and site of a ligand, predict binding affinity and identify the potential drug leads for a given protein target. Despite intensive research over the years, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. For this reason, this study reviews four basic types of scoring functions, physics-based, empirical, knowledge-based, and machine learning-based scoring functions, based on an up-to-date classification scheme. We not only discuss the foundations of the four types scoring functions, suitable application areas and shortcomings, but also discuss challenges and potential future study directions.
Collapse
Affiliation(s)
- Jin Li
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China.,School of Medical Information and Engineering, Southwest Medical University, Luzhou, 646000, China
| | - Ailing Fu
- College of Pharmaceutical Sciences, Southwest University, Chongqing, 400715, China
| | - Le Zhang
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China. .,College of Computer Science, Sichuan University, Chengdu, 610065, China. .,Medical Big Data Center, Sichuan University, Chengdu, 610065, China. .,Zdmedical, Information Polytron Technologies Inc Chongqing, Chongqing, 401320, China.
| |
Collapse
|
17
|
Dittrich J, Schmidt D, Pfleger C, Gohlke H. Converging a Knowledge-Based Scoring Function: DrugScore2018. J Chem Inf Model 2018; 59:509-521. [DOI: 10.1021/acs.jcim.8b00582] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jonas Dittrich
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Denis Schmidt
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Christopher Pfleger
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC) & Institute for Complex Systems−Structural Biochemistry (ICS-6), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
18
|
Zheng Z, Pei J, Bansal N, Liu H, Song LF, Merz KM. Generation of Pairwise Potentials Using Multidimensional Data Mining. J Chem Theory Comput 2018; 14:5045-5067. [PMID: 30183299 DOI: 10.1021/acs.jctc.8b00516] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The rapid development of molecular structural databases provides the chemistry community access to an enormous array of experimental data that can be used to build and validate computational models. Using radial distribution functions collected from experimentally available X-ray and NMR structures, a number of so-called statistical potentials have been developed over the years using the structural data mining strategy. These potentials have been developed within the context of the two-particle Kirkwood equation by extending its original use for isotropic monatomic systems to anisotropic biomolecular systems. However, the accuracy and the unclear physical meaning of statistical potentials have long formed the central arguments against such methods. In this work, we present a new approach to generate molecular energy functions using structural data mining. Instead of employing the Kirkwood equation and introducing the "reference state" approximation, we model the multidimensional probability distributions of the molecular system using graphical models and generate the target pairwise Boltzmann probabilities using the Bayesian field theory. Different from the current statistical potentials that mimic the "knowledge-based" PMF based on the 2-particle Kirkwood equation, the graphical-model-based structure-derived potential developed in this study focuses on the generation of lower-dimensional Boltzmann distributions of atoms through reduction of dimensionality. We have named this new scoring function GARF, and in this work we focus on the mathematical derivation of our novel approach followed by validation studies on its ability to predict protein-ligand interactions.
Collapse
Affiliation(s)
- Zheng Zheng
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Jun Pei
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Nupur Bansal
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Hao Liu
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Lin Frank Song
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| |
Collapse
|
19
|
Guedes IA, Pereira FSS, Dardenne LE. Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front Pharmacol 2018; 9:1089. [PMID: 30319422 PMCID: PMC6165880 DOI: 10.3389/fphar.2018.01089] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Accepted: 09/07/2018] [Indexed: 12/19/2022] Open
Abstract
Structure-based virtual screening (VS) is a widely used approach that employs the knowledge of the three-dimensional structure of the target of interest in the design of new lead compounds from large-scale molecular docking experiments. Through the prediction of the binding mode and affinity of a small molecule within the binding site of the target of interest, it is possible to understand important properties related to the binding process. Empirical scoring functions are widely used for pose and affinity prediction. Although pose prediction is performed with satisfactory accuracy, the correct prediction of binding affinity is still a challenging task and crucial for the success of structure-based VS experiments. There are several efforts in distinct fronts to develop even more sophisticated and accurate models for filtering and ranking large libraries of compounds. This paper will cover some recent successful applications and methodological advances, including strategies to explore the ligand entropy and solvent effects, training with sophisticated machine-learning techniques, and the use of quantum mechanics. Particular emphasis will be given to the discussion of critical aspects and further directions for the development of more accurate empirical scoring functions.
Collapse
Affiliation(s)
- Isabella A Guedes
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Felipe S S Pereira
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Laurent E Dardenne
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| |
Collapse
|
20
|
Padhorny D, Hall DR, Mirzaei H, Mamonov AB, Moghadasi M, Alekseenko A, Beglov D, Kozakov D. Protein-ligand docking using FFT based sampling: D3R case study. J Comput Aided Mol Des 2018; 32:225-230. [PMID: 29101520 PMCID: PMC5767528 DOI: 10.1007/s10822-017-0069-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 09/16/2017] [Indexed: 12/15/2022]
Abstract
Fast Fourier transform (FFT) based approaches have been successful in application to modeling of relatively rigid protein-protein complexes. Recently, we have been able to adapt the FFT methodology to treatment of flexible protein-peptide interactions. Here, we report our latest attempt to expand the capabilities of the FFT approach to treatment of flexible protein-ligand interactions in application to the D3R PL-2016-1 challenge. Based on the D3R assessment, our FFT approach in conjunction with Monte Carlo minimization off-grid refinement was among the top performing methods in the challenge. The potential advantage of our method is its ability to globally sample the protein-ligand interaction landscape, which will be explored in further applications.
Collapse
Affiliation(s)
- Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11794, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA
| | | | - Hanieh Mirzaei
- Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA
| | - Artem B Mamonov
- Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA
| | - Mohammad Moghadasi
- Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA
| | - Andrey Alekseenko
- Moscow Institute of Physics and Technology (State University), Institutskii per. 9, Dolgoprudny, Moscow Oblast, Russia, 141700
- Institute of Computer Aided Design of the Russian Academy of Sciences, 19/18, 2-nd Brestskaya St, Moscow, Russia, 123056
| | - Dmitri Beglov
- Department of Biomedical Engineering, Boston University, Boston, MA, 02215, USA.
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11794, USA.
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, 11794, USA.
| |
Collapse
|
21
|
Melkikh AV, Meijer DK. On a generalized Levinthal's paradox: The role of long- and short range interactions in complex bio-molecular reactions, including protein and DNA folding. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2018; 132:57-79. [DOI: 10.1016/j.pbiomolbio.2017.09.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 08/27/2017] [Accepted: 09/17/2017] [Indexed: 01/06/2023]
|