1
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). Med Rev (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
2
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|
3
|
Wang Q, Wang Z, Tian S, Wang L, Tang R, Yu Y, Ge J, Hou T, Hao H, Sun H. Determination of Molecule Category of Ligands Targeting the Ligand-Binding Pocket of Nuclear Receptors with Structural Elucidation and Machine Learning. J Chem Inf Model 2022; 62:3993-4007. [PMID: 36040137 DOI: 10.1021/acs.jcim.2c00851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The mechanism of transcriptional activation/repression of the nuclear receptors (NRs) involves two main conformations of the NR protein, namely, the active (agonistic) and inactive (antagonistic) conformations. Binding of agonists or antagonists to the ligand-binding pocket (LBP) of NRs can regulate the downstream signaling pathways with different physiological effects. However, it is still hard to determine the molecular type of a LBP-bound ligand because both the agonists and antagonists bind to the same position of the protein. Therefore, it is necessary to develop precise and efficient methods to facilitate the discrimination of agonists and antagonists targeting the LBP of NRs. Here, combining structural and energetic analyses with machine-learning (ML) algorithms, we constructed a series of structure-based ML models to determine the molecular category of the LBP-bound ligands. We show that the proposed models work robustly and with high accuracy (ACC > 0.9) for determining the category of molecules derived from docking-based and crystallized poses. Furthermore, the models are also capable of determining the molecular category of ligands with dual opposite functions on different NRs (i.e., working as an agonist in one NR target, whereas functioning as an antagonist in another) with reasonable accuracy. The proposed method is expected to facilitate the determination of the molecular properties of ligands targeting the LBP of NRs with structural interpretation.
Collapse
Affiliation(s)
- Qinghua Wang
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Sheng Tian
- Department of Medicinal Chemistry, College of Pharmaceutical Sciences, Soochow University, Suzhou 215123, P. R. China
| | - Lingling Wang
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Rongfan Tang
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Yang Yu
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| | - Jingxuan Ge
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China
| | - Haiping Hao
- State Key Laboratory of Natural Medicines, Key Lab of Drug Metabolism and Pharmacokinetics, China Pharmaceutical University, 210009 Nanjing, China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, P. R. China
| |
Collapse
|
4
|
Velasquez-lópez Y, Tejera E, Perez-castillo Y. Can docking scoring functions guarantee success in virtual screening? Virtual Screening and Drug Docking 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
5
|
Demerdash ONA. Using diverse potentials and scoring functions for the development of improved machine-learned models for protein-ligand affinity and docking pose prediction. J Comput Aided Mol Des 2021; 35:1095-123. [PMID: 34708263 DOI: 10.1007/s10822-021-00423-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 10/11/2021] [Indexed: 10/20/2022]
Abstract
The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein-ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.
Collapse
|
6
|
Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIREs Comput Mol Sci 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Guoli Xiong
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Dejun Jiang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
- College of Computer Science and Technology Zhejiang University Hangzhou China
| | - Shao Liu
- Department of Pharmacy Xiangya Hospital, Central South University Changsha China
| | - Aiping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| | - Xiang Chen
- Department of Dermatology, Hunan Engineering Research Center of Skin Health and Disease, Hunan Key Laboratory of Skin Cancer and Psoriasis Xiangya Hospital, Central South University Changsha China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences Zhejiang University Hangzhou China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine Hong Kong Baptist University Hong Kong SAR China
| |
Collapse
|
7
|
Ji B, He X, Zhai J, Zhang Y, Man VH, Wang J. Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction. Brief Bioinform 2021; 22:6184410. [PMID: 33758923 DOI: 10.1093/bib/bbab054] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/06/2021] [Accepted: 02/02/2021] [Indexed: 01/01/2023] Open
Abstract
Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
Collapse
Affiliation(s)
- Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xibing He
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuzhao Zhang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Viet Hoang Man
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
8
|
Shen C, Hu Y, Wang Z, Zhang X, Pang J, Wang G, Zhong H, Xu L, Cao D, Hou T. Beware of the generic machine learning-based scoring functions in structure-based virtual screening. Brief Bioinform 2020; 22:5850047. [PMID: 32484221 DOI: 10.1093/bib/bbaa070] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/17/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Machine learning-based scoring functions (MLSFs) have attracted extensive attention recently and are expected to be potential rescoring tools for structure-based virtual screening (SBVS). However, a major concern nowadays is whether MLSFs trained for generic uses rather than a given target can consistently be applicable for VS. In this study, a systematic assessment was carried out to re-evaluate the effectiveness of 14 reported MLSFs in VS. Overall, most of these MLSFs could hardly achieve satisfactory results for any dataset, and they could even not outperform the baseline of classical SFs such as Glide SP. An exception was observed for RFscore-VS trained on the Directory of Useful Decoys-Enhanced dataset, which showed its superiority for most targets. However, in most cases, it clearly illustrated rather limited performance on the targets that were dissimilar to the proteins in the corresponding training sets. We also used the top three docking poses rather than the top one for rescoring and retrained the models with the updated versions of the training set, but only minor improvements were observed. Taken together, generic MLSFs may have poor generalization capabilities to be applicable for the real VS campaigns. Therefore, it should be quite cautious to use this type of methods for VS.
Collapse
Affiliation(s)
| | - Ye Hu
- Central South University, China
| | | | | | | | | | | | - Lei Xu
- Central South University, China
| | | | | |
Collapse
|
9
|
Ye WL, Shen C, Xiong GL, Ding JJ, Lu AP, Hou TJ, Cao DS. Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring. J Chem Inf Model 2020; 60:4216-4230. [PMID: 32352294 DOI: 10.1021/acs.jcim.9b00977] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of the current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, EAT specifically refers to the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, in terms of AUC, LogAUC, or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design.
Collapse
Affiliation(s)
- Wen-Ling Ye
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guo-Li Xiong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
10
|
Affiliation(s)
- Hongjian Li
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
11
|
Abstract
The identification and optimization of lead compounds are inalienable components in drug design and discovery pipelines. As a powerful computational approach for the identification of hits with novel structural scaffolds, structure-based virtual screening (SBVS) has exhibited a remarkably increasing influence in the early stages of drug discovery. During the past decade, a variety of techniques and algorithms have been proposed and tested with different purposes in the scope of SBVS. Although SBVS has been a common and proven technology, it still shows some challenges and problems that are needed to be addressed, where the negative influence regardless of protein flexibility and the inaccurate prediction of binding affinity are the two major challenges. Here, focusing on these difficulties, we summarize a series of combined strategies or workflows developed by our group and others. Furthermore, several representative successful applications from recent publications are also discussed to demonstrate the effectiveness of the combined SBVS strategies in drug discovery campaigns.
Collapse
Affiliation(s)
- Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Huiyong Sun
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Xueping Hu
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Junbo Gao
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Dan Li
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China.
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| |
Collapse
|
12
|
Shen C, Hu Y, Wang Z, Zhang X, Zhong H, Wang G, Yao X, Xu L, Cao D, Hou T. Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions. Brief Bioinform 2020; 22:497-514. [PMID: 31982914 DOI: 10.1093/bib/bbz173] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Revised: 12/10/2019] [Accepted: 11/21/2019] [Indexed: 01/12/2023] Open
Abstract
How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.
Collapse
|
13
|
Whitfield TW, Ragland DA, Zeldovich KB, Schiffer CA. Characterizing Protein-Ligand Binding Using Atomistic Simulation and Machine Learning: Application to Drug Resistance in HIV-1 Protease. J Chem Theory Comput 2020; 16:1284-1299. [PMID: 31877249 DOI: 10.1021/acs.jctc.9b00781] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Over the past several decades, atomistic simulations of biomolecules, whether carried out using molecular dynamics or Monte Carlo techniques, have provided detailed insights into their function. Comparing the results of such simulations for a few closely related systems has guided our understanding of the mechanisms by which changes such as ligand binding or mutation can alter the function. The general problem of detecting and interpreting such mechanisms from simulations of many related systems, however, remains a challenge. This problem is addressed here by applying supervised and unsupervised machine learning techniques to a variety of thermodynamic observables extracted from molecular dynamics simulations of different systems. As an important test case, these methods are applied to understand the evasion by human immunodeficiency virus type-1 (HIV-1) protease of darunavir, a potent inhibitor to which resistance can develop via the simultaneous mutation of multiple amino acids. Complex mutational patterns have been observed among resistant strains, presenting a challenge to developing a mechanistic picture of resistance in the protease. In order to dissect these patterns and gain mechanistic insight into the role of specific mutations, molecular dynamics simulations were carried out on a collection of HIV-1 protease variants, chosen to include highly resistant strains and susceptible controls, in complex with darunavir. Using a machine learning approach that takes advantage of the hierarchical nature in the relationships among the sequence, structure, and function, an integrative analysis of these trajectories reveals key details of the resistance mechanism, including changes in the protein structure, hydrogen bonding, and protein-ligand contacts.
Collapse
Affiliation(s)
- Troy W Whitfield
- Department of Medicine , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States.,Program in Bioinformatics and Integrative Biology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Debra A Ragland
- Department of Biochemistry and Molecular Pharmacology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology , University of Massachusetts Medical School , Worcester , Massachusetts 01605 , United States
| |
Collapse
|
14
|
Shen C, Ding J, Wang Z, Cao D, Ding X, Hou T. From machine learning to deep learning: Advances in scoring functions for protein–ligand docking. WIREs Comput Mol Sci 2019. [DOI: 10.1002/wcms.1429] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Junjie Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University Changsha P. R. China
| | - Xiaoqin Ding
- Beijing Institute of Pharmaceutical Chemistry Beijing P. R. China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University Hangzhou P. R. China
| |
Collapse
|
15
|
Li J, Fu A, Zhang L. An Overview of Scoring Functions Used for Protein-Ligand Interactions in Molecular Docking. Interdiscip Sci 2019; 11:320-328. [PMID: 30877639 DOI: 10.1007/s12539-019-00327-w] [Citation(s) in RCA: 165] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Revised: 02/06/2019] [Accepted: 03/06/2019] [Indexed: 12/17/2022]
Abstract
Currently, molecular docking is becoming a key tool in drug discovery and molecular modeling applications. The reliability of molecular docking depends on the accuracy of the adopted scoring function, which can guide and determine the ligand poses when thousands of possible poses of ligand are generated. The scoring function can be used to determine the binding mode and site of a ligand, predict binding affinity and identify the potential drug leads for a given protein target. Despite intensive research over the years, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. For this reason, this study reviews four basic types of scoring functions, physics-based, empirical, knowledge-based, and machine learning-based scoring functions, based on an up-to-date classification scheme. We not only discuss the foundations of the four types scoring functions, suitable application areas and shortcomings, but also discuss challenges and potential future study directions.
Collapse
Affiliation(s)
- Jin Li
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China.,School of Medical Information and Engineering, Southwest Medical University, Luzhou, 646000, China
| | - Ailing Fu
- College of Pharmaceutical Sciences, Southwest University, Chongqing, 400715, China
| | - Le Zhang
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China. .,College of Computer Science, Sichuan University, Chengdu, 610065, China. .,Medical Big Data Center, Sichuan University, Chengdu, 610065, China. .,Zdmedical, Information Polytron Technologies Inc Chongqing, Chongqing, 401320, China.
| |
Collapse
|
16
|
Abstract
Molecular docking enables large-scale prediction of whether and how small molecules bind to a macromolecular target. Machine-learning scoring functions are particularly well suited to predict the strength of this interaction. Here we describe how to build RF-Score, a scoring function utilizing the machine-learning technique known as Random Forest (RF). We also point out how to use different data, features, and regression models using either R or Python programming languages.
Collapse
Affiliation(s)
| | - Pawel Siedlecki
- Institute of Biochemistry and Biophysics PAS, Warsaw, Poland
- Department of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, University of Warsaw, Warsaw, Poland
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, Marseille, France.
- Institut Paoli-Calmettes, Marseille, France.
- Aix-Marseille Université, Marseille, France.
- CNRS UMR7258, Marseille, France.
| |
Collapse
|
17
|
Guedes IA, Pereira FSS, Dardenne LE. Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front Pharmacol 2018; 9:1089. [PMID: 30319422 PMCID: PMC6165880 DOI: 10.3389/fphar.2018.01089] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Accepted: 09/07/2018] [Indexed: 12/19/2022] Open
Abstract
Structure-based virtual screening (VS) is a widely used approach that employs the knowledge of the three-dimensional structure of the target of interest in the design of new lead compounds from large-scale molecular docking experiments. Through the prediction of the binding mode and affinity of a small molecule within the binding site of the target of interest, it is possible to understand important properties related to the binding process. Empirical scoring functions are widely used for pose and affinity prediction. Although pose prediction is performed with satisfactory accuracy, the correct prediction of binding affinity is still a challenging task and crucial for the success of structure-based VS experiments. There are several efforts in distinct fronts to develop even more sophisticated and accurate models for filtering and ranking large libraries of compounds. This paper will cover some recent successful applications and methodological advances, including strategies to explore the ligand entropy and solvent effects, training with sophisticated machine-learning techniques, and the use of quantum mechanics. Particular emphasis will be given to the discussion of critical aspects and further directions for the development of more accurate empirical scoring functions.
Collapse
Affiliation(s)
- Isabella A Guedes
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Felipe S S Pereira
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Laurent E Dardenne
- Grupo de Modelagem Molecular em Sistemas Biológicos, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| |
Collapse
|
18
|
Li H, Peng J, Leung Y, Leung KS, Wong MH, Lu G, Ballester PJ. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction. Biomolecules 2018. [PMID: 29538331 PMCID: PMC5871981 DOI: 10.3390/biom8010012] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.
Collapse
Affiliation(s)
- Hongjian Li
- SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong, China.
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Jiangjun Peng
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xi'an 710049, China.
| | - Yee Leung
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Kwong-Sak Leung
- Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Gang Lu
- School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China.
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.
- Institut Paoli-Calmettes, F-13009 Marseille, France.
- Aix-Marseille Université, F-13284 Marseille, France.
- CNRS UMR7258, F-13009 Marseille, France.
| |
Collapse
|
19
|
Passeri GI, Trisciuzzi D, Alberga D, Siragusa L, Leonetti F, Mangiatordi GF, Nicolotti O. Strategies of Virtual Screening in Medicinal Chemistry. ACTA ACUST UNITED AC 2018. [DOI: 10.4018/ijqspr.2018010108] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Virtual screening represents an effective computational strategy to rise-up the chances of finding new bioactive compounds by accelerating the time needed to move from an initial intuition to market. Classically, the most pursued approaches rely on ligand- and structure-based studies, the former employed when structural data information about the target is missing while the latter employed when X-ray/NMR solved or homology models are instead available for the target. The authors will focus on the most advanced techniques applied in this area. In particular, they will survey the key concepts of virtual screening by discussing how to properly select chemical libraries, how to make database curation, how to applying and- and structure-based techniques, how to wisely use post-processing methods. Emphasis will be also given to the most meaningful databases used in VS protocols. For the ease of discussion several examples will be presented.
Collapse
Affiliation(s)
| | - Daniela Trisciuzzi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Domenico Alberga
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Lydia Siragusa
- Molecular Discovery Ltd., Pinner, Middlesex, London, United Kingdom
| | - Francesco Leonetti
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | - Giuseppe F. Mangiatordi
- Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo Moro”, Bari, Italy
| | | |
Collapse
|
20
|
|
21
|
Wójcikowski M, Ballester PJ, Siedlecki P. Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep 2017; 7:46710. [PMID: 28440302 PMCID: PMC5404222 DOI: 10.1038/srep46710] [Citation(s) in RCA: 184] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 03/23/2017] [Indexed: 12/23/2022] Open
Abstract
Classical scoring functions have reached a plateau in their performance in virtual screening and binding affinity prediction. Recently, machine-learning scoring functions trained on protein-ligand complexes have shown great promise in small tailored studies. They have also raised controversy, specifically concerning model overfitting and applicability to novel targets. Here we provide a new ready-to-use scoring function (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets. We use the full DUD-E data sets along with three docking tools, five classical and three machine-learning scoring functions for model building and performance assessment. Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results. We provide full data sets to facilitate further research in this area (http://github.com/oddt/rfscorevs) as well as ready-to-use RF-Score-VS (http://github.com/oddt/rfscorevs_binary).
Collapse
Affiliation(s)
- Maciej Wójcikowski
- Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw, Poland
| | - Pedro J Ballester
- Centre de Recherche en Cancérologie de Marseille (CRCM), Inserm, U1068, Marseille, F-13009, France.,CNRS, UMR7258, Marseille, F-13009, France.,Aix-Marseille University, UM 105, F-13284, Marseille, France
| | - Pawel Siedlecki
- Institute of Biochemistry and Biophysics PAS, Pawinskiego 5a, 02-106 Warsaw, Poland.,Department of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, University of Warsaw, Miecznikowa 1, 02-096 Warsaw, Poland
| |
Collapse
|
22
|
Li Y, Yang J. Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein–Ligand Interactions. J Chem Inf Model 2017; 57:1007-1012. [DOI: 10.1021/acs.jcim.7b00049] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Yang Li
- College
of Life Sciences, Nankai University, Tianjin 300071, China
- School
of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Jianyi Yang
- School
of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
23
|
Chen F, Sun H, Liu H, Li D, Li Y, Hou T. Prediction of luciferase inhibitors by the high-performance MIEC-GBDT approach based on interaction energetic patterns. Phys Chem Chem Phys 2017; 19:10163-10176. [DOI: 10.1039/c6cp08232g] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The MIEC-GBDT model can be used as a powerful tool to identify potential interference compounds in luciferase-based high-throughput screening.
Collapse
Affiliation(s)
- Fu Chen
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou
- China
| | - Huiyong Sun
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou
- China
| | - Hui Liu
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou
- China
| | - Dan Li
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou
- China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM)
- Soochow University
- Suzhou
- P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences
- Zhejiang University
- Hangzhou
- China
- State Key Lab of CAD&CG
| |
Collapse
|
24
|
Wang C, Zhang Y. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 2016; 38:169-177. [PMID: 27859414 DOI: 10.1002/jcc.24667] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Revised: 09/06/2016] [Accepted: 10/26/2016] [Indexed: 12/16/2022]
Abstract
The development of new protein-ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein-ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein-ligand docking functions simultaneously, we have introduced a Δvina RF parameterization and feature selection framework based on random forest. Our developed scoring function Δvina RF20 , which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The Δvina RF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Cheng Wang
- Department of Chemistry, New York University, New York, New York, 10003
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York, 10003.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China
| |
Collapse
|
25
|
Bjerrum EJ. Machine learning optimization of cross docking accuracy. Comput Biol Chem 2016; 62:133-44. [DOI: 10.1016/j.compbiolchem.2016.04.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Revised: 04/08/2016] [Accepted: 04/09/2016] [Indexed: 12/13/2022]
|
26
|
Sun H, Pan P, Tian S, Xu L, Kong X, Li Y, Dan Li, Hou T. Constructing and Validating High-Performance MIEC-SVM Models in Virtual Screening for Kinases: A Better Way for Actives Discovery. Sci Rep 2016; 6:24817. [PMID: 27102549 PMCID: PMC4840416 DOI: 10.1038/srep24817] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 04/06/2016] [Indexed: 01/23/2023] Open
Abstract
The MIEC-SVM approach, which combines molecular interaction energy components (MIEC) derived from free energy decomposition and support vector machine (SVM), has been found effective in capturing the energetic patterns of protein-peptide recognition. However, the performance of this approach in identifying small molecule inhibitors of drug targets has not been well assessed and validated by experiments. Thereafter, by combining different model construction protocols, the issues related to developing best MIEC-SVM models were firstly discussed upon three kinase targets (ABL, ALK, and BRAF). As for the investigated targets, the optimized MIEC-SVM models performed much better than the models based on the default SVM parameters and Autodock for the tested datasets. Then, the proposed strategy was utilized to screen the Specs database for discovering potential inhibitors of the ALK kinase. The experimental results showed that the optimized MIEC-SVM model, which identified 7 actives with IC50 < 10 μM from 50 purchased compounds (namely hit rate of 14%, and 4 in nM level) and performed much better than Autodock (3 actives with IC50 < 10 μM from 50 purchased compounds, namely hit rate of 6%, and 2 in nM level), suggesting that the proposed strategy is a powerful tool in structure-based virtual screening.
Collapse
Affiliation(s)
- Huiyong Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Sheng Tian
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, P. R. China
| | - Lei Xu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Xiaotian Kong
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, P. R. China
| | - Youyong Li
- Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, P. R. China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, Zhejiang 310058, P. R. China
| |
Collapse
|
27
|
Li N, Ainsworth RI, Wu M, Ding B, Wang W. MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction. Bioinformatics 2016; 32:940-2. [PMID: 26568623 PMCID: PMC4907390 DOI: 10.1093/bioinformatics/btv666] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 10/13/2015] [Accepted: 11/07/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. AVAILABILITY AND IMPLEMENTATION The program is available at http://wanglab.ucsd.edu/MIEC-SVM CONTACT: : wei-wang@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data available at Bioinformatics online.
Collapse
Affiliation(s)
- Nan Li
- Department of Chemistry and Biochemistry, UC, San Diego, La Jolla, CA 92093-0359 USA
| | - Richard I Ainsworth
- Department of Chemistry and Biochemistry, UC, San Diego, La Jolla, CA 92093-0359 USA
| | - Meixin Wu
- Department of Chemistry and Biochemistry, UC, San Diego, La Jolla, CA 92093-0359 USA
| | - Bo Ding
- Department of Chemistry and Biochemistry, UC, San Diego, La Jolla, CA 92093-0359 USA
| | - Wei Wang
- Department of Chemistry and Biochemistry, UC, San Diego, La Jolla, CA 92093-0359 USA
| |
Collapse
|
28
|
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip Rev Comput Mol Sci 2015; 5:405-424. [PMID: 27110292 PMCID: PMC4832270 DOI: 10.1002/wcms.1225] [Citation(s) in RCA: 186] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 07/17/2015] [Accepted: 07/18/2015] [Indexed: 12/29/2022]
Abstract
Docking tools to predict whether and how a small molecule binds to a target can be applied if a structural model of such target is available. The reliability of docking depends, however, on the accuracy of the adopted scoring function (SF). Despite intense research over the years, improving the accuracy of SFs for structure-based binding affinity prediction or virtual screening has proven to be a challenging task for any class of method. New SFs based on modern machine-learning regression models, which do not impose a predetermined functional form and thus are able to exploit effectively much larger amounts of experimental data, have recently been introduced. These machine-learning SFs have been shown to outperform a wide range of classical SFs at both binding affinity prediction and virtual screening. The emerging picture from these studies is that the classical approach of using linear regression with a small number of expert-selected structural features can be strongly improved by a machine-learning approach based on nonlinear regression allied with comprehensive data-driven feature selection. Furthermore, the performance of classical SFs does not grow with larger training datasets and hence this performance gap is expected to widen as more training data becomes available in the future. Other topics covered in this review include predicting the reliability of a SF on a particular target class, generating synthetic data to improve predictive performance and modeling guidelines for SF development. WIREs Comput Mol Sci 2015, 5:405-424. doi: 10.1002/wcms.1225 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Qurrat Ul Ain
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | | | - Florian D Roessler
- Department of Chemistry, Centre for Molecular Informatics University of Cambridge Cambridge UK
| | - Pedro J Ballester
- Cancer Research Center of Marseille, (INSERM U1068, Institut Paoli-Calmettes, Aix-Marseille Université, CNRS UMR7258) Marseille France
| |
Collapse
|
29
|
Li N, Ainsworth RI, Ding B, Hou T, Wang W. Using Hierarchical Virtual Screening To Combat Drug Resistance of the HIV-1 Protease. J Chem Inf Model 2015; 55:1400-12. [DOI: 10.1021/acs.jcim.5b00056] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Nan Li
- Department
of Chemistry and Biochemistry University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0359, United States
| | - Richard I. Ainsworth
- Department
of Chemistry and Biochemistry University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0359, United States
| | - Bo Ding
- Department
of Chemistry and Biochemistry University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0359, United States
| | - Tingjun Hou
- College
of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Wei Wang
- Department
of Chemistry and Biochemistry University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0359, United States
| |
Collapse
|
30
|
Li H, Leung KS, Wong MH, Ballester PJ. Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest. Molecules 2015; 20:10947-62. [PMID: 26076113 PMCID: PMC6272292 DOI: 10.3390/molecules200610947] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Revised: 06/04/2015] [Accepted: 06/09/2015] [Indexed: 12/17/2022] Open
Abstract
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.
Collapse
Affiliation(s)
- Hongjian Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Man-Hon Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong.
| | - Pedro J Ballester
- Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France.
| |
Collapse
|
31
|
Lionta E, Spyrou G, Vassilatis DK, Cournia Z. Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr Top Med Chem 2015; 14:1923-38. [PMID: 25262799 PMCID: PMC4443793 DOI: 10.2174/1568026614666140929124445] [Citation(s) in RCA: 513] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 01/01/2014] [Accepted: 02/18/2014] [Indexed: 02/06/2023]
Abstract
Structure-based drug discovery (SBDD) is becoming an essential tool in assisting fast and cost-efficient lead
discovery and optimization. The application of rational, structure-based drug design is proven to be more efficient than the
traditional way of drug discovery since it aims to understand the molecular basis of a disease and utilizes the knowledge
of the three-dimensional structure of the biological target in the process. In this review, we focus on the principles and applications
of Virtual Screening (VS) within the context of SBDD and examine different procedures ranging from the initial
stages of the process that include receptor and library pre-processing, to docking, scoring and post-processing of topscoring
hits. Recent improvements in structure-based virtual screening (SBVS) efficiency through ensemble docking, induced
fit and consensus docking are also discussed. The review highlights advances in the field within the framework of
several success studies that have led to nM inhibition directly from VS and provides recent trends in library design as well
as discusses limitations of the method. Applications of SBVS in the design of substrates for engineered proteins that enable
the discovery of new metabolic and signal transduction pathways and the design of inhibitors of multifunctional proteins
are also reviewed. Finally, we contribute two promising VS protocols recently developed by us that aim to increase
inhibitor selectivity. In the first protocol, we describe the discovery of micromolar inhibitors through SBVS designed to
inhibit the mutant H1047R PI3Kα kinase. Second, we discuss a strategy for the identification of selective binders for the
RXRα nuclear receptor. In this protocol, a set of target structures is constructed for ensemble docking based on binding
site shape characterization and clustering, aiming to enhance the hit rate of selective inhibitors for the desired protein target
through the SBVS process.
Collapse
Affiliation(s)
| | | | | | - Zoe Cournia
- Biomedical Research Foundation of the Academy of Athens, 4 Soranou Ephessiou, 11527 Athens, Greece.
| |
Collapse
|
32
|
Yuriev E, Holien J, Ramsland PA. Improvements, trends, and new ideas in molecular docking: 2012-2013 in review. J Mol Recognit 2015; 28:581-604. [PMID: 25808539 DOI: 10.1002/jmr.2471] [Citation(s) in RCA: 159] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 01/16/2015] [Accepted: 02/05/2015] [Indexed: 12/11/2022]
Abstract
Molecular docking is a computational method for predicting the placement of ligands in the binding sites of their receptor(s). In this review, we discuss the methodological developments that occurred in the docking field in 2012 and 2013, with a particular focus on the more difficult aspects of this computational discipline. The main challenges and therefore focal points for developments in docking, covered in this review, are receptor flexibility, solvation, scoring, and virtual screening. We specifically deal with such aspects of molecular docking and its applications as selection criteria for constructing receptor ensembles, target dependence of scoring functions, integration of higher-level theory into scoring, implicit and explicit handling of solvation in the binding process, and comparison and evaluation of docking and scoring methods.
Collapse
Affiliation(s)
- Elizabeth Yuriev
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, 3052, Australia
| | - Jessica Holien
- ACRF Rational Drug Discovery Centre and Structural Biology Laboratory, St. Vincent's Institute of Medical Research, Fitzroy, Victoria, 3065, Australia
| | - Paul A Ramsland
- Centre for Biomedical Research, Burnet Institute, Melbourne, Victoria, 3004, Australia.,Department of Surgery Austin Health, University of Melbourne, Melbourne, Victoria, 3084, Australia.,Department of Immunology, Monash University, Alfred Medical Research and Education Precinct, Melbourne, Victoria, 3004, Australia.,School of Biomedical Sciences, CHIRI Biosciences, Curtin University, Perth, Western Australia, 6845, Australia
| |
Collapse
|
33
|
Li H, Leung KS, Wong MH, Ballester PJ. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets. Mol Inform 2015; 34:115-26. [PMID: 27490034 DOI: 10.1002/minf.201400132] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 12/06/2014] [Indexed: 12/28/2022]
Abstract
There is a growing body of evidence showing that machine learning regression results in more accurate structure-based prediction of protein-ligand binding affinity. Docking methods that aim at optimizing the affinity of ligands for a target rely on how accurate their predicted ranking is. However, despite their proven advantages, machine-learning scoring functions are still not widely applied. This seems to be due to insufficient understanding of their properties and the lack of user-friendly software implementing them. Here we present a study where the accuracy of AutoDock Vina, arguably the most commonly-used docking software, is strongly improved by following a machine learning approach. We also analyse the factors that are responsible for this improvement and their generality. Most importantly, with the help of a proposed benchmark, we demonstrate that this improvement will be larger as more data becomes available for training Random Forest models, as regression models implying additive functional forms do not improve with more training data. We discuss how the latter opens the door to new opportunities in scoring function development. In order to facilitate the translation of this advance to enhance structure-based molecular design, we provide software to directly re-score Vina-generated poses and thus strongly improve their predicted binding affinity. The software is available at http://istar.cse.cuhk.edu.hk/rf-score-3.tgz and http://crcm. marseille.inserm.fr/fileadmin/rf-score-3.tgz.
Collapse
Affiliation(s)
- Hongjian Li
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Man-Hon Wong
- Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Pedro J Ballester
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK. .,Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France, Institut Paoli-Calmettes, F-13009 Marseille, France, Aix-Marseille Université, F-13284 Marseille, France, CNRS UMR7258, F-13009 Marseille, France.
| |
Collapse
|
34
|
Li H, Leung K, Wong M, Ballester PJ. The Use of Random Forest to Predict Binding Affinity in Docking. Bioinformatics and Biomedical Engineering 2015. [DOI: 10.1007/978-3-319-16480-9_24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
35
|
Ding B, Li N, Wang W. Characterizing Binding of Small Molecules. II. Evaluating the Potency of Small Molecules to Combat Resistance Based on Docking Structures. J Chem Inf Model 2013; 53:1213-22. [DOI: 10.1021/ci400011c] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Bo Ding
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Nan Li
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Wei Wang
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| |
Collapse
|