1
|
Moshawih S, Bu ZH, Goh HP, Kifli N, Lee LH, Goh KW, Ming LC. Consensus holistic virtual screening for drug discovery: a novel machine learning model approach. J Cheminform 2024; 16:62. [PMID: 38807196 PMCID: PMC11134635 DOI: 10.1186/s13321-024-00855-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 05/10/2024] [Indexed: 05/30/2024] Open
Abstract
In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula "w_new", consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC50 values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R2 values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology.Scientific contributionWe presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced 'w_new', a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains.
Collapse
Affiliation(s)
- Said Moshawih
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam.
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia.
| | - Zhen Hui Bu
- Faculty of Computing and Engineering, Quest International University, Ipoh, Malaysia
| | - Hui Poh Goh
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Lam Hong Lee
- Faculty of Computing and Engineering, Quest International University, Ipoh, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Long Chiau Ming
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
- School of Medical and Life Sciences, Sunway University, Sunway City, Malaysia
| |
Collapse
|
2
|
Azevedo PHRDA, Peçanha BRDB, Flores-Junior LAP, Alves TF, Dias LRS, Muri EMF, Lima CHDS. In silico drug repurposing by combining machine learning classification model and molecular dynamics to identify a potential OGT inhibitor. J Biomol Struct Dyn 2024; 42:1417-1428. [PMID: 37054524 DOI: 10.1080/07391102.2023.2199868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/01/2023] [Indexed: 04/15/2023]
Abstract
O-linked N-acetylglucosamine (O-GlcNAc) is a unique intracellular post-translational glycosylation at the hydroxyl group of serine or threonine residues in nuclear, cytoplasmic and mitochondrial proteins. The enzyme O-GlcNAc transferase (OGT) is responsible for adding GlcNAc, and anomalies in this process can lead to the development of diseases associated with metabolic imbalance, such as diabetes and cancer. Repurposing approved drugs can be an attractive tool to discover new targets reducing time and costs in the drug design. This work focuses on drug repurposing to OGT targets by virtual screening of FDA-approved drugs through consensus machine learning (ML) models from an imbalanced dataset. We developed a classification model using docking scores and ligand descriptors. The SMOTE approach to resampling the dataset showed excellent statistical values in five of the seven ML algorithms to create models from the training set, with sensitivity, specificity and accuracy over 90% and Matthew's correlation coefficient greater than 0.8. The pose analysis obtained by molecular docking showed only H-bond interaction with the OGT C-Cat domain. The molecular dynamics simulation showed the lack of H-bond interactions with the C- and N-catalytic domains allowed the drug to exit the binding site. Our results showed that the non-steroidal anti-inflammatory celecoxib could be a potentially OGT inhibitor.
Collapse
Affiliation(s)
| | | | | | - Tatiana Fialho Alves
- Laboratório de Química Medicinal, Faculdade de Farmácia, Universidade Federal Fluminense, Niterói, RJ, Brazil
| | - Luiza Rosaria Sousa Dias
- Laboratório de Química Medicinal, Faculdade de Farmácia, Universidade Federal Fluminense, Niterói, RJ, Brazil
| | - Estela Maris Freitas Muri
- Laboratório de Química Medicinal, Faculdade de Farmácia, Universidade Federal Fluminense, Niterói, RJ, Brazil
| | | |
Collapse
|
3
|
Khan M, Kandwal S, Fayne D. DataPype: A Fully Automated Unified Software Platform for Computer-Aided Drug Design. ACS OMEGA 2023; 8:39468-39480. [PMID: 37901539 PMCID: PMC10601415 DOI: 10.1021/acsomega.3c05207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 09/26/2023] [Indexed: 10/31/2023]
Abstract
With the advent of computer-aided drug design (CADD), traditional physical testing of thousands of molecules has now been replaced by target-focused drug discovery, where potentially bioactive molecules are predicted by computer software before their physical synthesis. However, despite being a significant breakthrough, CADD still faces various limitations and challenges. The increasing availability of data on small molecules has created a need to streamline the sourcing of data from different databases and automate the processing and cleaning of data into a form that can be used by multiple CADD software applications. Several standalone software packages are available to aid the drug designer, each with its own specific application, requiring specialized knowledge and expertise for optimal use. These applications require their own input and output files, making it a challenge for nonexpert users or multidisciplinary discovery teams. Here, we have developed a new software platform called DataPype, which wraps around these different software packages. It provides a unified automated workflow to search for hit compounds using specialist software. Additionally, multiple virtual screening packages can be used in the one workflow, and if different ways of looking at potential hit compounds all predict the same set of molecules, we have higher confidence that we should make or purchase and test the molecules. Importantly, DataPype can run on computer servers, speeding up the virtual screening for new compounds. Combining access to multiple CADD tools within one interface will enhance the early stage of drug discovery, increase usability, and enable the use of parallel computing.
Collapse
Affiliation(s)
- Mohemmed
Faraz Khan
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
- Department
of Pharmaceutical Chemistry, Faculty of Pharmacy, Integral University, Lucknow U.P., 226026, India
| | - Shubhangi Kandwal
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| | - Darren Fayne
- Molecular
Design Group, School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
4
|
Nhat Phuong D, Flower DR, Chattopadhyay S, Chattopadhyay AK. Towards Effective Consensus Scoring in Structure-Based Virtual Screening. Interdiscip Sci 2023; 15:131-145. [PMID: 36550341 PMCID: PMC9941253 DOI: 10.1007/s12539-022-00546-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 12/11/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
Virtual screening (VS) is a computational strategy that uses in silico automated protein docking inter alia to rank potential ligands, or by extension rank protein-ligand pairs, identifying potential drug candidates. Most docking methods use preferred sets of physicochemical descriptors (PCDs) to model the interactions between host and guest molecules. Thus, conventional VS is often data-specific, method-dependent and with demonstrably differing utility in identifying candidate drugs. This study proposes four universality classes of novel consensus scoring (CS) algorithms that combine docking scores, derived from ten docking programs (ADFR, DOCK, Gemdock, Ledock, PLANTS, PSOVina, QuickVina2, Smina, Autodock Vina and VinaXB), using decoys from the DUD-E repository ( http://dude.docking.org/ ) against 29 MRSA-oriented targets to create a general VS formulation that can identify active ligands for any suitable protein target. Our results demonstrate that CS provides improved ligand-protein docking fidelity when compared to individual docking platforms. This approach requires only a small number of docking combinations and can serve as a viable and parsimonious alternative to more computationally expensive docking approaches. Predictions from our CS algorithm are compared against independent machine learning evaluations using the same docking data, complementing the CS outcomes. Our method is a reliable approach for identifying protein targets and high-affinity ligands that can be tested as high-probability candidates for drug repositioning.
Collapse
Affiliation(s)
- Do Nhat Phuong
- grid.7273.10000 0004 0376 4727Department of Mathematics, College of Engineering and Physical Sciences, Aston University, Birmingham, B4 7ET UK
| | - Darren R. Flower
- grid.7273.10000 0004 0376 4727Life and Health Sciences, Aston University, Birmingham, B4 7ET UK
| | | | - Amit K. Chattopadhyay
- grid.7273.10000 0004 0376 4727Department of Mathematics, College of Engineering and Physical Sciences, Aston University, Birmingham, B4 7ET UK
| |
Collapse
|
5
|
Blanes-Mira C, Fernández-Aguado P, de Andrés-López J, Fernández-Carvajal A, Ferrer-Montiel A, Fernández-Ballester G. Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening. Molecules 2022; 28:molecules28010175. [PMID: 36615367 PMCID: PMC9821981 DOI: 10.3390/molecules28010175] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/19/2022] [Accepted: 12/21/2022] [Indexed: 12/28/2022] Open
Abstract
The rapid advances of 3D techniques for the structural determination of proteins and the development of numerous computational methods and strategies have led to identifying highly active compounds in computer drug design. Molecular docking is a method widely used in high-throughput virtual screening campaigns to filter potential ligands targeted to proteins. A great variety of docking programs are currently available, which differ in the algorithms and approaches used to predict the binding mode and the affinity of the ligand. All programs heavily rely on scoring functions to accurately predict ligand binding affinity, and despite differences in performance, none of these docking programs is preferable to the others. To overcome this problem, consensus scoring methods improve the outcome of virtual screening by averaging the rank or score of individual molecules obtained from different docking programs. The successful application of consensus docking in high-throughput virtual screening highlights the need to optimize the predictive power of molecular docking methods.
Collapse
|
6
|
Morris CJ, Stern JA, Stark B, Christopherson M, Della Corte D. MILCDock: Machine Learning Enhanced Consensus Docking for Virtual Screening in Drug Discovery. J Chem Inf Model 2022; 62:5342-5350. [PMID: 36342217 DOI: 10.1021/acs.jcim.2c00705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Molecular docking tools are regularly used to computationally identify new molecules in virtual screening for drug discovery. However, docking tools suffer from inaccurate scoring functions with widely varying performance on different proteins. To enable more accurate ranking of active over inactive ligands in virtual screening, we created a machine learning consensus docking tool, MILCDock, that uses predictions from five traditional molecular docking tools to predict the probability a ligand binds to a protein. MILCDock was trained and tested on data from both the DUD-E and LIT-PCBA docking datasets and shows improved performance over traditional molecular docking tools and other consensus docking methods on the DUD-E dataset. LIT-PCBA targets proved to be difficult for all methods tested. We also find that DUD-E data, although biased, can be effective in training machine learning tools if care is taken to avoid DUD-E's biases during training.
Collapse
Affiliation(s)
- Connor J Morris
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States
| | - Jacob A Stern
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States.,Department of Computer Science, Brigham Young University, Provo, Utah84602, United States
| | - Brenden Stark
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States
| | - Max Christopherson
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah84602, United States
| |
Collapse
|
7
|
Liu XQ, Yi YJ, Kong Y, Yu P, Zhao LG, Li DD. Consensus scoring model: A novel approach to the study of EGFR kinase inhibitors. Chem Phys Lett 2022. [DOI: 10.1016/j.cplett.2022.139650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
8
|
McGibbon M, Money-Kyrle S, Blay V, Houston DR. SCORCH: Improving structure-based virtual screening with machine learning classifiers, data augmentation, and uncertainty estimation. J Adv Res 2022; 46:135-147. [PMID: 35901959 PMCID: PMC10105235 DOI: 10.1016/j.jare.2022.07.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION The discovery of a new drug is a costly and lengthy endeavour. The computational prediction of which small molecules can bind to a protein target can accelerate this process if the predictions are fast and accurate enough. Recent machine-learning scoring functions re-evaluate the output of molecular docking to achieve more accurate predictions. However, previous scoring functions were trained on crystalised protein-ligand complexes and datasets of decoys. The limited availability of crystal structures and biases in the decoy datasets can lower the performance of scoring functions. OBJECTIVES To address key limitations of previous scoring functions and thus improve the predictive performance of structure-based virtual screening. METHODS A novel machine-learning scoring function was created, named SCORCH (Scoring COnsensus for RMSD-based Classification of Hits). To develop SCORCH, training data is augmented by considering multiple ligand poses and labelling poses based on their RMSD from the native pose. Decoy bias is addressed by generating property-matched decoys for each ligand and using the same methodology for preparing and docking decoys and ligands. A consensus of 3 different machine learning approaches is also used to improve performance. RESULTS We find that multi-pose augmentation in SCORCH improves its docking power and screening power on independent benchmark datasets. SCORCH outperforms an equivalent scoring function trained on single poses, with a 1% enrichment factor (EF) of 13.78 vs. 10.86 on 18 DEKOIS 2.0 targets and a mean native pose rank of 5.9 vs 30.4 on CSAR 2014. Additionally, SCORCH outperforms widely used scoring functions in virtual screening and pose prediction on independent benchmark datasets. CONCLUSION By rationally addressing key limitations of previous scoring functions, SCORCH improves the performance of virtual screening. SCORCH also provides an estimate of its uncertainty, which can help reduce the cost and time required for drug discovery.
Collapse
Affiliation(s)
- Miles McGibbon
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Sam Money-Kyrle
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK
| | - Vincent Blay
- Department of Microbiology and Environmental Toxicology, University of California at Santa Cruz, Santa Cruz, CA 95064, USA; Institute for Integrative Systems Biology (I(2)SysBio), Universitat de València and Spanish Research Council (CSIC), 46980 Valencia, Spain.
| | - Douglas R Houston
- Institute of Quantitative Biology, Biochemistry and Biotechnology, University of Edinburgh, Edinburgh, Scotland EH9 3BF, UK.
| |
Collapse
|
9
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
10
|
Yau MQ, Loo JSE. Consensus scoring evaluated using the GPCR-Bench dataset: Reconsidering the role of MM/GBSA. J Comput Aided Mol Des 2022; 36:427-441. [PMID: 35581483 DOI: 10.1007/s10822-022-00456-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 04/28/2022] [Indexed: 01/09/2023]
Abstract
The recent availability of large numbers of GPCR crystal structures has provided an unprecedented opportunity to evaluate their performance in virtual screening protocols using established benchmarking datasets. In this study, we evaluated the ability of MM/GBSA in consensus scoring-based virtual screening enrichment together with nine classical scoring functions, using the GPCR-Bench dataset consisting of 24 GPCR crystal structures and 254,646 actives and decoys. While the performance of consensus scoring was modest overall, combinations which included MM/GBSA performed relatively well compared to combinations of classical scoring functions. Combinations of MM/GBSA and good-performing scoring functions provided the highest proportion of improvements, with improvements observed in 32% and 19% of all combinations across all targets at the EF1% and EF5% levels respectively. Combinations of MM/GBSA and poor-performing scoring functions still outperformed classical scoring functions, with improvements observed in 26% and 17% of all combinations at the EF1% and EF5% levels. In comparison, only 14-22% and 6-11% of combinations of classical scoring functions produced improvements at EF1% and EF5% respectively. Efforts to improve performance by increasing the number of scoring functions in consensus scoring to three were mostly ineffective. We also observed that consensus scoring performed better for individual scoring functions possessing initially low enrichment factors, potentially implying their benefits are more relevant in such scenarios. Overall, this study demonstrated the first implementation of MM/GBSA in consensus scoring using the GPCR-Bench dataset and could provide a valuable benchmark of the performance of MM/GBSA in comparison to classical scoring functions in consensus scoring for GPCRs.
Collapse
Affiliation(s)
- Mei Qian Yau
- Centre for Drug Discovery and Molecular Pharmacology, Faculty of Health and Medical Sciences, Taylor's University, No. 1 Jalan Taylor's, 47500, Subang Jaya, Selangor, Malaysia.,School of Pharmacy, Faculty of Health and Medical Sciences, Taylor's University, No. 1 Jalan Taylors, 47500, Subang Jaya, Selangor, Malaysia
| | - Jason S E Loo
- Centre for Drug Discovery and Molecular Pharmacology, Faculty of Health and Medical Sciences, Taylor's University, No. 1 Jalan Taylor's, 47500, Subang Jaya, Selangor, Malaysia. .,School of Pharmacy, Faculty of Health and Medical Sciences, Taylor's University, No. 1 Jalan Taylors, 47500, Subang Jaya, Selangor, Malaysia.
| |
Collapse
|
11
|
Tarín-Pelló A, Suay-García B, Pérez-Gracia MT. Antibiotic resistant bacteria: current situation and treatment options to accelerate the development of a new antimicrobial arsenal. Expert Rev Anti Infect Ther 2022; 20:1095-1108. [PMID: 35576494 DOI: 10.1080/14787210.2022.2078308] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
INTRODUCTION Antibiotic resistance is one of the biggest public health threats worldwide. Currently, antibiotic-resistant bacteria kill 700,000 people every year. These data represent the near future in which we find ourselves, a "post-antibiotic era" where the identification and development of new treatments are key. This review is focused on the current and emerging antimicrobial therapies which can solve this global threat. AREAS COVERED Through a literature search using databases such as Medline and Web of Science, and search engines such as Google Scholar, different antimicrobial therapies were analyzed, including pathogen-oriented therapy, phagotherapy, microbiota and antivirulent therapy. Additionally, the development pathways of new antibiotics were described, emphasizing on the potential advantages that the combination of a drug repurposing strategy with the application of mathematical prediction models could bring to solve the problem of AMRs. EXPERT OPINION This review offers several starting points to solve a single problem: reducing the number of AMR. The data suggest that the strategies described could provide many benefits to improve antimicrobial treatments. However, the development of new antimicrobials remains necessary. Drug repurposing, with the application of mathematical prediction models, is considered to be of interest due to its rapid and effective potential to increase the current therapeutic arsenal.
Collapse
Affiliation(s)
- Antonio Tarín-Pelló
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud
| | - Beatriz Suay-García
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ Santiago Ramón y Cajal, 46115 Alfara del Patriarca, Valencia, Spain
| | - María-Teresa Pérez-Gracia
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud
| |
Collapse
|
12
|
Shimazaki T, Tachikawa M. Collaborative Approach between Explainable Artificial Intelligence and Simplified Chemical Interactions to Explore Active Ligands for Cyclin-Dependent Kinase 2. ACS OMEGA 2022; 7:10372-10381. [PMID: 35382271 PMCID: PMC8973106 DOI: 10.1021/acsomega.1c06976] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 03/09/2022] [Indexed: 05/13/2023]
Abstract
To improve virtual screening for drug discovery, we present a collaborative approach between explainable artificial intelligence (AI) and simplified chemical interaction scores to efficiently search for active ligands bound to the target receptor. In particular, we focus on cyclin-dependent kinase 2 (CDK2), which is well known as a cancer target protein. Docking simulation alone is insufficient to distinguish active ligands from decoy molecules. To identify active ligands, in this paper, machine learning is employed together with scoring functions that simplify the screened Coulomb and Lennard-Jones interactions between the ligands and residues of the target receptor. We demonstrate that these simplified interaction scores can significantly improve the classification ability of machine learning models. We also demonstrate that explainable AI together with the simplified scoring method can highlight the important residues of CDK2 for recognizing active ligands.
Collapse
Affiliation(s)
- Tomomi Shimazaki
- Graduate
School of Nanobioscience, Yokohama City
University, 22-2 Seto, Yokohama, Kanagawa 236-0027, Japan
| | - Masanori Tachikawa
- Graduate
School of Data Science, Yokohama City University, 22-2, Seto, Yokohama, Kanagawa 236-0027, Japan
| |
Collapse
|
13
|
Zhang W, Huang J. EViS: An Enhanced Virtual Screening Approach Based on Pocket-Ligand Similarity. J Chem Inf Model 2022; 62:498-510. [PMID: 35084171 DOI: 10.1021/acs.jcim.1c00944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Virtual screening (VS) is a popular technology in drug discovery to identify a new scaffold of actives for a specific drug target, which can be classified into ligand-based and structure-based approaches. As the number of protein-ligand complex structures available in public databases increases, it would be possible to develop a template searching-based VS approach that utilizes such information. In this work, we proposed an enhanced VS approach, which is termed EViS, to integrate ligand docking, protein pocket template searching, and ligand template shape similarity calculation. A novel and simple PL-score to characterize local pocket-ligand template similarity was used to evaluate the screening compounds. Benchmark tests were performed on three datasets including DUDE, LIT-PCBA, and DEKOIS. EViS achieved the average enrichment factors (EFs) of 27.8 and 23.4 at a 1% cutoff for experimental and predicted structures on the widely used DUDE dataset, respectively. Detailed data analysis shows that EViS benefits from obtaining favorable ligand poses from docking and using such ligand geometric information to perform three-dimensional (3D) ligand similarity calculations, and the PL-score is efficient to screen compounds based on template searching in the protein-ligand structure database.
Collapse
Affiliation(s)
- Wenyi Zhang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| | - Jing Huang
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China.,Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
14
|
Can docking scoring functions guarantee success in virtual screening? VIRTUAL SCREENING AND DRUG DOCKING 2022. [DOI: 10.1016/bs.armc.2022.08.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
Abdulhakeem Mansour Alhasbary A, Hashimah Ahamed Hassain Malim N. Turbo Similarity Searching: Effect of Partial Ranking and Fusion Rules on ChEMBL Database. Mol Inform 2021; 41:e2100106. [PMID: 34878229 DOI: 10.1002/minf.202100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 11/25/2021] [Indexed: 11/08/2022]
Abstract
Turbo Similarity Searching (TSS) is the simplest and most recent chemical similarity searching (SS) approach, which improves the effectiveness of SS by performing a multi-target searching. TSS has four important elements, namely structural representation, similarity coefficient, number of nearest neighbours (NNs), and fusion rule, and any changes in these elements could affect the TSS results. A previous study suggested the advantage of using large numbers of reference compounds with small fractions of the database structures to obtain a better recall in group fusion. Therefore, this study aims to investigate the effect of partial ranking on TSS utilising different fusion rules and different numbers of NNs on the ChEMBL database and to evaluate whether these observations hold in TSS. Furthermore, the objective is to observe the effect of the indirect relationship feature of TSS on the partial ranking investigation. The results showed that the effect of using partial ranking on TSS was significant. This study also found that the performance of TSS improved as the database proportions used in the fusion process decreased and by using a small number of NNs. In addition, fusion rules based on reciprocal rank positions (RKP), maximum similarity score (sMAX), and sMNZ were superior to all the other fusion rules.
Collapse
|
16
|
Harigua-Souiai E, Heinhane MM, Abdelkrim YZ, Souiai O, Abdeljaoued-Tej I, Guizani I. Deep Learning Algorithms Achieved Satisfactory Predictions When Trained on a Novel Collection of Anticoronavirus Molecules. Front Genet 2021; 12:744170. [PMID: 34912370 PMCID: PMC8667578 DOI: 10.3389/fgene.2021.744170] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 09/30/2021] [Indexed: 12/26/2022] Open
Abstract
Drug discovery and repurposing against COVID-19 is a highly relevant topic with huge efforts dedicated to delivering novel therapeutics targeting SARS-CoV-2. In this context, computer-aided drug discovery is of interest in orienting the early high throughput screenings and in optimizing the hit identification rate. We herein propose a pipeline for Ligand-Based Drug Discovery (LBDD) against SARS-CoV-2. Through an extensive search of the literature and multiple steps of filtering, we integrated information on 2,610 molecules having a validated effect against SARS-CoV and/or SARS-CoV-2. The chemical structures of these molecules were encoded through multiple systems to be readily useful as input to conventional machine learning (ML) algorithms or deep learning (DL) architectures. We assessed the performances of seven ML algorithms and four DL algorithms in achieving molecule classification into two classes: active and inactive. The Random Forests (RF), Graph Convolutional Network (GCN), and Directed Acyclic Graph (DAG) models achieved the best performances. These models were further optimized through hyperparameter tuning and achieved ROC-AUC scores through cross-validation of 85, 83, and 79% for RF, GCN, and DAG models, respectively. An external validation step on the FDA-approved drugs collection revealed a superior potential of DL algorithms to achieve drug repurposing against SARS-CoV-2 based on the dataset herein presented. Namely, GCN and DAG achieved more than 50% of the true positive rate assessed on the confirmed hits of a PubChem bioassay.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Mohamed Mahmoud Heinhane
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Yosser Zina Abdelkrim
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Oussama Souiai
- Laboratory of BioInformatics BioMathematics and BioStatistics (BIMS)-LR20IPT09, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
| | - Ines Abdeljaoued-Tej
- Laboratory of BioInformatics BioMathematics and BioStatistics (BIMS)-LR20IPT09, Institut Pasteur de Tunis, University of Tunis El Manar, Tunis, Tunisia
- Engineering School of Statistics and Information Analysis, University of Carthage, Ariana, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|
17
|
Ricci-Lopez J, Aguila SA, Gilson MK, Brizuela CA. Improving Structure-Based Virtual Screening with Ensemble Docking and Machine Learning. J Chem Inf Model 2021; 61:5362-5376. [PMID: 34652141 DOI: 10.1021/acs.jcim.1c00511] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
One of the main challenges of structure-based virtual screening (SBVS) is the incorporation of the receptor's flexibility, as its explicit representation in every docking run implies a high computational cost. Therefore, a common alternative to include the receptor's flexibility is the approach known as ensemble docking. Ensemble docking consists of using a set of receptor conformations and performing the docking assays over each of them. However, there is still no agreement on how to combine the ensemble docking results to obtain the final ligand ranking. A common choice is to use consensus strategies to aggregate the ensemble docking scores, but these strategies exhibit slight improvement regarding the single-structure approach. Here, we claim that using machine learning (ML) methodologies over the ensemble docking results could improve the predictive power of SBVS. To test this hypothesis, four proteins were selected as study cases: CDK2, FXa, EGFR, and HSP90. Protein conformational ensembles were built from crystallographic structures, whereas the evaluated compound library comprised up to three benchmarking data sets (DUD, DEKOIS 2.0, and CSAR-2012) and cocrystallized molecules. Ensemble docking results were processed through 30 repetitions of 4-fold cross-validation to train and validate two ML classifiers: logistic regression and gradient boosting trees. Our results indicate that the ML classifiers significantly outperform traditional consensus strategies and even the best performance case achieved with single-structure docking. We provide statistical evidence that supports the effectiveness of ML to improve the ensemble docking performance.
Collapse
Affiliation(s)
- Joel Ricci-Lopez
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico.,Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México (UNAM), Ensenada, Baja California C.P. 22860, Mexico
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, San Diego, California 92093, United States
| | - Carlos A Brizuela
- Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California C.P. 22860, Mexico
| |
Collapse
|
18
|
Discovery of inhibitors targeting protein tyrosine phosphatase 1B using a combined virtual screening approach. Mol Divers 2021; 26:2159-2174. [PMID: 34655403 DOI: 10.1007/s11030-021-10323-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/21/2021] [Indexed: 10/20/2022]
Abstract
Protein tyrosine phosphatase 1B (PTP1B) acts as a therapeutic target for type 2 diabetes. However, the major challenges of PTP1B drug discovery are the poor selectivity and the weak oral bioavailability. In this study, we performed a combined virtual screening approach including multicomplex pharmacophore, molecular docking-based screening, van der Waals energy normalization, pose scaling factor, ADMET evaluation, and molecular dynamics simulation to select PTP1B inhibitors from three databases (PubChem, ChEMBL, and ZINC). We identified three potential PTP1B inhibitors, compounds 1, 4, and 5, with favorable binding energy and good oral bioavailability. The energetic and geometrical analyses show that the three compounds are stably bound to PTP1B, via occupying both the catalytic site (site A) and the proximal noncatalytic site (site B or C). Such occupancy may improve the selectivity. This work not only provided a feasible virtual screening protocol, but also suggested three potential PTP1B inhibitors for the treatment of type 2 diabetes.
Collapse
|
19
|
High-resolution home location prediction from Twitter activities using consensus deep learning. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00808-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
20
|
dockECR: Open consensus docking and ranking protocol for virtual screening of small molecules. J Mol Graph Model 2021; 109:108023. [PMID: 34555725 PMCID: PMC8442548 DOI: 10.1016/j.jmgm.2021.108023] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 08/26/2021] [Accepted: 09/02/2021] [Indexed: 12/17/2022]
Abstract
The development of open computational pipelines to accelerate the discovery of treatments for emerging diseases allows finding novel solutions in shorter periods of time. Consensus molecular docking is one of these approaches, and its main purpose is to increase the detection of real actives within virtual screening campaigns. Here we present dockECR, an open consensus docking and ranking protocol that implements the exponential consensus ranking method to prioritize molecular candidates. The protocol uses four open source molecular docking programs: AutoDock Vina, Smina, LeDock and rDock, to rank the molecules. In addition, we introduce a scoring strategy based on the average RMSD obtained from comparing the best poses from each single program to complement the consensus ranking with information about the predicted poses. The protocol was benchmarked using 15 relevant protein targets with known actives and decoys, and applied using the main protease of the SARS-CoV-2 virus. For the application, different crystal structures of the protease, and frames obtained from molecular dynamics simulations were used to dock a library of 79 molecules derived from previously co-crystallized fragments. The ranking obtained with dockECR was used to prioritize eight candidates, which were evaluated in terms of the interactions generated with key residues from the protease. The protocol can be implemented in any virtual screening campaign involving proteins as molecular targets. The dockECR code is publicly available at: https://github.com/rochoa85/dockECR.
Collapse
|
21
|
Meli R, Anighoro A, Bodkin MJ, Morris GM, Biggin PC. Learning protein-ligand binding affinity with atomic environment vectors. J Cheminform 2021; 13:59. [PMID: 34391475 PMCID: PMC8364054 DOI: 10.1186/s13321-021-00536-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/21/2021] [Indexed: 12/03/2022] Open
Abstract
Scoring functions for the prediction of protein-ligand binding affinity have seen renewed interest in recent years when novel machine learning and deep learning methods started to consistently outperform classical scoring functions. Here we explore the use of atomic environment vectors (AEVs) and feed-forward neural networks, the building blocks of several neural network potentials, for the prediction of protein-ligand binding affinity. The AEV-based scoring function, which we term AEScore, is shown to perform as well or better than other state-of-the-art scoring functions on binding affinity prediction, with an RMSE of 1.22 pK units and a Pearson’s correlation coefficient of 0.83 for the CASF-2016 benchmark. However, AEScore does not perform as well in docking and virtual screening tasks, for which it has not been explicitly trained. Therefore, we show that the model can be combined with the classical scoring function AutoDock Vina in the context of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-learning, where corrections to the AutoDock Vina scoring function are learned instead of the protein-ligand binding affinity itself. Combined with AutoDock Vina, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\Delta$$\end{document}Δ-AEScore has an RMSE of 1.32 pK units and a Pearson’s correlation coefficient of 0.80 on the CASF-2016 benchmark, while retaining the docking and screening power of the underlying classical scoring function.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, UK
| | | | | | | | - Philip C Biggin
- Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
22
|
Llanos MA, Gantner ME, Rodriguez S, Alberca LN, Bellera CL, Talevi A, Gavernet L. Strengths and Weaknesses of Docking Simulations in the SARS-CoV-2 Era: the Main Protease (Mpro) Case Study. J Chem Inf Model 2021; 61:3758-3770. [PMID: 34313128 DOI: 10.1021/acs.jcim.1c00404] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The scientific community is working against the clock to arrive at therapeutic interventions to treat patients with COVID-19. Among the strategies for drug discovery, virtual screening approaches have the capacity to search potential hits within millions of chemical structures in days, with the appropriate computing infrastructure. In this article, we first analyzed the published research targeting the inhibition of the main protease (Mpro), one of the most studied targets of SARS-CoV-2, by docking-based methods. An alarming finding was the lack of an adequate validation of the docking protocols (i.e., pose prediction and virtual screening accuracy) before applying them in virtual screening campaigns. The performance of the docking protocols was tested at some level in 57.7% of the 168 investigations analyzed. However, we found only three examples of a complete retrospective analysis of the scoring functions to quantify the virtual screening accuracy of the methods. Moreover, only two publications reported some experimental evaluation of the proposed hits until preparing this manuscript. All of these findings led us to carry out a retrospective performance validation of three different docking protocols, through the analysis of their pose prediction and screening accuracy. Surprisingly, we found that even though all tested docking protocols have a good pose prediction, their screening accuracy is quite limited as they fail to correctly rank a test set of compounds. These results highlight the importance of conducting an adequate validation of the docking protocols before carrying out virtual screening campaigns, and to experimentally confirm the predictions made by the models before drawing bold conclusions. Finally, successful structure-based drug discovery investigations published during the redaction of this manuscript allow us to propose the inclusion of target flexibility and consensus scoring as alternatives to improve the accuracy of the methods.
Collapse
Affiliation(s)
- Manuel A Llanos
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Melisa E Gantner
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Santiago Rodriguez
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Lucas N Alberca
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Carolina L Bellera
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Alan Talevi
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| | - Luciana Gavernet
- Laboratory of Bioactive Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), 47&115, La Plata (B1900ADU), Buenos Aires, Argentina
| |
Collapse
|
23
|
Bitencourt-Ferreira G, Rizzotto C, de Azevedo Junior WF. Machine Learning-Based Scoring Functions, Development and Applications with SAnDReS. Curr Med Chem 2021; 28:1746-1756. [PMID: 32410551 DOI: 10.2174/0929867327666200515101820] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/06/2020] [Accepted: 04/07/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. OBJECTIVE Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. METHODS SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding and thermodynamic data to create targeted scoring functions. RESULTS Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. CONCLUSION Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker and AutoDock Vina.
Collapse
Affiliation(s)
| | - Camila Rizzotto
- Pontifical Catholic University of Rio Grande do Sul - PUCRS, Porto Alegre-RS, Brazil
| | | |
Collapse
|
24
|
King E, Qi R, Li H, Luo R, Aitchison E. Estimating the Roles of Protonation and Electronic Polarization in Absolute Binding Affinity Simulations. J Chem Theory Comput 2021; 17:2541-2555. [PMID: 33764050 DOI: 10.1021/acs.jctc.0c01305] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Accurate prediction of binding free energies is critical to streamlining the drug development and protein design process. With the advent of GPU acceleration, absolute alchemical methods, which simulate the removal of ligand electrostatics and van der Waals interactions with the protein, have become routinely accessible and provide a physically rigorous approach that enables full consideration of flexibility and solvent interaction. However, standard explicit solvent simulations are unable to model protonation or electronic polarization changes upon ligand transfer from water to the protein interior, leading to inaccurate prediction of binding affinities for charged molecules. Here, we perform extensive simulation totaling ∼540 μs to benchmark the impact of modeling conditions on predictive accuracy for absolute alchemical simulations. Binding to urokinase plasminogen activator (UPA), a protein frequently overexpressed in metastatic tumors, is evaluated for a set of 10 inhibitors with extended flexibility, highly charged character, and titratable properties. We demonstrate that the alchemical simulations can be adapted to utilize the MBAR/PBSA method to improve the accuracy upon incorporating electronic polarization, highlighting the importance of polarization in alchemical simulations of binding affinities. Comparison of binding energy prediction at various protonation states indicates that proper electrostatic setup is also crucial in binding affinity prediction of charged systems, prompting us to propose an alternative binding mode with protonated ligand phenol and Hid-46 at the binding site, a testable hypothesis for future experimental validation.
Collapse
Affiliation(s)
| | - Ruxi Qi
- Cryo-EM Center, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | | | | | | |
Collapse
|
25
|
Narayanan H, Dingfelder F, Butté A, Lorenzen N, Sokolov M, Arosio P. Machine Learning for Biologics: Opportunities for Protein Engineering, Developability, and Formulation. Trends Pharmacol Sci 2021; 42:151-165. [DOI: 10.1016/j.tips.2020.12.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 12/10/2020] [Accepted: 12/16/2020] [Indexed: 12/19/2022]
|
26
|
Bitencourt-Ferreira G, Duarte da Silva A, Filgueira de Azevedo W. Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2. Curr Med Chem 2021; 28:253-265. [PMID: 31729287 DOI: 10.2174/2213275912666191102162959] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 08/22/2019] [Accepted: 09/24/2019] [Indexed: 11/22/2022]
Abstract
BACKGROUND The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. OBJECTIVE Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. METHODS We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. RESULTS Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. CONCLUSION Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.
Collapse
Affiliation(s)
- Gabriela Bitencourt-Ferreira
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| | - Amauri Duarte da Silva
- Specialization Program in Bioinformatics. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900, Brazil
| | - Walter Filgueira de Azevedo
- Laboratory of Computational Systems Biology. Pontifical Catholic University of Rio Grande do Sul (PUCRS). Av. Ipiranga, 6681 Porto Alegre/RS 90619-900 , Brazil
| |
Collapse
|
27
|
Guedes IA, Barreto AMS, Marinho D, Krempser E, Kuenemann MA, Sperandio O, Dardenne LE, Miteva MA. New machine learning and physics-based scoring functions for drug discovery. Sci Rep 2021; 11:3198. [PMID: 33542326 PMCID: PMC7862620 DOI: 10.1038/s41598-021-82410-1] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 01/20/2021] [Indexed: 12/11/2022] Open
Abstract
Scoring functions are essential for modern in silico drug discovery. However, the accurate prediction of binding affinity by scoring functions remains a challenging task. The performance of scoring functions is very heterogeneous across different target classes. Scoring functions based on precise physics-based descriptors better representing protein–ligand recognition process are strongly needed. We developed a set of new empirical scoring functions, named DockTScore, by explicitly accounting for physics-based terms combined with machine learning. Target-specific scoring functions were developed for two important drug targets, proteases and protein–protein interactions, representing an original class of molecules for drug discovery. Multiple linear regression (MLR), support vector machine and random forest algorithms were employed to derive general and target-specific scoring functions involving optimized MMFF94S force-field terms, solvation and lipophilic interactions terms, and an improved term accounting for ligand torsional entropy contribution to ligand binding. DockTScore scoring functions demonstrated to be competitive with the current best-evaluated scoring functions in terms of binding energy prediction and ranking on four DUD-E datasets and will be useful for in silico drug design for diverse proteins as well as for specific targets such as proteases and protein–protein interactions. Currently, the MLR DockTScore is available at www.dockthor.lncc.br.
Collapse
Affiliation(s)
- Isabella A Guedes
- Laboratório Nacional de Computação Científica, Petrópolis, 25651-075, Brazil.,Inserm U973, Université Paris Diderot, Paris, France
| | - André M S Barreto
- Laboratório Nacional de Computação Científica, Petrópolis, 25651-075, Brazil
| | - Diogo Marinho
- Laboratório Nacional de Computação Científica, Petrópolis, 25651-075, Brazil
| | | | | | - Olivier Sperandio
- Inserm U973, Université Paris Diderot, Paris, France.,Structural Bioinformatics Unit, CNRS UMR3528, Institut Pasteur, 75015, Paris, France
| | - Laurent E Dardenne
- Laboratório Nacional de Computação Científica, Petrópolis, 25651-075, Brazil.
| | - Maria A Miteva
- Inserm U973, Université Paris Diderot, Paris, France. .,Inserm U1268 "Medicinal Chemistry and Translational Research", CiTCoM, UMR 8038, CNRS, Université de Paris, 75006, Paris, France.
| |
Collapse
|
28
|
Mehta SD, Okal D, Otieno F, Green SJ, Nordgren RK, Huibner S, Bailey RC, Bhaumik DK, Landay A, Kaul R. Schistosomiasis is associated with rectal mucosal inflammation among Kenyan men who have sex with men. Int J STD AIDS 2021; 32:694-703. [PMID: 33533314 DOI: 10.1177/0956462420985973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Background: Schistosoma mansoni infection is hyperendemic in Lake Victoria communities and associated with cervicovaginal immune alterations and HIV acquisition. We assessed the hypothesis that schistosomiasis correlates with greater rectal inflammation in men who have sex with men (MSM) in Kisumu, Kenya. Methods: In this cross-sectional study of 38 HIV-negative MSM aged 18-35 years, schistosomiasis was diagnosed by urine circulating cathodic antigen (CCA). Microbiome was assessed in rectal swabs by 16S rRNA gene amplicon sequencing, and rectal inflammation by quartile normalized summative score of inflammatory cytokines (IL-1α, IL-1β, IL-8, and TNF-α). Elastic net (EN) regression identified taxa associated with inflammation. Multivariable linear regression estimated the association between inflammation score and schistosomiasis and bacteria identified in EN. Results: Most men were CCA positive (24/38; 63%), and median rectal inflammation score was significantly higher in these participants (11 vs. 8, p = 0.04). In multivariable regression, CCA-positive men had 2.85-point greater inflammation score (p = 0.009). The relative abundance of Succinivibrio (coefficient = -1.13, p = 0.002) and Pseudomonas (coefficient = -1.04, p = 0.001) were negatively associated with inflammation. Discussion: CCA positivity was associated with rectal mucosal inflammation, controlling for rectal microbiome composition. Given its high prevalence and contribution to inflammation, schistosomiasis may have important implications for HIV transmission in this vulnerable population.
Collapse
Affiliation(s)
- Supriya D Mehta
- Division of Epidemiology & Biostatistics, 14681University of Illinois at Chicago School of Public Health, Chicago, USA
| | - Duncan Okal
- Nyanza Reproductive Health Society, Kisumu, Kenya
| | | | - Stefan J Green
- Sequencing Core, Research Resources Center, 14681University of Illinois at Chicago, Chicago, USA
| | - Rachel K Nordgren
- Division of Epidemiology & Biostatistics, 14681University of Illinois at Chicago School of Public Health, Chicago, USA
| | - Sanja Huibner
- Division of Infectious Diseases, University of Toronto School of Medicine, Toronto, Canada
| | - Robert C Bailey
- Division of Epidemiology & Biostatistics, 14681University of Illinois at Chicago School of Public Health, Chicago, USA
| | - Dulal K Bhaumik
- Division of Epidemiology & Biostatistics, 14681University of Illinois at Chicago School of Public Health, Chicago, USA
| | - Alan Landay
- Department of Internal Medicine, 2468Rush University, Chicago, USA
| | - Rupert Kaul
- Division of Infectious Diseases, University of Toronto School of Medicine, Toronto, Canada
| |
Collapse
|
29
|
SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction. Int J Mol Sci 2021; 22:ijms22031392. [PMID: 33573266 PMCID: PMC7869013 DOI: 10.3390/ijms22031392] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/24/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022] Open
Abstract
Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources required to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current models, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-Machine Learning (ML) models using various metrics. We visualize the intermediate layers of SSnet to show a potential latent space for proteins, in particular to extract structural elements in a protein that the model finds influential for ligand binding, which is one of the key features of SSnet. We observed in our study that SSnet learns information about locations in a protein where a ligand can bind, including binding sites, allosteric sites and cryptic sites, regardless of the conformation used. We further observed that SSnet is not biased to any specific molecular interaction and extracts the protein fold information critical for PLI prediction. Our work forms an important gateway to the general exploration of secondary structure-based Deep Learning (DL), which is not just confined to protein-ligand interactions, and as such will have a large impact on protein research, while being readily accessible for de novo drug designers as a standalone package.
Collapse
|
30
|
Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1513] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
31
|
Zhang F, Zhao M, Braun DR, Ericksen SS, Piotrowski JS, Nelson J, Peng J, Ananiev GE, Chanana S, Barns K, Fossen J, Sanchez H, Chevrette MG, Guzei IA, Zhao C, Guo L, Tang W, Currie CR, Rajski SR, Audhya A, Andes DR, Bugni TS. A marine microbiome antifungal targets urgent-threat drug-resistant fungi. Science 2020; 370:974-978. [PMID: 33214279 PMCID: PMC7756952 DOI: 10.1126/science.abd6919] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 10/05/2020] [Indexed: 12/29/2022]
Abstract
New antifungal drugs are urgently needed to address the emergence and transcontinental spread of fungal infectious diseases, such as pandrug-resistant Candida auris. Leveraging the microbiomes of marine animals and cutting-edge metabolomics and genomic tools, we identified encouraging lead antifungal molecules with in vivo efficacy. The most promising lead, turbinmicin, displays potent in vitro and mouse-model efficacy toward multiple-drug-resistant fungal pathogens, exhibits a wide safety index, and functions through a fungal-specific mode of action, targeting Sec14 of the vesicular trafficking pathway. The efficacy, safety, and mode of action distinct from other antifungal drugs make turbinmicin a highly promising antifungal drug lead to help address devastating global fungal pathogens such as C. auris.
Collapse
Affiliation(s)
- Fan Zhang
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Miao Zhao
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Doug R Braun
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Spencer S Ericksen
- Small Molecule Screening Facility, University of Wisconsin Carbone Cancer Center, Madison, WI, USA
| | | | | | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Gene E Ananiev
- Small Molecule Screening Facility, University of Wisconsin Carbone Cancer Center, Madison, WI, USA
| | - Shaurya Chanana
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Kenneth Barns
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Jen Fossen
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Hiram Sanchez
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Marc G Chevrette
- Department of Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, USA
| | - Ilia A Guzei
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Changgui Zhao
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Le Guo
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Weiping Tang
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Cameron R Currie
- Department of Genetics, University of Wisconsin-Madison, Madison, WI, USA
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Scott R Rajski
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
| | - Anjon Audhya
- Department of Biomolecular Chemistry, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, USA
| | - David R Andes
- Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA.
| | - Tim S Bugni
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
32
|
Saikia S, Bordoloi M. Molecular Docking: Challenges, Advances and its Use in Drug Discovery Perspective. Curr Drug Targets 2020; 20:501-521. [PMID: 30360733 DOI: 10.2174/1389450119666181022153016] [Citation(s) in RCA: 191] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 06/08/2018] [Accepted: 08/28/2018] [Indexed: 01/21/2023]
Abstract
Molecular docking is a process through which small molecules are docked into the macromolecular structures for scoring its complementary values at the binding sites. It is a vibrant research area with dynamic utility in structure-based drug-designing, lead optimization, biochemical pathway and for drug designing being the most attractive tools. Two pillars for a successful docking experiment are correct pose and affinity prediction. Each program has its own advantages and drawbacks with respect to their docking accuracy, ranking accuracy and time consumption so a general conclusion cannot be drawn. Moreover, users don't always consider sufficient diversity in their test sets which results in certain programs to outperform others. In this review, the prime focus has been laid on the challenges of docking and troubleshooters in existing programs, underlying algorithmic background of docking, preferences regarding the use of docking programs for best results illustrated with examples, comparison of performance for existing tools and algorithms, state of art in docking, recent trends of diseases and current drug industries, evidence from clinical trials and post-marketing surveillance are discussed. These aspects of the molecular drug designing paradigm are quite controversial and challenging and this review would be an asset to the bioinformatics and drug designing communities.
Collapse
Affiliation(s)
- Surovi Saikia
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| | - Manobjyoti Bordoloi
- Natural Products Chemistry Group, CSIR North East Institute of Science & Technology, Jorhat-785006, Assam, India
| |
Collapse
|
33
|
Ye WL, Shen C, Xiong GL, Ding JJ, Lu AP, Hou TJ, Cao DS. Improving Docking-Based Virtual Screening Ability by Integrating Multiple Energy Auxiliary Terms from Molecular Docking Scoring. J Chem Inf Model 2020; 60:4216-4230. [PMID: 32352294 DOI: 10.1021/acs.jcim.9b00977] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Virtual Screening (VS) based on molecular docking is an efficient method used for retrieving novel hit compounds in drug discovery. However, the accuracy of the current docking scoring function (SF) is usually insufficient. In this study, in order to improve the screening power of SF, a novel approach named EAT-Score was proposed by directly utilizing the energy auxiliary terms (EAT) provided by molecular docking scoring through eXtreme Gradient Boosting (XGBoost). Here, EAT specifically refers to the output of the Molecular Operating Environment (MOE) scoring, including the energy scores of five different classical SFs and the Protein-Ligand Interaction Fingerprint (PLIF) terms. The performance of EAT-Score to discriminate actives from decoys was strictly validated on the DUD-E diverse subset by using different performance metrics. The results showed that EAT-Score performed much better than classical SFs in VS, with its AUC values exhibiting an improvement of around 0.3. Meanwhile, EAT-Score could achieve comparable even better prediction performance compared with other state-of-the-art VS methods, such as some machine learning (ML)-based SFs and classical SFs implemented in docking programs, in terms of AUC, LogAUC, or BEDROC. Furthermore, the EAT-Score model can capture important binding pattern information from protein-ligand complexes by Shapley additive explanations (SHAP) analysis, which may be very helpful in interpreting the ligand binding mechanism for a certain target and thereby guiding drug design.
Collapse
Affiliation(s)
- Wen-Ling Ye
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guo-Li Xiong
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China
| | - Jun-Jie Ding
- Beijing Institute of Pharmaceutical Chemistry, Beijing 102205, P. R. China
| | - Ai-Ping Lu
- Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| | - Ting-Jun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, P. R. China
| | - Dong-Sheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410003, P. R. China.,Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR, P. R. China
| |
Collapse
|
34
|
Li H, Sze K, Lu G, Ballester PJ. Machine‐learning scoring functions for structure‐based virtual screening. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1478] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Hongjian Li
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Kam‐Heung Sze
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Gang Lu
- CUHK‐SDU Joint Laboratory on Reproductive Genetics, School of Biomedical Sciences Chinese University of Hong Kong Shatin Hong Kong
| | - Pedro J. Ballester
- Cancer Research Center of Marseille (INSERM U1068, Institut Paoli‐Calmettes, Aix‐Marseille Université UM105, CNRS UMR7258) Marseille France
| |
Collapse
|
35
|
Willems H, De Cesco S, Svensson F. Computational Chemistry on a Budget: Supporting Drug Discovery with Limited Resources. J Med Chem 2020; 63:10158-10169. [DOI: 10.1021/acs.jmedchem.9b02126] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Henriëtte Willems
- The ALBORADA Drug Discovery Institute, University of Cambridge, Island Research Building, Cambridge Biomedical Campus, Hills Road, Cambridge CB2 0AH, U.K
| | - Stephane De Cesco
- Alzheimer’s Research UK Oxford Drug Discovery Institute, University of Oxford, NDM Research Building, Old Road Campus, Roosevelt Drive, Oxford OX3 7FZ, U.K
| | - Fredrik Svensson
- Alzheimer’s Research UK UCL Drug Discovery Institute, University College London, The Cruciform Building, Gower Street, London WC1E 6BT, U.K
| |
Collapse
|
36
|
Monteiro AFM, de Oliveira Viana J, Muratov E, Scotti MT, Scotti L. In Silico Studies against Viral Sexually Transmitted Diseases. Curr Protein Pept Sci 2020; 20:1135-1150. [PMID: 30854957 DOI: 10.2174/1389203720666190311142747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 01/17/2019] [Accepted: 01/18/2019] [Indexed: 01/02/2023]
Abstract
Sexually Transmitted Diseases (STDs) refer to a variety of clinical syndromes and infections caused by pathogens that can be acquired and transmitted through sexual activity. Among STDs widely reported in the literature, viral sexual diseases have been increasing in a number of cases globally. This emphasizes the need for prevention and treatment. Among the methods widely used in drug planning are Computer-Aided Drug Design (CADD) studies and molecular docking which have the objective of investigating molecular interactions between two molecules to better understand the three -dimensional structural characteristics of the compounds. This review will discuss molecular docking studies applied to viral STDs, such as Ebola virus, Herpes virus and HIV, and reveal promising new drug candidates with high levels of specificity to their respective targets.
Collapse
Affiliation(s)
- Alex F M Monteiro
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraíba, Joao Pessoa-PB, Brazil
| | - Jessika de Oliveira Viana
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraíba, Joao Pessoa-PB, Brazil
| | - Engene Muratov
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, Eshelman School of Pharmacy, University of North Carolina, Beard Hall 301, CB#7568, Chapel Hill, NC, 27599, United States
| | - Marcus T Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraíba, Joao Pessoa-PB, Brazil
| | - Luciana Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraíba, Joao Pessoa-PB, Brazil.,Teaching and Research Management - University Hospital, Federal University of Paraíba, Campus I, 58051-900, João Pessoa-PB, Brazil
| |
Collapse
|
37
|
Dos Santos Maia M, Soares Rodrigues GC, Silva Cavalcanti AB, Scotti L, Scotti MT. Consensus Analyses in Molecular Docking Studies Applied to Medicinal Chemistry. Mini Rev Med Chem 2020; 20:1322-1340. [PMID: 32013847 DOI: 10.2174/1389557520666200204121129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 10/31/2019] [Accepted: 11/04/2019] [Indexed: 02/08/2023]
Abstract
The increasing number of computational studies in medicinal chemistry involving molecular docking has put the technique forward as promising in Computer-Aided Drug Design. Considering the main method in the virtual screening based on the structure, consensus analysis of docking has been applied in several studies to overcome limitations of algorithms of different programs and mainly to increase the reliability of the results and reduce the number of false positives. However, some consensus scoring strategies are difficult to apply and, in some cases, are not reliable due to the small number of datasets tested. Thus, for such a methodology to be successful, it is necessary to understand why, when and how to use consensus docking. Therefore, the present study aims to present different approaches to docking consensus, applications, and several scoring strategies that have been successful and can be applied in future studies.
Collapse
Affiliation(s)
- Mayara Dos Santos Maia
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Gabriela Cristina Soares Rodrigues
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Andreza Barbosa Silva Cavalcanti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Luciana Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| | - Marcus Tullius Scotti
- Program of Natural and Synthetic Bioactive Products (PgPNSB), Health Sciences Center, Federal University of Paraiba, Joao Pessoa-PB, Brazil
| |
Collapse
|
38
|
Macalino SJY, Billones JB, Organo VG, Carrillo MCO. In Silico Strategies in Tuberculosis Drug Discovery. Molecules 2020; 25:E665. [PMID: 32033144 PMCID: PMC7037728 DOI: 10.3390/molecules25030665] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 12/15/2019] [Accepted: 12/17/2019] [Indexed: 12/16/2022] Open
Abstract
Tuberculosis (TB) remains a serious threat to global public health, responsible for an estimated 1.5 million mortalities in 2018. While there are available therapeutics for this infection, slow-acting drugs, poor patient compliance, drug toxicity, and drug resistance require the discovery of novel TB drugs. Discovering new and more potent antibiotics that target novel TB protein targets is an attractive strategy towards controlling the global TB epidemic. In silico strategies can be applied at multiple stages of the drug discovery paradigm to expedite the identification of novel anti-TB therapeutics. In this paper, we discuss the current TB treatment, emergence of drug resistance, and the effective application of computational tools to the different stages of TB drug discovery when combined with traditional biochemical methods. We will also highlight the strengths and points of improvement in in silico TB drug discovery research, as well as possible future perspectives in this field.
Collapse
Affiliation(s)
- Stephani Joy Y. Macalino
- Chemistry Department, De La Salle University, 2401 Taft Avenue, Manila 0992, Philippines;
- OVPAA-EIDR Program, “Computer-Aided Discovery of Compounds for the Treatment of Tuberculosis in the Philippines”, Department of Physical Sciences and Mathematics, College of Arts and Sciences, University of the Philippines Manila, Manila 1000, Philippines; (V.G.O.); (M.C.O.C.)
| | - Junie B. Billones
- OVPAA-EIDR Program, “Computer-Aided Discovery of Compounds for the Treatment of Tuberculosis in the Philippines”, Department of Physical Sciences and Mathematics, College of Arts and Sciences, University of the Philippines Manila, Manila 1000, Philippines; (V.G.O.); (M.C.O.C.)
| | - Voltaire G. Organo
- OVPAA-EIDR Program, “Computer-Aided Discovery of Compounds for the Treatment of Tuberculosis in the Philippines”, Department of Physical Sciences and Mathematics, College of Arts and Sciences, University of the Philippines Manila, Manila 1000, Philippines; (V.G.O.); (M.C.O.C.)
| | - Maria Constancia O. Carrillo
- OVPAA-EIDR Program, “Computer-Aided Discovery of Compounds for the Treatment of Tuberculosis in the Philippines”, Department of Physical Sciences and Mathematics, College of Arts and Sciences, University of the Philippines Manila, Manila 1000, Philippines; (V.G.O.); (M.C.O.C.)
| |
Collapse
|
39
|
Wang Z, Sun H, Shen C, Hu X, Gao J, Li D, Cao D, Hou T. Combined strategies in structure-based virtual screening. Phys Chem Chem Phys 2020; 22:3149-3159. [PMID: 31995074 DOI: 10.1039/c9cp06303j] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The identification and optimization of lead compounds are inalienable components in drug design and discovery pipelines. As a powerful computational approach for the identification of hits with novel structural scaffolds, structure-based virtual screening (SBVS) has exhibited a remarkably increasing influence in the early stages of drug discovery. During the past decade, a variety of techniques and algorithms have been proposed and tested with different purposes in the scope of SBVS. Although SBVS has been a common and proven technology, it still shows some challenges and problems that are needed to be addressed, where the negative influence regardless of protein flexibility and the inaccurate prediction of binding affinity are the two major challenges. Here, focusing on these difficulties, we summarize a series of combined strategies or workflows developed by our group and others. Furthermore, several representative successful applications from recent publications are also discussed to demonstrate the effectiveness of the combined SBVS strategies in drug discovery campaigns.
Collapse
Affiliation(s)
- Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Huiyong Sun
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Chao Shen
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Xueping Hu
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Junbo Gao
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Dan Li
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, P. R. China.
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
| |
Collapse
|
40
|
Masters L, Eagon S, Heying M. Evaluation of consensus scoring methods for AutoDock Vina, smina and idock. J Mol Graph Model 2020; 96:107532. [PMID: 31991303 DOI: 10.1016/j.jmgm.2020.107532] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 12/29/2019] [Accepted: 01/06/2020] [Indexed: 12/27/2022]
Abstract
We investigated the application of consensus scoring using the freely available and open source structure-based virtual screening docking programs AutoDock Vina, smina and idock. These individual programs and several simple consensus scoring methods were tested for their ability to identify hits against 20 DUD-E benchmark targets using the AUC and EF1 metrics. We found that all of the consensus scoring methods, however normalized, fared worse, on average, than simply using the output from a single program, smina. Additionally, the effect of a significant increase in the run time of all three programs was tested to find if a longer run time yielded improved results. Our results indicated that a longer run time than the default had little impact on the performance of these three programs or on consensus scoring methods based on their output. Thus, we have found that using the smina program alone at default settings is the best approach for researchers that do not have access to a suite of commercial docking software packages.
Collapse
Affiliation(s)
- Lily Masters
- Department of Chemistry and Biochemistry, California Polytechnic State University, 1 Grand Avenue, San Luis Obispo, CA, 93407, USA
| | - Scott Eagon
- Department of Chemistry and Biochemistry, California Polytechnic State University, 1 Grand Avenue, San Luis Obispo, CA, 93407, USA
| | - Michael Heying
- Department of Chemistry and Biochemistry, California Polytechnic State University, 1 Grand Avenue, San Luis Obispo, CA, 93407, USA.
| |
Collapse
|
41
|
Lu J, Hou X, Wang C, Zhang Y. Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions. J Chem Inf Model 2019; 59:4540-4549. [PMID: 31638801 DOI: 10.1021/acs.jcim.9b00645] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Structure-based drug design is critically dependent on accuracy of molecular docking scoring functions, and there is of significant interest to advance scoring functions with machine learning approaches. In this work, by judiciously expanding the training set, exploring new features related to explicit mediating water molecules as well as ligand conformation stability, and applying extreme gradient boosting (XGBoost) with Δ-Vina parametrization, we have improved robustness and applicability of machine-learning scoring functions. The new scoring function ΔvinaXGB can not only perform consistently among the top compared to classical scoring functions for the CASF-2016 benchmark but also achieves significantly better prediction accuracy in different types of structures that mimic real docking applications.
Collapse
Affiliation(s)
- Jianing Lu
- Department of Chemistry , New York University , New York , New York 10003 , United States
| | - Xuben Hou
- Department of Chemistry , New York University , New York , New York 10003 , United States.,Department of Medicinal Chemistry, Key Laboratory of Chemical Biology (Ministry of Education), School of Pharmaceutical Science , Shandong University , Jinan , Shandong 250012 , China
| | - Cheng Wang
- Department of Chemistry , New York University , New York , New York 10003 , United States
| | - Yingkai Zhang
- Department of Chemistry , New York University , New York , New York 10003 , United States.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai , Shanghai 200062 , China
| |
Collapse
|
42
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
43
|
Comparing AutoDock and Vina in Ligand/Decoy Discrimination for Virtual Screening. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9214538] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
AutoDock and Vina are two of the most widely used protein–ligand docking programs. The fact that these programs are free and available under an open source license, also makes them a very popular first choice for many users and a common starting point for many virtual screening campaigns, particularly in academia. Here, we evaluated the performance of AutoDock and Vina against an unbiased dataset containing 102 protein targets, 22,432 active compounds and 1,380,513 decoy molecules. In general, the results showed that the overall performance of Vina and AutoDock was comparable in discriminating between actives and decoys. However, the results varied significantly with the type of target. AutoDock was better in discriminating ligands and decoys in more hydrophobic, poorly polar and poorly charged pockets, while Vina tended to give better results for polar and charged binding pockets. For the type of ligand, the tendency was the same for both Vina and AutoDock. Bigger and more flexible ligands still presented a bigger challenge for these docking programs. A set of guidelines was formulated, based on the strengths and weaknesses of both docking program and their limits of validation.
Collapse
|
44
|
Torres PHM, Sodero ACR, Jofily P, Silva-Jr FP. Key Topics in Molecular Docking for Drug Design. Int J Mol Sci 2019; 20:E4574. [PMID: 31540192 PMCID: PMC6769580 DOI: 10.3390/ijms20184574] [Citation(s) in RCA: 167] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/09/2019] [Accepted: 07/10/2019] [Indexed: 12/18/2022] Open
Abstract
Molecular docking has been widely employed as a fast and inexpensive technique in the past decades, both in academic and industrial settings. Although this discipline has now had enough time to consolidate, many aspects remain challenging and there is still not a straightforward and accurate route to readily pinpoint true ligands among a set of molecules, nor to identify with precision the correct ligand conformation within the binding pocket of a given target molecule. Nevertheless, new approaches continue to be developed and the volume of published works grows at a rapid pace. In this review, we present an overview of the method and attempt to summarise recent developments regarding four main aspects of molecular docking approaches: (i) the available benchmarking sets, highlighting their advantages and caveats, (ii) the advances in consensus methods, (iii) recent algorithms and applications using fragment-based approaches, and (iv) the use of machine learning algorithms in molecular docking. These recent developments incrementally contribute to an increase in accuracy and are expected, given time, and together with advances in computing power and hardware capability, to eventually accomplish the full potential of this area.
Collapse
Affiliation(s)
- Pedro H M Torres
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
| | - Ana C R Sodero
- Department of Drugs and Medicines; School of Pharmacy; Federal University of Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Paula Jofily
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21949-900, RJ, Brazil.
| | - Floriano P Silva-Jr
- Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, FIOCRUZ, Rio de Janeiro 21949-900, RJ, Brazil.
| |
Collapse
|
45
|
CompScore: Boosting Structure-Based Virtual Screening Performance by Incorporating Docking Scoring Function Components into Consensus Scoring. J Chem Inf Model 2019; 59:3655-3666. [DOI: 10.1021/acs.jcim.9b00343] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
Wang D, Cui C, Ding X, Xiong Z, Zheng M, Luo X, Jiang H, Chen K. Improving the Virtual Screening Ability of Target-Specific Scoring Functions Using Deep Learning Methods. Front Pharmacol 2019; 10:924. [PMID: 31507420 PMCID: PMC6713720 DOI: 10.3389/fphar.2019.00924] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 07/22/2019] [Indexed: 01/29/2023] Open
Abstract
Scoring functions play an important role in structure-based virtual screening. It has been widely accepted that target-specific scoring functions (TSSFs) may achieve better performance compared with universal scoring functions in actual drug research and development processes. A method that can effectively construct TSSFs will be of great value to drug design and discovery. In this work, we proposed a deep learning–based model named DeepScore to achieve this goal. DeepScore adopted the form of PMF scoring function to calculate protein–ligand binding affinity. However, different from PMF scoring function, in DeepScore, the score for each protein–ligand atom pair was calculated using a feedforward neural network. Our model significantly outperformed Glide Gscore on validation data set DUD-E. The average ROC-AUC on 102 targets was 0.98. We also combined Gscore and DeepScore together using a consensus method and put forward a consensus model named DeepScoreCS. The comparison results showed that DeepScore outperformed other machine learning–based TSSFs building methods. Furthermore, we presented a strategy to visualize the prediction of DeepScore. All of these results clearly demonstrated that DeepScore would be a useful model in constructing TSSFs and represented a novel way incorporating deep learning and drug design.
Collapse
Affiliation(s)
- Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,College of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Zhaoping Xiong
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.,School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| |
Collapse
|
47
|
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS One 2019; 14:e0220113. [PMID: 31430292 PMCID: PMC6701836 DOI: 10.1371/journal.pone.0220113] [Citation(s) in RCA: 104] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Accepted: 06/25/2019] [Indexed: 12/13/2022] Open
Abstract
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUD-E). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of protein-ligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development.
Collapse
|
48
|
Predicting kinase inhibitors using bioactivity matrix derived informer sets. PLoS Comput Biol 2019; 15:e1006813. [PMID: 31381559 PMCID: PMC6695194 DOI: 10.1371/journal.pcbi.1006813] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 08/15/2019] [Accepted: 07/13/2019] [Indexed: 12/21/2022] Open
Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS. In the early stages of drug discovery efforts, computational models are used to predict activity and prioritize compounds for experimental testing. New targets commonly lack the data necessary to build effective models, and the screening needed to generate that experimental data can be costly. We seek to improve the efficiency of the initial screening phase, and of the process of prioritizing compounds for subsequent screening. We choose a small informer set of compounds based on publicly available prior screening data on distinct targets. We then collect experimental data on these informer compounds and use that data to predict the activity of other compounds in the set for the target of interest. Computational and statistical tools are needed to identify informer compounds and to prioritize other compounds for subsequent phases of screening. We find that selection of informer compounds on the basis of bioactivity data from previous screening efforts is superior to the traditional approach of selection of a chemically diverse subset of compounds. We demonstrate the success of this approach in retrospective tests on the Published Kinase Inhibitor Sets (PKIS) chemogenomic data and in prospective experimental screens against three additional non-human kinase targets.
Collapse
|
49
|
Liu Z, Singh SB, Zheng Y, Lindblom P, Tice C, Dong C, Zhuang L, Zhao Y, Kruk BA, Lala D, Claremon DA, McGeehan GM, Gregg RD, Cain R. Discovery of Potent Inhibitors of 11β-Hydroxysteroid Dehydrogenase Type 1 Using a Novel Growth-Based Protocol of in Silico Screening and Optimization in CONTOUR. J Chem Inf Model 2019; 59:3422-3436. [PMID: 31355641 DOI: 10.1021/acs.jcim.9b00198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Zhijie Liu
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Suresh B. Singh
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Yajun Zheng
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Peter Lindblom
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Colin Tice
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Chengguo Dong
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Linghang Zhuang
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Yi Zhao
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Barbara A. Kruk
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Deepak Lala
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - David A. Claremon
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Gerard M. McGeehan
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Richard D. Gregg
- Vitae Pharmaceuticals, Inc., 502 West Office Center Drive, Fort Washington, Pennsylvania 19034, United States
| | - Robert Cain
- Allergan Plc, 2525 Dupont Drive, Irvine, California 92612, United States
| |
Collapse
|
50
|
Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH, Hou T. End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem Rev 2019; 119:9478-9508. [DOI: 10.1021/acs.chemrev.9b00055] [Citation(s) in RCA: 578] [Impact Index Per Article: 115.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Ercheng Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Huiyong Sun
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Junmei Wang
- Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Hui Liu
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - John Z. H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200122, China
- Department of Chemistry, New York University, New York, New York 10003, United States
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|