1
|
Popov KI, Wellnitz J, Maxfield T, Tropsha A. HIt Discovery using docking ENriched by GEnerative Modeling (HIDDEN GEM): A novel computational workflow for accelerated virtual screening of ultra-large chemical libraries. Mol Inform 2024; 43:e202300207. [PMID: 37802967 PMCID: PMC11156482 DOI: 10.1002/minf.202300207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/03/2023] [Accepted: 10/06/2023] [Indexed: 10/08/2023]
Abstract
Recent rapid expansion of make-on-demand, purchasable, chemical libraries comprising dozens of billions or even trillions of molecules has challenged the efficient application of traditional structure-based virtual screening methods that rely on molecular docking. We present a novel computational methodology termed HIDDEN GEM (HIt Discovery using Docking ENriched by GEnerative Modeling) that greatly accelerates virtual screening. This workflow uniquely integrates machine learning, generative chemistry, massive chemical similarity searching and molecular docking of small, selected libraries in the beginning and the end of the workflow. For each target, HIDDEN GEM nominates a small number of top-scoring virtual hits prioritized from ultra-large chemical libraries. We have benchmarked HIDDEN GEM by conducting virtual screening campaigns for 16 diverse protein targets using Enamine REAL Space library comprising 37 billion molecules. We show that HIDDEN GEM yields the highest enrichment factors as compared to state of the art accelerated virtual screening methods, while requiring the least computational resources. HIDDEN GEM can be executed with any docking software and employed by users with limited computational resources.
Collapse
Affiliation(s)
- Konstantin I. Popov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - James Wellnitz
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - Travis Maxfield
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
- These authors contributed equally: Konstantin I. Popov, James Wellnitz, Travis Maxfield
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
2
|
Rouse I, Lobaskin V. Machine-learning based prediction of small molecule-surface interaction potentials. Faraday Discuss 2023; 244:306-335. [PMID: 37092299 DOI: 10.1039/d2fd00155a] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Predicting the adsorption affinity of a small molecule to a target surface is of importance to a range of fields, from catalysis to drug delivery and human safety, but a complex task to perform computationally when taking into account the effects of the surrounding medium. We present a flexible machine-learning approach to predict potentials of mean force (PMFs) and adsorption energies for chemical-surface pairs from the separate interaction potentials of each partner with a set of probe atoms. We use a pre-existing library of PMFs obtained via atomistic molecular dynamics simulations for a variety of inorganic materials and molecules to train the model. We find good agreement between original and predicted PMFs in both training and validation groups, confirming the predictive power of this approach, and demonstrate the flexibility of the model by producing PMFs for molecules and surfaces outside the training set.
Collapse
Affiliation(s)
- Ian Rouse
- School of Physics, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Vladimir Lobaskin
- School of Physics, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
3
|
Yang S, Gong W, Zhou T, Sun X, Chen L, Zhou W, Li C. emPDBA: protein-DNA binding affinity prediction by combining features from binding partners and interface learned with ensemble regression model. Brief Bioinform 2023:7165253. [PMID: 37193676 DOI: 10.1093/bib/bbad192] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/26/2023] [Accepted: 04/29/2023] [Indexed: 05/18/2023] Open
Abstract
Protein-deoxyribonucleic acid (DNA) interactions are important in a variety of biological processes. Accurately predicting protein-DNA binding affinity has been one of the most attractive and challenging issues in computational biology. However, the existing approaches still have much room for improvement. In this work, we propose an ensemble model for Protein-DNA Binding Affinity prediction (emPDBA), which combines six base models with one meta-model. The complexes are classified into four types based on the DNA structure (double-stranded or other forms) and the percentage of interface residues. For each type, emPDBA is trained with the sequence-based, structure-based and energy features from binding partners and complex structures. Through feature selection by the sequential forward selection method, it is found that there do exist considerable differences in the key factors contributing to intermolecular binding affinity. The complex classification is beneficial for the important feature extraction for binding affinity prediction. The performance comparison of our method with other peer ones on the independent testing dataset shows that emPDBA outperforms the state-of-the-art methods with the Pearson correlation coefficient of 0.53 and the mean absolute error of 1.11 kcal/mol. The comprehensive results demonstrate that our method has a good performance for protein-DNA binding affinity prediction. Availability and implementation: The source code is available at https://github.com/ChunhuaLiLab/emPDBA/.
Collapse
Affiliation(s)
- Shuang Yang
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Xiaohan Sun
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Lei Chen
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Wenxue Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
4
|
Kekenes-Huskey PM, Burgess DE, Sun B, Bartos DC, Rozmus ER, Anderson CL, January CT, Eckhardt LL, Delisle BP. Mutation-Specific Differences in Kv7.1 ( KCNQ1) and Kv11.1 ( KCNH2) Channel Dysfunction and Long QT Syndrome Phenotypes. Int J Mol Sci 2022; 23:7389. [PMID: 35806392 PMCID: PMC9266926 DOI: 10.3390/ijms23137389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 06/22/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
The electrocardiogram (ECG) empowered clinician scientists to measure the electrical activity of the heart noninvasively to identify arrhythmias and heart disease. Shortly after the standardization of the 12-lead ECG for the diagnosis of heart disease, several families with autosomal recessive (Jervell and Lange-Nielsen Syndrome) and dominant (Romano-Ward Syndrome) forms of long QT syndrome (LQTS) were identified. An abnormally long heart rate-corrected QT-interval was established as a biomarker for the risk of sudden cardiac death. Since then, the International LQTS Registry was established; a phenotypic scoring system to identify LQTS patients was developed; the major genes that associate with typical forms of LQTS were identified; and guidelines for the successful management of patients advanced. In this review, we discuss the molecular and cellular mechanisms for LQTS associated with missense variants in KCNQ1 (LQT1) and KCNH2 (LQT2). We move beyond the "benign" to a "pathogenic" binary classification scheme for different KCNQ1 and KCNH2 missense variants and discuss gene- and mutation-specific differences in K+ channel dysfunction, which can predispose people to distinct clinical phenotypes (e.g., concealed, pleiotropic, severe, etc.). We conclude by discussing the emerging computational structural modeling strategies that will distinguish between dysfunctional subtypes of KCNQ1 and KCNH2 variants, with the goal of realizing a layered precision medicine approach focused on individuals.
Collapse
Affiliation(s)
- Peter M. Kekenes-Huskey
- Department of Cell and Molecular Physiology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL 60153, USA
| | - Don E. Burgess
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Bin Sun
- Department of Pharmacology, Harbin Medical University, Harbin 150081, China;
| | | | - Ezekiel R. Rozmus
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Corey L. Anderson
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Craig T. January
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Lee L. Eckhardt
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Brian P. Delisle
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| |
Collapse
|
5
|
Li R, Tian Y, Yang Z, Ji Y, Ding J, Yan A. Classification models and SAR analysis on HDAC1 inhibitors using machine learning methods. Mol Divers 2022:10.1007/s11030-022-10466-w. [PMID: 35737257 DOI: 10.1007/s11030-022-10466-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 05/19/2022] [Indexed: 10/17/2022]
Abstract
Histone deacetylase (HDAC) 1, a member of the histone deacetylases family, plays a pivotal role in various tumors. In this study, we collected 7313 human HDAC1 inhibitors with bioactivities to form a dataset. Then, the dataset was divided into a training set and a test set using two splitting methods: (1) Kohonen's self-organizing map and (2) random splitting. The molecular structures were represented by MACCS fingerprints, RDKit fingerprints, topological torsions fingerprints and ECFP4 fingerprints. A total of 80 classification models were built by using five machine learning methods, including decision tree (DT), random forest, support vector machine, eXtreme Gradient Boosting and deep neural network. Model 15A_2 built by the XGBoost algorithm based on ECFP4 fingerprints showed the best performance, with an accuracy of 88.08% and an MCC value of 0.76 on the test set. Finally, we clustered the 7313 HDAC1 inhibitors into 31 subsets, and the substructural features in each subset were investigated. Moreover, using DT algorithm we analyzed the structure-activity relationship of HDAC1 inhibitors. It may conclude that some substructures have a significant effect on high activity, such as N-(2-amino-phenyl)-benzamide, benzimidazole, AR-42 analogues, hydroxamic acid with a middle chain alkyl and 4-aryl imidazole with a midchain of alkyl whose α carbon is chiral.
Collapse
Affiliation(s)
- Rourou Li
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Yujia Tian
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Zhenwu Yang
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Yueshan Ji
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, China
| | - Jiaqi Ding
- School of International Education, Beijing University of Chemical Technology, Beijing, China
| | - Aixia Yan
- State Key Laboratory of Chemical Resource Engineering, Department of Pharmaceutical Engineering, Beijing University of Chemical Technology, Beijing, China.
| |
Collapse
|
6
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
7
|
Veit-Acosta M, de Azevedo Junior WF. Computational Prediction of Binding Affinity for CDK2-ligand Complexes. A Protein Target for Cancer Drug Discovery. Curr Med Chem 2021; 29:2438-2455. [PMID: 34365938 DOI: 10.2174/0929867328666210806105810] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 06/15/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND CDK2 participates in the control of eukaryotic cell-cycle progression. Due to the great interest in CDK2 for drug development and the relative easiness in crystallizing this enzyme, we have over 400 structural studies focused on this protein target. This structural data is the basis for the development of computational models to estimate CDK2-ligand binding affinity. OBJECTIVE This work focuses on the recent developments in the application of supervised machine learning modeling to develop scoring functions to predict the binding affinity of CDK2. METHOD We employed the structures available at the protein data bank and the ligand information accessed from the BindingDB, Binding MOAD, and PDBbind to evaluate the predictive performance of machine learning techniques combined with physical modeling used to calculate binding affinity. We compared this hybrid methodology with classical scoring functions available in docking programs. RESULTS Our comparative analysis of previously published models indicated that a model created using a combination of a mass-spring system and cross-validated Elastic Net to predict the binding affinity of CDK2-inhibitor complexes outperformed classical scoring functions available in AutoDock4 and AutoDock Vina. CONCLUSION All studies reviewed here suggest that targeted machine learning models are superior to classical scoring functions to calculate binding affinities. Specifically for CDK2, we see that the combination of physical modeling with supervised machine learning techniques exhibits improved predictive performance to calculate the protein-ligand binding affinity. These results find theoretical support in the application of the concept of scoring function space.
Collapse
Affiliation(s)
- Martina Veit-Acosta
- Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI 49008. United States
| | | |
Collapse
|
8
|
Hastings J, Glauer M, Memariani A, Neuhaus F, Mossakowski T. Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification. J Cheminform 2021; 13:23. [PMID: 33726837 PMCID: PMC7962259 DOI: 10.1186/s13321-021-00500-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 02/26/2021] [Indexed: 12/22/2022] Open
Abstract
Chemical data is increasingly openly available in databases such as PubChem, which contains approximately 110 million compound entries as of February 2021. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem. In this paper we evaluate machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and long short-term memory artificial neural networks, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We find that classical learning approaches such as logistic regression perform well with sets of relatively specific, disjoint chemical classes, while the neural network is able to handle larger sets of overlapping classes but needs more examples per class to learn from, and is not able to make a class prediction for every molecule. Future work will explore hybrid and ensemble approaches, as well as alternative network architectures including neuro-symbolic approaches.
Collapse
Affiliation(s)
- Janna Hastings
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Martin Glauer
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Adel Memariani
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Fabian Neuhaus
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| | - Till Mossakowski
- Department of Computer Science, Otto-von-Guericke University of Magdeburg, Magdeburg, Germany
| |
Collapse
|
9
|
Mahmoud AH, Masters MR, Yang Y, Lill MA. Elucidating the multiple roles of hydration for accurate protein-ligand binding prediction via deep learning. Commun Chem 2020; 3:19. [PMID: 36703428 PMCID: PMC9814895 DOI: 10.1038/s42004-020-0261-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 01/16/2020] [Indexed: 01/29/2023] Open
Abstract
Accurate and efficient prediction of protein-ligand interactions has been a long-lasting dream of practitioners in drug discovery. The insufficient treatment of hydration is widely recognized to be a major limitation for accurate protein-ligand scoring. Using an integration of molecular dynamics simulations on thousands of protein structures with novel big-data analytics based on convolutional neural networks and deep Taylor decomposition, we consistently identify here three different patterns of hydration to be essential for protein-ligand interactions. In addition to desolvation and water-mediated interactions, the formation of enthalpically favorable networks of first-shell water molecules around solvent-exposed ligand moieties is identified to be essential for protein-ligand binding. Despite being currently neglected in drug discovery, this hydration phenomenon could lead to new avenues in optimizing the free energy of ligand binding. Application of deep neural networks incorporating hydration to docking provides 89% accuracy in binding pose ranking, an essential step for rational structure-based drug design.
Collapse
Affiliation(s)
- Amr H. Mahmoud
- grid.169077.e0000 0004 1937 2197Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47906 USA
| | - Matthew R. Masters
- grid.169077.e0000 0004 1937 2197Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47906 USA
| | - Ying Yang
- grid.169077.e0000 0004 1937 2197Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47906 USA
| | - Markus A. Lill
- grid.169077.e0000 0004 1937 2197Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, 575 Stadium Mall Drive, West Lafayette, IN 47906 USA ,grid.6612.30000 0004 1937 0642Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| |
Collapse
|
10
|
Matsuzaka Y, Uesawa Y. DeepSnap-Deep Learning Approach Predicts Progesterone Receptor Antagonist Activity With High Performance. Front Bioeng Biotechnol 2020; 7:485. [PMID: 32039185 PMCID: PMC6987043 DOI: 10.3389/fbioe.2019.00485] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 12/30/2019] [Indexed: 12/16/2022] Open
Abstract
The progesterone receptor (PR) is important therapeutic target for many malignancies and endocrine disorders due to its role in controlling ovulation and pregnancy via the reproductive cycle. Therefore, the modulation of PR activity using its agonists and antagonists is receiving increasing interest as novel treatment strategy. However, clinical trials using the PR modulators have not yet been found conclusive evidences. Recently, increasing evidence from several fields shows that the classification of chemical compounds, including agonists and antagonists, can be done with recent improvements in deep learning (DL) using deep neural network. Therefore, we recently proposed a novel DL-based quantitative structure-activity relationship (QSAR) strategy using transfer learning to build prediction models for agonists and antagonists. By employing this novel approach, referred as DeepSnap-DL method, which uses images captured from 3-dimension (3D) chemical structure with multiple angles as input data into the DL classification, we constructed prediction models of the PR antagonists in this study. Here, the DeepSnap-DL method showed a high performance prediction of the PR antagonists by optimization of some parameters and image adjustment from 3D-structures. Furthermore, comparison of the prediction models from this approach with conventional machine learnings (MLs) indicated the DeepSnap-DL method outperformed these MLs. Therefore, the models predicted by DeepSnap-DL would be powerful tool for not only QSAR field in predicting physiological and agonist/antagonist activities, toxicity, and molecular bindings; but also for identifying biological or pathological phenomena.
Collapse
Affiliation(s)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo, Japan
| |
Collapse
|