1
|
Morales N, Valdés-Muñoz E, González J, Valenzuela-Hormazábal P, Palma JM, Galarza C, Catagua-González Á, Yáñez O, Pereira A, Bustos D. Machine Learning-Driven Classification of Urease Inhibitors Leveraging Physicochemical Properties as Effective Filter Criteria. Int J Mol Sci 2024; 25:4303. [PMID: 38673888 PMCID: PMC11049951 DOI: 10.3390/ijms25084303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/03/2024] [Accepted: 04/08/2024] [Indexed: 04/28/2024] Open
Abstract
Urease, a pivotal enzyme in nitrogen metabolism, plays a crucial role in various microorganisms, including the pathogenic Helicobacter pylori. Inhibiting urease activity offers a promising approach to combating infections and associated ailments, such as chronic kidney diseases and gastric cancer. However, identifying potent urease inhibitors remains challenging due to resistance issues that hinder traditional approaches. Recently, machine learning (ML)-based models have demonstrated the ability to predict the bioactivity of molecules rapidly and effectively. In this study, we present ML models designed to predict urease inhibitors by leveraging essential physicochemical properties. The methodological approach involved constructing a dataset of urease inhibitors through an extensive literature search. Subsequently, these inhibitors were characterized based on physicochemical properties calculations. An exploratory data analysis was then conducted to identify and analyze critical features. Ultimately, 252 classification models were trained, utilizing a combination of seven ML algorithms, three attribute selection methods, and six different strategies for categorizing inhibitory activity. The investigation unveiled discernible trends distinguishing urease inhibitors from non-inhibitors. This differentiation enabled the identification of essential features that are crucial for precise classification. Through a comprehensive comparison of ML algorithms, tree-based methods like random forest, decision tree, and XGBoost exhibited superior performance. Additionally, incorporating the "chemical family type" attribute significantly enhanced model accuracy. Strategies involving a gray-zone categorization demonstrated marked improvements in predictive precision. This research underscores the transformative potential of ML in predicting urease inhibitors. The meticulous methodology outlined herein offers actionable insights for developing robust predictive models within biochemical systems.
Collapse
Affiliation(s)
- Natalia Morales
- Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile; (N.M.); (J.G.)
| | - Elizabeth Valdés-Muñoz
- Doctorado en Biotecnología Traslacional, Centro de Biotecnología de los Recursos Naturales, Universidad Católica del Maule, Talca 3480094, Chile;
| | - Jaime González
- Magíster en Ciencias de la Computación, Universidad Católica del Maule, Talca 3460000, Chile; (N.M.); (J.G.)
| | - Paulina Valenzuela-Hormazábal
- Departamento de Farmacología, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción 4030000, Chile;
| | - Jonathan M. Palma
- Facultad de Ingeniería, Universidad de Talca, Curicó 3344158, Chile;
| | - Christian Galarza
- Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador; (C.G.); (Á.C.-G.)
| | - Ángel Catagua-González
- Departamento de Matemáticas, Facultad de Ciencias Naturales y Matemáticas, Escuela Superior Politécnica del Litoral, Guayaquil EC090903, Ecuador; (C.G.); (Á.C.-G.)
| | - Osvaldo Yáñez
- Núcleo de Investigación en Data Science, Facultad de Ingeniería y Negocios, Universidad de las Américas, Santiago 7500000, Chile;
| | - Alfredo Pereira
- Facultad de Ingeniería, Arquitectura y Diseño, Universidad San Sebastián, Bellavista 7, Santiago 8420524, Chile
| | - Daniel Bustos
- Laboratorio de Bioinformática y Química Computacional, Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca 3480094, Chile
| |
Collapse
|
2
|
Amaya-Rodriguez CA, Carvajal-Zamorano K, Bustos D, Alegría-Arcos M, Castillo K. A journey from molecule to physiology and in silico tools for drug discovery targeting the transient receptor potential vanilloid type 1 (TRPV1) channel. Front Pharmacol 2024; 14:1251061. [PMID: 38328578 PMCID: PMC10847257 DOI: 10.3389/fphar.2023.1251061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024] Open
Abstract
The heat and capsaicin receptor TRPV1 channel is widely expressed in nerve terminals of dorsal root ganglia (DRGs) and trigeminal ganglia innervating the body and face, respectively, as well as in other tissues and organs including central nervous system. The TRPV1 channel is a versatile receptor that detects harmful heat, pain, and various internal and external ligands. Hence, it operates as a polymodal sensory channel. Many pathological conditions including neuroinflammation, cancer, psychiatric disorders, and pathological pain, are linked to the abnormal functioning of the TRPV1 in peripheral tissues. Intense biomedical research is underway to discover compounds that can modulate the channel and provide pain relief. The molecular mechanisms underlying temperature sensing remain largely unknown, although they are closely linked to pain transduction. Prolonged exposure to capsaicin generates analgesia, hence numerous capsaicin analogs have been developed to discover efficient analgesics for pain relief. The emergence of in silico tools offered significant techniques for molecular modeling and machine learning algorithms to indentify druggable sites in the channel and for repositioning of current drugs aimed at TRPV1. Here we recapitulate the physiological and pathophysiological functions of the TRPV1 channel, including structural models obtained through cryo-EM, pharmacological compounds tested on TRPV1, and the in silico tools for drug discovery and repositioning.
Collapse
Affiliation(s)
- Cesar A. Amaya-Rodriguez
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
- Departamento de Fisiología y Comportamiento Animal, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - Karina Carvajal-Zamorano
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
| | - Daniel Bustos
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado Universidad Católica del Maule, Talca, Chile
- Laboratorio de Bioinformática y Química Computacional, Departamento de Medicina Traslacional, Facultad de Medicina, Universidad Católica del Maule, Talca, Chile
| | - Melissa Alegría-Arcos
- Núcleo de Investigación en Data Science, Facultad de Ingeniería y Negocios, Universidad de las Américas, Santiago, Chile
| | - Karen Castillo
- Centro Interdisciplinario de Neurociencia de Valparaíso, Facultad de Ciencias, Universidad de Valparaíso, Valparaíso, Chile
- Centro de Investigación de Estudios Avanzados del Maule (CIEAM), Vicerrectoría de Investigación y Postgrado Universidad Católica del Maule, Talca, Chile
| |
Collapse
|
3
|
Ashraf FB, Akter S, Mumu SH, Islam MU, Uddin J. Bio-activity prediction of drug candidate compounds targeting SARS-Cov-2 using machine learning approaches. PLoS One 2023; 18:e0288053. [PMID: 37669264 PMCID: PMC10479925 DOI: 10.1371/journal.pone.0288053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/18/2023] [Indexed: 09/07/2023] Open
Abstract
The SARS-CoV-2 3CLpro protein is one of the key therapeutic targets of interest for COVID-19 due to its critical role in viral replication, various high-quality protein crystal structures, and as a basis for computationally screening for compounds with improved inhibitory activity, bioavailability, and ADMETox properties. The ChEMBL and PubChem database contains experimental data from screening small molecules against SARS-CoV-2 3CLpro, which expands the opportunity to learn the pattern and design a computational model that can predict the potency of any drug compound against coronavirus before in-vitro and in-vivo testing. In this study, Utilizing several descriptors, we evaluated 27 machine learning classifiers. We also developed a neural network model that can correctly identify bioactive and inactive chemicals with 91% accuracy, on CheMBL data and 93% accuracy on combined data on both CheMBL and Pubchem. The F1-score for inactive and active compounds was 93% and 94%, respectively. SHAP (SHapley Additive exPlanations) on XGB classifier to find important fingerprints from the PaDEL descriptors for this task. The results indicated that the PaDEL descriptors were effective in predicting bioactivity, the proposed neural network design was efficient, and the Explanatory factor through SHAP correctly identified the important fingertips. In addition, we validated the effectiveness of our proposed model using a large dataset encompassing over 100,000 molecules. This research employed various molecular descriptors to discover the optimal one for this task. To evaluate the effectiveness of these possible medications against SARS-CoV-2, more in-vitro and in-vivo research is required.
Collapse
Affiliation(s)
- Faisal Bin Ashraf
- Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh
- Department of Computer Science and Engineering, University of California, Riverside, California, United States of America
| | - Sanjida Akter
- Department of Cell Molecular and Developmental Biology, University of California, Riverside, California, United States of America
| | - Sumona Hoque Mumu
- School of Kinesiology, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Muhammad Usama Islam
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, United States of America
| | - Jasim Uddin
- Department of Applied Computing and Engineering, Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, Wales, United Kingdom
| |
Collapse
|
4
|
Venkatraman V. FP-MAP: an extensive library of fingerprint-based molecular activity prediction tools. Front Chem 2023; 11:1239467. [PMID: 37649967 PMCID: PMC10462816 DOI: 10.3389/fchem.2023.1239467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 07/31/2023] [Indexed: 09/01/2023] Open
Abstract
Discovering new drugs for disease treatment is challenging, requiring a multidisciplinary effort as well as time, and resources. With a view to improving hit discovery and lead compound identification, machine learning (ML) approaches are being increasingly used in the decision-making process. Although a number of ML-based studies have been published, most studies only report fragments of the wider range of bioactivities wherein each model typically focuses on a particular disease. This study introduces FP-MAP, an extensive atlas of fingerprint-based prediction models that covers a diverse range of activities including neglected tropical diseases (caused by viral, bacterial and parasitic pathogens) as well as other targets implicated in diseases such as Alzheimer's. To arrive at the best predictive models, performance of ≈4,000 classification/regression models were evaluated on different bioactivity data sets using 12 different molecular fingerprints. The best performing models that achieved test set AUC values of 0.62-0.99 have been integrated into an easy-to-use graphical user interface that can be downloaded from https://gitlab.com/vishsoft/fpmap.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
5
|
Choudhary R, Walhekar V, Muthal A, Kumar D, Bagul C, Kulkarni R. Machine learning facilitated structural activity relationship approach for the discovery of novel inhibitors targeting EGFR. J Biomol Struct Dyn 2023; 41:12445-12463. [PMID: 36762704 DOI: 10.1080/07391102.2023.2175263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023]
Abstract
This research manuscript aims to find the most effective epidermal growth factor receptor (EGFR) inhibitors from millions of in house compounds through Machine Learning (ML) techniques. ML-based structure activity relationship (SAR) models were validated to predict biological activity of untested novel molecules. Six ML algorithms, including k nearest neighbour (KNN), decision tree (DT), Logistic Regression, support vector machine (SVM), multilinear regression (MLR), and random forest (RF), were used to build for activity prediction. Among these, RF classifier (accuracy for train and test set is 90% and 81%) and RF regressor (R2 and MSE for trainset is 0.83 and 0.29 and for test set, 0.69 and 0.46) showed good predictive performance. Also, the six most essential features that affect the biological activity parameter and highly contribute to model development were successfully selected by the variable importance technique. RF regression model was used to predict the biological activity expressed as pIC50 of nearly ten million molecules while RF classification model classifies those molecules into active, moderately active, and least active according to their predicted pIC50. Based on two models, thousand molecules from million molecules with higher predicted pIC50 values and classified as active were selected for molecular docking. Based on the docking scores, predicted pIC50, and binding interactions with MET769 residue, compounds, i.e., Zinc257233137, Zinc257232249, and Zinc101379788, were identified as potential EGFR inhibitors with predicted pIC50 7.72, 7.85, and 7.70. Dynamics studies were also performed on Zinc257233137 to illustrate that it has good binding free energy and stable hydrogen bonding interactions with EGFR. These molecules can be used for further research and proved to be the novel drugs for EGFR in cancer treatment.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rekha Choudhary
- Department of Pharmaceutical Chemistry, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
| | - Vinayak Walhekar
- Department of Pharmaceutical Chemistry, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
| | - Amol Muthal
- Department of Pharmacology, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
| | - Dilip Kumar
- Department of Pharmaceutical Chemistry, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
- Department of Entomology, University of California, Davis, Davis, California, USA
- UC Davis Comprehensive Cancer Centre, University of California, Davis, Davis, California, USA
| | - Chandrakant Bagul
- Department of Pharmaceutical Chemistry, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
| | - Ravindra Kulkarni
- Department of Pharmaceutical Chemistry, BVDU'S Poona College of Pharmacy, Pune, Maharashtra, India
| |
Collapse
|
6
|
Bonanni D, Pinzi L, Rastelli G. Development of machine learning classifiers to predict compound activity on prostate cancer cell lines. J Cheminform 2022; 14:77. [PMID: 36348374 PMCID: PMC9641853 DOI: 10.1186/s13321-022-00647-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/27/2022] [Indexed: 11/11/2022] Open
Abstract
Prostate cancer is the most common type of cancer in men. The disease presents good survival rates if treated at the early stages. However, the evolution of the disease in its most aggressive variant remains without effective therapeutic answers. Therefore, the identification of novel effective therapeutics is urgently needed. On these premises, we developed a series of machine learning models, based on compounds with reported highly homogeneous cell-based antiproliferative assay data, able to predict the activity of ligands towards the PC-3 and DU-145 prostate cancer cell lines. The data employed in the development of the computational models was finely-tuned according to a series of thresholds for the classification of active/inactive compounds, to the number of features to be implemented, and by using 10 different machine learning algorithms. Models’ evaluation allowed us to identify the best combination of activity thresholds and ML algorithms for the classification of active compounds, achieving prediction performances with MCC values above 0.60 for PC-3 and DU-145 cells. Moreover, in silico models based on the combination of PC-3 and DU-145 data were also developed, demonstrating excellent precision performances. Finally, an analysis of the activity annotations reported for the ligands in the curated datasets were conducted, suggesting associations between cellular activity and biological targets that might be explored in the future for the design of more effective prostate cancer antiproliferative agents.
Collapse
|
7
|
Rasool A, Batool Z, Khan M, Halim SA, Shafiq Z, Temirak A, Salem MA, Ali TE, Khan A, Al-Harrasi A. Bis-pharmacophore of cinnamaldehyde-clubbed thiosemicarbazones as potent carbonic anhydrase-II inhibitors. Sci Rep 2022; 12:16095. [PMID: 36167735 PMCID: PMC9515202 DOI: 10.1038/s41598-022-19975-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 09/07/2022] [Indexed: 11/24/2022] Open
Abstract
Here, we report the synthesis, carbonic anhydrase-II (CA-II) inhibition and structure–activity relationship studies of cinnamaldehyde-clubbed thiosemicarbazones derivatives. The derivatives showed potent activities in the range of 10.3 ± 0.62–46.6 ± 0.62 µM. Among all the synthesized derivatives, compound 3n (IC50 = 10.3 ± 0.62 µM), 3g (IC50 = 12.1 ± 1.01 µM), and 3h (IC50 = 13.4 ± 0.52 µM) showed higher inhibitory activity as compared to the standard inhibitor, acetazolamide. Furthermore, molecular docking of all the active compounds was carried out to predict their behavior of molecular binding. The docking results indicate that the most active hit (3n) specifically mediate ionic interaction with the Zn ion in the active site of CA-II. Furthermore, the The199 and Thr200 support the binding of thiosemicarbazide moiety of 3n, while Gln 92 supports the interactions of all the compounds by hydrogen bonding. In addition to Gln92, few other residues including Asn62, Asn67, The199, and Thr200 play important role in the stabilization of these molecules in the active site by specifically providing H-bonds to the thiosemicarbazide moiety of compounds. The docking score of active hits are found in range of − 6.75 to − 4.42 kcal/mol, which indicates that the computational prediction correlates well with the in vitro results.
Collapse
Affiliation(s)
- Asif Rasool
- Institute of Chemical Sciences, Bahauddin Zakariya University, Multan, 60800, Pakistan
| | - Zahra Batool
- Institute of Chemical Sciences, Bahauddin Zakariya University, Multan, 60800, Pakistan
| | - Majid Khan
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, Sultanate of Oman
| | - Sobia Ahsan Halim
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, Sultanate of Oman
| | - Zahid Shafiq
- Institute of Chemical Sciences, Bahauddin Zakariya University, Multan, 60800, Pakistan. .,Department of Pharmaceutical and Medicinal Chemistry, University of Bonn, An der Immenburg 4, 53121, Bonn, Germany.
| | - Ahmed Temirak
- National Research Centre, Chemistry of Natural and Microbial Products Department, Pharmaceutical and Drug Industries Research Institute, Dokki, P.O. Box 12622, Cairo, Egypt
| | - Mohamed A Salem
- Department of Chemistry, Faculty of Science and Arts, King Khalid University, Muhayil, Assir, Saudi Arabia.,Department of Chemistry, Faculty of Science, Al-Azhar University, 11284 Nasr City, Cairo, Egypt
| | - Tarik E Ali
- Department of Chemistry, Faculty of Science, King Khalid University, Abha, Saudi Arabia.,Department of Chemistry, Faculty of Education, Ain Shams University, Cairo, Egypt
| | - Ajmal Khan
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, Sultanate of Oman.
| | - Ahmed Al-Harrasi
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, Sultanate of Oman.
| |
Collapse
|
8
|
Khatua S, Taraphder S. In the footsteps of an inhibitor unbinding from the active site of human carbonic anhydrase II. J Biomol Struct Dyn 2022; 41:3187-3204. [PMID: 35257634 DOI: 10.1080/07391102.2022.2048075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The crystal structure of human carbonic anhydrase (HCA) II bound to an inhibitor molecule, 6-hydroxy-2-thioxocoumarin (FC5), shows FC5 to be located in a hydrophobic pocket at the active site. The present work employs classical molecular dynamics (MD) simulation to follow the FC5 molecule for 1 μs as it unbinds from its binding location, adopts the path of substrate/product diffusion (path 1) to leave the active site at around 75 ns. It is then found to undergo repeated binding and unbinding at different locations on the surface of the enzyme in water. Several transient excursions through different regions of the enzyme are also observed prior to its exit from the active site. These transient paths are combined with functionally relevant cavities/channels to enlist five additional pathways (path 2-6). Pathways 1-6 are subsequently explored using steered MD and umbrella sampling simulations. A free energy barrier of 0.969 kcal mol-1 is encountered along path 1, while barriers in the range of 0.57-2.84 kcal mol-1 are obtained along paths 2, 4 and 5. We also analyze in detail the interaction between FC5 and the enzyme along each path as the former leaves the active site of HCA II. Our results indicate path 1 to be the major exit pathway for FC5, although competing contributions may also come from the paths 2, 4 and 5.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Satyajit Khatua
- Department of Chemistry, Indian Institute of Technology, Kharagpur, India
| | - Srabani Taraphder
- Department of Chemistry, Indian Institute of Technology, Kharagpur, India
| |
Collapse
|
9
|
Yu Y, Wang R, Teo RD. Machine Learning Approaches for Metalloproteins. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27041277. [PMID: 35209064 PMCID: PMC8878495 DOI: 10.3390/molecules27041277] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 02/10/2022] [Accepted: 02/11/2022] [Indexed: 01/10/2023]
Abstract
Metalloproteins are a family of proteins characterized by metal ion binding, whereby the presence of these ions confers key catalytic and ligand-binding properties. Due to their ubiquity among biological systems, researchers have made immense efforts to predict the structural and functional roles of metalloproteins. Ultimately, having a comprehensive understanding of metalloproteins will lead to tangible applications, such as designing potent inhibitors in drug discovery. Recently, there has been an acceleration in the number of studies applying machine learning to predict metalloprotein properties, primarily driven by the advent of more sophisticated machine learning algorithms. This review covers how machine learning tools have consolidated and expanded our comprehension of various aspects of metalloproteins (structure, function, stability, ligand-binding interactions, and inhibitors). Future avenues of exploration are also discussed.
Collapse
Affiliation(s)
- Yue Yu
- Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, Jiangsu 215316, China;
- Department of Physics, Duke University, Durham, NC 27708, USA
| | - Ruobing Wang
- Department of Chemistry, Duke University, Durham, NC 27708, USA;
| | - Ruijie D. Teo
- Department of Chemistry, Duke University, Durham, NC 27708, USA;
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Correspondence:
| |
Collapse
|