1
|
OréMaldonado KA, Cuesta SA, Mora JR, Loroño MA, Paz JL. Discovering New Tyrosinase Inhibitors by Using In Silico Modelling, Molecular Docking, and Molecular Dynamics. Pharmaceuticals (Basel) 2025; 18:418. [PMID: 40143194 PMCID: PMC11946302 DOI: 10.3390/ph18030418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2024] [Revised: 03/09/2025] [Accepted: 03/13/2025] [Indexed: 03/28/2025] Open
Abstract
Background/Objectives: This study was used in silico modelling to search for potential tyrosinase protein inhibitors from a database of different core structures for IC50 prediction. Methods: Four machine learning algorithms and topographical descriptors were tested for model construction. Results: A model based on multiple linear regression was the most robust, with only six descriptors, and validated by the Tropsha test with statistical parameters R2 = 0.8687, Q2LOO = 0.8030, and Q2ext = 0.9151. From the screening of FDA-approved drugs and natural products, the pIC50 values for 15,424 structures were calculated. The applicability domain analysis covered 100% of the external dataset and 71.22% and 73.26% of the two screening datasets. Fifteen candidates with pIC50 above 7.6 were identified, with five structures proposed as potential tyrosinase enzyme inhibitors, which underwent ADME analysis. Conclusions: The molecular docking analysis was performed for the dataset used in the training-test process and for the fifteen structures from the screening dataset with potential pharmaceutical tyrosinase inhibition, followed by molecular dynamics studies for the top five candidates with the highest predicted pIC50 values. The new use of these five candidates in tyrosinase inhibition is highlighted based on their promising application in melanoma treatment.
Collapse
Affiliation(s)
- Kevin A. OréMaldonado
- Departamento Académico de Química Fisicoquímica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru;
| | - Sebastián A. Cuesta
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito 170901, Ecuador;
- Department of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, Manchester M17DN, UK
| | - José R. Mora
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito 170901, Ecuador;
| | - Marcos A. Loroño
- Departamento Académico de Química Fisicoquímica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru;
| | - José L. Paz
- Departamento Académico de Química Inorgánica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru;
| |
Collapse
|
2
|
García MC, Cuesta SA, Mora JR, Paz JL, Marrero-Ponce Y, Alexis F, Márquez EA. Using computer modeling to find new LRRK2 inhibitors for parkinson's disease. Sci Rep 2025; 15:4085. [PMID: 39900949 PMCID: PMC11790940 DOI: 10.1038/s41598-025-86926-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Accepted: 01/15/2025] [Indexed: 02/05/2025] Open
Abstract
Parkinson's disease (PD) is a complex neurodegenerative disorder that affects multiple neurotransmitters, and its exact cause is still unknown. Developing new drugs for PD is a lengthy and expensive process, making it difficult to find new treatments. This study aims to create a detailed dataset to build strong predictive models with various machine learning algorithms. An ensemble modeling approach was employed to screen the DrugBank database, aiming to repurpose approved medications as potential treatments for Parkinson's disease (PD). The dataset was constructed using pIC50 values of various compounds targeting the inhibition of leucine-rich repeat kinase 2 (LRRK2). The best ensemble model showed exceptional predictive performance, with five-fold cross-validation and external validation metrics exceeding 0.8 (Q2cv = 0.864 and Q2ext = 0.873). The DrugBank screening resulted in three promising drugs-triamterene, phenazopyridine, and CRA_1801-with predicted pIC50 values greater than 7, warranting further investigation as novel PD treatments. Molecular docking and molecular dynamics simulations were performed to provide a comprehensive understanding of the interactions between LRRK2 and the inhibitors in the data set and best molecules of the screening. Free energy of binding calculation along with hydrogen bond occupancy analysis and RMSD of the ligand in the pocket show CRA_1801 as the best candidate to be repurposed as LRRK2 inhibitor.
Collapse
Affiliation(s)
- María C García
- Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Universidad San Francisco de Quito, 170901, Quito, Ecuador
| | - Sebastián A Cuesta
- Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Universidad San Francisco de Quito, 170901, Quito, Ecuador
- Department of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
| | - José R Mora
- Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Universidad San Francisco de Quito, 170901, Quito, Ecuador.
| | - Jose L Paz
- Departamento Académico de Química Inorgánica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Lima, Perú
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, 1200-841, Quito, Ecuador
| | - Frank Alexis
- Departamento de Ingeniería Química, Diego de Robles y Vía Interoceánica, Universidad San Francisco de Quito, 170901, Quito, Ecuador
| | - Edgar A Márquez
- Grupo de Investigaciones en Química y Biología, Departamento de Química y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia.
| |
Collapse
|
3
|
Liu L, Zhang Q, Ma Y, Lin L, Liu W, Ding A, Wang C, Zhou S, Cai J, Tang H. Recent Developments in Drug Design of Oral Synthetic Free Fatty Acid Receptor 1 Agonists. Drug Des Devel Ther 2024; 18:5961-5983. [PMID: 39679134 PMCID: PMC11646431 DOI: 10.2147/dddt.s487469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 11/12/2024] [Indexed: 12/17/2024] Open
Abstract
Over the past two decades, synthetic FFAR1 agonists such as TAK-875 and TSL1806 have undergone meticulous design and extensive clinical trials. However, due to issues primarily related to hepatotoxicity, no FFAR1 agonist has yet received regulatory approval. Research into the sources of hepatotoxicity suggests that one potential cause lies in the molecular structure itself. These structures typically feature lipid-like carboxylic acid head groups, which tend to generate toxic metabolites. Strategies to mitigate these risks focus on optimizing chemical groups to reduce lipophilicity and prevent the formation of reactive metabolites. Recent studies have concentrated on developing low-molecular-weight compounds that more closely resemble natural products, with CPL207280 showing promising potential and liver safety, currently in Phase II clinical trials. Moreover, ongoing research continues to explore the potential applications of FFAR1 agonists in diabetes management, as well as in conditions such as non-alcoholic fatty liver disease (NAFLD) and cerebrovascular diseases. Utilizing advanced technologies such as artificial intelligence and computer-aided design, the development of compact molecules that mimic natural structures represents a hopeful direction for future research and development.
Collapse
Affiliation(s)
- Lei Liu
- Tasly Academy, Tasly Pharma Co., Ltd., Tianjin, People’s Republic of China
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Qinghua Zhang
- Tasly Academy, Tasly Pharma Co., Ltd., Tianjin, People’s Republic of China
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Yichuan Ma
- China Medical University (CMU)-The Queen’s University of Belfast (QUB) Joint College, Shenyang, Liaoning, People’s Republic of China
| | - Ling Lin
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Wenli Liu
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Aizhong Ding
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Chunjian Wang
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| | - Shuiping Zhou
- Tasly Academy, Tasly Pharma Co., Ltd., Tianjin, People’s Republic of China
| | - Jinyong Cai
- Tasly Academy, Tasly Pharma Co., Ltd., Tianjin, People’s Republic of China
| | - Hai Tang
- Tasly Academy, Tasly Pharma Co., Ltd., Tianjin, People’s Republic of China
- Tasly Academy Jiangsu Branch, Jiangsu Tasly Diyi Pharmaceutical Co., Ltd., Huaian, Jiangsu, People’s Republic of China
| |
Collapse
|
4
|
De La Torre S, Cuesta SA, Calle L, Mora JR, Paz JL, Espinoza-Montero PJ, Flores-Sumoza M, Márquez EA. Computational approaches for lead compound discovery in dipeptidyl peptidase-4 inhibition using machine learning and molecular dynamics techniques. Comput Biol Chem 2024; 112:108145. [PMID: 39002224 DOI: 10.1016/j.compbiolchem.2024.108145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 07/01/2024] [Accepted: 07/08/2024] [Indexed: 07/15/2024]
Abstract
The prediction of possible lead compounds from already-known drugs that may present DPP-4 inhibition activity imply a advantage in the drug development in terms of time and cost to find alternative medicines for the treatment of Type 2 Diabetes Mellitus (T2DM). The inhibition of dipeptidyl peptidase-4 (DPP-4) has been one of the most explored strategies to develop potential drugs against this condition. A diverse dataset of molecules with known experimental inhibitory activity against DPP-4 was constructed and used to develop predictive models using different machine-learning algorithms. Model M36 is the most promising one based on the internal and external performance showing values of Q2CV = 0.813, and Q2EXT = 0.803. The applicability domain evaluation and Tropsha's analysis were conducted to validate M36, indicating its robustness and accuracy in predicting pIC50 values for organic molecules within the established domain. The physicochemical properties of the ligands, including electronegativity, polarizability, and van der Waals volume were relevant to predict the inhibition process. The model was then employed in the virtual screening of potential DPP4 inhibitors, finding 448 compounds from the DrugBank and 9 from DiaNat with potential inhibitory activity. Molecular docking and molecular dynamics simulations were used to get insight into the ligand-protein interaction. From the screening and the favorable molecular dynamic results, several compounds including Skimmin (pIC50 = 3.54, Binding energy = -8.86 kcal/mol), bergenin (pIC50 = 2.69, Binding energy = -13.90 kcal/mol), and DB07272 (pIC50 = 3.97, Binding energy = -25.28 kcal/mol) seem to be promising hits to be tested and optimized in the treatment of T2DM. This results imply a important reduction in cost and time on the application of this drugs because all the information about the its metabolism is already available.
Collapse
Affiliation(s)
- Sandra De La Torre
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito 170901, Ecuador
| | - Sebastián A Cuesta
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito 170901, Ecuador; Department of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
| | - Luis Calle
- Facultad de Ciencias Médicas, Instituto de Investigación e Innovación en Salud Integral, Universidad Católica Santiago de Guayaquil, Guayaquil 09013493, Ecuador
| | - José R Mora
- Grupo de Química Computacional y Teórica (QCT-USFQ), Departamento de Ingeniería Química, Universidad San Francisco de Quito, Diego de Robles y Vía Interoceánica, Quito 170901, Ecuador.
| | - Jose L Paz
- Departamento Académico de Química Inorgánica, Facultad de Química e Ingeniería Química, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | | | - Máryury Flores-Sumoza
- Facultad de Ciencias Básicas y Biomédicas, Programa de Química y Farmacia, Universidad Simón Bolívar, carrera 59 N° 59-65, Barranquilla 080002, Colombia
| | - Edgar A Márquez
- Grupo de Investigaciones en Química y Biología, Departamento de Química y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla 081007, Colombia
| |
Collapse
|
5
|
Bathula S, Sankaranarayanan M, Malgija B, Kaliappan I, Bhandare RR, Shaik AB. 2-Amino Thiazole Derivatives as Prospective Aurora Kinase Inhibitors against Breast Cancer: QSAR, ADMET Prediction, Molecular Docking, and Molecular Dynamic Simulation Studies. ACS OMEGA 2023; 8:44287-44311. [PMID: 38027360 PMCID: PMC10666282 DOI: 10.1021/acsomega.3c07003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/05/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023]
Abstract
The aurora kinase is a key enzyme that is implicated in tumor growth. Research revealed that small molecules that target aurora kinase have beneficial effects as anticancer agents. In the present study, in order to identify potential antibreast cancer agents with aurora kinase inhibitory activity, we employed QSARINS software to perform the quantitative structure-activity relationship (QSAR). The statistical values resulted from the study include R2 = 0.8902, CCCtr = 0.7580, Q2 LOO = 0.7875, Q2LMO = 0.7624, CCCcv = 0.7535, R2ext = 0.8735, and CCCext = 0.8783. Among the four generated models, the two best models encompass five important variables, including PSA, EstateVSA5, MoRSEP3, MATSp5, and RDFC24. The parameters including the atomic volume, atomic charges, and Sanderson's electronegativity played an important role in designing newer lead compounds. Based on the above data, we have designed six series of compounds including 1a-e, 2a-e, 3a-e, 4a-e, 5a-e, and 6a-e. All these compounds were subjected to molecular docking studies by using AutoDock v4.2.6 against the aurora kinase protein (1MQ4). Among the above 30 compounds, the 2-amino thiazole derivatives 1a, 2a, 3e, 4d, 5d, and 6d have excellent binding interactions with the active site of 1MQ4. Compound 1a had the highest docking score (-9.67) and hence was additionally subjected to molecular dynamic simulation investigations for 100 ns. The stable binding of compound 1a with 1MQ4 was verified by RMSD, RMSF, RoG, H-bond, molecular mechanics-generalized Born surface area (MM-GBSA), free binding energy calculations, and solvent-accessible surface area (SASA) analyses. Furthermore, newly designed compound 1a exhibited excellent ADMET properties. Based on the above findings, we propose that the designed compound 1a may be utilized as the best theoretical lead for future experimental research of selective inhibition of aurora kinase, therefore assisting in the creation of new antibreast cancer drugs.
Collapse
Affiliation(s)
- Sivakumar Bathula
- Department
of Pharmaceutical Chemistry, SRM College of Pharmacy, SRM
Institute of Science and Technology, Kattankulathur 603203, Chengalpattu
District, Tamil Nadu, India
| | - Murugesan Sankaranarayanan
- Medicinal
Chemistry Research Laboratory, Department of Pharmacy, Birla Institute of Technology & Science (BITS)
Pilani, Pilani Campus, Pilani 333031, Rajasthan, India
| | - Beutline Malgija
- MCC-MRF
Innovation Park, Madras Christian College, Chennai 600059, Tamil Nadu, India
| | - Ilango Kaliappan
- Department
of Pharmaceutical Chemistry, SRM College of Pharmacy, SRM
Institute of Science and Technology, Kattankulathur 603203, Chengalpattu
District, Tamil Nadu, India
| | - Richie R. Bhandare
- Department
of Pharmaceutical Sciences, College
of Pharmacy and Health Sciences, Ajman University, P.O. Box 346, Ajman 61001, United Arab Emirates
- Centre of
Medical and Bio-allied Health Sciences Research, Ajman University, P.O. Box 346, Ajman 61001, United Arab Emirates
| | - Afzal B. Shaik
- St.
Mary’s College of Pharmacy, St. Mary’s
Group of Institutions Guntur, Affiliated to Jawaharlal Nehru Technological
University Kakinada, Chebrolu, Guntur 522212, Andhra
Pradesh, India
- Center
for Global Health Research, Saveetha Medical College, Saveetha Institute of Medical and Technical Sciences, Chennai 602105, Tamil Nadu, India
| |
Collapse
|
6
|
Galantamine Based Novel Acetylcholinesterase Enzyme Inhibitors: A Molecular Modeling Design Approach. Molecules 2023; 28:molecules28031035. [PMID: 36770702 PMCID: PMC9919016 DOI: 10.3390/molecules28031035] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 12/31/2022] [Accepted: 01/09/2023] [Indexed: 01/22/2023] Open
Abstract
Acetylcholinesterase (AChE) enzymes play an essential role in the development of Alzheimer's disease (AD). Its excessive activity causes several neuronal problems, particularly psychopathies and neuronal cell death. A bioactive pose on the hAChE B site of the human acetylcholinesterase (hAChE) enzyme employed in this investigation, which was obtained from the Protein Data Bank (PDB ID 4EY6), allowed for the prediction of the binding affinity and free binding energy between the protein and the ligand. Virtual screening was performed to obtain structures similar to Galantamine (GNT) with potential hAChE activity. The top 200 hit compounds were prioritized through the use of filters in ZincPharmer, with special features related to the pharmacophore. Critical analyses were carried out, such as hierarchical clustering analysis (HCA), ADME/Tox predictions, molecular docking, molecular simulation studies, synthetic accessibility (SA), lipophilicity, water solubility, and hot spots to confirm the stable binding of the two promising molecules (ZINC16951574-LMQC2, and ZINC08342556-LMQC5). The metabolism prediction, with metabolites M3-2, which is formed by Glutathionation reaction (Phase II), M1-2, and M2-2 formed from the reaction of S-oxidation and Aliphatic hydroxylation (Phase I), were both reactive but with no side effects. Theoretical synthetic routes and prediction of synthetic accessibility for the most promising compounds are also proposed. In conclusion, this study shows that in silico modeling can be used to create new drug candidate inhibitors for hAChE. The compounds ZINC16951574-LMQC2, and ZINC08342556-LMQC5 are particularly promising for oral administration because they have a favorable drug-likeness profile, excellent lipid solubility, high bioavailability, and adequate pharmacokinetics.
Collapse
|
7
|
Kharisma VD, Utami SL, Rizky WC, Dings TGA, Ullah ME, Jakhmola V, Nugraha AP. Molecular docking study of Zingiber officinale Roscoe compounds as a mumps virus nucleoprotein inhibitor. DENTAL JOURNAL (MAJALAH KEDOKTERAN GIGI) 2023; 56:23-29. [DOI: 10.20473/j.djmkg.v56.i1.p23-29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
Abstract
Background: Mumps virus (MuV) can trigger severe infections, such as parotitis, epididymo-orchitis, and meningitis. The effectiveness of MuV vaccine administration has been proven, but current outbreaks warrant the development of antivirals against MuV. Zingiber officinale var. Roscoe or ginger is often used as an alternative remedy. Currently, there are no known in vitro or in vivo studies that investigate ginger as an MuV antiviral. Purpose: This study aims to evaluate the antiviral potency of the bioactive compounds in Zingiber officinale var. Roscoe against MuV. Methods: Antiviral activity screening was conducted by druglikeness analysis, antiviral probability, molecular docking, and molecular dynamic simulation. Results: As an antiviral, 6-shogaol from Zingiber officinale var. Roscoe has potency against MuV. It has a good binding affinity and can establish interactions with the binding domain of the target protein by forming hydrogen, Van der Waals, and alkyl bonds. Conclusion: The complex of 6-shogaol_NP was predicted to be volatile but stable for triggering inhibitory activity. However, these results must be proved by in vivo and in vitro approaches to strengthen the scientific evidence.
Collapse
|
8
|
Searching glycolate oxidase inhibitors based on QSAR, molecular docking, and molecular dynamic simulation approaches. Sci Rep 2022; 12:19969. [PMID: 36402831 PMCID: PMC9675741 DOI: 10.1038/s41598-022-24196-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 11/11/2022] [Indexed: 11/21/2022] Open
Abstract
Primary hyperoxaluria type 1 (PHT1) treatment is mainly focused on inhibiting the enzyme glycolate oxidase, which plays a pivotal role in the production of glyoxylate, which undergoes oxidation to produce oxalate. When the renal secretion capacity exceeds, calcium oxalate forms stones that accumulate in the kidneys. In this respect, detailed QSAR analysis, molecular docking, and dynamics simulations of a series of inhibitors containing glycolic, glyoxylic, and salicylic acid groups have been performed employing different regression machine learning techniques. Three robust models with less than 9 descriptors-based on a tenfold cross (Q2 CV) and external (Q2 EXT) validation-were found i.e., MLR1 (Q2 CV = 0.893, Q2 EXT = 0.897), RF1 (Q2 CV = 0.889, Q2 EXT = 0.907), and IBK1 (Q2 CV = 0.891, Q2 EXT = 0.907). An ensemble model was built by averaging the predicted pIC50 of the three models, obtaining a Q2 EXT = 0.933. Physicochemical properties such as charge, electronegativity, hardness, softness, van der Waals volume, and polarizability were considered as attributes to build the models. To get more insight into the potential biological activity of the compouds studied herein, docking and dynamic analysis were carried out, finding the hydrophobic and polar residues show important interactions with the ligands. A screening of the DrugBank database V.5.1.7 was performed, leading to the proposal of seven commercial drugs within the applicability domain of the models, that can be suggested as possible PHT1 treatment.
Collapse
|
9
|
Prada Gori DN, Llanos MA, Bellera CL, Talevi A, Alberca LN. iRaPCA and SOMoC: Development and Validation of Web Applications for New Approaches for the Clustering of Small Molecules. J Chem Inf Model 2022; 62:2987-2998. [PMID: 35687523 DOI: 10.1021/acs.jcim.2c00265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The clustering of small molecules implies the organization of a group of chemical structures into smaller subgroups with similar features. Clustering has important applications to sample chemical datasets or libraries in a representative manner (e.g., to choose, from a virtual screening hit list, a chemically diverse subset of compounds to be submitted to experimental confirmation, or to split datasets into representative training and validation sets when implementing machine learning models). Most strategies for clustering molecules are based on molecular fingerprints and hierarchical clustering algorithms. Here, two open-source in-house methodologies for clustering of small molecules are presented: iterative Random subspace Principal Component Analysis clustering (iRaPCA), an iterative approach based on feature bagging, dimensionality reduction, and K-means optimization; and Silhouette Optimized Molecular Clustering (SOMoC), which combines molecular fingerprints with the Uniform Manifold Approximation and Projection (UMAP) and Gaussian Mixture Model algorithm (GMM). In a benchmarking exercise, the performance of both clustering methods has been examined across 29 datasets containing between 100 and 5000 small molecules, comparing these results with those given by two other well-known clustering methods, Ward and Butina. iRaPCA and SOMoC consistently showed the best performance across these 29 datasets, both in terms of within-cluster and between-cluster distances. Both iRaPCA and SOMoC have been implemented as free Web Apps and standalone applications, to allow their use to a wide audience within the scientific community.
Collapse
Affiliation(s)
- Denis N Prada Gori
- Laboratory of Bioactive Compounds Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata B1900ADU, Argentina
| | - Manuel A Llanos
- Laboratory of Bioactive Compounds Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata B1900ADU, Argentina
| | - Carolina L Bellera
- Laboratory of Bioactive Compounds Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata B1900ADU, Argentina
| | - Alan Talevi
- Laboratory of Bioactive Compounds Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata B1900ADU, Argentina
| | - Lucas N Alberca
- Laboratory of Bioactive Compounds Research and Development (LIDeB), Department of Biological Sciences, Faculty of Exact Sciences, National University of La Plata (UNLP), La Plata B1900ADU, Argentina
| |
Collapse
|