1
|
Atasever S. Enhancing HCV NS3 Inhibitor Classification with Optimized Molecular Fingerprints Using Random Forest. Int J Mol Sci 2025; 26:2680. [PMID: 40141322 PMCID: PMC11943357 DOI: 10.3390/ijms26062680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2025] [Revised: 03/09/2025] [Accepted: 03/11/2025] [Indexed: 03/28/2025] Open
Abstract
The classification of Hepatitis C virus (HCV) NS3 inhibitors is essential for identifying potential antiviral agents through computational methods. This study aims to develop an optimized machine learning (ML) model using random forest (RF) and molecular fingerprints to accurately classify HCV NS3 inhibitors. A dataset of 965 molecules was retrieved from the ChEMBL database, and 290 bioactive compounds were selected for model training. Twelve molecular fingerprint descriptors were tested, and the CDK graph-only fingerprint yielded the best performance. In addition to RF, performance comparisons of other classifiers such as instance-based k-nearest neighbor (IBk), logistic regression (LR), AdaBoost, and OneR were conducted using WEKA with various molecular fingerprint descriptors. The optimized RF model achieved an accuracy of 89.6552%, a mean absolute error (MAE) of 0.2114, a root mean square error (RMSE) of 0.3304, and a Matthews correlation coefficient (MCC) of 0.7950 on the test set. These results highlight the effectiveness of optimized molecular fingerprints in enhancing virtual screening (VS) for HCV inhibitors. This approach offers a data-driven method for drug discovery.
Collapse
Affiliation(s)
- Sema Atasever
- Department of Computer Engineering, Faculty of Engineering and Architecture, Nevsehir Haci Bektas Veli University, 50300 Nevşehir, Turkey
| |
Collapse
|
2
|
Nabi T, Riyed TH, Ornob A. Deep learning based predictive modeling to screen natural compounds against TNF-alpha for the potential management of rheumatoid arthritis: Virtual screening to comprehensive in silico investigation. PLoS One 2024; 19:e0303954. [PMID: 39636801 PMCID: PMC11620472 DOI: 10.1371/journal.pone.0303954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 10/02/2024] [Indexed: 12/07/2024] Open
Abstract
Rheumatoid arthritis (RA) affects an estimated 0.1% to 2.0% of the world's population, leading to a substantial impact on global health. The adverse effects and toxicity associated with conventional RA treatment pathways underscore the critical need to seek potential new therapeutic candidates, particularly those of natural sources that can treat the condition with minimal side effects. To address this challenge, this study employed a deep-learning (DL) based approach to conduct a virtual assessment of natural compounds against the Tumor Necrosis Factor-alpha (TNF-α) protein. TNF-α stands out as the primary pro-inflammatory cytokine, crucial in the development of RA. Our predictive model demonstrated appreciable performance, achieving MSE of 0.6, MAPE of 10%, and MAE of 0.5. The model was then deployed to screen a comprehensive set of 2563 natural compounds obtained from the Selleckchem database. Utilizing their predicted bioactivity (pIC50), the top 128 compounds were identified. Among them, 68 compounds were taken for further analysis based on drug-likeness analysis. Subsequently, selected compounds underwent additional evaluation using molecular docking (< - 8.7 kcal/mol) and ADMET resulting in four compounds posing nominal toxicity, which were finally subjected to MD simulation for 200 ns. Later on, the stability of complexes was assessed via analysis encompassing RMSD, RMSF, Rg, H-Bonds, SASA, and Essential Dynamics. Ultimately, based on the total binding free energy estimated using the MM/GBSA method, Imperialine, Veratramine, and Gelsemine are proven to be potential natural inhibitors of TNF-α.
Collapse
Affiliation(s)
- Tasnia Nabi
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| | - Tanver Hasan Riyed
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| | - Akid Ornob
- Department of Biomedical Engineering, Military Institute of Science and Technology (MIST), Dhaka, Bangladesh
| |
Collapse
|
3
|
Xuan Y, Zhou Y, Yue Y, Zhang N, Sun G, Fan T, Zhao L, Zhong R. Identification of potential natural product derivatives as CK2 inhibitors based on GA-MLR QSAR modeling, synthesis and biological evaluation. Med Chem Res 2024; 33:1611-1624. [DOI: 10.1007/s00044-024-03271-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Accepted: 06/26/2024] [Indexed: 01/12/2025]
|
4
|
Setiya A, Jani V, Sonavane U, Joshi R. MolToxPred: small molecule toxicity prediction using machine learning approach. RSC Adv 2024; 14:4201-4220. [PMID: 38292268 PMCID: PMC10826801 DOI: 10.1039/d3ra07322j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/23/2024] [Indexed: 02/01/2024] Open
Abstract
Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.
Collapse
Affiliation(s)
- Anjali Setiya
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Vinod Jani
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Uddhavesh Sonavane
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Rajendra Joshi
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| |
Collapse
|
5
|
Devillers J, Sartor V, Devillers H. Predicting mosquito repellents for clothing application from molecular fingerprint-based artificial neural network SAR models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:729-751. [PMID: 36106833 DOI: 10.1080/1062936x.2022.2124014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/06/2022] [Indexed: 06/15/2023]
Abstract
Spraying repellents on clothing limits toxicity and allergy problems that can occur when the repellents are directly applied to skin. This also allows the use of higher doses to ensure longer lasting effects. As the number of repellents available on the market is limited, it is necessary to propose new ones, especially by using in silico methods that reduce costs and time. In this context SAR models were built from a dataset of 2027 chemicals for which repellent activity on clothing was measured against Aedes aegypti. The interest of using either the ECFP or MACCS fingerprints as input neurons of a three-layer perceptron was evaluated. Transformation of MACCS bit strings into disjunctive tables led to interesting results. Models obtained with both types of fingerprints were compared to a model including physicochemical and topological descriptors.
Collapse
Affiliation(s)
| | - V Sartor
- Laboratoire des IMRCP, Université de Toulouse, CNRS UMR 5623, Université Toulouse III - Paul Sabatier, Toulouse, France
| | - H Devillers
- SPO, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
6
|
Kwapien K, Nittinger E, He J, Margreitter C, Voronov A, Tyrchan C. Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design. ACS OMEGA 2022; 7:26573-26581. [PMID: 35936431 PMCID: PMC9352238 DOI: 10.1021/acsomega.2c02738] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/08/2022] [Indexed: 05/20/2023]
Abstract
Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.
Collapse
Affiliation(s)
- Karolina Kwapien
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Eva Nittinger
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Jiazhen He
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | | | - Alexey Voronov
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg 431 83, Sweden
| | - Christian Tyrchan
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 83, Sweden
| |
Collapse
|
7
|
Bi M, Guan Z, Fan T, Zhang N, Wang J, Sun G, Zhao L, Zhong R. Identification of Pharmacophoric Fragments of DYRK1A Inhibitors Using Machine Learning Classification Models. Molecules 2022; 27:1753. [PMID: 35335117 PMCID: PMC8954712 DOI: 10.3390/molecules27061753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/04/2022] [Accepted: 03/05/2022] [Indexed: 11/17/2022] Open
Abstract
Dual-specific tyrosine phosphorylation regulated kinase 1 (DYRK1A) has been regarded as a potential therapeutic target of neurodegenerative diseases, and considerable progress has been made in the discovery of DYRK1A inhibitors. Identification of pharmacophoric fragments provides valuable information for structure- and fragment-based design of potent and selective DYRK1A inhibitors. In this study, seven machine learning methods along with five molecular fingerprints were employed to develop qualitative classification models of DYRK1A inhibitors, which were evaluated by cross-validation, test set, and external validation set with four performance indicators of predictive classification accuracy (CA), the area under receiver operating characteristic (AUC), Matthews correlation coefficient (MCC), and balanced accuracy (BA). The PubChem fingerprint-support vector machine model (CA = 0.909, AUC = 0.933, MCC = 0.717, BA = 0.855) and PubChem fingerprint along with the artificial neural model (CA = 0.862, AUC = 0.911, MCC = 0.705, BA = 0.870) were considered as the optimal modes for training set and test set, respectively. A hybrid data balancing method SMOTETL, a combination of synthetic minority over-sampling technique (SMOTE) and Tomek link (TL) algorithms, was applied to explore the impact of balanced learning on the performance of models. Based on the frequency analysis and information gain, pharmacophoric fragments related to DYRK1A inhibition were also identified. All the results will provide theoretical supports and clues for the screening and design of novel DYRK1A inhibitors.
Collapse
Affiliation(s)
- Mengzhou Bi
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
| | - Zhen Guan
- Beijing Municipal Key Laboratory of Child Development and Nutriomics, Translational Medicine Laboratory, Capital Institute of Pediatrics, Beijing 100020, China;
| | - Tengjiao Fan
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
- Department of Medical Technology, Beijing Pharmaceutical University of Staff and Workers, Beijing 100079, China
| | - Na Zhang
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
| | - Jianhua Wang
- Beijing Municipal Key Laboratory of Child Development and Nutriomics, Translational Medicine Laboratory, Capital Institute of Pediatrics, Beijing 100020, China;
| | - Guohui Sun
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
| | - Lijiao Zhao
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
| | - Rugang Zhong
- Key Laboratory of Environmental and Viral Oncology, College of Life Science and Chemistry, Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, China; (M.B.); (T.F.); (G.S.); (L.Z.); (R.Z.)
| |
Collapse
|