1
|
Banerjee A, Roy K. The multiclass ARKA framework for developing improved q-RASAR models for environmental toxicity endpoints. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2025; 27:1229-1243. [PMID: 40227888 DOI: 10.1039/d5em00068h] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
The continuous quest for the quick, accurate, and efficient methods for filling the gaps in the toxicity data of commercial chemicals is the need of the hour. Thus, it has become essential to develop simple and improved modeling strategies that aim to generate more accurate predictions. Recently, quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling has been reported to enhance the external predictivity of QSAR models. However, the cross-validation metrics of some q-RASAR models show compromised values compared to those of the corresponding QSAR models. We report here an improved q-RASAR workflow coupled with the Arithmetic Residuals in K-groups Analysis (ARKA) framework. This improved workflow (ARKA-RASAR) considers two important aspects: the contribution of different QSAR descriptors to different experimental response ranges, and the identification of similarity among close congeners based on both the selected QSAR descriptors and the contribution of different QSAR descriptors to different experimental response ranges. A simple, free, and user-friendly Java-based tool, Multiclass ARKA-v1.0, has been developed to compute the multiclass ARKA descriptors. In this study, five different toxicity datasets previously used for the development of QSAR and q-RASAR models were considered. We developed hybrid ARKA models that consist of a combination of QSAR descriptors and ARKA descriptors. These hybrid feature spaces were used to compute RASAR descriptors and develop ARKA-RASAR models. We used the same modeling strategies used to develop the previously reported QSAR and q-RASAR models for a fair comparison. Additionally, these modeling algorithms are straightforward, reproducible, and transferable. A multi-criteria decision-making statistical approach, the Sum of Ranking Differences (SRD), indicated that the ARKA-RASAR models are the best-performing models, considering training, test, and cross-validation statistics. The least significant difference procedure ensured that the SRD values were significantly different for most models, presenting an unbiased workflow. True external validation using a set of pesticide metabolites and predicting their early-stage acute fish toxicity using relevant ARKA-RASAR models was also carried out and yielded encouraging results. The promising results and the ease of computation of ARKA and RASAR descriptors using our tools suggest that the ARKA-RASAR modeling framework may be a potential choice for developing highly robust and predictive models for filling the gaps in environmental toxicity data.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| |
Collapse
|
2
|
Ghosh S, Pandey SK, Roy K. Predictive classification-based read-across for diverse functional vitiligo-linked chemical exposomes (ViCE): A new approach for the assessment of chemical safety for the vitiligo disease in humans. Toxicol In Vitro 2025; 104:106018. [PMID: 39922550 DOI: 10.1016/j.tiv.2025.106018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/27/2025] [Accepted: 02/04/2025] [Indexed: 02/10/2025]
Abstract
We have explored a new approach using a similarity measure-based read-across derived hypothesis to address the precise risk assessment of vitiligo active chemicals. In this analysis, we initially prepared a data set by combining vitiligo active compounds taken from the previous literature with non-vitiligo chemicals, which are non-skin sensitizers reported in another literature. Afterward, we performed the manual curation process to obtain a curated dataset. Furthermore, the optimum similarity measure was identified from a validation set using a pool of 47 descriptors from the analysis of the most discriminating features. The identified optimum similarity measure (i.e., Euclidean distance-based similarity along with seven close source compounds) has been utilized in the read-across derived similarity-based classification studies on close source congeners concerning target compounds. In this study, we identified the positive and negative contributing features toward the assessment of vitiligo potential as well, including the estimation of target chemicals with better accuracy. The applicability domain status of the reported compounds was also studied, and the outliers were identified. As there are no comparative studies in this regard to the best of our knowledge, we can further affirm that it is the first report on the in-silico identification of potential vitiligo-linked chemical exposomes (ViCE) based on the similarity measure of the read-across.
Collapse
Affiliation(s)
- Shilpayan Ghosh
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Sapna Kumari Pandey
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India..
| |
Collapse
|
3
|
Hossain MM, Roy K. The development of classification-based machine-learning models for the toxicity assessment of chemicals associated with plastic packaging. JOURNAL OF HAZARDOUS MATERIALS 2025; 484:136702. [PMID: 39637787 DOI: 10.1016/j.jhazmat.2024.136702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 11/24/2024] [Accepted: 11/26/2024] [Indexed: 12/07/2024]
Abstract
Assessing chemical toxicity in materials like plastic packaging is critical to safeguarding public health. This study presents the development of classification-based machine learning models to predict the toxicity of chemicals associated with plastic packaging. Using an extensive dataset of chemical structures, we trained multiple machine learning models-Random Forest, Support Vector Machine, Linear Discriminant Analysis, and Logistic Regression-targeting endpoints such as Neurotoxicity, Hepatotoxicity, Dermatotoxicity, Carcinogenicity, Reproductive Toxicity, Skin Sensitization, and Toxic Pneumonitis. The dataset was pre-processed by selecting 2D molecular descriptors as feature inputs, with resampling methods (ADASYN, Borderline SMOTE, Random Over-sampler, SVMSMOTE Cluster Centroid, Near Miss, Random Under Sampler) applied to balance classes for accurate classification. A five-fold cross-validation technique was used to optimize model performance, with model parameters fine-tuned using grid search. The model performance was evaluated using accuracy (Acc), sensitivity (Se), specificity (Sp), and area under the receiver operating characteristic curve (AUC-ROC) metrics. In most of the cases, the model accuracy was 0.8 or above for both training and test sets. Additionally, SHAP (SHapley Additive exPlanations) values were utilized for feature importance analysis, highlighting significant descriptors contributing to toxicity predictions. The models were ranked using the Sum of Ranking Differences (SRD) method to systematically select the most effective model. The optimal models demonstrated high predictive accuracy and interpretability, providing a scalable and efficient solution for toxicity assessment compared to traditional methods. This approach offers a valuable tool for rapidly screening potentially hazardous chemicals in plastic packaging.
Collapse
Affiliation(s)
- Md Mobarak Hossain
- Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
4
|
Dasgupta I, Barik H, Gayen S. Modelling of intrinsic membrane permeability of drug molecules by explainable ML-based q-RASPR approach towards better pharmacokinetics and toxicokinetics properties. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2025; 36:127-143. [PMID: 40190164 DOI: 10.1080/1062936x.2025.2478118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Accepted: 03/04/2025] [Indexed: 05/17/2025]
Abstract
Drug discovery's success lies in potent inhibition against a target and optimum pharmacokinetic and toxicokinetic properties of drug molecules. Membrane permeability is a crucial factor in determining the absorption, distribution, metabolism, and excretion of drug molecules, thereby determining the pharmacokinetic and toxicokinetic properties important for drug development. Intrinsic permeability (P0) is more crucial than apparent permeability (Papp) in assessing the transport of drug molecules across a membrane. It gives more consistent results due to its non-dependency on external/site-specific factors. In the present work, our focus is on the construction of a machine learning (ML)-based quantitative read-across structure-property relationship (q-RASPR) model of intrinsic permeability of drug molecules by utilizing both linear and non-linear algorithms. The Support Vector Regression (SVR) q-RASPR model was found to be the best model having superior predictive ability (Q2F1 = 0.788, Q2F2 = 0.785, MAEtest = 0.637). The contribution of important descriptors in the final model is explained to get a mechanistic interpretation of intrinsic permeability. Overall, the present study unveils the application of the q-RASPR framework for significant improvement of the external predictivity of the traditional QSPR model in the case of intrinsic permeability to get a better assessment of the total permeability of drug molecules.
Collapse
Affiliation(s)
- I Dasgupta
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - H Barik
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - S Gayen
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
5
|
Banerjee A, Roy K. Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs. Sci Rep 2025; 15:808. [PMID: 39755865 PMCID: PMC11700179 DOI: 10.1038/s41598-024-85063-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 12/30/2024] [Indexed: 01/06/2025] Open
Abstract
We have adopted the classification Read-Across Structure-Activity Relationship (c-RASAR) approach in the present study for machine-learning (ML)-based model development from a recently reported curated dataset of nephrotoxicity potential of orally active drugs. We initially developed ML models using nine different algorithms separately on topological descriptors (referred to as simply "descriptors" in the subsequent sections of the manuscript) and MACCS fingerprints (referred to as "fingerprints" in the subsequent sections of the manuscript), thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models. All 36 models were cross-validated 20 times with a fivefold cross-validation strategy, and their predictivity was checked on the test set data. A multi-criteria decision-making strategy - the Sum of Ranking Differences (SRD) approach-was adopted to identify the best-performing model based on robustness and external validation parameters. This statistical analysis suggested that the c-RASAR models had an overall good performance, while the best-performing model was also a c-RASAR model (LDA c-RASAR model derived from topological descriptors, with MCC values of 0.229 and 0.431 for the training and test sets, respectively). This model was used to screen a true external data set prepared from the known nephrotoxic compounds of DrugBankDB, demonstrating good predictivity.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India.
| |
Collapse
|
6
|
Das S, Bhattacharjee A, Ojha PK. First report on q-RASTR modelling of hazardous dose (HD 5) for acute toxicity of pesticides: an efficient and reliable approach towards safeguarding the sensitive avian species. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2025; 36:39-55. [PMID: 39931931 DOI: 10.1080/1062936x.2025.2462559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Accepted: 01/27/2025] [Indexed: 02/25/2025]
Abstract
Pesticides are crucial in modern agriculture, significantly enhancing crop productivity by managing pests. It is important to evaluate their toxicity to minimize health risks to bird species and preserve ecosystem balance. Traditional parameters including lethal concentration (LC50) or median lethal dose (LD50) often underestimate hazards due to limited data and uncertainty about the most sensitive species tested. This limitation can be addressed using extrapolation factors like HD5 accounting for 50% mortality of the most sensitive 5% of bird species. In this research, a QSTR model was developed utilizing a diverse set of 480 pesticides using partial least squares (PLS) regression with 2D descriptors. Additionally, a PLS-based quantitative read-across structure-toxicity relationship (q-RASTR) and classification based models were constructed. The q-RASTR model outperformed traditional QSTR approaches, achieving robust statistical performance with internal validation metrics r2 = 0.623, Q2 = 0.569 and external validation metrics Q2F1 = 0.541, Q2F2 = 0.540. Key factors influencing avian toxicity were identified. The q-RASTR model was used to screen the Pesticide Properties Database (PPDB) to recognize the most and least toxic pesticides for avian species, aligning well with real-world data. This work provides a more economical and ethical alternative to conventional in vivo testing methods, aiding regulatory bodies and industries in developing safer, environmentally friendly pesticides.
Collapse
Affiliation(s)
- S Das
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - A Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - P K Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
7
|
Khatun S, Dasgupta I, Sen S, Amin SA, Qureshi IA, Jha T, Gayen S. Histone deacetylase 8 in focus: Decoding structural prerequisites for innovative epigenetic intervention beyond hydroxamates. Int J Biol Macromol 2025; 284:138119. [PMID: 39608552 DOI: 10.1016/j.ijbiomac.2024.138119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 11/21/2024] [Accepted: 11/25/2024] [Indexed: 11/30/2024]
Abstract
Histone deacetylase 8 (HDAC8) inhibitors play a pivotal role in epigenetic regulation. Numerous HDAC8 inhibitors (HDAC8is), that are non-hydroxamates have been identified to date, and a few of them exhibit antiproliferative activity that is on par with hydroxamates. While many non-hydroxamate-based HDAC8is have demonstrated selectivity, hydroxamate-based HDAC8is, like Vorinostat and TSA, have a tendency of non-specificity among the different HDAC isoforms. Moreover, because of the unfavorable toxic side effects, there are significant concerns surrounding the use of hydroxamate derivatives as therapeutic agents in cancer as well as other chronic diseases. Consequently, the research on non-hydroxamate-based HDAC8is is of utmost priority. In the present study, a comprehensive study was presented to unravel the structural requirements of non-hydroxamate-based HDAC8is from a diverse set of 866 compounds. The study utilized Classification-based Quantitative Structure-Activity Relationship (QSAR) analysis, incorporating Bayesian classification, Recursive partitioning, and other machine learning methods to pinpoint the key structural features essential for HDAC8 inhibition. To underscore and gain deeper insights into the identified structural features, molecular docking, and molecular dynamic (MD) simulation studies were conducted. The integration of these computational approaches unveiled key structural motifs essential for potent HDAC8 inhibitory activity, shedding light on the molecular basis of HDAC8 inhibition using non-hydroxamates.
Collapse
Affiliation(s)
- Samima Khatun
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, West Bengal, India
| | - Indrasis Dasgupta
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, West Bengal, India
| | - Sourish Sen
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad 500 046, Telangana, India
| | - Sk Abdul Amin
- Department of Pharmaceutical Technology, JIS University, 81, Nilgunj Road, Agarpara, Kolkata 700109, West Bengal, India; Department of Pharmacy, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
| | - Insaf Ahmed Qureshi
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad 500 046, Telangana, India
| | - Tarun Jha
- Natural Science Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, West Bengal, India
| | - Shovanlal Gayen
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, West Bengal, India.
| |
Collapse
|
8
|
Pandey SK, Roy K. Development of hybrid models by the integration of the read-across hypothesis with the QSAR framework for the assessment of developmental and reproductive toxicity (DART) tested according to OECD TG 414. Toxicol Rep 2024; 13:101822. [PMID: 39649380 PMCID: PMC11621937 DOI: 10.1016/j.toxrep.2024.101822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 12/10/2024] Open
Abstract
The governing laws mandate animal testing guidelines (TG) to assess the developmental and reproductive toxicity (DART) potential of new and current chemical compounds for the categorization, hazard identification, and labeling. In silico modeling has evolved as a promising, economical, and animal-friendly technique for assessing a chemical's potential for DART testing. The complexity of the endpoint has presented a problem for Quantitative Structure-Activity Relationship (QSAR) model developers as various facets of the chemical have to be appropriately analyzed to predict the DART. For the next-generation risk assessment (NGRA) studies, researchers and governing bodies are exploring various new approach methodologies (NAMs) integrated to address complex endpoints like repeated dose toxicity and DART. We have developed four hybrid computational models for DART studies of rodents and rabbits for their adult and fetal life stages separately. The hybrid models were created by integrating QSAR features with similarities-derived features (obtained from read-across hypotheses). This analysis has identified that this integrated method gives a better statistical quality compared to the traditional QSAR models, and the predictivity and transferability of the model are also enhanced in this new approach.
Collapse
|
9
|
Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin MTD. Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-RA) to quantitative read-across structure-activity relationship (q-RASAR) with the application of machine learning. Crit Rev Toxicol 2024; 54:659-684. [PMID: 39225123 PMCID: PMC12010357 DOI: 10.1080/10408444.2024.2386260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 09/04/2024]
Abstract
This article aims to provide a comprehensive critical, yet readable, review of general interest to the chemistry community on molecular similarity as applied to chemical informatics and predictive modeling with a special focus on read-across (RA) and read-across structure-activity relationships (RASAR). Molecular similarity-based computational tools, such as quantitative structure-activity relationships (QSARs) and RA, are routinely used to fill the data gaps for a wide range of properties including toxicity endpoints for regulatory purposes. This review will explore the background of RA starting from how structural information has been used through to how other similarity contexts such as physicochemical, absorption, distribution, metabolism, and elimination (ADME) properties, and biological aspects are being characterized. More recent developments of RA's integration with QSAR have resulted in the emergence of novel models such as ToxRead, generalized read-across (GenRA), and quantitative RASAR (q-RASAR). Conventional QSAR techniques have been excluded from this review except where necessary for context.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Supratik Kar
- Department of Chemistry and Physics, Chemometrics & Molecular Modeling Laboratory, Kean University, Union, NJ, USA
| | - Kunal Roy
- Department of Pharmaceutical Technology, Drug Theoretics and Cheminformatics (DTC) Laboratory, Jadavpur University, Kolkata, India
| | - Grace Patlewicz
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Nathaniel Charest
- Center for Computational Toxicology and Exposure, US Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Emilio Benfenati
- Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Mark T. D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
10
|
Banerjee A, Roy K. The application of chemical similarity measures in an unconventional modeling framework c-RASAR along with dimensionality reduction techniques to a representative hepatotoxicity dataset. Sci Rep 2024; 14:20812. [PMID: 39242880 PMCID: PMC11379871 DOI: 10.1038/s41598-024-71892-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 09/02/2024] [Indexed: 09/09/2024] Open
Abstract
With the exponential progress in the field of cheminformatics, the conventional modeling approaches have so far been to employ supervised and unsupervised machine learning (ML) and deep learning models, utilizing the standard molecular descriptors, which represent the structural, physicochemical, and electronic properties of a particular compound. Deviating from the conventional approach, in this investigation, we have employed the classification Read-Across Structure-Activity Relationship (c-RASAR), which involves the amalgamation of the concepts of classification-based quantitative structure-activity relationship (QSAR) and Read-Across to incorporate Read-Across-derived similarity and error-based descriptors into a statistical and machine learning modeling framework. ML models developed from these RASAR descriptors use similarity-based information from the close source neighbors of a particular query compound. We have employed different classification modeling algorithms on the selected QSAR and RASAR descriptors to develop predictive models for efficient prediction of query compounds' hepatotoxicity. The predictivity of each of these models was evaluated on a large number of test set compounds. The best-performing model was also used to screen a true external data set. The concepts of explainable AI (XAI) coupled with Read-Across were used to interpret the contributions of the RASAR descriptors in the best c-RASAR model and to explain the chemical diversity in the dataset. The application of various unsupervised dimensionality reduction techniques like t-SNE and UMAP and the supervised ARKA framework showed the usefulness of the RASAR descriptors over the selected QSAR descriptors in their ability to group similar compounds, enhancing the modelability of the dataset and efficiently identifying activity cliffs. Furthermore, the activity cliffs were also identified from Read-Across by observing the nature of compounds constituting the nearest neighbors for a particular query compound. On comparing our simple linear c-RASAR model with the previously reported models developed using the same dataset derived from the US FDA Orange Book ( https://www.accessdata.fda.gov/scripts/cder/ob/index.cfm ), it was observed that our model is simple, reproducible, transferable, and highly predictive. The performance of the LDA c-RASAR model on the true external set supersedes that of the previously reported work. Therefore, the present simple LDA c-RASAR model can efficiently be used to predict the hepatotoxicity of query chemicals.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India.
| |
Collapse
|
11
|
Banerjee A, Roy K. How to correctly develop q-RASAR models for predictive cheminformatics. Expert Opin Drug Discov 2024; 19:1017-1022. [PMID: 38966910 DOI: 10.1080/17460441.2024.2376651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 07/02/2024] [Indexed: 07/06/2024]
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
12
|
An S, Park IG, Hwang SY, Gong J, Lee Y, Ahn S, Noh M. Cheminformatic Read-Across Approach Revealed Ultraviolet Filter Cinoxate as an Obesogenic Peroxisome Proliferator-Activated Receptor γ Agonist. Chem Res Toxicol 2024; 37:1344-1355. [PMID: 39095321 DOI: 10.1021/acs.chemrestox.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
This study introduces a novel cheminformatic read-across approach designed to identify potential environmental obesogens, substances capable of disrupting metabolism and inducing obesity by mainly influencing nuclear hormone receptors (NRs). Leveraging real-valued two-dimensional features derived from chemical fingerprints of 8435 Tox21 compounds, cluster analysis and subsequent statistical testing revealed 385 clusters enriched with compounds associated with specific NR targets. Notably, one cluster exhibited selective enrichment in peroxisome proliferator-activated receptor γ (PPARγ) agonist activity, prominently featuring methoxy cinnamate ultraviolet (UV) filters and obesogen-related compounds. Experimental validation confirmed that 2-ethoxyethyl 4-methoxycinnamate, an organic UV filter cinoxate, could selectively bind to PPARγ (Ki = 18.0 μM), eliciting an obesogenic phenotype in human bone marrow-derived mesenchymal stem cells during adipogenic differentiation. Molecular docking and further experiments identified cinoxate as a potent PPARγ full agonist, demonstrating a preference for coactivator SRC3 recruitment. Moreover, cinoxate upregulated transcription levels of genes encoding lipid metabolic enzymes in normal human epidermal keratinocytes as primary cells exposed during clinical usage. This study provides compelling evidence for the efficacy of cheminformatic read-across analysis in prioritizing potential obesogens, showcasing its utility in unveiling cinoxate as an obesogenic PPARγ agonist.
Collapse
Affiliation(s)
- Seungchan An
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - In Guk Park
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Seok Young Hwang
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Junpyo Gong
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Yeonjin Lee
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Sungjin Ahn
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| | - Minsoo Noh
- College of Pharmacy, Natural Products Research Institute, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
13
|
Gomatam A, Hirlekar BU, Singh KD, Murty US, Dixit VA. Improved QSAR models for PARP-1 inhibition using data balancing, interpretable machine learning, and matched molecular pair analysis. Mol Divers 2024; 28:2135-2152. [PMID: 38374474 DOI: 10.1007/s11030-024-10809-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/07/2024] [Indexed: 02/21/2024]
Abstract
The poly (ADP-ribose) polymerase-1 (PARP-1) enzyme is an important target in the treatment of breast cancer. Currently, treatment options include the drugs Olaparib, Niraparib, Rucaparib, and Talazoparib; however, these drugs can cause severe side effects including hematological toxicity and cardiotoxicity. Although in silico models for the prediction of PARP-1 activity have been developed, the drawbacks of these models include low specificity, a narrow applicability domain, and a lack of interpretability. To address these issues, a comprehensive machine learning (ML)-based quantitative structure-activity relationship (QSAR) approach for the informed prediction of PARP-1 activity is presented. Classification models built using the Synthetic Minority Oversampling Technique (SMOTE) for data balancing gave robust and predictive models based on the K-nearest neighbor algorithm (accuracy 0.86, sensitivity 0.88, specificity 0.80). Regression models were built on structurally congeneric datasets, with the models for the phthalazinone class and fused cyclic compounds giving the best performance. In accordance with the Organization for Economic Cooperation and Development (OECD) guidelines, a mechanistic interpretation is proposed using the Shapley Additive Explanations (SHAP) to identify the important topological features to differentiate between PARP-1 actives and inactives. Moreover, an analysis of the PARP-1 dataset revealed the prevalence of activity cliffs, which possibly negatively impacts the model's predictive performance. Finally, a set of chemical transformation rules were extracted using the matched molecular pair analysis (MMPA) which provided mechanistic insights and can guide medicinal chemists in the design of novel PARP-1 inhibitors.
Collapse
Affiliation(s)
- Anish Gomatam
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Bhakti Umesh Hirlekar
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Krishan Dev Singh
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Upadhyayula Suryanarayana Murty
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India
| | - Vaibhav A Dixit
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, (NIPER Guwahati), Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Govt. of India, Sila Katamur (Halugurisuk), Dist: Kamrup, P.O.: Changsari, Guwahati, Assam, 781101, India.
| |
Collapse
|
14
|
Srisongkram T. DeepRA: A novel deep learning-read-across framework and its application in non-sugar sweeteners mutagenicity prediction. Comput Biol Med 2024; 178:108731. [PMID: 38870727 DOI: 10.1016/j.compbiomed.2024.108731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 06/15/2024]
Abstract
Non-sugar sweeteners (NSSs) or artificial sweeteners have long been used as food chemicals since World War II. NSSs, however, also raise a concern about their mutagenicity. Evaluating the mutagenic ability of NSSs is crucial for food safety; this step is needed for every new chemical registration in the food and pharmaceutical industries. A computational assessment provides less time, money, and involved animals than the in vivo experiments; thus, this study developed a novel computational method from an ensemble convolutional deep neural network and read-across algorithms, called DeepRA, to classify the mutagenicity of chemicals. The mutagenicity data were obtained from the curated Ames test data set. The DeepRA model was developed using both molecular descriptors and molecular fingerprints. The obtained DeepRA model provides accurate and reliable mutagenicity classification through an independent test set. This model was then used to examine the NSSs-related chemicals, enabling the evaluation of mutagenicity from the NSSs-like substances. Finally, this model was publicly available at https://github.com/taraponglab/deepra for further use in chemical regulation and risk assessment.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, 40002, Thailand.
| |
Collapse
|
15
|
Khatun S, Dasgupta I, Islam R, Amin SA, Jha T, Dhaked DK, Gayen S. Unveiling critical structural features for effective HDAC8 inhibition: a comprehensive study using quantitative read-across structure-activity relationship (q-RASAR) and pharmacophore modeling. Mol Divers 2024; 28:2197-2215. [PMID: 38871969 DOI: 10.1007/s11030-024-10903-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Accepted: 05/20/2024] [Indexed: 06/15/2024]
Abstract
Histone deacetylases constitute a group of enzymes that participate in several biological processes. Notably, inhibiting HDAC8 has become a therapeutic strategy for various diseases. The current inhibitors for HDAC8 lack selectivity and target multiple HDACs. Consequently, there is a growing recognition of the need for selective HDAC8 inhibitors to enhance the effectiveness of therapeutic interventions. In our current study, we have utilized a multi-faceted approach, including Quantitative Structure-Activity Relationship (QSAR) combined with Quantitative Read-Across Structure-Activity Relationship (q-RASAR) modeling, pharmacophore mapping, molecular docking, and molecular dynamics (MD) simulations. The developed q-RASAR model has a high statistical significance and predictive ability (Q2F1:0.778, Q2F2:0.775). The contributions of important descriptors are discussed in detail to gain insight into the crucial structural features in HDAC8 inhibition. The best pharmacophore hypothesis exhibits a high regression coefficient (0.969) and a low root mean square deviation (0.944), highlighting the importance of correctly orienting hydrogen bond acceptor (HBA), ring aromatic (RA), and zinc-binding group (ZBG) features in designing potent HDAC8 inhibitors. To confirm the results of q-RASAR and pharmacophore mapping, molecular docking analysis of the five potent compounds (44, 54, 82, 102, and 118) was performed to gain further insights into these structural features crucial for interaction with the HDAC8 enzyme. Lastly, MD simulation studies of the most active compound (54, mapped correctly with the pharmacophore hypothesis) and the least active compound (34, mapped poorly with the pharmacophore hypothesis) were carried out to validate the observations of the studies above. This study not only refines our understanding of essential structural features for HDAC8 inhibition but also provides a robust framework for the rational design of novel selective HDAC8 inhibitors which may offer insights to medicinal chemists and researchers engaged in the development of HDAC8-targeted therapeutics.
Collapse
Affiliation(s)
- Samima Khatun
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Indrasis Dasgupta
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Rakibul Islam
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Kolkata, West Bengal, 700054, India
| | - Sk Abdul Amin
- Department of Pharmaceutical Technology, JIS University, 81, Nilgunj Road, Agarpara, Kolkata, West Bengal, India
| | - Tarun Jha
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India
| | - Devendra Kumar Dhaked
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), Kolkata, West Bengal, 700054, India
| | - Shovanlal Gayen
- Laboratory of Drug Design and Discovery, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700032, India.
| |
Collapse
|
16
|
Banerjee A, Roy K. ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:991-1007. [PMID: 38743054 DOI: 10.1039/d4em00173g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Due to the lack of experimental toxicity data for environmental chemicals, there arises a need to fill data gaps by in silico approaches. One of the most commonly used in silico approaches for toxicity assessment of small datasets is the Quantitative Structure-Activity Relationship (QSAR), which generates predictive models for the efficient prediction of query compounds. However, the reliability of the predictions from QSARs derived from small datasets is often questionable from a statistical point of view. This is due to the presence of a larger number of descriptors as compared to the number of training compounds, which reduces the degree of freedom of the developed model. To reduce the overall prediction error for a particular QSAR model, we have proposed here the computation of the novel Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. We have reduced the number of modeling descriptors in a supervised manner by partitioning them into K classes (K = 2 here) depending on the higher mean normalized values of the descriptors to a particular response class, thus preventing the loss of chemical information. A scatter plot of the data points using the values of two ARKA descriptors (ARKA_2 vs. ARKA_1) can potentially identify activity cliffs, less confident data points, and less modelable data points. We have used here five representative environmentally relevant endpoints (skin sensitization, earthworm toxicity, milk/plasma partitioning, algal toxicity, and rodent carcinogenicity of hazardous chemicals) with graded responses to which the ARKA framework was applied for classification modeling. On comparing the performance of the models generated using conventional QSAR descriptors and the ARKA descriptors, the prediction quality of the models derived from ARKA descriptors was found, based on multiple graded-data validation metrics-derived decision criteria, much better than the models derived from QSAR descriptors signifying the potential of ARKA descriptors in ecotoxicological classification modeling of small data sets. Additionally, this holds true for the Read-Across approach as well, since the Read-Across predictions using ARKA descriptors supersede the predictions generated from QSAR descriptors. For the ease of users, a Java-based expert system has been developed that computes the ARKA descriptors from the input of QSAR descriptors.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| |
Collapse
|
17
|
Ghosh S, Roy K. Quantitative read-across structure-activity relationship (q-RASAR): A novel approach to estimate the subchronic oral safety (NOAEL) of diverse organic chemicals in rats. Toxicology 2024; 505:153824. [PMID: 38705560 DOI: 10.1016/j.tox.2024.153824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 04/28/2024] [Accepted: 04/29/2024] [Indexed: 05/07/2024]
Abstract
We have developed a quantitative safety prediction model for subchronic repeated doses of diverse organic chemicals on rats using the novel quantitative read-across structure-activity relationship (q-RASAR) approach, which uses similarity-based descriptors for predictive model generation. The experimental -Log (NOAEL) values have been used here as a potential indicator of oral subchronic safety on rats as it determines the maximum dose level for which no observed adverse effects of chemicals are found. A total of 186 data points of diverse organic chemicals have been used for the model generation using structural and physicochemical (0D-2D) descriptors. The read-across-derived similarity, error, and concordance measures (RASAR descriptors) have been extracted from the preliminary 0D-2D descriptors. Then, the combined pool of RASAR and the identified 0D-2D descriptors of the training set were employed to develop the final models by using the partial least squares (PLS) algorithm. The developed PLS model was rigorously validated by various internal and external validation metrics as suggested by the Organization for Economic Co-operation and Development (OECD). The final q-RASAR model is proven to be statistically sound, robust and externally predictive (R2 = 0.85, Q2LOO = 0.82 and Q2F1 = 0.94), superseding the internal as well as external predictivity of the corresponding quantitative structure-activity relationship (QSAR) model as well as previously reported subchronic repeated dose toxicity model found in the literature. In a nutshell, the q-RASAR is an effective approach that has the potential to be used as a good alternative way to improve external predictivity, interpretability, and transferability for subchronic oral safety prediction as well as ecotoxicity risk identification.
Collapse
Affiliation(s)
- Shilpayan Ghosh
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
18
|
Zhou Y, Wang Z, Huang Z, Li W, Chen Y, Yu X, Tang Y, Liu G. In silico prediction of ocular toxicity of compounds using explainable machine learning and deep learning approaches. J Appl Toxicol 2024; 44:892-907. [PMID: 38329145 DOI: 10.1002/jat.4586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024]
Abstract
The accurate identification of chemicals with ocular toxicity is of paramount importance in health hazard assessment. In contemporary chemical toxicology, there is a growing emphasis on refining, reducing, and replacing animal testing in safety evaluations. Therefore, the development of robust computational tools is crucial for regulatory applications. The performance of predictive models is heavily reliant on the quality and quantity of data. In this investigation, we amalgamated the most extensive dataset (4901 compounds) sourced from governmental GHS-compliant databases and literature to develop binary classification models of chemical ocular toxicity. We employed 12 molecular representations in conjunction with six machine learning algorithms and two deep learning algorithms to create a series of binary classification models. The findings indicated that the deep learning method GCN outperformed the machine learning models in cross-validation, achieving an impressive AUC of 0.915. However, the top-performing machine learning model (RF-Descriptor) demonstrated excellent performance with an AUC of 0.869 on the test set and was therefore selected as the best model. To enhance model interpretability, we conducted the SHAP method and attention weights analysis. The two approaches offered visual depictions of the relevance of key descriptors and substructures in predicting ocular toxicity of chemicals. Thus, we successfully struck a delicate balance between data quality and model interpretability, rendering our model valuable for predicting and comprehending potential ocular-toxic compounds in the early stages of drug discovery.
Collapse
Affiliation(s)
- Yiqing Zhou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Ze Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Zejun Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yuanting Chen
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Xinxin Yu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| |
Collapse
|
19
|
Kumar V, Banerjee A, Roy K. Breaking the Barriers: Machine-Learning-Based c-RASAR Approach for Accurate Blood-Brain Barrier Permeability Prediction. J Chem Inf Model 2024; 64:4298-4309. [PMID: 38700741 DOI: 10.1021/acs.jcim.4c00433] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
The intricate nature of the blood-brain barrier (BBB) poses a significant challenge in predicting drug permeability, which is crucial for assessing central nervous system (CNS) drug efficacy and safety. This research utilizes an innovative approach, the classification read-across structure-activity relationship (c-RASAR) framework, that leverages machine learning (ML) to enhance the accuracy of BBB permeability predictions. The c-RASAR framework seamlessly integrates principles from both read-across and QSAR methodologies, underscoring the need to consider similarity-related aspects during the development of the c-RASAR model. It is crucial to note that the primary goal of this research is not to introduce yet another model for predicting BBB permeability but rather to showcase the refinement in predicting the BBB permeability of organic compounds through the introduction of a c-RASAR approach. This groundbreaking methodology aims to elevate the accuracy of assessing neuropharmacological implications and streamline the process of drug development. In this study, an ML-based c-RASAR linear discriminant analysis (LDA) model was developed using a dataset of 7807 compounds, encompassing both BBB-permeable and -nonpermeable substances sourced from the B3DB database (freely accessible from https://github.com/theochem/B3DB), for predicting BBB permeability in lead discovery for CNS drugs. The model's predictive capability was then validated using three external sets: one containing 276,518 natural products (NPs) from the LOTUS database (accessible from https://lotus.naturalproducts.net/download) for data gap filling, another comprising 13,002 drug-like/drug compounds from the DrugBank database (available from https://go.drugbank.com/), and a third set of 56 FDA-approved drugs to assess the model's reliability. Further diversifying the predictive arsenal, various other ML-based c-RASAR models were also developed for comparison purposes. The proposed c-RASAR framework emerged as a powerful tool for predicting BBB permeability. This research not only advances the understanding of molecular determinants influencing CNS drug permeability but also provides a versatile computational platform for the rapid assessment of diverse compounds, facilitating informed decision-making in drug development and design.
Collapse
Affiliation(s)
- Vinay Kumar
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
20
|
Pore S, Banerjee A, Roy K. Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs). Mol Inform 2024; 43:e202300210. [PMID: 38374528 DOI: 10.1002/minf.202300210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/31/2023] [Accepted: 02/04/2024] [Indexed: 02/21/2024]
Abstract
The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.
Collapse
Affiliation(s)
- Souvik Pore
- Drug Theoretics and Chemoinformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India
| | - Arkaprava Banerjee
- Drug Theoretics and Chemoinformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India
| | - Kunal Roy
- Drug Theoretics and Chemoinformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, 188 Raja S C Mullick Road, 700032, Kolkata, India
| |
Collapse
|
21
|
Wu X, Gong J, Ren S, Tan F, Wang Y, Zhao H. A machine learning-based QSAR model reveals important molecular features for understanding the potential inhibition mechanism of ionic liquids to acetylcholinesterase. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 915:169974. [PMID: 38199350 DOI: 10.1016/j.scitotenv.2024.169974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/02/2024] [Accepted: 01/04/2024] [Indexed: 01/12/2024]
Abstract
The broad application of ionic liquids (ILs) has been hindered by uncertainties surrounding their ecotoxicity. In this work, a Quantitative Structure-Activity Relationship (QSAR) model was devised to predict the inhibition of ILs towards the activity of AChE, employing both Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) machine learning approaches. Fourteen kings of essential molecular feature descriptors were screened from an initial roster of 244 descriptors through the application of a feature importance index and they showed a significant impact on the activity of AChE activity. The two models based solely on the 14 most critical molecular descriptors could maintain model's robustness and reliability. The correlation analysis between these 14 descriptors and the inhibition of AChE activity revealed the potential impact of the molecular characteristics on ILs toxicity. The results underscored the main influence of cations in ILs on the inhibitory activity towards the AChE enzyme. Specifically, cations exhibiting hydrophobicity properties were found to exert more potent inhibitory effects on the AChE enzyme. In addition, some other properties of the cations, such as the degree of branching, atomic weight and partial charge also modulated their inhibition potential. This study enhances the comprehension of the structure-activity relationship between ILs and AChE inhibition, providing a reference for designing safer and greener ILs.
Collapse
Affiliation(s)
- Xuri Wu
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Jixiang Gong
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Suyu Ren
- School of Environmental and Material Engineering, Yantai University, Yantai 264005, China
| | - Feng Tan
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China.
| | - Yan Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Hongxia Zhao
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| |
Collapse
|
22
|
Huang Z, Yu J, He W, Yu J, Deng S, Yang C, Zhu W, Shao X. AI-enhanced chemical paradigm: From molecular graphs to accurate prediction and mechanism. JOURNAL OF HAZARDOUS MATERIALS 2024; 465:133355. [PMID: 38198864 DOI: 10.1016/j.jhazmat.2023.133355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/19/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024]
Abstract
The development of accurate and interpretable models for predicting reaction constants of organic compounds with hydroxyl radicals is vital for advancing quantitative structure-activity relationships (QSAR) in pollutant degradation. Methods like molecular descriptors, molecular fingerprinting, and group contribution methods have limitations, as traditional machine learning struggles to capture all intramolecular information simultaneously. To address this, we established an integrated graph neural network (GNN) with approximately 12 million learnable parameters. GNN represents atoms as nodes and chemical bonds as edges, thus transforming molecules into a graph structures, effectively capturing microscopic properties while depicting atom connectivity in non-Euclidean space. Our datasets comprise 1401 pollutants to develop an integrated GNN model with Bayesian optimization, the model achieves root mean square errors of 0.165, 0.172, and 0.189 on the training, validation, and test datasets, respectively. Furthermore, we assess molecular structure similarity using molecular fingerprint to enhance the model's applicability. Afterwards, we propose a gradient weight mapping method for model explainability, uncovering the key functional groups in chemical reactions in artificial intelligence perspective, which would boost chemistry through artificial intelligence extreme arithmetic power.
Collapse
Affiliation(s)
- Zhi Huang
- Department of Environmental Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China
| | - Jiang Yu
- Department of Environmental Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China; Institute of New Energy and Low Carbon Technology, Sichuan University, Chengdu 610065, PR China; Yibin Institute of Industrial Technology, Sichuan University, Yibin 644000, PR China.
| | - Wei He
- Chengdu Jin Sheng Water Engineering Co, PR China
| | - Jie Yu
- Department of Environmental Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China; Institute of New Energy and Low Carbon Technology, Sichuan University, Chengdu 610065, PR China
| | - Siwei Deng
- Department of Environmental Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China
| | - Chun Yang
- Ministry of Education and School of Mathematics Sciences, Sichuan Normal University, PR China
| | - Weiwei Zhu
- Department of Environmental Science and Engineering, College of Architecture and Environment, Sichuan University, Chengdu 610065, PR China
| | - Xiao Shao
- School of Agriculture and Environment, University of Western Australia, Perth 6907, Western Australia, Australia
| |
Collapse
|
23
|
Banjare P, Singh R, Pandey NK, Matore BW, Murmu A, Singh J, Roy PP. In silico soil degradation and ecotoxicity analysis of veterinary pharmaceuticals on terrestrial species: first report. Toxicol Res (Camb) 2024; 13:tfae020. [PMID: 38496320 PMCID: PMC10939401 DOI: 10.1093/toxres/tfae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/19/2024] Open
Abstract
With the aim of persistence property analysis and ecotoxicological impact of veterinary pharmaceuticals on different terrestrial species, different classes of veterinary pharmaceuticals (n = 37) with soil degradation property (DT50) were gathered and subjected to QSAR and q-RASAR model development. The models were developed from 2D descriptors under organization for economic cooperation and development guidelines with the application of multiple linear regressions along with genetic algorithm. All developed QSAR and q-RASAR were statistically significant (Internal = R2adj: 0.721-0.861, Q2LOO: 0.609-0.757, and external = Q2Fn = 0.597-0.933, MAEext = 0.174-0.260). Further, the leverage approach of applicability domain assured the model's reliability. The veterinary pharmaceuticals with no experimental values were classified based on their persistence level. Further, the terrestrial toxicity analysis of persistent veterinary pharmaceuticals was done using toxicity prediction by computer assisted technology and in-house built quantitative structure toxicity relationship models to prioritize the toxic and persistent veterinary pharmaceuticals. This study will be helpful in estimation of persistence and toxicity of existing and upcoming veterinary pharmaceuticals.
Collapse
Affiliation(s)
- Purusottam Banjare
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
- Department of Pharmaceutical Chemistry, Apollo College of Pharmacy, Anjora, Durg 491001, Chhattisgarh, India
| | - Rekha Singh
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| | - Nilesh Kumar Pandey
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| | - Balaji Wamanrao Matore
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| | - Anjali Murmu
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| | - Jagadish Singh
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| | - Partha Pratim Roy
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur 495009, Chhattisgarh, India
| |
Collapse
|
24
|
Pandey NK, Murmu A, Banjare P, Matore BW, Singh J, Roy PP. Integrated predictive QSAR, Read Across, and q-RASAR analysis for diverse agrochemical phytotoxicity in oat and corn: A consensus-based approach for risk assessment and prioritization. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:12371-12386. [PMID: 38228952 DOI: 10.1007/s11356-024-31872-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/02/2024] [Indexed: 01/18/2024]
Abstract
In the modern fast-paced lifestyle, time-efficient and nutritionally rich foods like corn and oat have gained popularity for their amino acids and antioxidant contents. The increasing demand for these cereals necessitates higher production which leads to dependency on agrochemicals, which can pose health risks through residual present in the plant products. To first report the phytotoxicity for corn and oat, our study employs QSAR, quantitative Read-Across and quantitative RASAR (q-RASAR). All developed QSAR and q-RASAR models were equally robust (R2 = 0.680-0.762, Q2Loo = 0.593-0.693, Q2F1 = 0.680-0.860) and find their superiority in either oat or corn model, respectively, based on MAE criteria. AD and PRI had been performed which confirm the reliability and predictability of the models. The mechanistic interpretation reveals that the symmetrical arrangement of electronegative atoms and polar groups directly influences the toxicity of compounds. The final phytotoxicity and prioritization are performed by the consensus approach which results into selection of 15 most toxic compounds for both species.
Collapse
Affiliation(s)
- Nilesh Kumar Pandey
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, 495009, India
| | - Anjali Murmu
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, 495009, India
| | | | - Balaji Wamanrao Matore
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, 495009, India
| | - Jagadish Singh
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, 495009, India
| | - Partha Pratim Roy
- Department of Pharmacy, Guru Ghasidas Vishwavidyalaya (A Central University), Bilaspur, 495009, India.
| |
Collapse
|
25
|
Ahmadi M, Ayyoubzadeh SM, Ghorbani-Bidkorpeh F. Toxicity prediction of nanoparticles using machine learning approaches. Toxicology 2024; 501:153697. [PMID: 38056590 DOI: 10.1016/j.tox.2023.153697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/21/2023] [Accepted: 12/01/2023] [Indexed: 12/08/2023]
Abstract
Nanoparticle toxicity analysis is critical for evaluating the safety of nanomaterials due to their potential harm to the biological system. However, traditional experimental methods for evaluating nanoparticle toxicity are expensive and time-consuming. As an alternative approach, machine learning offers a solution for predicting cellular responses to nanoparticles. This study focuses on developing ML models for nanoparticle toxicity prediction. The training dataset used for building these models includes the physicochemical properties of nanoparticles, exposure conditions, and cellular responses of different cell lines. The impact of each parameter on cell death was assessed using the Gini index. Five classifiers, namely Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes, and Artificial Neural Network, were employed to predict toxicity. The models' performance was compared based on accuracy, sensitivity, specificity, area under the curve, F measure, K-fold validation, and classification error. The Gini index indicated that cell line, exposure dose, and tissue are the most influential factors in cell death. Among the models tested, Random Forest exhibited the highest performance in the given dataset. Other models demonstrated lower performance compared to Random Forest. Researchers can utilize the Random Forest model to predict nanoparticle toxicity, resulting in cost and time savings for toxicity analysis.
Collapse
Affiliation(s)
- Mahnaz Ahmadi
- Medical Nanotechnology and Tissue Engineering Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran; Health Information Management Research Center, Tehran University of Medical Sciences, Tehran, Iran.
| | - Fatemeh Ghorbani-Bidkorpeh
- Department of Pharmaceutics and Pharmaceutical Nanotechnology, School of Pharmacy, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
26
|
Duchowicz PR, Fioressi SE, Bacelo DE, Quispe AQ, Yapu EL, Castañeta H. QSPR predicting the vapor pressure of pesticides into high/low volatility classes. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024; 31:1395-1402. [PMID: 38038924 DOI: 10.1007/s11356-023-31235-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/21/2023] [Indexed: 12/02/2023]
Abstract
In this work, the vapor pressure of pesticides is employed as an indicator of their volatility potential. Quantitative Structure-Property Relationship models are established to predict the classification of compounds according to their volatility, into the high and low binary classes separated by the 1-mPa limit. A large dataset of 1005 structurally diverse pesticides with known experimental vapor pressure data at 20 °C is compiled from the publicly available Pesticide Properties DataBase (PPDB) and used for model development. The freely available PaDEL-Descriptor and ISIDA/Fragmentor molecular descriptor programs provide a large number of 19,947 non-conformational molecular descriptors that are analyzed through multivariable linear regressions and the Replacement Method technique. Through the selection of appropriate molecular descriptors of the substructure fragment type and the use of different standard classification metrics of model's quality, the classification of the structure-property relationship achieves acceptable results for discerning between the high and low volatility classes. Finally, an application of the obtained QSPR model is performed to predict the classes for 504 pesticides not having experimentally measured vapor pressures.
Collapse
Affiliation(s)
- Pablo R Duchowicz
- Instituto de Investigaciones Fisicoquímicas Teóricas y Aplicadas (INIFTA), CONICET, UNLP, Diag. 113 y 64, C.C. 16, Sucursal 4, 1900, La Plata, Argentina.
| | - Silvina E Fioressi
- Facultad de Ciencias Exactas y Naturales, Universidad de Belgrano, CONICET, Villanueva 1324, 1426, Buenos Aires, Argentina
| | - Daniel E Bacelo
- Facultad de Ciencias Exactas y Naturales, Universidad de Belgrano, CONICET, Villanueva 1324, 1426, Buenos Aires, Argentina
| | - Alexander Q Quispe
- Carrera de Ciencias Químicas, Universidad Mayor de San Andrés, 303, La Paz, Bolivia
| | - Ebbe L Yapu
- Carrera de Ciencias Químicas, Universidad Mayor de San Andrés, 303, La Paz, Bolivia
| | - Heriberto Castañeta
- Instituto de Investigaciones Químicas, Universidad Mayor de San Andrés, 303, La Paz, Bolivia
| |
Collapse
|
27
|
Ghosh V, Bhattacharjee A, Kumar A, Ojha PK. q-RASTR modelling for prediction of diverse toxic chemicals towards T. pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2024; 35:11-30. [PMID: 38193248 DOI: 10.1080/1062936x.2023.2298452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 12/16/2023] [Indexed: 01/10/2024]
Abstract
A series of diverse organic compounds impose serious detrimental effects on the health of living organisms and the environment. Determination of the structural aspects of compounds that impart toxicity and evaluation of the same is crucial before public usage. The present study aims to determine the structural characteristics of compounds for Tetrahymena pyriformis toxicity using the q-RASTR (Quantitative Read Across Structure-Toxicity Relationship) model. It was developed using RASTR and 2-D descriptors for a dataset of 1792 compounds with defined endpoint (pIGC50) against a model organism, T. pyriformis. For the current study, the whole dataset was divided based on activity/property into the training and test sets, and the q-RASTR model was developed employing six descriptors (three latent variables) having r2, Q2F1 and Q2 values of 0.739, 0.767, and 0.735, respectively. The generated model was thoroughly validated using internationally recognized internal and external validation criteria to assess the model's dependability and predictability. It was highlighted that high molecular weight, aromatic hydroxyls, nitrogen, double bonds, and hydrophobicity increase the toxicity of organic compounds. The current study demonstrates the applicability of the RASTR algorithm in QSTR model development for the prediction of toxic chemicals (pIGC50) towards T. pyriformis.
Collapse
Affiliation(s)
- V Ghosh
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - A Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - A Kumar
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - P K Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
28
|
Srisongkram T. Ensemble Quantitative Read-Across Structure-Activity Relationship Algorithm for Predicting Skin Cytotoxicity. Chem Res Toxicol 2023; 36:1961-1972. [PMID: 38047785 DOI: 10.1021/acs.chemrestox.3c00238] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Read-across (RA) and quantitative structure-activity relationship (QSAR) are two alternative methods commonly used to fill data gaps in chemical registrations. These approaches use physicochemical properties or molecular fingerprints of source substances to predict the properties of unknown substances that have similar chemical structures or physicochemical properties. Research on RA and QSAR is essential to minimize the time, money, and animal testing needed to determine biological properties that are not currently known. This study developed a stacked ensemble quantitative read-across structure-activity relationship algorithm (enQRASAR) for predicting skin irritation toxicity based on negative log cell viability inhibition concentration at 50% (pIC50) against skin keratinocytes as the end point. The goodness-of-fit and predictability of this algorithm were validated using leave-one-out cross-validation and external test data sets. The results obtained were statistically reliable in terms of goodness-of-fit, robustness, and predictability metrics. Additionally, the developed model demonstrated a low prediction error when predicting FDA-approved drugs. These results confirm that the enQRASAR algorithm can be used to predict skin cytotoxicity of chemicals. Therefore, this model was publicly available to further facilitate toxicity predictions of unknown compounds in chemical registrations.
Collapse
Affiliation(s)
- Tarapong Srisongkram
- Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40000, Thailand
| |
Collapse
|
29
|
Baran K, Kloskowski A. Graph Neural Networks and Structural Information on Ionic Liquids: A Cheminformatics Study on Molecular Physicochemical Property Prediction. J Phys Chem B 2023; 127:10542-10555. [PMID: 38015981 PMCID: PMC10726349 DOI: 10.1021/acs.jpcb.3c05521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/01/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023]
Abstract
Ionic liquids (ILs) provide a promising solution in many industrial applications, such as solvents, absorbents, electrolytes, catalysts, lubricants, and many others. However, due to the enormous variety of their structures, uncovering or designing those with optimal attributes requires expensive and exhaustive simulations and experiments. For these reasons, searching for an efficient theoretical tool for finding the relationship between the IL structure and properties has been the subject of many research studies. Recently, special attention has been paid to machine learning tools, especially multilayer perceptron and convolutional neural networks, among many other algorithms in the field of artificial neural networks. For the latter, graph neural networks (GNNs) seem to be a powerful cheminformatic tool yet not well enough studied for dual molecular systems such as ILs. In this work, the usage of GNNs in structure-property studies is critically evaluated for predicting the density, viscosity, and surface tension of ILs. The problem of data availability and integrity is discussed to show how well GNNs deal with mislabeled chemical data. Providing more training data is proven to be more important than ensuring that they are immaculate. Great attention is paid to how GNNs process different ions to give graph transformations and electrostatic information. Clues on how GNNs should be applied to predict the properties of ILs are provided. Differences, especially regarding handling mislabeled data, favoring the use of GNNs over classical quantitative structure-property models are discussed.
Collapse
Affiliation(s)
- Karol Baran
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| | - Adam Kloskowski
- Department of Physical Chemistry,
Faculty of Chemistry, Gdansk University
of Technology, Narutowicza Street 11/12, 80-233 Gdansk, Poland
| |
Collapse
|
30
|
Pandey SK, Roy K. Development of a read-across-derived classification model for the predictions of mutagenicity data and its comparison with traditional QSAR models and expert systems. Toxicology 2023; 500:153676. [PMID: 37993082 DOI: 10.1016/j.tox.2023.153676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/06/2023] [Accepted: 11/17/2023] [Indexed: 11/24/2023]
Abstract
Mutagenicity is considered an important endpoint from the regulatory, environmental and medical points of view. Due to the wide number of compounds that may be of concern and the enormous expenses (in terms of time, money, and animals) associated with rodent mutagenicity bioassays, this endpoint is a major target for the development of alternative approaches for screening and prediction. The majority of old-aged expert systems and quantitative structure-activity relationship (QSAR) models may show reduced performance over time for their application on newer chemical candidates; thus, researchers constantly try to improve the modeling strategies. In our report, we initially performed traditional classification-based linear discriminant analysis (LDA) QSAR modeling using the benchmark Ames dataset of diverse chemicals (6512 compounds) to recognize the relationship between the molecules and their potential mutagenic behavior. The classical LDA QSAR model is developed from a selected set of 2D descriptors. The LDA QSAR model was developed by using a total of 31 descriptors identified from the analysis of the most discriminating features. Additionally, we have used similarity-derived features obtained from the read-across (RA) to develop an RA-based QSAR model. The developed RA-based LDA QSAR model has better predictivity, transferability, and interpretability compared to the LDA QSAR model, and it uses a very small number of descriptors compared to the classical QSAR model. Different machine learning (ML) models were also developed using the descriptors appearing in the read-across-based LDA QSAR model for comparative studies. We have checked the prediction quality of 216 true external set compounds using the novel similarity-derived RA model. The performance of the OECD toolbox is also compared with the RA-derived LDA QSAR model for a true external set. The current study aimed to explore the significance of the read-across-based algorithm and its application to the most current experimental mutagenicity data to complement already available expert systems.
Collapse
Affiliation(s)
- Sapna Kumari Pandey
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
31
|
Banerjee A, Roy K. Read-across-based intelligent learning: development of a global q-RASAR model for the efficient quantitative predictions of skin sensitization potential of diverse organic chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023; 25:1626-1644. [PMID: 37682520 DOI: 10.1039/d3em00322a] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Environmental chemicals and contaminants cause a wide array of harmful implications to terrestrial and aquatic life which ranges from skin sensitization to acute oral toxicity. The current study aims to assess the quantitative skin sensitization potential of a large set of industrial and environmental chemicals acting through different mechanisms using the novel quantitative Read-Across Structure-Activity Relationship (q-RASAR) approach. Based on the identified important set of structural and physicochemical features, Read-Across-based hyperparameters were optimized using the training set compounds followed by the calculation of similarity and error-based RASAR descriptors. Data fusion, further feature selection, and removal of prediction confidence outliers were performed to generate a partial least squares (PLS) q-RASAR model, followed by the application of various Machine Learning (ML) tools to check the quality of predictions. The PLS model was found to be the best among different models. A simple user-friendly Java-based software tool was developed based on the PLS model, which efficiently predicts the toxicity value(s) of query compound(s) along with their status of Applicability Domain (AD) in terms of leverage values. This model has been developed using structurally diverse compounds and is expected to predict efficiently and quantitatively the skin sensitization potential of environmental chemicals to estimate their occupational and health hazards.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|