1
|
Computation of molecular description of supramolecular Fuchsine model useful in medical data. Sci Rep 2024; 14:10933. [PMID: 38740796 DOI: 10.1038/s41598-024-60284-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 04/21/2024] [Indexed: 05/16/2024] Open
Abstract
Supramolecular chemistry is a fascinating field that explores the interactions between molecules to create higher-order structures. In the case of the supramolecular chain of Fuchsine acid, which is a type of dye molecule, several chemical applications are possible. Fuchsine acid helps to make better medicine carriers that deliver drugs where they're needed in the body, making treatments more effective and reducing side effects. It also helps create smart materials like sensors and self-fixing plastics, which are useful in electronics, keeping our environment clean, and making new materials. In sensing and detection, the supramolecular chain of Fuchsine acid utilizes as a sensor or detector for specific analyzes. In drug delivery, the supramolecular chains of Fuchsine acid incorporated into drug delivery systems. In recent years, a common method is linking a graph to a chemical structure and using topological descriptors to study it. This technique is becoming increasingly important over time. Topological descriptors gives very useful information while studying the topology of chemical graph. In this paper, we have computed the 3D structure of supramolecular graph of Fuchsine acid. We have computed an explicit expressions of ABC index, GA index, General Randi c ´ index, first and second Zagreb index, hyper Zagreb index, H-index and F-index of supramolecular structure of Fushine acid.
Collapse
|
2
|
Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
|
3
|
Inhibitor design for TMPRSS2: insights from computational analysis of its backbone hydrogen bonds using a simple descriptor. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024; 53:27-46. [PMID: 38157015 PMCID: PMC10853362 DOI: 10.1007/s00249-023-01695-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 01/03/2024]
Abstract
Transmembrane protease serine 2 (TMPRSS2) is an important drug target due to its role in the infection mechanism of coronaviruses including SARS-CoV-2. Current understanding regarding the molecular mechanisms of known inhibitors and insights required for inhibitor design are limited. This study investigates the effect of inhibitor binding on the intramolecular backbone hydrogen bonds (BHBs) of TMPRSS2 using the concept of hydrogen bond wrapping, which is the phenomenon of stabilization of a hydrogen bond in a solvent environment as a result of being surrounded by non-polar groups. A molecular descriptor which quantifies the extent of wrapping around BHBs is introduced for this. First, virtual screening for TMPRSS2 inhibitors is performed by molecular docking using the program DOCK 6 with a Generalized Born surface area (GBSA) scoring function. The docking results are then analyzed using this descriptor and its relationship to the solvent-accessible surface area term ΔGsa of the GBSA score is demonstrated with machine learning regression and principal component analysis. The effect of binding of the inhibitors camostat, nafamostat, and 4-guanidinobenzoic acid (GBA) on the wrapping of important BHBs in TMPRSS2 is also studied using molecular dynamics. For BHBs with a large increase in wrapping groups due to these inhibitors, the radial distribution function of water revealed that certain residues involved in these BHBs, like Gln438, Asp440, and Ser441, undergo preferential desolvation. The findings offer valuable insights into the mechanisms of these inhibitors and may prove useful in the design of new inhibitors.
Collapse
|
4
|
Interpretable attention-based multi-encoder transformer based QSPR model for assessing toxicity and environmental impact of chemicals. CHEMOSPHERE 2024; 350:141086. [PMID: 38163464 DOI: 10.1016/j.chemosphere.2023.141086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/03/2024]
Abstract
The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure-property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable 'black box' feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact.
Collapse
|
5
|
Global classification models for predicting acute toxicity of chemicals towards Daphnia magna. ENVIRONMENTAL RESEARCH 2023; 238:117239. [PMID: 37778597 DOI: 10.1016/j.envres.2023.117239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/10/2023] [Accepted: 09/18/2023] [Indexed: 10/03/2023]
Abstract
Molecular descriptors reflecting structural information on hydrophobicity, reactivity, polarizability, hydrogen bond and charged groups, were used to predict the toxicity (pLC50) of chemicals towards Daphnia magna with global quantitative structure-activity/toxicity relationship (QSAR/QSTR) models. A sufficiently large dataset including 1517 chemical toxicity to Daphnia magna was divided into a training set (758 pLC50) and a test set (759 pLC50). By applying random forest algorithm, two classification models, Class Model A and Class Model B were developed, having prediction accuracy, sensitivity and specificity above 85% for Class 1 (with pLC50 ≤ 4.48) and Class 2 (with pLC50 > 4.48). The Class Model A was based on nine molecular descriptors and RF parameters of nodesize = 1, ntree = 80 and mtry = 2, and yielded accuracy of 92.3% (training set), 85.6% (test set) and 88.9% (total data set). Class Model B was based on ten descriptors and parameters, nodesize = 1, ntree = 90 and mtry = 2, produced accuracy of 88.3% (training set), 86.8% (test set) and 87.5% (total data set). The two classification models were satisfactory compared with other classification model reported in the literature, although classification models in this work dealt with more samples. Thus, the two classification models with a larger applicability domain provided efficient tools for assessing chemical aquatic toxicity towards Daphnia magna.
Collapse
|
6
|
PyL3dMD: Python LAMMPS 3D molecular descriptors package. J Cheminform 2023; 15:69. [PMID: 37507792 PMCID: PMC10385924 DOI: 10.1186/s13321-023-00737-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023] Open
Abstract
Molecular descriptors characterize the biological, physical, and chemical properties of molecules and have long been used for understanding molecular interactions and facilitating materials design. Some of the most robust descriptors are derived from geometrical representations of molecules, called 3-dimensional (3D) descriptors. When calculated from molecular dynamics (MD) simulation trajectories, 3D descriptors can also capture the effects of operating conditions such as temperature or pressure. However, extracting 3D descriptors from MD trajectories is non-trivial, which hinders their wide use by researchers developing advanced quantitative-structure-property-relationship models using machine learning. Here, we describe a suite of open-source Python-based post-processing routines, called PyL3dMD, for calculating 3D descriptors from MD simulations. PyL3dMD is compatible with the popular simulation package LAMMPS and enables users to compute more than 2000 3D molecular descriptors from atomic trajectories generated by MD simulations. PyL3dMD is freely available via GitHub and can be easily installed and used as a highly flexible Python package on all major platforms (Windows, Linux, and macOS). A performance benchmark study used descriptors calculated by PyL3dMD to develop a neural network and the results showed that PyL3dMD is fast and efficient in calculating descriptors for large and complex molecular systems with long simulation durations. PyL3dMD facilitates the calculation of 3D molecular descriptors using MD simulations, making it a valuable tool for cheminformatics studies.
Collapse
|
7
|
Aging affects isomer-specific occurrence of dechlorane plus in soil profiles: A case study in a geographically isolated landfill from the Tibetan Plateau. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 878:163119. [PMID: 36996972 DOI: 10.1016/j.scitotenv.2023.163119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/20/2023] [Accepted: 03/23/2023] [Indexed: 05/13/2023]
Abstract
Two major structural isomers in commercial dechlorane plus (DP) mixtures, anti-DP and syn-DP, generally displayed varied desorption and partitioning efficiencies in soils, which may be linked to their different aging rates. However, the molecular parameters that govern the degree of aging and its associated effects on the occurrence of DP isomers have not been comprehensively investigated. In this study, the relative abundance of rapid desorption concentration (Rrapid) was measured for anti-DP, syn-DP, anti-Cl11-DP, anti-Cl10-DP, Dechlorane-604 (Dec-604), and Dechlorane-602 (Dec-602) at a geographically isolated landfill area in the Tibetan Plateau. The Rrapid values were used as an indicator of aging degree, exhibiting a close correlation with the three-dimension conformation of the molecules for the dechlorane series compounds. This observation suggested that planar molecules may have a greater tendency to accumulate in the condensed phase of organic matter and undergo more rapid aging. The fractional abundances and dechlorinated products of anti-DP were found to be predominantly controlled by the aging degree of DP isomers. The multiple nonlinear regression model indicated that differences in aging between anti-CP and syn-DP were primarily driven by the total desorption concentration and soil organic matter content. Aging plays a significant role in both the transport processes and metabolism of DP isomers and should be taken into account to refine the assessment of their environmental behaviors.
Collapse
|
8
|
Study on the characterization of pesticide modes of action similarity and the multi-endpoint combined toxicity of pesticide mixtures to Caenorhabditis elegans. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023:164918. [PMID: 37327899 DOI: 10.1016/j.scitotenv.2023.164918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 05/19/2023] [Accepted: 06/13/2023] [Indexed: 06/18/2023]
Abstract
With the widespread use of pesticides, the coexistence of multiple low-residue pesticides in environmental media has increased significantly, and the "cocktail" effect caused by this phenomenon has garnered increasing attention. However, owing to the scarcity of information regarding the modes of action (MOAs) of chemicals, the application of concentration addition (CA) models for evaluating and predicting the toxicity of mixture with similar MOAs is limited. Additionally, the joint toxicity laws of complex mixture systems to different toxicity endpoints in organisms remain unclear, and effective methods to test the mixture toxicity on lifespan and reproductive inhibition are lacking. Therefore, in this study, the similarity of pesticide MOAs was characterized using molecular electronegativity-distance vector (MEDV-13) descriptors based on eight pesticides (aldicarb, methomyl, imidacloprid, thiamethoxam, dichlorvos, dimethoate, methamidophos and triazophos). Additionally, the methods of lifespan and reproduction inhibition microplate toxicity analysis of elegans (EL-MTA and ER-MTA) were established to test the lifespan and reproduction inhibition toxicity of Caenorhabditis elegans. Finally, a unified scale synergistic-antagonistic heatmap (SAHscale) method was proposed to explore the combined toxicity of the mixtures on the lifespan, reproduction, and mortality of nematodes. The results showed that the MEDV-13 descriptor could effectively characterize the similarity in MOAs. The lifespan and reproductive ability of Caenorhabditis elegans were significantly inhibited when the pesticide exposure concentration was one order of magnitude lower than the lethal dose. The sensitivity of lifespan and reproductive endpoints to mixtures was dependent on the concentration ratio. The same rays in the mixture had consistent toxicity interactions on the lifespan and reproductive endpoints of Caenorhabditis elegans. In conclusion, we demonstrated the feasibility of MEDV-13 in characterizing the similarity of MOAs, and provided a theoretical basis for exploring the mechanism of chemical mixtures by studying their apparent toxicity of mixtures on nematode lifespan and reproduction endpoints.
Collapse
|
9
|
Reference dose prediction by using CDK molecular descriptors: A non-experimental method. CHEMOSPHERE 2022; 305:135460. [PMID: 35752312 DOI: 10.1016/j.chemosphere.2022.135460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 06/17/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023]
Abstract
Reference dose (RfD) is an estimate of a daily dose that individual can be exposed chronically without obvious deleterious effects during a lifetime. In the area of toxicology, researchers always use the traditional approach by employing NOAEL/LOAEL or the benchmark dose (BMD) and other dose-response approaches to estimate RfD. These methods have, despite their typicalness, certain limitations. In this study, we present a novel method of the estimation of reference dose without experiments. The information of the organic chemicals is available from the Integrated Risk Information System (IRIS) of USEPA. Molecular descriptors for each molecular structure were calculated by an integrated platform, and the chemicals were classified into four categories based on molecular similarity: 128 contained benzene rings, 47 were heteroaromatics, 104 contained halogen substituents and 44 were halogenated aliphatic hydrocarbons. The predictive model of RfD was constructed by the multiple linear stepwise regression (MLR) method. Approximately 95% and 82% of the data points differ by less than 10-fold and 5-fold between the predicted values and the true values respectively. The non-experimental method improves the estimation efficiency and has a certain reference value to predict.
Collapse
|
10
|
Random forest algorithm-based accurate prediction of chemical toxicity to Tetrahymena pyriformis. Toxicology 2022; 480:153325. [PMID: 36115645 DOI: 10.1016/j.tox.2022.153325] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/09/2022] [Accepted: 09/13/2022] [Indexed: 12/01/2022]
Abstract
The random forest (RF) algorithm, together with ten Dragon descriptors, was used to develop a quantitative structure-toxicity/activity relationship (QSTR/QSAR) model for a larger data set of 1792 chemical toxicity pIGC50 towards Tetrahymena pyriformis. The optimal RF (ntree =300 and mtry =3) model yielded root mean square (rms) errors of 0.261 for the training set (1434 chemicals) and 0.348 for the test set (358 chemicals). Compared with other QSTR models reported in the literature, the optimal RF model in this paper is more accurate. The feasibility of applying the RF algorithm to predict chemical toxicity pIGC50 towards Tetrahymena pyriformis has been verified.
Collapse
|
11
|
Development of binary classification models for assessment of drug-induced liver injury in humans using a large set of FDA-approved drugs. J Pharmacol Toxicol Methods 2022; 116:107185. [PMID: 35623583 DOI: 10.1016/j.vascn.2022.107185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 04/13/2022] [Accepted: 05/18/2022] [Indexed: 02/05/2023]
Abstract
Drug-induced liver injury (DILI) has been identified as one of the major causes for drugs withdrawn from the market, and even termination during the late stages of development. Therefore, it is imperative to evaluate the DILI potential of lead compounds during the research and development process. Although various computational models have been developed to predict DILI, most of which applied the DILI data were extracted from preclinical sources. In this investigation, the in silico prediction models for DILI were constructed based on 1140 FDA-approved drugs by using naïve Bayes classifier approach. The genetic algorithm method was applied for the molecular descriptors selection. Among these established prediction models, the NB-11 model based on eight molecular descriptors combined with ECFP_18 showed the best prediction performance for DILI, which gave 91.7% overall prediction accuracy for the training set, and 68.9% concordance for the external test set. Therefore, the established NB-11 prediction model can be used as a reliable virtual screening tool to predict DILI adverse effect in the early stages of drug design. In addition, some new structural alters for DILI were identified, which could be used for structural optimization in the future drug design by medicinal chemists.
Collapse
|
12
|
Chemical characterization of anemia-inducing aniline-related substances and their application to the construction of a decision tree-based anemia prediction model. Food Chem Toxicol 2021; 157:112548. [PMID: 34509582 DOI: 10.1016/j.fct.2021.112548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 08/11/2021] [Accepted: 09/07/2021] [Indexed: 10/20/2022]
Abstract
Anemia is a well-observed toxicity of chemical substances, and aniline is a typical anemia-inducing substance. However, it remains unclear whether all aniline-like substances with various substituents could induce anemia. We thus investigated the physicochemical characteristics of anemia-inducing substances by decision tree analyses. Training and validation substances were selected from a publicly available database of rat repeated-dose toxicity studies, and discrimination models were constructed by decision tree and bootstrapping methods with molecular descriptors as explanatory variables. To improve the accuracy of discrimination, we individually evaluated the explanatory variables to modify them, established "prerules" that were applied before subjecting a substance to a decision tree by considering metabolism, such as azo reduction and N-dealkylation, and introduced the idea of "partly negative" evaluation for substances having multiple aniline-like substructures. The final model obtained showed 79.2% and 77.5% accuracy for the training and validation dataset, respectively. In addition, we identified some chemical properties that reduce the anemia inducibility of aniline-like substances, including the addition of a sulfonate or carboxy functional group and/or a bulky multiring structure to anilines. In conclusion, the present findings will provide a novel insight into the mechanistic understanding of chemically induced anemia and help to develop a prediction system.
Collapse
|
13
|
A comprehensive study on retention of selected model substances in β-cyclodextrin-modified high performance liquid chromatography. J Chromatogr A 2021; 1645:462120. [PMID: 33839575 DOI: 10.1016/j.chroma.2021.462120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/21/2021] [Accepted: 03/24/2021] [Indexed: 10/21/2022]
Abstract
The quantitative structure-retention relationship (QSRR) models are not only employed in retention behaviour prediction, but also in an in-depth understanding of complex chromatographic systems. The goal of the present research is to enable the comprehensive understanding of retention underlying the separation in β-cyclodextrin (CD) modified reversed-phase high performance liquid chromatography (RP-HPLC) systems, through the development of mixed QSRR models. Moreover, the amount of β-CD adsorbed on the stationary phase surface (β-CDA) is added as the model's input in order to evaluate its contribution to both model performances and retention. Nuclear magnetic resonance (NMR) experiments were conducted to confirm the predicted inclusion complex structures and support the application of in silico tools. The most significant descriptors revealed that retention is governed by the steric factors 7.5 Å distant from the geometrical centre of a molecule, 3D arrangement of atoms determining the molecular size and shape, lipophilicity indicated by topological distances, as well as the unbound system's energy, related to the inclusion complex formation. In addition, a notable effect of the pH of the aqueous phase on the retention of ionizable analytes was shown. In the case of pH of the aqueous phase and β-CDA the change in retention behaviour of the studied analytes was observed only at the highest β-CDA value (5.17 μM/m2), but it was not related to the ionization state of analytes. When the analytes did not change the ionization form across the investigated studied pH range, and the acetonitrile content in the mobile phase was 25% (v/v), the retention factor had low values regardless of the β-CDA; under these circumstances the retention is probably acetonitrile driven.
Collapse
|
14
|
On neighborhood Zagreb index of product graphs. J Mol Struct 2021; 1223:129210. [PMID: 32921807 PMCID: PMC7474663 DOI: 10.1016/j.molstruc.2020.129210] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 08/17/2020] [Accepted: 09/04/2020] [Indexed: 11/27/2022]
Abstract
The properties and activities of chemicals are strongly related to their molecular structures. Topological indices defined on these molecular structures are capable to predict those properties and activities. In this article, a new topological index named as neighborhood Zagreb index (MN ) is presented. Here the chemical importance of the MN index is investigated and it is shown that the newly introduced index is useful in predicting physico-chemical properties with high accuracy compared to some well-established and often used indices. The isomer-discrimination ability of MN is also examined. To demonstrate how the computational formula of the novel index for chemical compounds is simple and convenient, the chemical structures of favipiravir and hydroxychloroquine are used. In addition, some explicit results for this index of different product graphs such as Cartesian, tensor and wreath product are derived. Some of these results are applied to obtain the MN index of some special structures.
Collapse
|
15
|
Machine learning guided prediction of liquid chromatography-mass spectrometry ionization efficiency for genotoxic impurities in pharmaceutical products. J Pharm Biomed Anal 2020; 194:113781. [PMID: 33280999 DOI: 10.1016/j.jpba.2020.113781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 11/14/2020] [Accepted: 11/16/2020] [Indexed: 10/23/2022]
Abstract
The limitation and control of genotoxic impurities (GTIs) has continued to receive attention from pharmaceutical companies and authorities for several decades. Because GTIs have the ability to damage deoxyribonucleic acid (DNA) and the potential to cause cancer, low-level quantitation is required to protect patients. A quick and easy method of determining the liquid chromatography-mass spectrometry (LC/MS) conditions for high-sensitivity analysis of GTIs may prospectively accelerate pharmaceutical development. In this study, a quantitative structure-property relationship (QSPR) model was developed for predicting the ionization efficiency of compounds using liquid-chromatography-mass spectrometry (LC/MS) parameters and molecular descriptors. Before implementing the QSPR prediction model, linear regression analysis was performed to model the relationship between the ionization efficiency and the LC/MS parameters for each compound. Comparison of the predicted peak areas with the experimentally observed peak areas showed good agreement based on the coefficient of determination (R2 > 0.96). The machine learning-based QSPR approach begins with computation of the molecular descriptors expressing the physicochemical properties of a compound, followed by a genetic algorithm-based feature selection. Linear and nonlinear regression were performed, and support vector machine (SVM) was selected as the best machine learning algorithm for the prediction. The SVM algorithm was developed and optimized using 1031 experimental data points for nine compounds, including well-known GTIs. Validation of the model by comparison of the predicted and observed relative ionization efficiencies (RIE) showed a high coefficient of determination (R2 = 0.96) and low root mean squared error value (RMSE = 0.118). Finally, this established prediction model was applied to hydrophilic interaction liquid chromatography coupled with MS for a new compound in new mobile phase compositions and new MS parameter settings. The RMSE of the predicted versus observed RIE was 0.203. This prediction accuracy was sufficient to determine the starting point of the LC/MS method development. The methodology demonstrated in this study can be used to determine the LC/MS conditions for high sensitivity analysis of GTIs.
Collapse
|
16
|
Developing novel computational prediction models for assessing chemical-induced neurotoxicity using naïve Bayes classifier technique. Food Chem Toxicol 2020; 143:111513. [PMID: 32621845 DOI: 10.1016/j.fct.2020.111513] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/02/2020] [Accepted: 06/04/2020] [Indexed: 02/08/2023]
Abstract
Development of reliable and efficient alternative in vivo methods for evaluation of the chemicals with potential neurotoxicity is an urgent need in the early stages of drug design. In this investigation, the computational prediction models for drug-induced neurotoxicity were developed by using the classical naïve Bayes classifier. Eight molecular properties closely relevant to neurotoxicity were selected. Then, 110 classification models were developed with using the eight important molecular descriptors and 10 types of fingerprints with 11 different maximum diameters. Among these 110 prediction models, the prediction model (NB-03) based on eight molecular descriptors combined with ECFP_10 fingerprints showed the best prediction performance, which gave 90.5% overall prediction accuracy for the training set and 82.1% concordance for the external test set. In addition, compared to naïve Bayes classifier, the recursive partitioning classifier displayed worse predictive performance for neurotoxicity. Therefore, the established NB-03 prediction model can be used as a reliable virtual screening tool to predict neurotoxicity in the early stages of drug design. Moreover, some structure alerts for characterizing neurotoxicity were identified in this research, which could give an important guidance for the chemists in structural modification and optimization to reduce the chemicals with potential neurotoxicity.
Collapse
|
17
|
Prediction of hERG potassium channel blockage using ensemble learning methods and molecular fingerprints. Toxicol Lett 2020; 332:88-96. [PMID: 32629073 DOI: 10.1016/j.toxlet.2020.07.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 06/16/2020] [Accepted: 07/02/2020] [Indexed: 11/30/2022]
Abstract
The human ether-a-go-go-related gene (hERG) encodes a tetrameric potassium channel called Kv11.1. This channel can be blocked by certain drugs, which leads to long QT syndrome, causing cardiotoxicity. This is a significant problem during drug development. Using computer models to predict compound cardiotoxicity during the early stages of drug design will help to solve this problem. In this study, we used a dataset of 1865 compounds exhibiting known hERG inhibitory activities as a training set. Thirty cardiotoxicity classification models were established using three machine learning algorithms based on molecular fingerprints and molecular descriptors. Through using these models as the base classifier, a new cardiotoxicity classification model with better predictive performance was developed using ensemble learning method. The accuracy of the best base classifier, which was generated using the XGBoost method with molecular descriptors, was 84.8 %, and the area under the receiver-operating characteristic curve (AUC) was 0.876 in the five fold cross-validation. However, all of the ensemble models that we developed had higher predictive performance than the base classifiers in the five fold cross-validation. The best predictive performance was achieved by the Ensemble-Top7 model, with accuracy of 84.9 % and AUC of 0.887. We also tested the ensemble model using external validation data and achieved accuracy of 85.0 % and AUC of 0.786. Furthermore, we identified several hERG-related substructures, which provide valuable information for designing drug candidates.
Collapse
|
18
|
Inclusion of molecular descriptors in predictive models improves pesticide soil-air partitioning estimates. CHEMOSPHERE 2020; 248:126031. [PMID: 32032877 DOI: 10.1016/j.chemosphere.2020.126031] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 01/23/2020] [Accepted: 01/24/2020] [Indexed: 06/10/2023]
Abstract
The soil-air exchange of pesticides is one potential fate and exposure pathways, and this process is generally thought to be governed by soil properties and environmental conditions. The experimental determination of soil-air partitioning coefficient (Ksa) is laborious and costly and typically, Ksa's are predicted from a semiempirical or a simple linear regression approach with soil and environmental variables. Here we developed a model that combined linear regression of soil, environmental and molecular parameters with the quantitative structural-property relationship (QSPR) to predict Ksa for pesticides. The values of theoretical descriptors of pesticides were calculated and the best descriptors selected using the Boruta Algorithm. Seventy-six experimental logKsa values for 17 pesticides were used in model development. Multiple linear regression (MLR) with a soil (organic carbon fraction), physicochemical (octanol-air partitioning coefficient), environmental (temperature and humidity) and molecular descriptor (Gmin, a 2D E-state molecular parameter), called as MLR-QSPR combined model exhibited better predictability (adj. r2 = 0.95) of logKsa compared to MLR (adj. r2 = 0.87) or QSPR (adj. r2 = 0.82) itself. MLR-QSPR also showed the best performance in five-fold cross-validation (adj. r2 = 0.94) and test set verification (adj. r2 = 0.96). The developed model was validated and characterized by the applicability domain. Results showed that the proposed MLR-QSPR approach is highly predictive and statistically robust with >95% of predictions within ±0.5 log unit of the measured Ksa. Therefore, this approach can be used in estimating the soil-air partitioning of pesticides to better predict it's fate and transport in environments.
Collapse
|
19
|
Prediction of chemical toxicity to Tetrahymena pyriformis with four-descriptor models. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2020; 190:110146. [PMID: 31923753 DOI: 10.1016/j.ecoenv.2019.110146] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 12/27/2019] [Accepted: 12/28/2019] [Indexed: 06/10/2023]
Abstract
A quantitative structure-toxicity relationship (QSTR) model based on four descriptors was successfully developed for 1163 chemical toxicants against Tetrahymena pyriformis by applying general regression neural network (GRNN). The training set consisting of 600 organic compounds was used to train GRNN models that were evaluated with the test set of 563 compounds. For the optimal GRNN model, the training set possesses the coefficient of determination R2 of 0.86 and root mean square (rms) error of 0.41, and the test set has R2 of 0.80 and rms of 0.41. Investigated results indicate that the optimal GRNN model is accurate, although the GRNN model has only four descriptor and more samples in the test set.
Collapse
|
20
|
Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development. J Cheminform 2020; 12:6. [PMID: 33431009 PMCID: PMC6977316 DOI: 10.1186/s13321-020-0410-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/11/2020] [Indexed: 12/21/2022] Open
Abstract
Computer-aided research on the relationship between molecular structures of natural compounds (NC) and their biological activities have been carried out extensively because the molecular structures of new drug candidates are usually analogous to or derived from the molecular structures of NC. In order to express the relationship physically realistically using a computer, it is essential to have a molecular descriptor set that can adequately represent the characteristics of the molecular structures belonging to the NC’s chemical space. Although several topological descriptors have been developed to describe the physical, chemical, and biological properties of organic molecules, especially synthetic compounds, and have been widely used for drug discovery researches, these descriptors have limitations in expressing NC-specific molecular structures. To overcome this, we developed a novel molecular fingerprint, called Natural Compound Molecular Fingerprints (NC-MFP), for explaining NC structures related to biological activities and for applying the same for the natural product (NP)-based drug development. NC-MFP was developed to reflect the structural characteristics of NCs and the commonly used NP classification system. NC-MFP is a scaffold-based molecular fingerprint method comprising scaffolds, scaffold-fragment connection points (SFCP), and fragments. The scaffolds of the NC-MFP have a hierarchical structure. In this study, we introduce 16 structural classes of NPs in the Dictionary of Natural Product database (DNP), and the hierarchical scaffolds of each class were calculated using the Bemis and Murko (BM) method. The scaffold library in NC-MFP comprises 676 scaffolds. To compare how well the NC-MFP represents the structural features of NCs compared to the molecular fingerprints that have been widely used for organic molecular representation, two kinds of binary classification tasks were performed. Task I is a binary classification of the NCs in commercially available library DB into a NC or synthetic compound. Task II is classifying whether NCs with inhibitory activity in seven biological target proteins are active or inactive. Two tasks were developed with some molecular fingerprints, including NC-MFP, using the 1-nearest neighbor (1-NN) method. The performance of task I showed that NC-MFP is a practical molecular fingerprint to classify NC structures from the data set compared with other molecular fingerprints. Performance of task II with NC-MFP outperformed compared with other molecular fingerprints, suggesting that the NC-MFP is useful to explain NC structures related to biological activities. In conclusion, NC-MFP is a robust molecular fingerprint in classifying NC structures and explaining the biological activities of NC structures. Therefore, we suggest NC-MFP as a potent molecular descriptor of the virtual screening of NC for natural product-based drug development.![]()
Collapse
|
21
|
When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler? Mol Divers 2019; 24:913-932. [PMID: 31659696 DOI: 10.1007/s11030-019-10002-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 10/09/2019] [Indexed: 01/29/2023]
Abstract
In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs ( http://tomocomd.com/md-lovis ), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon's entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs.
Collapse
|
22
|
Estimating Some General Molecular Descriptors of Saturated Hydrocarbons. Mol Inform 2019; 38:e1900007. [PMID: 31589808 DOI: 10.1002/minf.201900007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 03/13/2019] [Indexed: 11/06/2022]
Abstract
Three general molecular descriptors, namely the general sum-connectivity index, general Platt index and ordinary generalized geometric-arithmetic index, are studied here. Best possible bounds for the aforementioned descriptors of arbitrary saturated hydrocarbons are derived under certain constraints. These bounds are expressed in terms of number of carbon atoms and number of carbon-carbon bonds of the considered hydrocarbons.
Collapse
|
23
|
In silico prediction of drug-induced developmental toxicity by using machine learning approaches. Mol Divers 2019; 24:1281-1290. [PMID: 31486961 DOI: 10.1007/s11030-019-09991-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 08/28/2019] [Indexed: 02/05/2023]
Abstract
Some drugs and xenobiotics have the potential to disturb homeostasis, normal growth, differentiation, development or behavior during prenatal development or postnatally until puberty. Assessment of the developmental toxicity is one of the important safety considerations incorporated by international regulatory agencies. In this investigation, seven machine learning methods, including naïve Bayes, support vector machine, recursive partitioning, k-nearest neighbor, C4.5 decision tree, random forest and Adaboost, were used to build binary classification models for developmental toxicity. Among these models, the naïve Bayes classifier represented the best predictive performance and stability, which gave 91.11% overall prediction accuracy, 91.50% balanced accuracy and 0.818 MCC for the training set, and generated 83.93% concordance, 81.85% balanced accuracy and 0.627 MCC for the test set. The application domains were analyzed, and only one chemical in the test set was identified as outside the application domain. In addition, 10 important molecular descriptors related to developmental toxicity were selected by the genetic algorithm, which may contribute to explanation of the mechanisms of developmental toxicants. The best naïve Bayes classification model should be employed as alternative method for qualitative prediction of chemical-induced developmental toxicity in early stages of drug development.
Collapse
|
24
|
BCL::Mol2D-a robust atom environment descriptor for QSAR modeling and lead optimization. J Comput Aided Mol Des 2019; 33:477-486. [PMID: 30955193 PMCID: PMC6824857 DOI: 10.1007/s10822-019-00199-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 03/18/2019] [Indexed: 12/28/2022]
Abstract
Comparing fragment based molecular fingerprints of drug-like molecules is one of the most robust and frequently used approaches in computer-assisted drug discovery. Molprint2D, a popular atom environment (AE) descriptor, yielded the best enrichment of active compounds across a diverse set of targets in a recent large-scale study. We present here BCL::Mol2D descriptors that outperformed Molprint2D on nine PubChem datasets spanning a wide range of protein classes. Because BCL::Mol2D records the number of AEs from a universal AE library, a novel aspect of BCL::Mol2D over the Molprint2D is its reversibility. This property enables decomposition of prediction from machine learning models to particular molecular substructures. Artificial neural networks with dropout, when trained on BCL::Mol2D descriptors outperform those trained on Molprint2D descriptors by up to 26% in logAUC metric. When combined with the Reduced Short Range descriptor set, our previously published set of descriptors optimized for QSARs, BCL::Mol2D yields a modest improvement. Finally, we demonstrate how the reversibility of BCL::Mol2D enables visualization of a 'pharmacophore map' that could guide lead optimization for serine/threonine kinase 33 inhibitors.
Collapse
|
25
|
Alkanes with the First Three Maximal/Minimal Modified First Zagreb Connection Indices. Mol Inform 2019; 38:e1800116. [PMID: 30614630 DOI: 10.1002/minf.201800116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/01/2018] [Indexed: 11/11/2022]
Abstract
The modified first Zagreb connection index ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> ) is a molecular descriptor, which was initially appeared within a formula of the total electron energy of alternant hydrocarbons in 1972. In a recent paper [A. Ali, N. Trinajstić, A novel/old modification of the first Zagreb index, Mol. Inform. 37 (2018) 1800008], it was observed that the molecular descriptor <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> correlates well with the entropy and acentric factor of octane isomers. In this article, the molecules with the first three maximal <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> values as well as the first three minimal <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> values are determined from the family of all alkanes with n carbon atoms, for <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>n</mml:mi> <mml:mo>≥</mml:mo> <mml:mn>6</mml:mn></mml:mrow> </mml:math> . This extends the main results of the aforementioned paper.
Collapse
|
26
|
Abstract
Molecular descriptors are widely employed to present molecular characteristics in cheminformatics. Various molecular-descriptor-calculation software programs have been developed. However, users of those programs must contend with several issues, including software bugs, insufficient update frequencies, and software licensing constraints. To address these issues, we propose Mordred, a developed descriptor-calculation software application that can calculate more than 1800 two- and three-dimensional descriptors. It is freely available via GitHub. Mordred can be easily installed and used in the command line interface, as a web application, or as a high-flexibility Python package on all major platforms (Windows, Linux, and macOS). Performance benchmark results show that Mordred is at least twice as fast as the well-known PaDEL-Descriptor and it can calculate descriptors for large molecules, which cannot be accomplished by other software. Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships.![]()
Collapse
|
27
|
Clustering pesticides according to their molecular properties, fate, and effects by considering additional ecotoxicological parameters in the TyPol method. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018; 25:4728-4738. [PMID: 29197062 DOI: 10.1007/s11356-017-0758-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/14/2017] [Indexed: 05/05/2023]
Abstract
Understanding the fate and ecotoxicological effects of pesticides largely depends on their molecular properties. We recently developed "TyPol" (Typology of Pollutants), a classification method of organic compounds based on statistical analyses. It combines several environmental (sorption coefficient, degradation half-life) and one ecotoxicological (bioconcentration factor) parameters, to structural molecular descriptors (number of atoms in the molecule, molecular surface, dipole moment, energy of orbitals, etc.). The present study attempts to extend TyPol to the ecotoxicological effects of pesticides on non-target organisms, based on data analysis from available literature and databases. It revealed that relevant ecotoxicological endpoints for terrestrial organisms (e.g., soil microorganisms, invertebrates) that support a range of ecosystemic services are lacking as compared to aquatic organisms. The availability of ecotoxicological parameters was also lower for chronic than for acute ecotoxicity endpoints. Consequently, seven parameters were included for acute (EC50, LC50) and chronic (NOEC) ecotoxicological effects for one terrestrial (Eisenia sp.) and three aquatic (Daphnia sp., algae, Lemna sp.) organisms. In this new configuration, we used TyPol to classify 50 pesticides into different clusters that gather molecules with similar environmental behaviors and ecotoxicological effects. The classification results evidenced relationships between molecular descriptors, environmental parameters, and the added ecotoxicological endpoints. This proof-of-concept study also showed that TyPol in silico classification can successfully address new scientific questions and be expanded with other parameters of interest.
Collapse
|
28
|
Abstract
Human pluripotent stem cells such as embryonic stem (ES) and induced pluripotent stem (iPS) cells, combined with sophisticated bioinformatics methods, are powerful tools to predict developmental chemical toxicity. Because cell differentiation is not necessary, these cells can facilitate cost-effective assays, thus providing a practical system for the toxicity assessment of various types of chemicals. Here we describe how to apply machine learning techniques to different types of data, such as qRT-PCRs, gene networks, and molecular descriptors, for toxic chemicals, as well as how to integrate these data to predict toxicity categories. Interestingly, our results using 20 chemical data for neurotoxins (NTs), genotoxic carcinogens (GCs), and nongenotoxic carcinogens (NGCs) demonstrated that the highest and most robust prediction performance was obtained by using gene networks as the input. We also observed that qRT-PCR and molecular descriptors tend to contribute to specific toxicity categories.
Collapse
|
29
|
Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2017; 56:314-321. [PMID: 29091819 DOI: 10.1016/j.etap.2017.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 10/09/2017] [Accepted: 10/11/2017] [Indexed: 06/07/2023]
Abstract
Several descriptors from atom weighted vectors are used in the prediction of aquatic toxicity of set of organic compounds of 392 benzene derivatives to the protozoo ciliate Tetrahymena pyriformis (log(IGC50)-1). These descriptors are calculated using the MD-LOVIs software and various Aggregation Operators are examined with the aim comparing their performances in predicting aquatic toxicity. Variability analysis is used to quantify the information content of these molecular descriptors by means of an information theory-based algorithm. Multiple Linear Regression with Genetic Algorithms is used to obtain models of the structure-toxicity relationships; the best model shows values of Q2=0.830 and R2=0.837 using six variables. Our models compare favorably with other previously published models that use the same data set. The obtained results suggest that these descriptors provide an effective alternative for determining aquatic toxicity of benzene derivatives.
Collapse
|
30
|
Machine learning-based models to predict modes of toxic action of phenols to Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:735-747. [PMID: 29022372 DOI: 10.1080/1062936x.2017.1376705] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 09/01/2017] [Indexed: 06/07/2023]
Abstract
The phenols are structurally heterogeneous pollutants and they present a variety of modes of toxic action (MOA), including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles, and soft electrophiles. Because it is often difficult to determine correctly the mechanism of action of a compound, quantitative structure-activity relationship (QSAR) methods, which have proved their interest in toxicity prediction, can be used. In this work, several QSAR models for the prediction of MOA of 221 phenols to the ciliated protozoan Tetrahymena pyriformis, using Chemistry Development Kit descriptors, are reported. Four machine learning techniques (ML), k-nearest neighbours, support vector machine, classification trees, and artificial neural networks, have been used to develop several models with higher accuracies and predictive capabilities for distinguishing between four MOAs. They showed global accuracy values between 95.9% and 97.7% and area under Receiver Operator Curve values between 0.978 and 0.998; additionally, false alarm rate values were below 8.2% for training set. In order to validate our models, cross-validation (10-folds-out) and external test-set were performed with good behaviour in all cases. These models, obtained with ML techniques, were compared with others previously reported by other researchers, and the improvement was significant.
Collapse
|
31
|
In vivo toxicity of nitroaromatics: A comprehensive quantitative structure-activity relationship study. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2017; 36:2227-2233. [PMID: 28169452 DOI: 10.1002/etc.3761] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 11/01/2016] [Accepted: 02/06/2017] [Indexed: 06/06/2023]
Abstract
The toxicity data of 90 nitroaromatic compounds related to their 50% lethal dose concentration for rats (LD50) were analyzed to develop quantitative structure-activity relationship (QSAR) models. Quantum-chemically calculated descriptors together with molecular descriptors generated by DRAGON, PaDEL, and HiT-QSAR software were utilized to build QSAR models. Quality and validity of the models were determined by internal and external validation techniques. The results show that the toxicity of nitroaromatic compounds depends on various factors, such as the number of nitro-groups, the topological state, and the presence of certain structural fragments. The developed models based on the largest (to date) dataset of nitroaromatics in vivo toxicity showed a good predictive ability. The results provide important input that could be applied in a preliminary assessment of nitroaromatic compounds' toxicity to mammals. Environ Toxicol Chem 2017;36:2227-2233. © 2017 SETAC.
Collapse
|
32
|
Deciphering molecular properties and docking studies of hepatitis C and non-hepatitis C antiviral inhibitors - A computational approach. Life Sci 2017; 174:8-14. [PMID: 28259653 DOI: 10.1016/j.lfs.2017.02.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 02/24/2017] [Accepted: 02/28/2017] [Indexed: 11/19/2022]
Abstract
BACKGROUND Hepatitis C is an infectious liver disease with high mortality rate which is caused by Hepatitis C virus. Several treatment methods have been applied to combat this deadly virus including interferons, vaccine and direct acting antivirals (DAAs). However, the later shows promising effects in HCV treatment with lower adverse effect. Specifically, the DAAs target the non-structural proteins (NS3 and NS5B). PURPOSE The objective of the present study is to hypothesize an alternative antiviral inhibitor for HCV from the available other antivirals. METHODS Computation of 2D molecular descriptors for the selected antiviral inhibitors followed by clustering the descriptor features. The closely clustered compounds were subjected to the interaction studies against the HCV target protein to validate the cluster result. RESULTS AND DISCUSSION The clustering result showed that indinavir (HIV inhibitor) and AT130 (HBV inhibitor) molecule are close to the HCV inhibitor. The indinavir complexed with NS3 protein shows -5.33kcal/mol and AT-130 complexed with NS5B protein possess the binding energy of -8.87kcal/mol. The docking interaction study indicated a better binding affinity than other viral inhibitors. CONCLUSION From the descriptor based feature similarity analysis and the interaction study, it can be concluded that indinavir and AT-130 could be a potential alternative agent for HCV treatment.
Collapse
|
33
|
How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity. J Cheminform 2016; 8:4. [PMID: 26816532 PMCID: PMC4727313 DOI: 10.1186/s13321-016-0114-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 01/08/2016] [Indexed: 11/24/2022] Open
Abstract
Background Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies. Results Representing dendrograms in graph theoretical terms allowed us to introduce four measures of cluster frequency in a canonical way, and use them to calculate cluster frequencies over the set of all possible dendrograms, taking all ties in proximity into account. A toy example of well separated clusters was used, as well as a set of 1666 molecular descriptors calculated for a group of molecules having hepatotoxic activity to show how our functions may be used for studying the effect of ties in HCA analysis. Such functions were not restricted to the tie case; the possibility of using them to derive cluster stability measurements on arbitrary sets of dendrograms having the same leaves is discussed, e.g. dendrograms from variations of HCA parameters. It was found that ties occurred frequently, some yielding tens of thousands of dendrograms, even for small data sets. Conclusions Our approach was able to detect trends in clustering patterns by offering a simple way of measuring their frequency, which is often very low. This would imply, that inferences and models based on descriptor classifications (e.g. QSAR) are likely to be biased, thereby requiring an assessment of their reliability. Moreover, any classification of molecular descriptors is likely to be far from unique. Our results highlight the need for evaluating the effect of ties on clustering patterns before classification results can be used accurately.Four cluster contrast functions identifying statistically sound clusters within dendrograms considering ties in proximity ![]()
Collapse
|
34
|
Effect of imidazolium-based ionic liquids on bacterial growth inhibition investigated via experimental and QSAR modelling studies. JOURNAL OF HAZARDOUS MATERIALS 2015; 297:198-206. [PMID: 25965417 DOI: 10.1016/j.jhazmat.2015.04.082] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 04/28/2015] [Accepted: 04/29/2015] [Indexed: 06/04/2023]
Abstract
Tuning the characteristics of solvents to fit industrial requirements has currently become a major interest in both academic and industrial communities, notably in the field of room temperature ionic liquids (RTILs), which are considered one of the most promising green alternatives to molecular organic solvents. In this work, several sets of imidazolium-based ionic liquids were synthesized, and their toxicities were assessed towards four human pathogens bacteria to investigate how tunability can affect this characteristic. Additionally, the toxicity of particular RTILs bearing an amino acid anion was introduced in this work. EC50 values (50% effective concentration) were established, and significant variations were observed; although all studied ILs displayed an imidazolium moiety, the toxicity values were found to vary between 0.05 mM for the most toxic to 85.57 mM for the least toxic. Linear quantitative structure activity relationship models were then developed using the charge density distribution (σ-profiles) as molecular descriptors, which can yield accuracies as high as 95%.
Collapse
|
35
|
QSAR prediction of HIV-1 protease inhibitory activities using docking derived molecular descriptors. J Theor Biol 2015; 369:13-22. [PMID: 25600056 DOI: 10.1016/j.jtbi.2015.01.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 01/10/2015] [Accepted: 01/12/2015] [Indexed: 01/30/2023]
Abstract
In this study, application of a new hybrid docking-quantitative structure activity relationship (QSAR) methodology to model and predict the HIV-1 protease inhibitory activities of a series of newly synthesized chemicals is reported. This hybrid docking-QSAR approach can provide valuable information about the most important chemical and structural features of the ligands that affect their inhibitory activities. Docking studies were used to find the actual conformations of chemicals in active site of HIV-1 protease. Then the molecular descriptors were calculated from these conformations. Multiple linear regression (MLR) and least square support vector machine (LS-SVM) were used as QSAR models, respectively. The obtained results reveal that statistical parameters of the LS-SVM model are better than the MLR model, which indicate that there are some non-linear relations between selected molecular descriptors and anti-HIV activities of interested chemicals. The correlation coefficient (R), root mean square error (RMSE) and average absolute error (AAE) for LS-SVM are: R=0.988, RMSE=0.207 and AAE=0.145 for the training set, and R=0.965, RMSE=0.403 and AAE=0.338 for the test set. Leave one out cross validation test was used for assessment of the predictive power and validity of models which led to cross-validation correlation coefficient QUOTE of 0.864 and 0.850 and standardized predicted relative error sum of squares (SPRESS) of 0.553 and 0.581 for LS-SVM and MLR models, respectively.
Collapse
|
36
|
Predicting network of drug-enzyme interaction based on machine learning method. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:214-23. [PMID: 23907006 DOI: 10.1016/j.bbapap.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 07/16/2013] [Accepted: 07/18/2013] [Indexed: 12/11/2022]
Abstract
It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, a total of 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, Molecular Volume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a sensitivity of 87.9% at the specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a sensitivity of 95.7% at the specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai.
Collapse
|
37
|
Prediction of boiling points of organic compounds by QSPR tools. J Mol Graph Model 2013; 44:113-9. [PMID: 23792208 DOI: 10.1016/j.jmgm.2013.04.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 04/24/2013] [Indexed: 10/26/2022]
Abstract
The novel electro-negativity topological descriptors of YC, WC were derived from molecular structure by equilibrium electro-negativity of atom and relative bond length of molecule. The quantitative structure-property relationships (QSPR) between descriptors of YC, WC as well as path number parameter P3 and the normal boiling points of 80 alkanes, 65 unsaturated hydrocarbons and 70 alcohols were obtained separately. The high-quality prediction models were evidenced by coefficient of determination (R(2)), the standard error (S), average absolute errors (AAE) and predictive parameters (Qext(2),RCV(2),Rm(2)). According to the regression equations, the influences of the length of carbon backbone, the size, the degree of branching of a molecule and the role of functional groups on the normal boiling point were analyzed. Comparison results with reference models demonstrated that novel topological descriptors based on the equilibrium electro-negativity of atom and the relative bond length were useful molecular descriptors for predicting the normal boiling points of organic compounds.
Collapse
|