Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

37
(from Reference Citation Analysis)

Article PDFs (4)

Cited by > 0 (28)

Searched Name

Molecular descriptor

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Collapse

Four cluster contrast functions identifying statistically sound clusters within dendrograms considering ties in proximity
Number	Citation Analysis
1	Computation of molecular description of supramolecular Fuchsine model useful in medical data. Sci Rep 2024;14:10933. [PMID: 38740796 DOI: 10.1038/s41598-024-60284-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 04/21/2024] [Indexed: 05/16/2024] Open Abstract Supramolecular chemistry is a fascinating field that explores the interactions between molecules to create higher-order structures. In the case of the supramolecular chain of Fuchsine acid, which is a type of dye molecule, several chemical applications are possible. Fuchsine acid helps to make better medicine carriers that deliver drugs where they're needed in the body, making treatments more effective and reducing side effects. It also helps create smart materials like sensors and self-fixing plastics, which are useful in electronics, keeping our environment clean, and making new materials. In sensing and detection, the supramolecular chain of Fuchsine acid utilizes as a sensor or detector for specific analyzes. In drug delivery, the supramolecular chains of Fuchsine acid incorporated into drug delivery systems. In recent years, a common method is linking a graph to a chemical structure and using topological descriptors to study it. This technique is becoming increasingly important over time. Topological descriptors gives very useful information while studying the topology of chemical graph. In this paper, we have computed the 3D structure of supramolecular graph of Fuchsine acid. We have computed an explicit expressions of ABC index, GA index, General Randi c ´ index, first and second Zagreb index, hyper Zagreb index, H-index and F-index of supramolecular structure of Fushine acid. Collapse Key Words Fuchsine C 2 0 H 1 9 N 3 H C l Molecular descriptor Topological descriptors Collapse MESH Headings Collapse Grants Collapse
2	Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024;342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024] Abstract The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields. Collapse Key Words Ecological risk Graph attention network Molecular descriptor Molecular fingerprint Species sensitivity distribution XGBoost Collapse MESH Headings Reproducibility of Results Risk Assessment Machine Learning Quantitative Structure-Activity Relationship Environmental Pollutants Collapse Grants Collapse
3	Inhibitor design for TMPRSS2: insights from computational analysis of its backbone hydrogen bonds using a simple descriptor. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024;53:27-46. [PMID: 38157015 PMCID: PMC10853362 DOI: 10.1007/s00249-023-01695-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 12/04/2023] [Accepted: 12/07/2023] [Indexed: 01/03/2024] Abstract Transmembrane protease serine 2 (TMPRSS2) is an important drug target due to its role in the infection mechanism of coronaviruses including SARS-CoV-2. Current understanding regarding the molecular mechanisms of known inhibitors and insights required for inhibitor design are limited. This study investigates the effect of inhibitor binding on the intramolecular backbone hydrogen bonds (BHBs) of TMPRSS2 using the concept of hydrogen bond wrapping, which is the phenomenon of stabilization of a hydrogen bond in a solvent environment as a result of being surrounded by non-polar groups. A molecular descriptor which quantifies the extent of wrapping around BHBs is introduced for this. First, virtual screening for TMPRSS2 inhibitors is performed by molecular docking using the program DOCK 6 with a Generalized Born surface area (GBSA) scoring function. The docking results are then analyzed using this descriptor and its relationship to the solvent-accessible surface area term ΔGsa of the GBSA score is demonstrated with machine learning regression and principal component analysis. The effect of binding of the inhibitors camostat, nafamostat, and 4-guanidinobenzoic acid (GBA) on the wrapping of important BHBs in TMPRSS2 is also studied using molecular dynamics. For BHBs with a large increase in wrapping groups due to these inhibitors, the radial distribution function of water revealed that certain residues involved in these BHBs, like Gln438, Asp440, and Ser441, undergo preferential desolvation. The findings offer valuable insights into the mechanisms of these inhibitors and may prove useful in the design of new inhibitors. Collapse Key Words Hydrogen bond wrapping Molecular descriptor Protease inhibitors SARS-CoV-2 TMPRSS2 Collapse MESH Headings Hydrogen Bonding Molecular Docking Simulation SARS-CoV-2 Solvents Water Humans Collapse Grants Collapse
4	Interpretable attention-based multi-encoder transformer based QSPR model for assessing toxicity and environmental impact of chemicals. CHEMOSPHERE 2024;350:141086. [PMID: 38163464 DOI: 10.1016/j.chemosphere.2023.141086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/03/2024] Abstract The rising demand from consumer goods and pharmaceutical industry is driving a fast expansion of newly developed chemicals. The conventional toxicity testing of unknown chemicals is expensive, time-consuming, and raises ethical concerns. The quantitative structure-property relationship (QSPR) is an efficient computational method because it saves time, resources, and animal experimentation. Advances in machine learning have improved chemical analysis in QSPR studies, but the real-world application of machine learning-based QSPR studies was limited by the unexplainable 'black box' feature of the machine learnings. In this study, multi-encoder structure-to-toxicity (S2T)-transformer based QSPR model was developed to estimate the properties of polychlorinated biphenyls (PCBs) and endocrine disrupting chemicals (EDCs). Simplified molecular input line entry systems (SMILES) and molecular descriptors calculated by the Dragon 6 software, were simultaneously considered as input of QSPR model. Furthermore, an attention-based framework is proposed to describe the relationship between the molecular structure and toxicity of hazardous chemicals. The S2T-transformer model achieved the highest R2 scores of 0.918, 0.856, and 0.907 for logarithm of octanol-water partition coefficient (Log KOW), octanol-air partition coefficient (Log KOA), and bioconcentration factor (Log BCF) estimation of PCBs, respectively. Moreover, the attention weights were able to properly interpret the lateral (meta, para) chlorination associated with PCBs toxicity and environmental impact. Collapse Key Words Attention mechanism Molecular descriptor Multi-encoder transformer structure Quantitative structure-activity/property relationship SMILES Collapse MESH Headings Animals Polychlorinated Biphenyls/analysis Octanols/chemistry Water/chemistry Software Quantitative Structure-Activity Relationship Environment Collapse Grants Collapse
5	Global classification models for predicting acute toxicity of chemicals towards Daphnia magna. ENVIRONMENTAL RESEARCH 2023;238:117239. [PMID: 37778597 DOI: 10.1016/j.envres.2023.117239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 09/10/2023] [Accepted: 09/18/2023] [Indexed: 10/03/2023] Abstract Molecular descriptors reflecting structural information on hydrophobicity, reactivity, polarizability, hydrogen bond and charged groups, were used to predict the toxicity (pLC50) of chemicals towards Daphnia magna with global quantitative structure-activity/toxicity relationship (QSAR/QSTR) models. A sufficiently large dataset including 1517 chemical toxicity to Daphnia magna was divided into a training set (758 pLC50) and a test set (759 pLC50). By applying random forest algorithm, two classification models, Class Model A and Class Model B were developed, having prediction accuracy, sensitivity and specificity above 85% for Class 1 (with pLC50 ≤ 4.48) and Class 2 (with pLC50 > 4.48). The Class Model A was based on nine molecular descriptors and RF parameters of nodesize = 1, ntree = 80 and mtry = 2, and yielded accuracy of 92.3% (training set), 85.6% (test set) and 88.9% (total data set). Class Model B was based on ten descriptors and parameters, nodesize = 1, ntree = 90 and mtry = 2, produced accuracy of 88.3% (training set), 86.8% (test set) and 87.5% (total data set). The two classification models were satisfactory compared with other classification model reported in the literature, although classification models in this work dealt with more samples. Thus, the two classification models with a larger applicability domain provided efficient tools for assessing chemical aquatic toxicity towards Daphnia magna. Collapse Key Words Molecular descriptor QSAR QSTR Random forest Collapse MESH Headings Animals Daphnia Water Pollutants, Chemical/chemistry Quantitative Structure-Activity Relationship Random Forest Collapse Grants Collapse
6	PyL3dMD: Python LAMMPS 3D molecular descriptors package. J Cheminform 2023;15:69. [PMID: 37507792 PMCID: PMC10385924 DOI: 10.1186/s13321-023-00737-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/16/2023] [Indexed: 07/30/2023] Open Abstract Molecular descriptors characterize the biological, physical, and chemical properties of molecules and have long been used for understanding molecular interactions and facilitating materials design. Some of the most robust descriptors are derived from geometrical representations of molecules, called 3-dimensional (3D) descriptors. When calculated from molecular dynamics (MD) simulation trajectories, 3D descriptors can also capture the effects of operating conditions such as temperature or pressure. However, extracting 3D descriptors from MD trajectories is non-trivial, which hinders their wide use by researchers developing advanced quantitative-structure-property-relationship models using machine learning. Here, we describe a suite of open-source Python-based post-processing routines, called PyL3dMD, for calculating 3D descriptors from MD simulations. PyL3dMD is compatible with the popular simulation package LAMMPS and enables users to compute more than 2000 3D molecular descriptors from atomic trajectories generated by MD simulations. PyL3dMD is freely available via GitHub and can be easily installed and used as a highly flexible Python package on all major platforms (Windows, Linux, and macOS). A performance benchmark study used descriptors calculated by PyL3dMD to develop a neural network and the results showed that PyL3dMD is fast and efficient in calculating descriptors for large and complex molecular systems with long simulation durations. PyL3dMD facilitates the calculation of 3D molecular descriptors using MD simulations, making it a valuable tool for cheminformatics studies. Collapse Key Words Cheminformatics LAMMPS MD simulations Molecular descriptor Python QSPR Collapse MESH Headings Collapse Grants Collapse
7	Aging affects isomer-specific occurrence of dechlorane plus in soil profiles: A case study in a geographically isolated landfill from the Tibetan Plateau. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;878:163119. [PMID: 36996972 DOI: 10.1016/j.scitotenv.2023.163119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 03/20/2023] [Accepted: 03/23/2023] [Indexed: 05/13/2023] Abstract Two major structural isomers in commercial dechlorane plus (DP) mixtures, anti-DP and syn-DP, generally displayed varied desorption and partitioning efficiencies in soils, which may be linked to their different aging rates. However, the molecular parameters that govern the degree of aging and its associated effects on the occurrence of DP isomers have not been comprehensively investigated. In this study, the relative abundance of rapid desorption concentration (R_rapid) was measured for anti-DP, syn-DP, anti-Cl₁₁-DP, anti-Cl₁₀-DP, Dechlorane-604 (Dec-604), and Dechlorane-602 (Dec-602) at a geographically isolated landfill area in the Tibetan Plateau. The R_rapid values were used as an indicator of aging degree, exhibiting a close correlation with the three-dimension conformation of the molecules for the dechlorane series compounds. This observation suggested that planar molecules may have a greater tendency to accumulate in the condensed phase of organic matter and undergo more rapid aging. The fractional abundances and dechlorinated products of anti-DP were found to be predominantly controlled by the aging degree of DP isomers. The multiple nonlinear regression model indicated that differences in aging between anti-CP and syn-DP were primarily driven by the total desorption concentration and soil organic matter content. Aging plays a significant role in both the transport processes and metabolism of DP isomers and should be taken into account to refine the assessment of their environmental behaviors. Collapse Key Words Aging Dechlorane plus Fractional abundance Molecular descriptor Tibetan Plateau Collapse MESH Headings Collapse Grants Collapse
8	Study on the characterization of pesticide modes of action similarity and the multi-endpoint combined toxicity of pesticide mixtures to Caenorhabditis elegans. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023:164918. [PMID: 37327899 DOI: 10.1016/j.scitotenv.2023.164918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 05/19/2023] [Accepted: 06/13/2023] [Indexed: 06/18/2023] Abstract With the widespread use of pesticides, the coexistence of multiple low-residue pesticides in environmental media has increased significantly, and the "cocktail" effect caused by this phenomenon has garnered increasing attention. However, owing to the scarcity of information regarding the modes of action (MOAs) of chemicals, the application of concentration addition (CA) models for evaluating and predicting the toxicity of mixture with similar MOAs is limited. Additionally, the joint toxicity laws of complex mixture systems to different toxicity endpoints in organisms remain unclear, and effective methods to test the mixture toxicity on lifespan and reproductive inhibition are lacking. Therefore, in this study, the similarity of pesticide MOAs was characterized using molecular electronegativity-distance vector (MEDV-13) descriptors based on eight pesticides (aldicarb, methomyl, imidacloprid, thiamethoxam, dichlorvos, dimethoate, methamidophos and triazophos). Additionally, the methods of lifespan and reproduction inhibition microplate toxicity analysis of elegans (EL-MTA and ER-MTA) were established to test the lifespan and reproduction inhibition toxicity of Caenorhabditis elegans. Finally, a unified scale synergistic-antagonistic heatmap (SAHscale) method was proposed to explore the combined toxicity of the mixtures on the lifespan, reproduction, and mortality of nematodes. The results showed that the MEDV-13 descriptor could effectively characterize the similarity in MOAs. The lifespan and reproductive ability of Caenorhabditis elegans were significantly inhibited when the pesticide exposure concentration was one order of magnitude lower than the lethal dose. The sensitivity of lifespan and reproductive endpoints to mixtures was dependent on the concentration ratio. The same rays in the mixture had consistent toxicity interactions on the lifespan and reproductive endpoints of Caenorhabditis elegans. In conclusion, we demonstrated the feasibility of MEDV-13 in characterizing the similarity of MOAs, and provided a theoretical basis for exploring the mechanism of chemical mixtures by studying their apparent toxicity of mixtures on nematode lifespan and reproduction endpoints. Collapse Key Words APTox Chronic toxicity Mixture design Molecular descriptor Synergistic-antagonistic heatmap Collapse MESH Headings Collapse Grants Collapse
9	Reference dose prediction by using CDK molecular descriptors: A non-experimental method. CHEMOSPHERE 2022;305:135460. [PMID: 35752312 DOI: 10.1016/j.chemosphere.2022.135460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 06/17/2022] [Accepted: 06/21/2022] [Indexed: 06/15/2023] Abstract Reference dose (RfD) is an estimate of a daily dose that individual can be exposed chronically without obvious deleterious effects during a lifetime. In the area of toxicology, researchers always use the traditional approach by employing NOAEL/LOAEL or the benchmark dose (BMD) and other dose-response approaches to estimate RfD. These methods have, despite their typicalness, certain limitations. In this study, we present a novel method of the estimation of reference dose without experiments. The information of the organic chemicals is available from the Integrated Risk Information System (IRIS) of USEPA. Molecular descriptors for each molecular structure were calculated by an integrated platform, and the chemicals were classified into four categories based on molecular similarity: 128 contained benzene rings, 47 were heteroaromatics, 104 contained halogen substituents and 44 were halogenated aliphatic hydrocarbons. The predictive model of RfD was constructed by the multiple linear stepwise regression (MLR) method. Approximately 95% and 82% of the data points differ by less than 10-fold and 5-fold between the predicted values and the true values respectively. The non-experimental method improves the estimation efficiency and has a certain reference value to predict. Collapse Key Words Molecular descriptor Molecular similarity Multiple liner stepwise regression Reference dose Collapse MESH Headings Benchmarking No-Observed-Adverse-Effect Level Reference Values Risk Assessment/methods United States United States Environmental Protection Agency Collapse Grants Collapse
10	Random forest algorithm-based accurate prediction of chemical toxicity to Tetrahymena pyriformis. Toxicology 2022;480:153325. [PMID: 36115645 DOI: 10.1016/j.tox.2022.153325] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/09/2022] [Accepted: 09/13/2022] [Indexed: 12/01/2022] Abstract The random forest (RF) algorithm, together with ten Dragon descriptors, was used to develop a quantitative structure-toxicity/activity relationship (QSTR/QSAR) model for a larger data set of 1792 chemical toxicity pIGC₅₀ towards Tetrahymena pyriformis. The optimal RF (ntree =300 and mtry =3) model yielded root mean square (rms) errors of 0.261 for the training set (1434 chemicals) and 0.348 for the test set (358 chemicals). Compared with other QSTR models reported in the literature, the optimal RF model in this paper is more accurate. The feasibility of applying the RF algorithm to predict chemical toxicity pIGC₅₀ towards Tetrahymena pyriformis has been verified. Collapse Key Words Molecular descriptor QSAR QSTR Random forest Tetrahymena pyriformis Toxicity Collapse MESH Headings Collapse Grants Collapse
11	Development of binary classification models for assessment of drug-induced liver injury in humans using a large set of FDA-approved drugs. J Pharmacol Toxicol Methods 2022;116:107185. [PMID: 35623583 DOI: 10.1016/j.vascn.2022.107185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 04/13/2022] [Accepted: 05/18/2022] [Indexed: 02/05/2023] Abstract Drug-induced liver injury (DILI) has been identified as one of the major causes for drugs withdrawn from the market, and even termination during the late stages of development. Therefore, it is imperative to evaluate the DILI potential of lead compounds during the research and development process. Although various computational models have been developed to predict DILI, most of which applied the DILI data were extracted from preclinical sources. In this investigation, the in silico prediction models for DILI were constructed based on 1140 FDA-approved drugs by using naïve Bayes classifier approach. The genetic algorithm method was applied for the molecular descriptors selection. Among these established prediction models, the NB-11 model based on eight molecular descriptors combined with ECFP_18 showed the best prediction performance for DILI, which gave 91.7% overall prediction accuracy for the training set, and 68.9% concordance for the external test set. Therefore, the established NB-11 prediction model can be used as a reliable virtual screening tool to predict DILI adverse effect in the early stages of drug design. In addition, some new structural alters for DILI were identified, which could be used for structural optimization in the future drug design by medicinal chemists. Collapse Key Words Drug induced liver injury In silico prediction Molecular descriptor Naïve Bayes classifier Structural alerts Collapse MESH Headings Collapse Grants Collapse
12	Chemical characterization of anemia-inducing aniline-related substances and their application to the construction of a decision tree-based anemia prediction model. Food Chem Toxicol 2021;157:112548. [PMID: 34509582 DOI: 10.1016/j.fct.2021.112548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 08/11/2021] [Accepted: 09/07/2021] [Indexed: 10/20/2022] Abstract Anemia is a well-observed toxicity of chemical substances, and aniline is a typical anemia-inducing substance. However, it remains unclear whether all aniline-like substances with various substituents could induce anemia. We thus investigated the physicochemical characteristics of anemia-inducing substances by decision tree analyses. Training and validation substances were selected from a publicly available database of rat repeated-dose toxicity studies, and discrimination models were constructed by decision tree and bootstrapping methods with molecular descriptors as explanatory variables. To improve the accuracy of discrimination, we individually evaluated the explanatory variables to modify them, established "prerules" that were applied before subjecting a substance to a decision tree by considering metabolism, such as azo reduction and N-dealkylation, and introduced the idea of "partly negative" evaluation for substances having multiple aniline-like substructures. The final model obtained showed 79.2% and 77.5% accuracy for the training and validation dataset, respectively. In addition, we identified some chemical properties that reduce the anemia inducibility of aniline-like substances, including the addition of a sulfonate or carboxy functional group and/or a bulky multiring structure to anilines. In conclusion, the present findings will provide a novel insight into the mechanistic understanding of chemically induced anemia and help to develop a prediction system. Collapse Key Words Anemia Aniline Chemical toxicity Decision tree Molecular descriptor Structure-activity relationship Collapse MESH Headings Collapse Grants Collapse
13	A comprehensive study on retention of selected model substances in β-cyclodextrin-modified high performance liquid chromatography. J Chromatogr A 2021;1645:462120. [PMID: 33839575 DOI: 10.1016/j.chroma.2021.462120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/21/2021] [Accepted: 03/24/2021] [Indexed: 10/21/2022] Abstract The quantitative structure-retention relationship (QSRR) models are not only employed in retention behaviour prediction, but also in an in-depth understanding of complex chromatographic systems. The goal of the present research is to enable the comprehensive understanding of retention underlying the separation in β-cyclodextrin (CD) modified reversed-phase high performance liquid chromatography (RP-HPLC) systems, through the development of mixed QSRR models. Moreover, the amount of β-CD adsorbed on the stationary phase surface (β-CD_A) is added as the model's input in order to evaluate its contribution to both model performances and retention. Nuclear magnetic resonance (NMR) experiments were conducted to confirm the predicted inclusion complex structures and support the application of in silico tools. The most significant descriptors revealed that retention is governed by the steric factors 7.5 Å distant from the geometrical centre of a molecule, 3D arrangement of atoms determining the molecular size and shape, lipophilicity indicated by topological distances, as well as the unbound system's energy, related to the inclusion complex formation. In addition, a notable effect of the pH of the aqueous phase on the retention of ionizable analytes was shown. In the case of pH of the aqueous phase and β-CD_A the change in retention behaviour of the studied analytes was observed only at the highest β-CD_A value (5.17 μM/m²), but it was not related to the ionization state of analytes. When the analytes did not change the ionization form across the investigated studied pH range, and the acetonitrile content in the mobile phase was 25% (v/v), the retention factor had low values regardless of the β-CD_A; under these circumstances the retention is probably acetonitrile driven. Collapse Key Words Inclusion complex Molecular descriptor NMR study Quantitative structure-retention relationship RP-HPLC β-cyclodextrin Collapse MESH Headings Collapse Grants Collapse
14	On neighborhood Zagreb index of product graphs. J Mol Struct 2021;1223:129210. [PMID: 32921807 PMCID: PMC7474663 DOI: 10.1016/j.molstruc.2020.129210] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 08/17/2020] [Accepted: 09/04/2020] [Indexed: 11/27/2022] Abstract The properties and activities of chemicals are strongly related to their molecular structures. Topological indices defined on these molecular structures are capable to predict those properties and activities. In this article, a new topological index named as neighborhood Zagreb index (M_N ) is presented. Here the chemical importance of the M_N index is investigated and it is shown that the newly introduced index is useful in predicting physico-chemical properties with high accuracy compared to some well-established and often used indices. The isomer-discrimination ability of M_N is also examined. To demonstrate how the computational formula of the novel index for chemical compounds is simple and convenient, the chemical structures of favipiravir and hydroxychloroquine are used. In addition, some explicit results for this index of different product graphs such as Cartesian, tensor and wreath product are derived. Some of these results are applied to obtain the M_N index of some special structures. Collapse Key Words Cartesian product Molecular descriptor Molecular graph Neighborhood Zagreb index Tensor product Wreath product Collapse MESH Headings Collapse Grants Collapse
15	Machine learning guided prediction of liquid chromatography-mass spectrometry ionization efficiency for genotoxic impurities in pharmaceutical products. J Pharm Biomed Anal 2020;194:113781. [PMID: 33280999 DOI: 10.1016/j.jpba.2020.113781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 11/14/2020] [Accepted: 11/16/2020] [Indexed: 10/23/2022] Abstract The limitation and control of genotoxic impurities (GTIs) has continued to receive attention from pharmaceutical companies and authorities for several decades. Because GTIs have the ability to damage deoxyribonucleic acid (DNA) and the potential to cause cancer, low-level quantitation is required to protect patients. A quick and easy method of determining the liquid chromatography-mass spectrometry (LC/MS) conditions for high-sensitivity analysis of GTIs may prospectively accelerate pharmaceutical development. In this study, a quantitative structure-property relationship (QSPR) model was developed for predicting the ionization efficiency of compounds using liquid-chromatography-mass spectrometry (LC/MS) parameters and molecular descriptors. Before implementing the QSPR prediction model, linear regression analysis was performed to model the relationship between the ionization efficiency and the LC/MS parameters for each compound. Comparison of the predicted peak areas with the experimentally observed peak areas showed good agreement based on the coefficient of determination (R² > 0.96). The machine learning-based QSPR approach begins with computation of the molecular descriptors expressing the physicochemical properties of a compound, followed by a genetic algorithm-based feature selection. Linear and nonlinear regression were performed, and support vector machine (SVM) was selected as the best machine learning algorithm for the prediction. The SVM algorithm was developed and optimized using 1031 experimental data points for nine compounds, including well-known GTIs. Validation of the model by comparison of the predicted and observed relative ionization efficiencies (RIE) showed a high coefficient of determination (R² = 0.96) and low root mean squared error value (RMSE = 0.118). Finally, this established prediction model was applied to hydrophilic interaction liquid chromatography coupled with MS for a new compound in new mobile phase compositions and new MS parameter settings. The RMSE of the predicted versus observed RIE was 0.203. This prediction accuracy was sufficient to determine the starting point of the LC/MS method development. The methodology demonstrated in this study can be used to determine the LC/MS conditions for high sensitivity analysis of GTIs. Collapse Key Words Genetic algorithm Genotoxic impurities Ionization efficiency Liquid chromatography-mass spectrometry Molecular descriptor Support vector machine Collapse MESH Headings Collapse Grants Collapse
16	Developing novel computational prediction models for assessing chemical-induced neurotoxicity using naïve Bayes classifier technique. Food Chem Toxicol 2020;143:111513. [PMID: 32621845 DOI: 10.1016/j.fct.2020.111513] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 06/02/2020] [Accepted: 06/04/2020] [Indexed: 02/08/2023] Abstract Development of reliable and efficient alternative in vivo methods for evaluation of the chemicals with potential neurotoxicity is an urgent need in the early stages of drug design. In this investigation, the computational prediction models for drug-induced neurotoxicity were developed by using the classical naïve Bayes classifier. Eight molecular properties closely relevant to neurotoxicity were selected. Then, 110 classification models were developed with using the eight important molecular descriptors and 10 types of fingerprints with 11 different maximum diameters. Among these 110 prediction models, the prediction model (NB-03) based on eight molecular descriptors combined with ECFP_10 fingerprints showed the best prediction performance, which gave 90.5% overall prediction accuracy for the training set and 82.1% concordance for the external test set. In addition, compared to naïve Bayes classifier, the recursive partitioning classifier displayed worse predictive performance for neurotoxicity. Therefore, the established NB-03 prediction model can be used as a reliable virtual screening tool to predict neurotoxicity in the early stages of drug design. Moreover, some structure alerts for characterizing neurotoxicity were identified in this research, which could give an important guidance for the chemists in structural modification and optimization to reduce the chemicals with potential neurotoxicity. Collapse Key Words In silico prediction Molecular descriptor Naïve Bayes classifier Neurotoxicity Structural alerts Collapse MESH Headings Collapse Grants Collapse
17	Prediction of hERG potassium channel blockage using ensemble learning methods and molecular fingerprints. Toxicol Lett 2020;332:88-96. [PMID: 32629073 DOI: 10.1016/j.toxlet.2020.07.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 06/16/2020] [Accepted: 07/02/2020] [Indexed: 11/30/2022] Abstract The human ether-a-go-go-related gene (hERG) encodes a tetrameric potassium channel called Kv11.1. This channel can be blocked by certain drugs, which leads to long QT syndrome, causing cardiotoxicity. This is a significant problem during drug development. Using computer models to predict compound cardiotoxicity during the early stages of drug design will help to solve this problem. In this study, we used a dataset of 1865 compounds exhibiting known hERG inhibitory activities as a training set. Thirty cardiotoxicity classification models were established using three machine learning algorithms based on molecular fingerprints and molecular descriptors. Through using these models as the base classifier, a new cardiotoxicity classification model with better predictive performance was developed using ensemble learning method. The accuracy of the best base classifier, which was generated using the XGBoost method with molecular descriptors, was 84.8 %, and the area under the receiver-operating characteristic curve (AUC) was 0.876 in the five fold cross-validation. However, all of the ensemble models that we developed had higher predictive performance than the base classifiers in the five fold cross-validation. The best predictive performance was achieved by the Ensemble-Top7 model, with accuracy of 84.9 % and AUC of 0.887. We also tested the ensemble model using external validation data and achieved accuracy of 85.0 % and AUC of 0.786. Furthermore, we identified several hERG-related substructures, which provide valuable information for designing drug candidates. Collapse Key Words Ensemble model Machine learning Molecular descriptor Molecular fingerprint hERG Collapse MESH Headings Collapse Grants Collapse
18	Inclusion of molecular descriptors in predictive models improves pesticide soil-air partitioning estimates. CHEMOSPHERE 2020;248:126031. [PMID: 32032877 DOI: 10.1016/j.chemosphere.2020.126031] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 01/23/2020] [Accepted: 01/24/2020] [Indexed: 06/10/2023] Abstract The soil-air exchange of pesticides is one potential fate and exposure pathways, and this process is generally thought to be governed by soil properties and environmental conditions. The experimental determination of soil-air partitioning coefficient (Ksa) is laborious and costly and typically, Ksa's are predicted from a semiempirical or a simple linear regression approach with soil and environmental variables. Here we developed a model that combined linear regression of soil, environmental and molecular parameters with the quantitative structural-property relationship (QSPR) to predict Ksa for pesticides. The values of theoretical descriptors of pesticides were calculated and the best descriptors selected using the Boruta Algorithm. Seventy-six experimental logKsa values for 17 pesticides were used in model development. Multiple linear regression (MLR) with a soil (organic carbon fraction), physicochemical (octanol-air partitioning coefficient), environmental (temperature and humidity) and molecular descriptor (Gmin, a 2D E-state molecular parameter), called as MLR-QSPR combined model exhibited better predictability (adj. r² = 0.95) of logKsa compared to MLR (adj. r² = 0.87) or QSPR (adj. r² = 0.82) itself. MLR-QSPR also showed the best performance in five-fold cross-validation (adj. r² = 0.94) and test set verification (adj. r² = 0.96). The developed model was validated and characterized by the applicability domain. Results showed that the proposed MLR-QSPR approach is highly predictive and statistically robust with >95% of predictions within ±0.5 log unit of the measured Ksa. Therefore, this approach can be used in estimating the soil-air partitioning of pesticides to better predict it's fate and transport in environments. Collapse Key Words MLR-QSPR Molecular descriptor Pesticide Soil-air partitioning Collapse MESH Headings Air Pollutants/chemistry Algorithms Humidity Linear Models Octanols/chemistry Pesticides/chemistry Quantitative Structure-Activity Relationship Reproducibility of Results Soil/chemistry Soil Pollutants/chemistry Temperature Collapse Grants Collapse
19	Prediction of chemical toxicity to Tetrahymena pyriformis with four-descriptor models. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2020;190:110146. [PMID: 31923753 DOI: 10.1016/j.ecoenv.2019.110146] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 12/27/2019] [Accepted: 12/28/2019] [Indexed: 06/10/2023] Abstract A quantitative structure-toxicity relationship (QSTR) model based on four descriptors was successfully developed for 1163 chemical toxicants against Tetrahymena pyriformis by applying general regression neural network (GRNN). The training set consisting of 600 organic compounds was used to train GRNN models that were evaluated with the test set of 563 compounds. For the optimal GRNN model, the training set possesses the coefficient of determination R² of 0.86 and root mean square (rms) error of 0.41, and the test set has R² of 0.80 and rms of 0.41. Investigated results indicate that the optimal GRNN model is accurate, although the GRNN model has only four descriptor and more samples in the test set. Collapse Key Words General regression neural network Molecular descriptor Structure–property relationship Tetrahymena pyriformis Toxicity Collapse MESH Headings Neural Networks, Computer Organic Chemicals/toxicity Quantitative Structure-Activity Relationship Tetrahymena pyriformis/drug effects Tetrahymena pyriformis/physiology Toxicity Tests Collapse Grants Collapse
20	Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development. J Cheminform 2020;12:6. [PMID: 33431009 PMCID: PMC6977316 DOI: 10.1186/s13321-020-0410-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/11/2020] [Indexed: 12/21/2022] Open Abstract Computer-aided research on the relationship between molecular structures of natural compounds (NC) and their biological activities have been carried out extensively because the molecular structures of new drug candidates are usually analogous to or derived from the molecular structures of NC. In order to express the relationship physically realistically using a computer, it is essential to have a molecular descriptor set that can adequately represent the characteristics of the molecular structures belonging to the NC’s chemical space. Although several topological descriptors have been developed to describe the physical, chemical, and biological properties of organic molecules, especially synthetic compounds, and have been widely used for drug discovery researches, these descriptors have limitations in expressing NC-specific molecular structures. To overcome this, we developed a novel molecular fingerprint, called Natural Compound Molecular Fingerprints (NC-MFP), for explaining NC structures related to biological activities and for applying the same for the natural product (NP)-based drug development. NC-MFP was developed to reflect the structural characteristics of NCs and the commonly used NP classification system. NC-MFP is a scaffold-based molecular fingerprint method comprising scaffolds, scaffold-fragment connection points (SFCP), and fragments. The scaffolds of the NC-MFP have a hierarchical structure. In this study, we introduce 16 structural classes of NPs in the Dictionary of Natural Product database (DNP), and the hierarchical scaffolds of each class were calculated using the Bemis and Murko (BM) method. The scaffold library in NC-MFP comprises 676 scaffolds. To compare how well the NC-MFP represents the structural features of NCs compared to the molecular fingerprints that have been widely used for organic molecular representation, two kinds of binary classification tasks were performed. Task I is a binary classification of the NCs in commercially available library DB into a NC or synthetic compound. Task II is classifying whether NCs with inhibitory activity in seven biological target proteins are active or inactive. Two tasks were developed with some molecular fingerprints, including NC-MFP, using the 1-nearest neighbor (1-NN) method. The performance of task I showed that NC-MFP is a practical molecular fingerprint to classify NC structures from the data set compared with other molecular fingerprints. Performance of task II with NC-MFP outperformed compared with other molecular fingerprints, suggesting that the NC-MFP is useful to explain NC structures related to biological activities. In conclusion, NC-MFP is a robust molecular fingerprint in classifying NC structures and explaining the biological activities of NC structures. Therefore, we suggest NC-MFP as a potent molecular descriptor of the virtual screening of NC for natural product-based drug development. Collapse Key Words Dictionary of Natural Product database (DNP) Molecular descriptor Natural compound (NC) Natural product (NP) Natural product-based drug development Virtual screening Collapse MESH Headings Collapse Grants Collapse
21	When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler? Mol Divers 2019;24:913-932. [PMID: 31659696 DOI: 10.1007/s11030-019-10002-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 10/09/2019] [Indexed: 01/29/2023] Abstract In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs ( http://tomocomd.com/md-lovis ), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon's entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs. Collapse Key Words Aggregation operator Atom weight vector MD-LOVIs software Molecular descriptor No free lunch theorem PCA QSP(A)R Shannon entropy Collapse MESH Headings Collapse Grants Collapse
22	Estimating Some General Molecular Descriptors of Saturated Hydrocarbons. Mol Inform 2019;38:e1900007. [PMID: 31589808 DOI: 10.1002/minf.201900007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 03/13/2019] [Indexed: 11/06/2022] Abstract Three general molecular descriptors, namely the general sum-connectivity index, general Platt index and ordinary generalized geometric-arithmetic index, are studied here. Best possible bounds for the aforementioned descriptors of arbitrary saturated hydrocarbons are derived under certain constraints. These bounds are expressed in terms of number of carbon atoms and number of carbon-carbon bonds of the considered hydrocarbons. Collapse Key Words Molecular descriptor Topological index general Platt index general sum-connectivity index ordinary generalized geometric-arithmetic index Collapse MESH Headings Collapse Grants Collapse
23	In silico prediction of drug-induced developmental toxicity by using machine learning approaches. Mol Divers 2019;24:1281-1290. [PMID: 31486961 DOI: 10.1007/s11030-019-09991-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 08/28/2019] [Indexed: 02/05/2023] Abstract Some drugs and xenobiotics have the potential to disturb homeostasis, normal growth, differentiation, development or behavior during prenatal development or postnatally until puberty. Assessment of the developmental toxicity is one of the important safety considerations incorporated by international regulatory agencies. In this investigation, seven machine learning methods, including naïve Bayes, support vector machine, recursive partitioning, k-nearest neighbor, C4.5 decision tree, random forest and Adaboost, were used to build binary classification models for developmental toxicity. Among these models, the naïve Bayes classifier represented the best predictive performance and stability, which gave 91.11% overall prediction accuracy, 91.50% balanced accuracy and 0.818 MCC for the training set, and generated 83.93% concordance, 81.85% balanced accuracy and 0.627 MCC for the test set. The application domains were analyzed, and only one chemical in the test set was identified as outside the application domain. In addition, 10 important molecular descriptors related to developmental toxicity were selected by the genetic algorithm, which may contribute to explanation of the mechanisms of developmental toxicants. The best naïve Bayes classification model should be employed as alternative method for qualitative prediction of chemical-induced developmental toxicity in early stages of drug development. Collapse Key Words Developmental toxicity Genetic algorithm In silico prediction Machine learning Molecular descriptor Collapse MESH Headings Collapse Grants Collapse
24	BCL::Mol2D-a robust atom environment descriptor for QSAR modeling and lead optimization. J Comput Aided Mol Des 2019;33:477-486. [PMID: 30955193 PMCID: PMC6824857 DOI: 10.1007/s10822-019-00199-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 03/18/2019] [Indexed: 12/28/2022] Abstract Comparing fragment based molecular fingerprints of drug-like molecules is one of the most robust and frequently used approaches in computer-assisted drug discovery. Molprint2D, a popular atom environment (AE) descriptor, yielded the best enrichment of active compounds across a diverse set of targets in a recent large-scale study. We present here BCL::Mol2D descriptors that outperformed Molprint2D on nine PubChem datasets spanning a wide range of protein classes. Because BCL::Mol2D records the number of AEs from a universal AE library, a novel aspect of BCL::Mol2D over the Molprint2D is its reversibility. This property enables decomposition of prediction from machine learning models to particular molecular substructures. Artificial neural networks with dropout, when trained on BCL::Mol2D descriptors outperform those trained on Molprint2D descriptors by up to 26% in logAUC metric. When combined with the Reduced Short Range descriptor set, our previously published set of descriptors optimized for QSARs, BCL::Mol2D yields a modest improvement. Finally, we demonstrate how the reversibility of BCL::Mol2D enables visualization of a 'pharmacophore map' that could guide lead optimization for serine/threonine kinase 33 inhibitors. Collapse Key Words Cheminformatics Molecular descriptor Pharmacophore mapping QSAR Sensitivity analysis Collapse MESH Headings Algorithms Drug Design Drug Discovery/methods Humans Ligands Quantitative Structure-Activity Relationship Small Molecule Libraries/chemistry Small Molecule Libraries/pharmacology Collapse Grants R01 GM099842 NIGMS NIH HHS R01 DK097376 NIDDK NIH HHS R01 DA046138 NIDA NIH HHS R01 MH090192 NIMH NIH HHS S10 OD023680 NIH HHS Collapse
25	Alkanes with the First Three Maximal/Minimal Modified First Zagreb Connection Indices. Mol Inform 2019;38:e1800116. [PMID: 30614630 DOI: 10.1002/minf.201800116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/01/2018] [Indexed: 11/11/2022] Abstract The modified first Zagreb connection index ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo></mml:mo></mml:msubsup> </mml:mrow> </mml:math> ) is a molecular descriptor, which was initially appeared within a formula of the total electron energy of alternant hydrocarbons in 1972. In a recent paper [A. Ali, N. Trinajstić, A novel/old modification of the first Zagreb index, Mol. Inform. 37 (2018) 1800008], it was observed that the molecular descriptor <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo></mml:mo></mml:msubsup> </mml:mrow> </mml:math> correlates well with the entropy and acentric factor of octane isomers. In this article, the molecules with the first three maximal <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo></mml:mo></mml:msubsup> </mml:mrow> </mml:math> values as well as the first three minimal <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>Z</mml:mi> <mml:msubsup><mml:mi>C</mml:mi> <mml:mn>1</mml:mn> <mml:mo></mml:mo></mml:msubsup> </mml:mrow> </mml:math> values are determined from the family of all alkanes with n carbon atoms, for <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>n</mml:mi> <mml:mo>≥</mml:mo> <mml:mn>6</mml:mn></mml:mrow> </mml:math> . This extends the main results of the aforementioned paper. Collapse Key Words Alkanes Extremal values First Zagreb index Modified first Zagreb connection index Molecular descriptor Collapse MESH Headings Collapse Grants Collapse
26	Mordred: a molecular descriptor calculator. J Cheminform 2018;10:4. [PMID: 29411163 PMCID: PMC5801138 DOI: 10.1186/s13321-018-0258-y] [Citation(s) in RCA: 425] [Impact Index Per Article: 70.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 01/23/2018] [Indexed: 01/05/2023] Open Abstract Molecular descriptors are widely employed to present molecular characteristics in cheminformatics. Various molecular-descriptor-calculation software programs have been developed. However, users of those programs must contend with several issues, including software bugs, insufficient update frequencies, and software licensing constraints. To address these issues, we propose Mordred, a developed descriptor-calculation software application that can calculate more than 1800 two- and three-dimensional descriptors. It is freely available via GitHub. Mordred can be easily installed and used in the command line interface, as a web application, or as a high-flexibility Python package on all major platforms (Windows, Linux, and macOS). Performance benchmark results show that Mordred is at least twice as fast as the well-known PaDEL-Descriptor and it can calculate descriptors for large molecules, which cannot be accomplished by other software. Owing to its good performance, convenience, number of descriptors, and a lax licensing constraint, Mordred is a promising choice of molecular descriptor calculation software that can be utilized for cheminformatics studies, such as those on quantitative structure–property relationships. Collapse Key Words Calculation software Cheminformatics Molecular descriptor Python QSPR Collapse MESH Headings Collapse Grants Collapse
27	Clustering pesticides according to their molecular properties, fate, and effects by considering additional ecotoxicological parameters in the TyPol method. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018;25:4728-4738. [PMID: 29197062 DOI: 10.1007/s11356-017-0758-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 11/14/2017] [Indexed: 05/05/2023] Abstract Understanding the fate and ecotoxicological effects of pesticides largely depends on their molecular properties. We recently developed "TyPol" (Typology of Pollutants), a classification method of organic compounds based on statistical analyses. It combines several environmental (sorption coefficient, degradation half-life) and one ecotoxicological (bioconcentration factor) parameters, to structural molecular descriptors (number of atoms in the molecule, molecular surface, dipole moment, energy of orbitals, etc.). The present study attempts to extend TyPol to the ecotoxicological effects of pesticides on non-target organisms, based on data analysis from available literature and databases. It revealed that relevant ecotoxicological endpoints for terrestrial organisms (e.g., soil microorganisms, invertebrates) that support a range of ecosystemic services are lacking as compared to aquatic organisms. The availability of ecotoxicological parameters was also lower for chronic than for acute ecotoxicity endpoints. Consequently, seven parameters were included for acute (EC50, LC50) and chronic (NOEC) ecotoxicological effects for one terrestrial (Eisenia sp.) and three aquatic (Daphnia sp., algae, Lemna sp.) organisms. In this new configuration, we used TyPol to classify 50 pesticides into different clusters that gather molecules with similar environmental behaviors and ecotoxicological effects. The classification results evidenced relationships between molecular descriptors, environmental parameters, and the added ecotoxicological endpoints. This proof-of-concept study also showed that TyPol in silico classification can successfully address new scientific questions and be expanded with other parameters of interest. Collapse Key Words Clustering Ecotoxicity Fate Molecular descriptor Pesticides Collapse MESH Headings Animals Chlorophyta/drug effects Cluster Analysis Daphnia/drug effects Ecosystem Ecotoxicology/methods Environmental Monitoring/methods Environmental Pollutants/chemistry Environmental Pollutants/classification Environmental Pollutants/toxicity Lethal Dose 50 Oligochaeta/drug effects Pesticides/chemistry Pesticides/classification Pesticides/toxicity Toxicity Tests Collapse Grants Collapse
28	Stem Cell-Based Methods to Predict Developmental Chemical Toxicity. Methods Mol Biol 2018;1800:475-483. [PMID: 29934906 DOI: 10.1007/978-1-4939-7899-1_21] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Abstract Human pluripotent stem cells such as embryonic stem (ES) and induced pluripotent stem (iPS) cells, combined with sophisticated bioinformatics methods, are powerful tools to predict developmental chemical toxicity. Because cell differentiation is not necessary, these cells can facilitate cost-effective assays, thus providing a practical system for the toxicity assessment of various types of chemicals. Here we describe how to apply machine learning techniques to different types of data, such as qRT-PCRs, gene networks, and molecular descriptors, for toxic chemicals, as well as how to integrate these data to predict toxicity categories. Interestingly, our results using 20 chemical data for neurotoxins (NTs), genotoxic carcinogens (GCs), and nongenotoxic carcinogens (NGCs) demonstrated that the highest and most robust prediction performance was obtained by using gene networks as the input. We also observed that qRT-PCR and molecular descriptors tend to contribute to specific toxicity categories. Collapse Key Words Chemical toxicity prediction Developmental effect Embryonic stem cell Gene network Molecular descriptor Multi-kernel support vector machine Collapse MESH Headings Collapse Grants Collapse
29	Prediction of aquatic toxicity of benzene derivatives using molecular descriptor from atomic weighted vectors. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2017;56:314-321. [PMID: 29091819 DOI: 10.1016/j.etap.2017.10.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 10/09/2017] [Accepted: 10/11/2017] [Indexed: 06/07/2023] Abstract Several descriptors from atom weighted vectors are used in the prediction of aquatic toxicity of set of organic compounds of 392 benzene derivatives to the protozoo ciliate Tetrahymena pyriformis (log(IGC50)^-1). These descriptors are calculated using the MD-LOVIs software and various Aggregation Operators are examined with the aim comparing their performances in predicting aquatic toxicity. Variability analysis is used to quantify the information content of these molecular descriptors by means of an information theory-based algorithm. Multiple Linear Regression with Genetic Algorithms is used to obtain models of the structure-toxicity relationships; the best model shows values of Q²=0.830 and R²=0.837 using six variables. Our models compare favorably with other previously published models that use the same data set. The obtained results suggest that these descriptors provide an effective alternative for determining aquatic toxicity of benzene derivatives. Collapse Key Words Aggregation operator Aquatic toxicity Atom weighted vector Molecular descriptor Multiple linear regression Variability analysis Collapse MESH Headings Algorithms Benzene Derivatives/toxicity Models, Molecular Software Tetrahymena pyriformis/drug effects Water Pollutants, Chemical/toxicity Collapse Grants Collapse
30	Machine learning-based models to predict modes of toxic action of phenols to Tetrahymena pyriformis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017;28:735-747. [PMID: 29022372 DOI: 10.1080/1062936x.2017.1376705] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 09/01/2017] [Indexed: 06/07/2023] Abstract The phenols are structurally heterogeneous pollutants and they present a variety of modes of toxic action (MOA), including polar narcotics, weak acid respiratory uncouplers, pro-electrophiles, and soft electrophiles. Because it is often difficult to determine correctly the mechanism of action of a compound, quantitative structure-activity relationship (QSAR) methods, which have proved their interest in toxicity prediction, can be used. In this work, several QSAR models for the prediction of MOA of 221 phenols to the ciliated protozoan Tetrahymena pyriformis, using Chemistry Development Kit descriptors, are reported. Four machine learning techniques (ML), k-nearest neighbours, support vector machine, classification trees, and artificial neural networks, have been used to develop several models with higher accuracies and predictive capabilities for distinguishing between four MOAs. They showed global accuracy values between 95.9% and 97.7% and area under Receiver Operator Curve values between 0.978 and 0.998; additionally, false alarm rate values were below 8.2% for training set. In order to validate our models, cross-validation (10-folds-out) and external test-set were performed with good behaviour in all cases. These models, obtained with ML techniques, were compared with others previously reported by other researchers, and the improvement was significant. Collapse Key Words Molecular descriptor QSAR machine learning technique mode of toxic action phenol derivative pollutant Collapse MESH Headings Antiprotozoal Agents/pharmacology Machine Learning Neural Networks, Computer Phenols/pharmacology Quantitative Structure-Activity Relationship Tetrahymena pyriformis/drug effects Collapse Grants Collapse
31	In vivo toxicity of nitroaromatics: A comprehensive quantitative structure-activity relationship study. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2017;36:2227-2233. [PMID: 28169452 DOI: 10.1002/etc.3761] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 11/01/2016] [Accepted: 02/06/2017] [Indexed: 06/06/2023] Abstract The toxicity data of 90 nitroaromatic compounds related to their 50% lethal dose concentration for rats (LD50) were analyzed to develop quantitative structure-activity relationship (QSAR) models. Quantum-chemically calculated descriptors together with molecular descriptors generated by DRAGON, PaDEL, and HiT-QSAR software were utilized to build QSAR models. Quality and validity of the models were determined by internal and external validation techniques. The results show that the toxicity of nitroaromatic compounds depends on various factors, such as the number of nitro-groups, the topological state, and the presence of certain structural fragments. The developed models based on the largest (to date) dataset of nitroaromatics in vivo toxicity showed a good predictive ability. The results provide important input that could be applied in a preliminary assessment of nitroaromatic compounds' toxicity to mammals. Environ Toxicol Chem 2017;36:2227-2233. © 2017 SETAC. Collapse Key Words Molecular descriptor Nitroaromatic Quantitative structure-activity relationships Toxic effects Toxicity mechanisms Collapse MESH Headings Animals Environmental Pollutants/chemistry Environmental Pollutants/toxicity Lethal Dose 50 Models, Theoretical Nitrobenzenes/chemistry Nitrobenzenes/toxicity Predictive Value of Tests Quantitative Structure-Activity Relationship Rats Software Collapse Grants Collapse
32	Deciphering molecular properties and docking studies of hepatitis C and non-hepatitis C antiviral inhibitors - A computational approach. Life Sci 2017;174:8-14. [PMID: 28259653 DOI: 10.1016/j.lfs.2017.02.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Revised: 02/24/2017] [Accepted: 02/28/2017] [Indexed: 11/19/2022] Abstract BACKGROUND Hepatitis C is an infectious liver disease with high mortality rate which is caused by Hepatitis C virus. Several treatment methods have been applied to combat this deadly virus including interferons, vaccine and direct acting antivirals (DAAs). However, the later shows promising effects in HCV treatment with lower adverse effect. Specifically, the DAAs target the non-structural proteins (NS3 and NS5B). PURPOSE The objective of the present study is to hypothesize an alternative antiviral inhibitor for HCV from the available other antivirals. METHODS Computation of 2D molecular descriptors for the selected antiviral inhibitors followed by clustering the descriptor features. The closely clustered compounds were subjected to the interaction studies against the HCV target protein to validate the cluster result. RESULTS AND DISCUSSION The clustering result showed that indinavir (HIV inhibitor) and AT130 (HBV inhibitor) molecule are close to the HCV inhibitor. The indinavir complexed with NS3 protein shows -5.33kcal/mol and AT-130 complexed with NS5B protein possess the binding energy of -8.87kcal/mol. The docking interaction study indicated a better binding affinity than other viral inhibitors. CONCLUSION From the descriptor based feature similarity analysis and the interaction study, it can be concluded that indinavir and AT-130 could be a potential alternative agent for HCV treatment. Collapse Key Words Antiviral inhibitors Binding energy Docking studies Hepatitis C virus Hierarchical clustering Molecular descriptor Collapse MESH Headings Antiviral Agents/chemistry Antiviral Agents/metabolism Hepacivirus/drug effects Hepatitis C/drug therapy Hepatitis C/virology Humans Models, Molecular Molecular Docking Simulation Protease Inhibitors/chemistry Protease Inhibitors/metabolism Viral Proteins/chemistry Viral Proteins/metabolism Collapse Grants Collapse
33	How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity. J Cheminform 2016;8:4. [PMID: 26816532 PMCID: PMC4727313 DOI: 10.1186/s13321-016-0114-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 01/08/2016] [Indexed: 11/24/2022] Open Abstract Background Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies. Results Representing dendrograms in graph theoretical terms allowed us to introduce four measures of cluster frequency in a canonical way, and use them to calculate cluster frequencies over the set of all possible dendrograms, taking all ties in proximity into account. A toy example of well separated clusters was used, as well as a set of 1666 molecular descriptors calculated for a group of molecules having hepatotoxic activity to show how our functions may be used for studying the effect of ties in HCA analysis. Such functions were not restricted to the tie case; the possibility of using them to derive cluster stability measurements on arbitrary sets of dendrograms having the same leaves is discussed, e.g. dendrograms from variations of HCA parameters. It was found that ties occurred frequently, some yielding tens of thousands of dendrograms, even for small data sets. Conclusions Our approach was able to detect trends in clustering patterns by offering a simple way of measuring their frequency, which is often very low. This would imply, that inferences and models based on descriptor classifications (e.g. QSAR) are likely to be biased, thereby requiring an assessment of their reliability. Moreover, any classification of molecular descriptors is likely to be far from unique. Our results highlight the need for evaluating the effect of ties on clustering patterns before classification results can be used accurately.Graphical abstract
34	Effect of imidazolium-based ionic liquids on bacterial growth inhibition investigated via experimental and QSAR modelling studies. JOURNAL OF HAZARDOUS MATERIALS 2015;297:198-206. [PMID: 25965417 DOI: 10.1016/j.jhazmat.2015.04.082] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 04/28/2015] [Accepted: 04/29/2015] [Indexed: 06/04/2023] Abstract Tuning the characteristics of solvents to fit industrial requirements has currently become a major interest in both academic and industrial communities, notably in the field of room temperature ionic liquids (RTILs), which are considered one of the most promising green alternatives to molecular organic solvents. In this work, several sets of imidazolium-based ionic liquids were synthesized, and their toxicities were assessed towards four human pathogens bacteria to investigate how tunability can affect this characteristic. Additionally, the toxicity of particular RTILs bearing an amino acid anion was introduced in this work. EC50 values (50% effective concentration) were established, and significant variations were observed; although all studied ILs displayed an imidazolium moiety, the toxicity values were found to vary between 0.05 mM for the most toxic to 85.57 mM for the least toxic. Linear quantitative structure activity relationship models were then developed using the charge density distribution (σ-profiles) as molecular descriptors, which can yield accuracies as high as 95%. Collapse Key Words Antimicrobial activity (EC50) Ionic liquids Molecular descriptor QSAR Synthesis Collapse MESH Headings Aeromonas hydrophila/drug effects Aeromonas hydrophila/growth & development Anions Anti-Infective Agents/chemistry Cations Escherichia coli/drug effects Escherichia coli/growth & development Gentamicins/chemistry Imidazoles/chemistry Ionic Liquids/chemistry Linear Models Listeria monocytogenes/drug effects Listeria monocytogenes/growth & development Microbial Sensitivity Tests Quantitative Structure-Activity Relationship Regression Analysis Reproducibility of Results Solvents/chemistry Staphylococcus aureus/drug effects Staphylococcus aureus/growth & development Collapse Grants Collapse
35	QSAR prediction of HIV-1 protease inhibitory activities using docking derived molecular descriptors. J Theor Biol 2015;369:13-22. [PMID: 25600056 DOI: 10.1016/j.jtbi.2015.01.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 01/10/2015] [Accepted: 01/12/2015] [Indexed: 01/30/2023] Abstract In this study, application of a new hybrid docking-quantitative structure activity relationship (QSAR) methodology to model and predict the HIV-1 protease inhibitory activities of a series of newly synthesized chemicals is reported. This hybrid docking-QSAR approach can provide valuable information about the most important chemical and structural features of the ligands that affect their inhibitory activities. Docking studies were used to find the actual conformations of chemicals in active site of HIV-1 protease. Then the molecular descriptors were calculated from these conformations. Multiple linear regression (MLR) and least square support vector machine (LS-SVM) were used as QSAR models, respectively. The obtained results reveal that statistical parameters of the LS-SVM model are better than the MLR model, which indicate that there are some non-linear relations between selected molecular descriptors and anti-HIV activities of interested chemicals. The correlation coefficient (R), root mean square error (RMSE) and average absolute error (AAE) for LS-SVM are: R=0.988, RMSE=0.207 and AAE=0.145 for the training set, and R=0.965, RMSE=0.403 and AAE=0.338 for the test set. Leave one out cross validation test was used for assessment of the predictive power and validity of models which led to cross-validation correlation coefficient QUOTE of 0.864 and 0.850 and standardized predicted relative error sum of squares (SPRESS) of 0.553 and 0.581 for LS-SVM and MLR models, respectively. Collapse Key Words AutoDock Hybrid docking Molecular descriptor Quantitative structure activity relationship Collapse MESH Headings Collapse Grants Collapse
36	Predicting network of drug-enzyme interaction based on machine learning method. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013;1844:214-23. [PMID: 23907006 DOI: 10.1016/j.bbapap.2013.07.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 07/16/2013] [Accepted: 07/18/2013] [Indexed: 12/11/2022] Abstract It is important to correctly and efficiently map drugs and enzymes to their possible interaction network in modern drug research. In this work, a novel approach was introduced to encode drug and enzyme molecules with physicochemical molecular descriptors and pseudo amino acid composition, respectively. Based on this encoding method, Random Forest was adopted to build the drug-enzyme interaction network. After selecting the optimal features that are able to represent the main factors of drug-enzyme interaction in our prediction, a total of 129 features were attained which can be clustered into nine categories: Elemental Analysis, Geometry, Chemistry, Amino Acid Composition, Secondary Structure, Polarity, Molecular Volume, Codon Diversity and Electrostatic Charge. It is further found that Geometry features were the most important of all the features. As a result, our predicting model achieved an MCC of 0.915 and a sensitivity of 87.9% at the specificity level of 99.8% for 10-fold cross-validation test, and achieved an MCC of 0.895 and a sensitivity of 95.7% at the specificity level of 95.4% for independent set test. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai. Collapse Key Words CfsSubset Drug–enzyme interaction Machine learning Molecular descriptor Pseudo amino acid composition Random Forest Collapse MESH Headings Collapse Grants Collapse
37	Prediction of boiling points of organic compounds by QSPR tools. J Mol Graph Model 2013;44:113-9. [PMID: 23792208 DOI: 10.1016/j.jmgm.2013.04.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Accepted: 04/24/2013] [Indexed: 10/26/2022] Abstract The novel electro-negativity topological descriptors of YC, WC were derived from molecular structure by equilibrium electro-negativity of atom and relative bond length of molecule. The quantitative structure-property relationships (QSPR) between descriptors of YC, WC as well as path number parameter P3 and the normal boiling points of 80 alkanes, 65 unsaturated hydrocarbons and 70 alcohols were obtained separately. The high-quality prediction models were evidenced by coefficient of determination (R(2)), the standard error (S), average absolute errors (AAE) and predictive parameters (Qext(2),RCV(2),Rm(2)). According to the regression equations, the influences of the length of carbon backbone, the size, the degree of branching of a molecule and the role of functional groups on the normal boiling point were analyzed. Comparison results with reference models demonstrated that novel topological descriptors based on the equilibrium electro-negativity of atom and the relative bond length were useful molecular descriptors for predicting the normal boiling points of organic compounds. Collapse Key Words Equilibrium electro-negativity Molecular descriptor Normal boiling point Organic compound Quantitative structure–property relationship (QSPR) Collapse MESH Headings Collapse Grants Collapse