1
|
Wang X, Li F, Chen J, Teng Y, Ji C, Wu H. Critical features identification for chemical chronic toxicity based on mechanistic forecast models. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 307:119584. [PMID: 35688391 DOI: 10.1016/j.envpol.2022.119584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 05/03/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
Facing billions of tons of pollutants entering the ocean each year, aquatic toxicity is becoming a crucial endpoint for evaluating chemical adverse effects on ecosystems. Notably, huge amount of toxic chemicals at environmental relevant doses can cause potential adverse effects. However, chronic aquatic toxicity effects of chemicals are much scarcer, especially at population level. Rotifers are highly sensitive to toxicants even at chronic low-doses and their communities are usually considered as effective indicators for assessing the status of aquatic ecosystems. Therefore, the no observed effect concentration (NOEC) for population abundance of rotifers were selected as endpoints to develop machine learning models for the prediction of chemical aquatic chronic toxicity. In this study, forty-eight binary models were built by eight types of chemical descriptors combined with six machine learning algorithms. The best binary model was 1D & 2D molecular descriptors - random trees model (RT) with high balanced accuracy (BA) (0.83 for training and 0.83 for validation set), and Matthews correlation coefficient (MCC) (0.72 for training set and 0.67 for validation set). Moreover, the optimal model identified the primary factors (SpMAD_Dzp, AMW, MATS2v) and filtered out three high alerting substructures [c1cc(Cl)cc1, CNCO, CCOP(=S)(OCC)O] influencing the chronic aquatic toxicity. These results showed that the compounds with low molecular volume, high polarity and molecular weight could contribute to adverse effects on rotifers, facilitating the deeper understanding of chronic toxicity mechanisms. In addition, forecast models had better performances than the common models embedded into ECOSAR software. This study provided insights into structural features responsible for the toxicity of different groups of chemicals and thereby allowed for the rational design of green and safer alternatives.
Collapse
Affiliation(s)
- Xiaoqing Wang
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; University of Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Fei Li
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China.
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian, 116024, China
| | - Yuefa Teng
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; University of Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Chenglong Ji
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China
| | - Huifeng Wu
- CAS Key Laboratory of Coastal Environmental Processes and Ecological Remediation, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Key Laboratory of Coastal Environmental Processes, YICCAS, Yantai, 264003, PR China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, PR China
| |
Collapse
|
2
|
Mamada H, Nomura Y, Uesawa Y. Novel QSAR Approach for a Regression Model of Clearance That Combines DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2022; 7:17055-17062. [PMID: 35647436 PMCID: PMC9134387 DOI: 10.1021/acsomega.2c00261] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/29/2022] [Indexed: 05/03/2023]
Abstract
The toxicity, absorption, distribution, metabolism, and excretion properties of some targets are difficult to predict by quantitative structure-activity relationship analysis. Therefore, there is a need for a new prediction method that performs well for these targets. The aim of this study was to develop a new regression model of rat clearance (CL). We constructed a regression model using 1545 in-house compounds for which we had rat CL data. Molecular descriptors were calculated using molecular operating environment, alvaDesc, and ADMET Predictor software. The classification model of DeepSnap and Deep Learning (DeepSnap-DL) with images of the three-dimensional chemical structures of compounds as features was constructed, and the prediction probabilities for each compound were calculated. For molecular descriptor-based methods that use molecular descriptors and conventional machine learning algorithms selected by DataRobot, the correlation coefficient (R 2) and root mean square error (RMSE) were 0.625-0.669 and 0.295-0.318, respectively. We combined molecular descriptors and prediction probability of DeepSnap-DL as features and developed a novel regression method we called the combination model. In the combination model with these two types of features and conventional algorithms selected by DataRobot, R 2 and RMSE were 0.710-0.769 and 0.247-0.278, respectively. This finding shows that the combination model performed better than molecular descriptor-based methods. Our combination model will contribute to the design of more rational compounds for drug discovery. This method may be applicable not only to rat CL but also to other pharmacokinetic and pharmacological activity and toxicity parameters; therefore, applying it to other parameters may help to accelerate drug discovery.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical
Research Institute, Japan Tobacco Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose, Tokyo 204-8588, Japan
- . Phone: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
3
|
Ajjarapu SM, Tiwari A, Ramteke PW, Singh DB, Kumar S. Ligand-based drug designing. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00018-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
4
|
Mamada H, Nomura Y, Uesawa Y. Prediction Model of Clearance by a Novel Quantitative Structure-Activity Relationship Approach, Combination DeepSnap-Deep Learning and Conventional Machine Learning. ACS OMEGA 2021; 6:23570-23577. [PMID: 34549154 PMCID: PMC8444299 DOI: 10.1021/acsomega.1c03689] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 08/23/2021] [Indexed: 05/19/2023]
Abstract
Some targets predicted by machine learning (ML) in drug discovery remain a challenge because of poor prediction. In this study, a new prediction model was developed and rat clearance (CL) was selected as a target because it is difficult to predict. A classification model was constructed using 1545 in-house compounds with rat CL data. The molecular descriptors calculated by Molecular Operating Environment (MOE), alvaDesc, and ADMET Predictor software were used to construct the prediction model. In conventional ML using 100 descriptors and random forest selected by DataRobot, the area under the curve (AUC) and accuracy (ACC) were 0.883 and 0.825, respectively. Conversely, the prediction model using DeepSnap and Deep Learning (DeepSnap-DL) with compound features as images had AUC and ACC of 0.905 and 0.832, respectively. We combined the two models (conventional ML and DeepSnap-DL) to develop a novel prediction model. Using the ensemble model with the mean of the predicted probabilities from each model improved the evaluation metrics (AUC = 0.943 and ACC = 0.874). In addition, a consensus model using the results of the agreement between classifications had an increased ACC (0.959). These combination models with a high level of predictive performance can be applied to rat CL as well as other pharmacokinetic parameters, pharmacological activity, and toxicity prediction. Therefore, these models will aid in the design of more rational compounds for the development of drugs.
Collapse
Affiliation(s)
- Hideaki Mamada
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yukihiro Nomura
- Drug
Metabolism and Pharmacokinetics Research Laboratories, Central Pharmaceutical Research Institute, Japan Tobacco
Inc., 1-1, Murasaki-cho, Takatsuki, Osaka 569-1125, Japan
| | - Yoshihiro Uesawa
- Department
of Medical Molecular Informatics, Meiji
Pharmaceutical University, 2-522-1, Noshio, Kiyose-shi, Tokyo 204-858, Japan
- . Tel.: +81-42-495-8983. Fax: +81-42-495-8983
| |
Collapse
|
5
|
Titov IY, Stroylov VS, Rusina P, Svitanko IV. Preliminary modelling as the first stage of targeted organic synthesis. RUSSIAN CHEMICAL REVIEWS 2021. [DOI: 10.1070/rcr5012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The review aims to present a classification and applicability analysis of methods for preliminary molecular modelling for targeted organic, catalytic and biocatalytic synthesis. The following three main approaches are considered as a primary classification of the methods: modelling of the target – ligand coordination without structural information on both the target and the resulting complex; calculations based on experimentally obtained structural information about the target; and dynamic simulation of the target – ligand complex and the reaction mechanism with calculation of the free energy of the reaction. The review is meant for synthetic chemists to be used as a guide for building an algorithm for preliminary modelling and synthesis of structures with specified properties.
The bibliography includes 353 references.
Collapse
|
6
|
Evaluation of Quantitative Structure Property Relationship Algorithms for Predicting Plasma Protein Binding in Humans. ACTA ACUST UNITED AC 2021; 17:100142. [PMID: 34017929 DOI: 10.1016/j.comtox.2020.100142] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The extent of plasma protein binding is an important compound-specific property that influences a compound's pharmacokinetic behavior and is a critical input parameter for predicting exposure in physiologically based pharmacokinetic (PBPK) modeling. When experimentally determined fraction unbound in plasma (fup) data are not available, quantitative structure-property relationship (QSPR) models can be used for prediction. Because available QSPR models were developed based on training sets containing pharmaceutical-like compounds, we compared their prediction accuracy for environmentally relevant and pharmaceutical compounds. Fup values were calculated using Ingle et al., Watanabe et al. and ADMET Predictor (Simulation Plus). The test set included 818 pharmaceutical and environmentally relevant compounds with fup values ranging from 0.01 to 1. Overall, the three QSPR models resulted in over-prediction of fup for highly binding compounds and under-prediction for low or moderately binding compounds. For highly binding compounds (0.01≤ fup ≤ 0.25), Watanabe et al. performed better with a lower mean absolute error (MAE) of 6.7% and a lower mean absolute relative prediction error (RPE) of 171.7 % than other methods. For low to moderately binding compounds, both Ingle et al. and ADMET Predictor performed better than Watanabe et al. with superior MAE and RPE values. The positive polar surface area, the number of basic functional groups and lipophilicity were the most important chemical descriptors for predicting fup. This study demonstrated that the prediction of fup was the most uncertain for highly binding compounds. This suggested that QSPR-predicted fup values should be used with caution in PBPK modeling.
Collapse
|
7
|
Abstract
At the end of her academic career, the author summarizes the main aspects of QSAR modeling, giving comments and suggestions according to her 23 years' experience in QSAR research on environmental topics. The focus is mainly on Multiple Linear Regression, particularly Ordinary Least Squares, using a Genetic Algorithm for variable selection from various theoretical molecular descriptors, but the comments can be useful also for other QSAR methods. The need for rigorous validation, also external, and for applicability domain check to guarantee predictivity and reliability of QSAR models is particularly highlighted. The commented approach is the “predictive” one, based on chemometrics, and is usefully applied to the prioritization of environmental pollutants. All the discussed points and the author's ideas are implemented in the software QSARINS, as a legacy to the QSAR community.
Collapse
|
8
|
Radadiya A, Pickett JA. Characterizing human odorant signals: insights from insect semiochemistry and in silico modelling. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190263. [PMID: 32306882 DOI: 10.1098/rstb.2019.0263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Interactions relating to human chemical signalling, although widely acknowledged, are relatively poorly characterized chemically, except for human axillary odour. However, the extensive chemical ecology of insects, involving countless pheromone and other semiochemical identifications, may offer insights into overcoming problems of characterizing human-derived semiochemicals more widely. Current techniques for acquiring insect semiochemicals are discussed, particularly in relation to the need for samples to relate, as closely as possible, to the ecological situation in which they are naturally deployed. Analysis is facilitated by chromatography coupled to electrophysiological preparations from the olfactory organs of insects in vivo. This is not feasible with human olfaction, but there are now potential approaches using molecular genetically reconstructed olfactory preparations already in use with insect systems. There are specific insights of value for characterizing human semiochemicals from advanced studies on semiochemicals of haematophagous insects, which include those involving human hosts, in addition to wider studies on farm and companion animals. The characterization of the precise molecular properties recognized in olfaction could lead to new advances in analogue design and a range of novel semiochemicals for human benefit. There are insights from successful synthetic biology studies on insect semiochemicals using novel biosynthetic precursors. Already, wider opportunities in olfaction emerging from in silico studies, involving a range of theoretical and computational approaches to molecular design and understanding olfactory systems at the molecular level, are showing promise for studying human semiochemistry. This article is part of the Theo Murphy meeting issue 'Olfactory communication in humans'.
Collapse
Affiliation(s)
- Ashish Radadiya
- School of Chemistry, Cardiff University, Cardiff CF10 3AT, UK
| | - John A Pickett
- School of Chemistry, Cardiff University, Cardiff CF10 3AT, UK
| |
Collapse
|
9
|
Golbraikh A. Value of p-Value. Mol Inform 2019; 38:e1800152. [PMID: 31188542 DOI: 10.1002/minf.201800152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/07/2019] [Indexed: 11/09/2022]
Abstract
The goal of this manuscript is to discuss important aspects of external validation of classification and category Quantitative Structure - Activity/Property/Toxicity Relationship QS/A/P/T/R models that to the best of author's knowledge are not addressed in publications. Statistical significance (in terms of p-value) and accuracy of prediction (in terms of Correct Classification Rate (CCR)) of external validation set compounds are among most important characteristics of the models. We assert that in most cases the models built for classification or category response variable should be statistically significant and predictive for each class or category. We show that three thresholds of the number of compounds in each class or category of the external validation sets should be satisfied. 1) The p-value criterion can never be satisfied, if the number of compounds is below the first threshold. 2) If the number of compounds is between the first and the second thresholds, p-value criterion should be used. 3) If it is higher than the third threshold, classification or category accuracy criterion should be used. 4) If the number of compounds is between second and third thresholds, either one or the other criterion should be used depending on the value of p-value. 5) When the number of compounds in the class approaches infinity, the maximum relative error of prediction approaches the relative expected error. The results are of interest in other areas of multidimensional data analysis.
Collapse
Affiliation(s)
- Alexander Golbraikh
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, CB #7360, Chapel Hill, NC 27599
| |
Collapse
|
10
|
Heterologous biosynthesis of triterpenoid ambrein in engineered Escherichia coli. Biotechnol Lett 2017; 40:399-404. [DOI: 10.1007/s10529-017-2483-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 11/16/2017] [Indexed: 01/12/2023]
|
11
|
Tromelin A, Chabanet C, Audouze K, Koensgen F, Guichard E. Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors. FLAVOUR FRAG J 2017. [DOI: 10.1002/ffj.3430] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Anne Tromelin
- UMR CSGA: CNRS, INRA; Université de Bourgogne Franche-Comté; 21000 Dijon France
| | - Claire Chabanet
- UMR CSGA: CNRS, INRA; Université de Bourgogne Franche-Comté; 21000 Dijon France
| | - Karine Audouze
- MTi, Sorbonne Paris Cité; Université Paris Diderot; INSERM UMR-S 973 75013 Paris France
| | - Florian Koensgen
- UMR CSGA: CNRS, INRA; Université de Bourgogne Franche-Comté; 21000 Dijon France
| | - Elisabeth Guichard
- UMR CSGA: CNRS, INRA; Université de Bourgogne Franche-Comté; 21000 Dijon France
| |
Collapse
|
12
|
Castillo-González D, Mergny JL, De Rache A, Pérez-Machado G, Cabrera-Pérez MA, Nicolotti O, Introcaso A, Mangiatordi GF, Guédin A, Bourdoncle A, Garrigues T, Pallardó F, Cordeiro MNDS, Paz-y-Miño C, Tejera E, Borges F, Cruz-Monteagudo M. Harmonization of QSAR Best Practices and Molecular Docking Provides an Efficient Virtual Screening Tool for Discovering New G-Quadruplex Ligands. J Chem Inf Model 2015; 55:2094-110. [DOI: 10.1021/acs.jcim.5b00415] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Daimel Castillo-González
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Jean-Louis Mergny
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Aurore De Rache
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Gisselle Pérez-Machado
- Molecular Simulation and
Drug Design Group, Centro de Bioactivos Químicos (CBQ), Central University of Las Villas, Santa Clara, Villa Clara 54830, Cuba
- Department of Physiology,
Faculty of Medicine, University of Valencia, Valencia 46010, Valencia, Spain
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
| | - Miguel Angel Cabrera-Pérez
- Molecular Simulation and
Drug Design Group, Centro de Bioactivos Químicos (CBQ), Central University of Las Villas, Santa Clara, Villa Clara 54830, Cuba
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
- Department of Engineering, Area of Pharmacy and Pharmaceutical
Technology, Miguel Hernández University, 03550 Sant Joan d’Alacant, Alicante, Alicante, Spain
| | - Orazio Nicolotti
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Antonellina Introcaso
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Giuseppe Felice Mangiatordi
- Dipartimento
di Farmacia-Scienze, Università degli Studi di Bari “Aldo Moro″, Via Orabona 4, 70125 Bari, Bari, Italy
| | - Aurore Guédin
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Anne Bourdoncle
- ARNA Laboratory, IECB, University of Bordeaux, F-33600 Pessac, France
- ARNA Laboratory,
INSERM, U869, F-33000 Bordeaux, France
| | - Teresa Garrigues
- Department
of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
| | - Federico Pallardó
- Department of Physiology,
Faculty of Medicine, University of Valencia, Valencia 46010, Valencia, Spain
| | | | - Cesar Paz-y-Miño
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| | - Eduardo Tejera
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| | | | - Maykel Cruz-Monteagudo
- Instituto de Investigaciones
Biomédicas (IIB), Universidad de Las Américas, 170513 Quito, Pichincha, Ecuador
| |
Collapse
|
13
|
Persuy MA, Sanz G, Tromelin A, Thomas-Danguin T, Gibrat JF, Pajot-Augy E. Mammalian olfactory receptors: molecular mechanisms of odorant detection, 3D-modeling, and structure-activity relationships. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2014; 130:1-36. [PMID: 25623335 DOI: 10.1016/bs.pmbts.2014.11.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
This chapter describes the main characteristics of olfactory receptor (OR) genes of vertebrates, including generation of this large multigenic family and pseudogenization. OR genes are compared in relation to evolution and among species. OR gene structure and selection of a given gene for expression in an olfactory sensory neuron (OSN) are tackled. The specificities of OR proteins, their expression, and their function are presented. The expression of OR proteins in locations other than the nasal cavity is regulated by different mechanisms, and ORs display various additional functions. A conventional olfactory signal transduction cascade is observed in OSNs, but individual ORs can also mediate different signaling pathways, through the involvement of other molecular partners and depending on the odorant ligand encountered. ORs are engaged in constitutive dimers. Ligand binding induces conformational changes in the ORs that regulate their level of activity depending on odorant dose. When present, odorant binding proteins induce an allosteric modulation of OR activity. Since no 3D structure of an OR has been yet resolved, modeling has to be performed using the closest G-protein-coupled receptor 3D structures available, to facilitate virtual ligand screening using the models. The study of odorant binding modes and affinities may infer best-bet OR ligands, to be subsequently checked experimentally. The relationship between spatial and steric features of odorants and their activity in terms of perceived odor quality are also fields of research that development of computing tools may enhance.
Collapse
Affiliation(s)
- Marie-Annick Persuy
- INRA UR 1197 NeuroBiologie de l'Olfaction, Domaine de Vilvert, Jouy-en-Josas, France
| | - Guenhaël Sanz
- INRA UR 1197 NeuroBiologie de l'Olfaction, Domaine de Vilvert, Jouy-en-Josas, France
| | - Anne Tromelin
- INRA UMR 1129 Flaveur, Vision et Comportement du Consommateur, Dijon, France
| | | | - Jean-François Gibrat
- INRA UR1077 Mathématique Informatique et Génome, Domaine de Vilvert, Jouy-en-Josas, France
| | - Edith Pajot-Augy
- INRA UR 1197 NeuroBiologie de l'Olfaction, Domaine de Vilvert, Jouy-en-Josas, France.
| |
Collapse
|
14
|
Weidlich IE, Pevzner Y, Miller BT, Filippov IV, Woodcock HL, Brooks BR. Development and implementation of (Q)SAR modeling within the CHARMMing web-user interface. J Comput Chem 2014; 36:62-7. [PMID: 25362883 DOI: 10.1002/jcc.23765] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 10/03/2014] [Accepted: 10/10/2014] [Indexed: 11/07/2022]
Abstract
Recent availability of large publicly accessible databases of chemical compounds and their biological activities (PubChem, ChEMBL) has inspired us to develop a web-based tool for structure activity relationship and quantitative structure activity relationship modeling to add to the services provided by CHARMMing (www.charmming.org). This new module implements some of the most recent advances in modern machine learning algorithms-Random Forest, Support Vector Machine, Stochastic Gradient Descent, Gradient Tree Boosting, so forth. A user can import training data from Pubchem Bioassay data collections directly from our interface or upload his or her own SD files which contain structures and activity information to create new models (either categorical or numerical). A user can then track the model generation process and run models on new data to predict activity.
Collapse
Affiliation(s)
- Iwona E Weidlich
- Computational Drug Design Systems (CODDES) LLC, Rockville, Maryland, 20852; Laboratory of Computational Biology, NIH, National Heart, Lung, and Blood Institute, Rockville, Maryland, 20852
| | | | | | | | | | | |
Collapse
|
15
|
Gunturi SB, Ramamurthi N. A novel approach to generate robust classification models to predict developmental toxicity from imbalanced datasets. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2014; 25:711-727. [PMID: 25102768 DOI: 10.1080/1062936x.2014.942357] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Computational models to predict the developmental toxicity of compounds are built on imbalanced datasets wherein the toxicants outnumber the non-toxicants. Consequently, the results are biased towards the majority class (toxicants). To overcome this problem and to obtain sensitive but also accurate classifiers, we followed an integrated approach wherein (i) Synthetic Minority Over Sampling (SMOTE) is used for re-sampling, (ii) genetic algorithm (GA) is used for variable selection and (iii) support vector machines (SVM) is used for model development. The best model, M3, has (i) sensitivity (SE) = 85.54% and specificity (SP) = 85.62% in leave-one-out validation, (ii) classification accuracy of the training set = 99.67%, (iii) classification accuracy of the test set = 92.59%; and (iv) sensitivity = 92.68, specificity = 92.31 on the test set. Consensus prediction based on models M3-M5 improved these percentages by 5% over M3. From the analysis of results we infer that data imbalance in toxicity studies can be effectively addressed by the application of re-sampling techniques.
Collapse
Affiliation(s)
- S B Gunturi
- a Innovation Labs Hyderabad , Tata Consultancy Services Limited , Madhapur , Hyderabad , India
| | | |
Collapse
|
16
|
Leahy DE, Sykora V. Automation of decision making in drug design. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 10:e437-41. [PMID: 24179997 DOI: 10.1016/j.ddtec.2013.02.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Kim M, Sedykh A, Chakravarti SK, Saiakhov RD, Zhu H. Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches. Pharm Res 2014; 31:1002-14. [PMID: 24306326 PMCID: PMC3955412 DOI: 10.1007/s11095-013-1222-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 09/30/2013] [Indexed: 01/07/2023]
Abstract
PURPOSE Oral bioavailability (%F) is a key factor that determines the fate of a new drug in clinical trials. Traditionally, %F is measured using costly and time-consuming experimental tests. Developing computational models to evaluate the %F of new drugs before they are synthesized would be beneficial in the drug discovery process. METHODS We employed Combinatorial Quantitative Structure-Activity Relationship approach to develop several computational %F models. We compiled a %F dataset of 995 drugs from public sources. After generating chemical descriptors for each compound, we used random forest, support vector machine, k nearest neighbor, and CASE Ultra to develop the relevant QSAR models. The resulting models were validated using five-fold cross-validation. RESULTS The external predictivity of %F values was poor (R(2) = 0.28, n = 995, MAE = 24), but was improved (R(2) = 0.40, n = 362, MAE = 21) by filtering unreliable predictions that had a high probability of interacting with MDR1 and MRP2 transporters. Furthermore, classifying the compounds according to the %F values (%F < 50% as "low", %F ≥ 50% as 'high") and developing category QSAR models resulted in an external accuracy of 76%. CONCLUSIONS In this study, we developed predictive %F QSAR models that could be used to evaluate new drug compounds, and integrating drug-transporter interactions data greatly benefits the resulting models.
Collapse
Affiliation(s)
- Marlene Kim
- Department of Chemistry, Rutgers University, Camden, New Jersey 08102
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102
| | - Alexander Sedykh
- Division of Chemical Biology and Medicinal Chemistry, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | | | | | - Hao Zhu
- Department of Chemistry, Rutgers University, Camden, New Jersey 08102
- The Rutgers Center for Computational and Integrative Biology, Camden, New Jersey 08102
| |
Collapse
|
18
|
Mitchell JBO. Machine learning methods in chemoinformatics. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014; 4:468-481. [PMID: 25285160 PMCID: PMC4180928 DOI: 10.1002/wcms.1183] [Citation(s) in RCA: 238] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure-activity relationships (QSAR), many others exist in the technical literature. This discussion is methods-based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k-Nearest Neighbors and naïve Bayes classifiers.
Collapse
|
19
|
Bonet I, Franco-Montero P, Rivero V, Teijeira M, Borges F, Uriarte E, Morales Helguera A. Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. J Chem Inf Model 2013; 53:3140-55. [PMID: 24289249 DOI: 10.1021/ci300516w] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
A(2B) adenosine receptor antagonists may be beneficial in treating diseases like asthma, diabetes, diabetic retinopathy, and certain cancers. This has stimulated research for the development of potent ligands for this subtype, based on quantitative structure-affinity relationships. In this work, a new ensemble machine learning algorithm is proposed for classification and prediction of the ligand-binding affinity of A(2B) adenosine receptor antagonists. This algorithm is based on the training of different classifier models with multiple training sets (composed of the same compounds but represented by diverse features). The k-nearest neighbor, decision trees, neural networks, and support vector machines were used as single classifiers. To select the base classifiers for combining into the ensemble, several diversity measures were employed. The final multiclassifier prediction results were computed from the output obtained by using a combination of selected base classifiers output, by utilizing different mathematical functions including the following: majority vote, maximum and average probability. In this work, 10-fold cross- and external validation were used. The strategy led to the following results: i) the single classifiers, together with previous features selections, resulted in good overall accuracy, ii) a comparison between single classifiers, and their combinations in the multiclassifier model, showed that using our ensemble gave a better performance than the single classifier model, and iii) our multiclassifier model performed better than the most widely used multiclassifier models in the literature. The results and statistical analysis demonstrated the supremacy of our multiclassifier approach for predicting the affinity of A(2B) adenosine receptor antagonists, and it can be used to develop other QSAR models.
Collapse
Affiliation(s)
- Isis Bonet
- Escuela de Ingeniería de Antioquia, Envigado, 055428 Antioquia, Colombia
| | | | | | | | | | | | | |
Collapse
|
20
|
Zhang L, Fourches D, Sedykh A, Zhu H, Golbraikh A, Ekins S, Clark J, Connelly MC, Sigal M, Hodges D, Guiguemde A, Guy RK, Tropsha A. Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. J Chem Inf Model 2013; 53:475-92. [PMID: 23252936 DOI: 10.1021/ci300421n] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Quantitative structure-activity relationship (QSAR) models have been developed for a data set of 3133 compounds defined as either active or inactive against P. falciparum. Because the data set was strongly biased toward inactive compounds, different sampling approaches were employed to balance the ratio of actives versus inactives, and models were rigorously validated using both internal and external validation approaches. The balanced accuracy for assessing the antimalarial activities of 70 external compounds was between 87% and 100% depending on the approach used to balance the data set. Virtual screening of the ChemBridge database using QSAR models identified 176 putative antimalarial compounds that were submitted for experimental validation, along with 42 putative inactives as negative controls. Twenty five (14.2%) computational hits were found to have antimalarial activities with minimal cytotoxicity to mammalian cells, while all 42 putative inactives were confirmed experimentally. Structural inspection of confirmed active hits revealed novel chemical scaffolds, which could be employed as starting points to discover novel antimalarial agents.
Collapse
Affiliation(s)
- Liying Zhang
- The Laboratory for Molecular Modeling, Eshelman School of Pharmacy, CB# 7568, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Muratov EN, Varlamova EV, Artemenko AG, Polishchuk PG, Kuz'min VE. Existing and Developing Approaches for QSAR Analysis of Mixtures. Mol Inform 2012; 31:202-21. [PMID: 27477092 DOI: 10.1002/minf.201100129] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2011] [Accepted: 02/04/2012] [Indexed: 11/10/2022]
Abstract
This review is devoted to the critical analysis of advantages and disadvantages of existing mixture descriptors and their usage in various QSAR/QSPR tasks. We describe good practices for the QSAR modeling of mixtures, data sources for mixtures, a discussion of various mixture descriptors and their application, recommendations about proper external validation specific for mixture QSAR modeling, and future perspectives of this field. The biggest problem in QSAR of mixtures is the lack of reliable data about the mixtures' properties. Various mixture descriptors are used for the modeling of different endpoints. However, these descriptors have certain disadvantages, such as applicability only to 1 : 1 binary mixtures, and additive nature. The field of QSAR of mixtures is still under development, and existing efforts could be considered as a foundation for future approaches and studies. The usage of non-additive mixture descriptors, which are sensitive to interaction effects, in combination with best practices of QSAR model development (e.g., thorough data collection and curation, rigorous external validation, etc.) will significantly improve the quality of QSAR studies of mixtures.
Collapse
Affiliation(s)
- Eugene N Muratov
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394. , .,Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, Eshelman School of Pharmacy, University of North Carolina, Beard Hall 301, CB#7568, Chapel Hill, NC, 27599, USA tel: +19199663459, fax: +19199660204. ,
| | - Ekaterina V Varlamova
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Anatoly G Artemenko
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Pavel G Polishchuk
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| | - Victor E Kuz'min
- Laboratory of Theoretical Chemistry, Department of Molecular Structure, A. V. Bogatsky Physical Chemical Institute, National Academy of Sciences of Ukraine, Lustdorfskaya Doroga 86, Odessa 65080, Ukraine tel: +380487662394, fax: +380487662394
| |
Collapse
|
22
|
Scotti L, Tullius Scotti M, de Oliveira Lima E, Sobral da Silva M, do Carmo Alves de Lima M, da Rocha Pitta I, Olímpio de Moura R, Gonzaga Batista de Oliveira J, Duarte da Cruz RM, Bezerra Mendonça FJ. Experimental methodologies and evaluations of computer-aided drug design methodologies applied to a series of 2-aminothiophene derivatives with antifungal activities. Molecules 2012; 17:2298-315. [PMID: 22367025 PMCID: PMC6269054 DOI: 10.3390/molecules17032298] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Revised: 02/03/2012] [Accepted: 02/21/2012] [Indexed: 11/16/2022] Open
Abstract
Fifty 2-[(arylidene)amino]-4,5-cycloalkyl[b]thiophene-3-carbonitrile derivatives were screened for their in vitro antifungal activities against Candida krusei and Cryptococcus neoformans. Based on experimentally determined minimum inhibitory concentration (MIC) values, we conducted computer-aided drug design studies [molecular modelling, chemometric tools (CPCA, PCA, PLS) and QSAR-3D] that enable the prediction of three-dimensional structural characteristics that influence the antifungal activities of these derivatives. These predictions provide direction with regard to the syntheses of new derivatives with improved biological activities, which can be used as therapeutic alternatives for the treatment of fungal infections.
Collapse
Affiliation(s)
- Luciana Scotti
- Centro de Biotecnologia, Universidade Federal da Paraíba, João Pessoa 50670-910, PB, Brazil; (M.S.S.)
- Authors to whom correspondence should be addressed: (L.S.); (F.J.B.M.J.); Tel.: +55-83-3191-1528 (L.S.); Tel.: +55-83-9924-1423 (F.J.B.M.J.); Fax: +55-83-3223-1128
| | - Marcus Tullius Scotti
- Departamento de Engenharia e Meio Ambiente, Universidade Federal da Paraíba, Campus IV, Rio Tinto 58297-000, PB, Brazil;
| | - Edeltrudes de Oliveira Lima
- Laboratório Micologia, Departamento de Ciências Farmacêuticas, Centro de Ciências da Saúde, Universidade Federal da Paraíba, João Pessoa 50670-910, PB, Brazil;
| | - Marcelo Sobral da Silva
- Centro de Biotecnologia, Universidade Federal da Paraíba, João Pessoa 50670-910, PB, Brazil; (M.S.S.)
| | - Maria do Carmo Alves de Lima
- Laboratório de Planejamento e Síntese de Fármacos, Departamento de Antibióticos, Universidade Federal de Pernambuco, Recife 50670-910, PE, Brazil; (M.C.A.L.); (I.R.P.)
| | - Ivan da Rocha Pitta
- Laboratório de Planejamento e Síntese de Fármacos, Departamento de Antibióticos, Universidade Federal de Pernambuco, Recife 50670-910, PE, Brazil; (M.C.A.L.); (I.R.P.)
| | - Ricardo Olímpio de Moura
- Laboratório de Síntese e Vetorização de Moléculas, Departamento de Ciências Biológicas, Universidade Estadual da Paraíba, Rua Horácio Trajano de Oliveira s/n, Cristo Redentor, João Pessoa 58070-450, PB, Brazil; (R.O.M.); (J.G.B.O.); (R.M.D.C.)
| | - Jaismary Gonzaga Batista de Oliveira
- Laboratório de Síntese e Vetorização de Moléculas, Departamento de Ciências Biológicas, Universidade Estadual da Paraíba, Rua Horácio Trajano de Oliveira s/n, Cristo Redentor, João Pessoa 58070-450, PB, Brazil; (R.O.M.); (J.G.B.O.); (R.M.D.C.)
| | - Rayssa Marques Duarte da Cruz
- Laboratório de Síntese e Vetorização de Moléculas, Departamento de Ciências Biológicas, Universidade Estadual da Paraíba, Rua Horácio Trajano de Oliveira s/n, Cristo Redentor, João Pessoa 58070-450, PB, Brazil; (R.O.M.); (J.G.B.O.); (R.M.D.C.)
| | - Francisco Jaime Bezerra Mendonça
- Laboratório de Síntese e Vetorização de Moléculas, Departamento de Ciências Biológicas, Universidade Estadual da Paraíba, Rua Horácio Trajano de Oliveira s/n, Cristo Redentor, João Pessoa 58070-450, PB, Brazil; (R.O.M.); (J.G.B.O.); (R.M.D.C.)
- Authors to whom correspondence should be addressed: (L.S.); (F.J.B.M.J.); Tel.: +55-83-3191-1528 (L.S.); Tel.: +55-83-9924-1423 (F.J.B.M.J.); Fax: +55-83-3223-1128
| |
Collapse
|
23
|
Zhang S. Application of Machine Leaning in Drug Discovery and Development. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Machine learning techniques have been widely used in drug discovery and development, particularly in the areas of cheminformatics, bioinformatics and other types of pharmaceutical research. It has been demonstrated they are suitable for large high dimensional data, and the models built with these methods can be used for robust external predictions. However, various problems and challenges still exist, and new approaches are in great need. In this Chapter, the authors will review the current development of machine learning techniques, and especially focus on several machine learning techniques they developed as well as their application to model building, lead discovery via virtual screening, integration with molecular docking, and prediction of off-target properties. The authors will suggest some potential different avenues to unify different disciplines, such as cheminformatics, bioinformatics and systems biology, for the purpose of developing integrated in silico drug discovery and development approaches.
Collapse
Affiliation(s)
- Shuxing Zhang
- The University of Texas at M.D. Anderson Cancer Center, USA
| |
Collapse
|
24
|
Recent trends in statistical QSAR modeling of environmental chemical toxicity. EXPERIENTIA SUPPLEMENTUM (2012) 2012; 101:381-411. [PMID: 22945576 DOI: 10.1007/978-3-7643-8340-4_13] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Quantitative cheminformatics approaches such as QSAR modeling find growing applications in chemical risk assessment. Traditional methods rely on the use of calculated chemical descriptors of molecules and relatively small training sets. However, in recent years, there is a trend toward the increased use of in vitro biological testing approaches to reduce both the length of experimental studies and the animal use for chemical risk assessment. Furthermore, there is also much greater emphasis on model validation using external datasets to enable the reliable use of computational models as part of regulatory decision making. In this chapter, recent trends emphasizing the need for both careful curation of experimental data prior to model development and rigorous model validation are investigated. Furthermore, recent approaches to chemical toxicity prediction that employ both chemical descriptors and in vitro screening data for developing novel hybrid chemical/biological models are being reviewed. Examples of respective application studies that employ novel workflows for model developments are described and recent important efforts by several academic, nonprofit, and industrial groups to start placing both data and, especially, models in the public domain are discussed.
Collapse
|
25
|
Low Y, Uehara T, Minowa Y, Yamada H, Ohno Y, Urushidani T, Sedykh A, Muratov E, Fourches D, Zhu H, Rusyn I, Tropsha A. Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. Chem Res Toxicol 2011; 24:1251-62. [PMID: 21699217 PMCID: PMC4281093 DOI: 10.1021/tx200148a] [Citation(s) in RCA: 160] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Quantitative structure-activity relationship (QSAR) modeling and toxicogenomics are typically used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely, their chemical descriptors and toxicogenomics profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs ( http://toxico.nibio.go.jp/datalist.html ). The model end point was hepatotoxicity in the rat following 28 days of continuous exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (correct classification rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomics data (24 h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomics descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomics data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the mechanistic understanding of subchronic liver injury and afford models capable of accurate prediction of hepatotoxicity from chemical structure and short-term assay results.
Collapse
Affiliation(s)
- Yen Low
- Laboratory for Molecular Modeling, University of North Carolina, Chapel Hill, North Carolina 27599
- Department of Environmental Sciences & Engineering, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Takeki Uehara
- Department of Environmental Sciences & Engineering, University of North Carolina, Chapel Hill, North Carolina 27599
- Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, Asagi, Osaka, Japan
| | - Yohsuke Minowa
- Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, Asagi, Osaka, Japan
| | - Hiroshi Yamada
- Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, Asagi, Osaka, Japan
| | - Yasuo Ohno
- National Institute of Health Sciences, Kamiyoga, Tokyo, Japan
| | - Tetsuro Urushidani
- Toxicogenomics Informatics Project, National Institute of Biomedical Innovation, Asagi, Osaka, Japan
- Doshisha Women's College of Liberal Arts, Kodo, Kyoto, Japan
| | - Alexander Sedykh
- Department of Environmental Sciences & Engineering, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Eugene Muratov
- Department of Environmental Sciences & Engineering, University of North Carolina, Chapel Hill, North Carolina 27599
- A.V. Bogatsky Physical-Chemical Institute NAS of Ukraine, Odessa, Ukraine
| | - Denis Fourches
- Laboratory for Molecular Modeling, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Hao Zhu
- Laboratory for Molecular Modeling, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Ivan Rusyn
- Department of Environmental Sciences & Engineering, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, University of North Carolina, Chapel Hill, North Carolina 27599
| |
Collapse
|
26
|
Rodríguez O, Teixeira MA, Rodrigues AE. Prediction of odour detection thresholds using partition coefficients. FLAVOUR FRAG J 2011. [DOI: 10.1002/ffj.2076] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Oscar Rodríguez
- LSRE - Laboratory of Separation and Reaction Engineering; Associate Laboratory LSRE/LCM; Dept. of Chemical Engineering; Faculty of Engineering of University of Porto; Rua Dr. Roberto Frias; 4200-465; Porto; Portugal
| | - Miguel A Teixeira
- LSRE - Laboratory of Separation and Reaction Engineering; Associate Laboratory LSRE/LCM; Dept. of Chemical Engineering; Faculty of Engineering of University of Porto; Rua Dr. Roberto Frias; 4200-465; Porto; Portugal
| | - Alírio E Rodrigues
- LSRE - Laboratory of Separation and Reaction Engineering; Associate Laboratory LSRE/LCM; Dept. of Chemical Engineering; Faculty of Engineering of University of Porto; Rua Dr. Roberto Frias; 4200-465; Porto; Portugal
| |
Collapse
|
27
|
Nasonov AF. Computational methods and software in computer-aided combinatorial library design. RUSS J GEN CHEM+ 2011. [DOI: 10.1134/s1070363210120248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
28
|
Ebalunode JO, Zheng W, Tropsha A. Application of QSAR and shape pharmacophore modeling approaches for targeted chemical library design. Methods Mol Biol 2011; 685:111-33. [PMID: 20981521 DOI: 10.1007/978-1-60761-931-4_6] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Optimization of chemical library composition affords more efficient identification of hits from biological screening experiments. The optimization could be achieved through rational selection of reagents used in combinatorial library synthesis. However, with a rapid advent of parallel synthesis methods and availability of millions of compounds synthesized by many vendors, it may be more efficient to design targeted libraries by means of virtual screening of commercial compound collections. This chapter reviews the application of advanced cheminformatics approaches such as quantitative structure-activity relationships (QSAR) and pharmacophore modeling (both ligand and structure based) for virtual screening. Both approaches rely on empirical SAR data to build models; thus, the emphasis is placed on achieving models of the highest rigor and external predictive power. We present several examples of successful applications of both approaches for virtual screening to illustrate their utility. We suggest that the expert use of both QSAR and pharmacophore models, either independently or in combination, enables users to achieve targeted libraries enriched with experimentally confirmed hit compounds.
Collapse
Affiliation(s)
- Jerry O Ebalunode
- Department of Pharmaceutical Sciences, BRITE Institute, North Carolina Center University, Durham, NC, USA.
| | | | | |
Collapse
|
29
|
Abstract
Computer-aided approaches have been widely used in pharmaceutical research to improve the efficiency of the drug discovery and development pipeline. To identify and design small molecules as clinically effective therapeutics, various computational methods have been evaluated as promising strategies, depending on the purpose and systems of interest. Both ligand and structure-based drug design approaches are powerful technologies, which can be applied to virtual screening for lead identification and optimization. Here, we review the progress in this field and summarize the application of some new technologies we developed. These state-of-the-art tools have been used for the discovery and development of active agents for various diseases, in particular for cancer therapies. The described protocols are appropriate for all drug discovery stages, but expertise is still needed to perform the studies based on the targets of interest.
Collapse
Affiliation(s)
- Shuxing Zhang
- Department of Experimental Therapeutics, M.D. Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
30
|
Hajjo R, Grulke C, Golbraikh A, Setola V, Huang XP, Roth BL, Tropsha A. Development, validation, and use of quantitative structure-activity relationship models of 5-hydroxytryptamine (2B) receptor ligands to identify novel receptor binders and putative valvulopathic compounds among common drugs. J Med Chem 2010; 53:7573-86. [PMID: 20958049 PMCID: PMC3438292 DOI: 10.1021/jm100600y] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Some antipsychotic drugs are known to cause valvular heart disease by activating serotonin 5-HT(2B) receptors. We have developed and validated binary classification QSAR models capable of predicting potential 5-HT(2B) actives. The classification accuracies of the models built to discriminate 5-HT(2B) actives from the inactives were as high as 80% for the external test set. These models were used to screen in silico 59,000 compounds included in the World Drug Index, and 122 compounds were predicted as actives with high confidence. Ten of them were tested in radioligand binding assays and nine were found active, suggesting a success rate of 90%. All validated actives were then tested in functional assays, and one compound was identified as a true 5-HT(2B) agonist. We suggest that the QSAR models developed in this study could be used as reliable predictors to flag drug candidates that are likely to cause valvulopathy.
Collapse
Affiliation(s)
- Rima Hajjo
- Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Christopher Grulke
- Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Alexander Golbraikh
- Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Vincent Setola
- National Institute of Mental Health Psychoactive Drug Screening Program, Division of Medicinal Chemistry and Natural Products and Department of Pharmacology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Xi-Ping Huang
- National Institute of Mental Health Psychoactive Drug Screening Program, Division of Medicinal Chemistry and Natural Products and Department of Pharmacology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Bryan L. Roth
- Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
- National Institute of Mental Health Psychoactive Drug Screening Program, Division of Medicinal Chemistry and Natural Products and Department of Pharmacology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Alexander Tropsha
- Division of Medicinal Chemistry and Natural Products, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| |
Collapse
|
31
|
Chemometric studies on natural products as potential inhibitors of the NADH oxidase from Trypanosoma cruzi using the VolSurf approach. Molecules 2010; 15:7363-77. [PMID: 20966878 PMCID: PMC6259467 DOI: 10.3390/molecules15107363] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Revised: 10/05/2010] [Accepted: 10/11/2010] [Indexed: 11/16/2022] Open
Abstract
Natural products have widespread biological activities, including inhibition of mitochondrial enzyme systems. Some of these activities, for example cytotoxicity, may be the result of alteration of cellular bioenergetics. Based on previous computer-aided drug design (CADD) studies and considering reported data on structure-activity relationships (SAR), an assumption regarding the mechanism of action of natural products against parasitic infections involves the NADH-oxidase inhibition. In this study, chemometric tools, such as: Principal Component Analysis (PCA), Consensus PCA (CPCA), and partial least squares regression (PLS), were applied to a set of forty natural compounds, acting as NADH-oxidase inhibitors. The calculations were performed using the VolSurf+ program. The formalisms employed generated good exploratory and predictive results. The independent variables or descriptors having a hydrophobic profile were strongly correlated to the biological data.
Collapse
|
32
|
Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inform 2010; 29:476-88. [DOI: 10.1002/minf.201000061] [Citation(s) in RCA: 1086] [Impact Index Per Article: 77.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 06/08/2010] [Indexed: 11/11/2022]
|
33
|
Per aspera ad astra: application of Simplex QSAR approach in antiviral research. Future Med Chem 2010; 2:1205-26. [DOI: 10.4155/fmc.10.194] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
This review explores the application of the Simplex representation of molecular structure (SiRMS) QSAR approach in antiviral research. We provide an introduction to and description of SiRMS, its application in antiviral research and future directions of development of the Simplex approach and the whole QSAR field. In the Simplex approach every molecule is represented as a system of different simplexes (tetratomic fragments with fixed composition, structure, chirality and symmetry). The main advantages of SiRMS are consideration of the different physical–chemical properties of atoms, high adequacy and good interpretability of models obtained and clear procedures for molecular design. The reliability of developed QSAR models as predictive virtual screening tools and their ability to serve as the basis of directed drug design was validated by subsequent synthetic and biological experiments. The SiRMS approach is realized as the complex of the computer program ‘HiT QSAR’, which is available on request.
Collapse
|
34
|
Gunturi SB, Theerthala SS, Patel NK, Bahl J, Narayanan R. Prediction of skin sensitization potential using D-optimal design and GA-kNN classification methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2010; 21:305-335. [PMID: 20544553 DOI: 10.1080/10629361003773955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Modelling of skin sensitization data of 255 diverse compounds and 450 calculated descriptors was performed to develop global predictive classification models that are applicable to whole chemical space. With this aim, we employed two automated procedures, (a) D-optimal design to select optimal members of the training and test sets and (b) k-Nearest Neighbour classification (kNN) method along with Genetic Algorithms (GA-kNN Classification) to select significant and independent descriptors in order to build the models. This methodology helped us to derive multiple models, M1-M5, that are stable and robust. The best among them, model M1 (CCR(train) = 84.3%, CCR(test) = 87.2% and CCR(ext) = 80.4%), is based on six neighbours and nine descriptors and further suggests that: (a) it is stable and robust and performs better than the reported models in literature, and (b) the combination of D-optimal design and GA-kNN classification approach is a very promising approach. Consensus prediction based on the models M1-M5 improved the CCR of training, test and external validation datasets by 3.8%, 4.45% and 3.85%, respectively, over M1. From the analysis of the physical meaning of the selected descriptors, it is inferred that the skin sensitization potential of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (i) lipophilicity, (ii) atomic polarizability, (iii) shape, (iii) electrostatic interactions, and (iv) chemical reactivity.
Collapse
Affiliation(s)
- S B Gunturi
- Innovation Labs Hyderabad, Tata Consultancy Services Limited, #1, Software Units Layout, Madhapur, Hyderabad - 500 081, India
| | | | | | | | | |
Collapse
|
35
|
Pérez-Garrido A, Helguera AM, Rodríguez FG, Cordeiro MNDS. QSAR models to predict mutagenicity of acrylates, methacrylates and alpha,beta-unsaturated carbonyl compounds. Dent Mater 2010; 26:397-415. [PMID: 20122717 DOI: 10.1016/j.dental.2009.11.158] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2009] [Revised: 09/08/2009] [Accepted: 11/26/2009] [Indexed: 11/17/2022]
Abstract
OBJECTIVE The purpose of this study is to develop a quantitative structure-activity relationship (QSAR) model that can distinguish mutagenic from non-mutagenic species with alpha,beta-unsaturated carbonyl moiety using two endpoints for this activity - Ames test and mammalian cell gene mutation test - and also to gather information about the molecular features that most contribute to eliminate the mutagenic effects of these chemicals. METHODS Two data sets were used for modeling the two mutagenicity endpoints: (1) Ames test and (2) mammalian cells mutagenesis. The first one comprised 220 molecules, while the second one 48 substances, ranging from acrylates, methacrylates to alpha,beta-unsaturated carbonyl compounds. The QSAR models were developed by applying linear discriminant analysis (LDA) along with different sets of descriptors computed using the DRAGON software. RESULTS For both endpoints, there was a concordance of 89% in the prediction and 97% confidentiality by combining the three models for the Ames test mutagenicity. We have also identified several structural alerts to assist the design of new monomers. SIGNIFICANCE These individual models and especially their combination are attractive from the point of view of molecular modeling and could be used for the prediction and design of new monomers that do not pose a human health risk.
Collapse
Affiliation(s)
- Alfonso Pérez-Garrido
- Enviromental Engineering and Toxicology Dpt., Catholic University of San Antonio, Guadalupe, Murcia, Spain.
| | | | | | | |
Collapse
|
36
|
Gedeck P, Kramer C, Ertl P. Computational analysis of structure-activity relationships. PROGRESS IN MEDICINAL CHEMISTRY 2010; 49:113-60. [PMID: 20855040 DOI: 10.1016/s0079-6468(10)49004-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | | | | |
Collapse
|
37
|
Adekoya A, Dong X, Ebalunode J, Zheng W. Development of improved models for phosphodiesterase-4 inhibitors with a multi-conformational structure-based QSAR method. CURRENT CHEMICAL GENOMICS 2009; 3:54-61. [PMID: 20161837 PMCID: PMC2802764 DOI: 10.2174/1875397300903010054] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2009] [Revised: 09/15/2009] [Accepted: 09/17/2009] [Indexed: 11/29/2022]
Abstract
Phosphodiesterase-4 (PDE-4) is an important drug target for several diseases, including COPD (chronic obstructive pulmonary disorder) and neurodegenerative diseases. In this paper, we describe the development of improved QSAR (quantitative structure-activity relationship) models using a novel multi-conformational structure-based pharmacophore key (MC-SBPPK) method. Similar to our previous work, this method calculates molecular descriptors based on the matching of a molecule's pharmacophore features with those of the target binding pocket. Therefore, these descriptors are PDE4-specific, and most relevant to the problem under study. Furthermore, this work expands our previous SBPPK QSAR method by explicitly including multiple conformations of the PDE-4 inhibitors in the regression analysis, and thus addresses the issue of molecular flexibility. The nonlinear regression problem resulted from including multiple conformations has been transformed into a linear equation and solved by an iterative partial least square (iPLS) procedure, according to the Lukacova-Balaz scheme. 35 PDE-4 inhibitors have been analyzed with this new method, and predictive models have been developed. Based on the prediction statistics for both the training set and the test set, these new models are more robust and predictive than those obtained by traditional ligand-based QSAR techniques as well as that obtained with the SBPPK method reported in our previous work. As a result, multiple predictive models have been added to the collection of QSAR models for PDE4 inhibitors. Collectively, these models will be useful for the discovery of new drug candidates targeting the PDE-4 enzyme.
Collapse
Affiliation(s)
- Adetokunbo Adekoya
- Department of Pharmaceutical Sciences, BRITE Institute, North Carolina Central University, 1801 Fayetteville Street, Durham, NC 27707, USA
| | | | | | | |
Collapse
|
38
|
Study of the structural and electronic origin of the sandalwood odor of some terpenylcyclohexanols. MONATSHEFTE FUR CHEMIE 2009. [DOI: 10.1007/s00706-009-0208-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
39
|
Peterson YK, Wang XS, Casey PJ, Tropsha A. Discovery of geranylgeranyltransferase-I inhibitors with novel scaffolds by the means of quantitative structure-activity relationship modeling, virtual screening, and experimental validation. J Med Chem 2009; 52:4210-20. [PMID: 19537691 DOI: 10.1021/jm8013772] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Geranylgeranylation is critical to the function of several proteins including Rho, Rap1, Rac, Cdc42, and G-protein gamma subunits. Geranylgeranyltransferase type I (GGTase-I) inhibitors (GGTIs) have therapeutic potential to treat inflammation, multiple sclerosis, atherosclerosis, and many other diseases. Following our standard workflow, we have developed and rigorously validated quantitative structure-activity relationship (QSAR) models for 48 GGTIs using variable selection k nearest neighbor (kNN), automated lazy learning (ALL), and partial least squares (PLS) methods. The QSAR models were employed for virtual screening of 9.5 million commercially available chemicals, yielding 47 diverse computational hits. Seven of these compounds with novel scaffolds and high predicted GGTase-I inhibitory activities were tested in vitro, and all were found to be bona fide and selective micromolar inhibitors. Notably, these novel hits could not be identified using traditional similarity search. These data demonstrate that rigorously developed QSAR models can serve as reliable virtual screening tools, leading to the discovery of structurally novel bioactive compounds.
Collapse
Affiliation(s)
- Yuri K Peterson
- Department of Pharmacology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | |
Collapse
|
40
|
Puzyn T, Leszczynski J, Cronin MT. Virtual Screening and Molecular Design Based on Hierarchical Qsar Technology. RECENT ADVANCES IN QSAR STUDIES 2009; 8. [PMCID: PMC7120998 DOI: 10.1007/978-1-4020-9783-6_5] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
This chapter is devoted to the hierarchical QSAR technology (HiT QSAR) based on simplex representation of molecular structure (SiRMS) and its application to different QSAR/QSPR tasks. The essence of this technology is a sequential solution (with the use of the information obtained on the previous steps) of the QSAR paradigm by a series of enhanced models based on molecular structure description (in a specific order from 1D to 4D). Actually, it’s a system of permanently improved solutions. Different approaches for domain applicability estimation are implemented in HiT QSAR. In the SiRMS approach every molecule is represented as a system of different simplexes (tetratomic fragments with fixed composition, structure, chirality, and symmetry). The level of simplex descriptors detailed increases consecutively from the 1D to 4D representation of the molecular structure. The advantages of the approach presented are an ability to solve QSAR/QSPR tasks for mixtures of compounds, the absence of the “molecular alignment” problem, consideration of different physical–chemical properties of atoms (e.g., charge, lipophilicity), and the high adequacy and good interpretability of obtained models and clear ways for molecular design. The efficiency of HiT QSAR was demonstrated by its comparison with the most popular modern QSAR approaches on two representative examination sets. The examples of successful application of the HiT QSAR for various QSAR/QSPR investigations on the different levels (1D–4D) of the molecular structure description are also highlighted. The reliability of developed QSAR models as the predictive virtual screening tools and their ability to serve as the basis of directed drug design was validated by subsequent synthetic, biological, etc. experiments. The HiT QSAR is realized as the suite of computer programs termed the “HiT QSAR” software that so includes powerful statistical capabilities and a number of useful utilities.
Collapse
Affiliation(s)
- Tomasz Puzyn
- Dept. Chemistry, University of Gdansk, ul. Jana Sobieskiego 18, Gdansk, 80-952 Poland
| | - Jerzy Leszczynski
- Dept. Chemistry, Jackson State University, J. R. Lynch St. 1325, Jackson, 39217 U.S.A
| | - Mark T. Cronin
- Dept. Chemistry, John Moores University, Byrom St., Liverpool, L3 3AF United Kingdom
| |
Collapse
|
41
|
Kuz'min V, Muratov E, Artemenko A, Varlamova E, Gorb L, Wang J, Leszczynski J. Consensus QSAR Modeling of Phosphor-Containing Chiral AChE Inhibitors. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200860117] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
42
|
Zhang L, Zhu H, Oprea TI, Golbraikh A, Tropsha A. QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds. Pharm Res 2008; 25:1902-14. [DOI: 10.1007/s11095-008-9609-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2007] [Accepted: 04/23/2008] [Indexed: 01/16/2023]
|
43
|
Wang XS, Tang H, Golbraikh A, Tropsha A. Combinatorial QSAR Modeling of Specificity and Subtype Selectivity of Ligands Binding to Serotonin Receptors 5HT1E and 5HT1F. J Chem Inf Model 2008; 48:997-1013. [DOI: 10.1021/ci700404c] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Xiang S. Wang
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, and Molecular & Cellular Biophysics Program, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Hao Tang
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, and Molecular & Cellular Biophysics Program, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Alexander Golbraikh
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, and Molecular & Cellular Biophysics Program, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products and Carolina Exploratory Center for Cheminformatics Research, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, and Molecular & Cellular Biophysics Program, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599
| |
Collapse
|
44
|
Zhu H, Rusyn I, Richard A, Tropsha A. Use of cell viability assay data improves the prediction accuracy of conventional quantitative structure-activity relationship models of animal carcinogenicity. ENVIRONMENTAL HEALTH PERSPECTIVES 2008; 116:506-13. [PMID: 18414635 PMCID: PMC2291015 DOI: 10.1289/ehp.10573] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Accepted: 01/03/2008] [Indexed: 05/02/2023]
Abstract
BACKGROUND To develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem. OBJECTIVES We have explored these data in terms of their utility for predicting adverse health effects of the environmental agents. METHODS AND RESULTS Initially, the classification k nearest neighbor (kNN) quantitative structure-activity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTP-HTS studies. We found that compounds classified by HTS as "actives" in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS "inactives" were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors. CONCLUSIONS Our studies suggest that combining NTP-HTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology.
Collapse
Affiliation(s)
- Hao Zhu
- Carolina Environmental Bioinformatics Research Center
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, School of Pharmacy and
| | - Ivan Rusyn
- Carolina Environmental Bioinformatics Research Center
- Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina USA
| | - Ann Richard
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Alexander Tropsha
- Carolina Environmental Bioinformatics Research Center
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, School of Pharmacy and
| |
Collapse
|
45
|
Hsieh JH, Wang XS, Teotico D, Golbraikh A, Tropsha A. Differentiation of AmpC beta-lactamase binders vs. decoys using classification kNN QSAR modeling and application of the QSAR classifier to virtual screening. J Comput Aided Mol Des 2008; 22:593-609. [PMID: 18338225 DOI: 10.1007/s10822-008-9199-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 02/18/2008] [Indexed: 11/24/2022]
Abstract
The use of inaccurate scoring functions in docking algorithms may result in the selection of compounds with high predicted binding affinity that nevertheless are known experimentally not to bind to the target receptor. Such falsely predicted binders have been termed 'binding decoys'. We posed a question as to whether true binders and decoys could be distinguished based only on their structural chemical descriptors using approaches commonly used in ligand based drug design. We have applied the k-Nearest Neighbor (kNN) classification QSAR approach to a dataset of compounds characterized as binders or binding decoys of AmpC beta-lactamase. Models were subjected to rigorous internal and external validation as part of our standard workflow and a special QSAR modeling scheme was employed that took into account the imbalanced ratio of inhibitors to non-binders (1:4) in this dataset. 342 predictive models were obtained with correct classification rate (CCR) for both training and test sets as high as 0.90 or higher. The prediction accuracy was as high as 100% (CCR = 1.00) for the external validation set composed of 10 compounds (5 true binders and 5 decoys) selected randomly from the original dataset. For an additional external set of 50 known non-binders, we have achieved the CCR of 0.87 using very conservative model applicability domain threshold. The validated binary kNN QSAR models were further employed for mining the NCGC AmpC screening dataset (69653 compounds). The consensus prediction of 64 compounds identified as screening hits in the AmpC PubChem assay disagreed with their annotation in PubChem but was in agreement with the results of secondary assays. At the same time, 15 compounds were identified as potential binders contrary to their annotation in PubChem. Five of them were tested experimentally and showed inhibitory activities in millimolar range with the highest binding constant K(i) of 135 microM. Our studies suggest that validated QSAR models could complement structure based docking and scoring approaches in identifying promising hits by virtual screening of molecular libraries.
Collapse
Affiliation(s)
- Jui-Hua Hsieh
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, CB #7360, Beard Hall, Chapel Hill, NC, 27599-7360, USA
| | | | | | | | | |
Collapse
|
46
|
Zhu H, Tropsha A, Fourches D, Varnek A, Papa E, Gramatica P, Oberg T, Dao P, Cherkasov A, Tetko IV. Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model 2008; 48:766-84. [PMID: 18311912 DOI: 10.1021/ci700443v] [Citation(s) in RCA: 188] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Selecting most rigorous quantitative structure-activity relationship (QSAR) approaches is of great importance in the development of robust and predictive models of chemical toxicity. To address this issue in a systematic way, we have formed an international virtual collaboratory consisting of six independent groups with shared interests in computational chemical toxicology. We have compiled an aqueous toxicity data set containing 983 unique compounds tested in the same laboratory over a decade against Tetrahymena pyriformis. A modeling set including 644 compounds was selected randomly from the original set and distributed to all groups that used their own QSAR tools for model development. The remaining 339 compounds in the original set (external set I) as well as 110 additional compounds (external set II) published recently by the same laboratory (after this computational study was already in progress) were used as two independent validation sets to assess the external predictive power of individual models. In total, our virtual collaboratory has developed 15 different types of QSAR models of aquatic toxicity for the training set. The internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as measured by the leave-one-out cross-validation correlation coefficient ( Q abs2). The prediction accuracy for the external validation sets I and II ranged from 0.71 to 0.85 (linear regression coefficient R absI2) and from 0.38 to 0.83 (linear regression coefficient R absII2), respectively. The use of an applicability domain threshold implemented in most models generally improved the external prediction accuracy but at the same time led to a decrease in chemical space coverage. Finally, several consensus models were developed by averaging the predicted aquatic toxicity for every compound using all 15 models, with or without taking into account their respective applicability domains. We find that consensus models afford higher prediction accuracy for the external validation data sets with the highest space coverage as compared to individual constituent models. Our studies prove the power of a collaborative and consensual approach to QSAR model development. The best validated models of aquatic toxicity developed by our collaboratory (both individual and consensus) can be used as reliable computational predictors of aquatic toxicity and are available from any of the participating laboratories.
Collapse
Affiliation(s)
- Hao Zhu
- Laboratory for Molecular Modeling, Division of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Hierarchical QSAR technology based on the Simplex representation of molecular structure. J Comput Aided Mol Des 2008; 22:403-21. [DOI: 10.1007/s10822-008-9179-6] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/10/2008] [Indexed: 10/22/2022]
|
48
|
Tropsha A, Wang SX. QSAR modeling of GPCR ligands: methodologies and examples of applications. ACTA ACUST UNITED AC 2007:49-73. [PMID: 17703577 DOI: 10.1007/2789_2006_003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
GPCR ligands represent not only one of the major classes of current drugs but the major continuing source of novel potent pharmaceutical agents. Because 3D structures of GPCRs as determined by experimental techniques are still unavailable, ligand-based drug discovery methods remain the major computational molecular modeling approaches to the analysis of growing data sets of tested GPCR ligands. This paper presents an overview of modern Quantitative Structure Activity Relationship (QSAR) modeling. We discuss the critical issue of model validation and the strategy for applying the successfully validated QSAR models to virtual screening of available chemical databases. We present several examples of applications of validated QSAR modeling approaches to GPCR ligands. We conclude with the comments on exciting developments in the QSAR modeling of GPCR ligands that focus on the study of emerging data sets of compounds with dual or even multiple activities against two or more of GPCRs.
Collapse
Affiliation(s)
- A Tropsha
- Laboratory for Molecular Modeling, CB#7360, Beard Hall, School of Pharmacy, University of North Carolina at Chapel Hill, 27599-7360 North Carolina, USA.
| | | |
Collapse
|
49
|
Abstract
y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.
Collapse
|
50
|
Gunturi S, Narayanan R. In Silico ADME Modeling 3: Computational Models to Predict Human Intestinal Absorption Using Sphere Exclusion and kNN QSAR Methods. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200630094] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|