1
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
2
|
Goya-Jorge E, Amber M, Gozalbes R, Connolly L, Barigye SJ. Assessing the chemical-induced estrogenicity using in silico and in vitro methods. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2021; 87:103688. [PMID: 34119701 DOI: 10.1016/j.etap.2021.103688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 06/07/2021] [Accepted: 06/08/2021] [Indexed: 06/12/2023]
Abstract
Multiple substances are considered endocrine disrupting chemicals (EDCs). However, there is a significant gap in the early prioritization of EDC's effects. In this work, in silico and in vitro methods were used to model estrogenicity. Two Quantitative Structure-Activity Relationship (QSAR) models based on Logistic Regression and REPTree algorithms were built using a large and diverse database of estrogen receptor (ESR) agonism. A 10-fold external validation demonstrated their robustness and predictive capacity. Mechanistic interpretations of the molecular descriptors (C-026, nArOH,PW5, B06[Br-Br]) used for modelling suggested that the heteroatomic fragments, aromatic hydroxyls, and bromines, and the relative bond accessibility areas of molecules, are structural determinants in estrogenicity. As validation of the QSARs, ESR transactivity of thirteen persistent organic pollutants (POPs) and suspected EDCs was tested in vitro using the MMV-Luc cell line. A good correspondence between predictions and experimental bioassays demonstrated the value of the QSARs for prioritization of ESR agonist compounds.
Collapse
Affiliation(s)
- Elizabeth Goya-Jorge
- ProtoQSAR SL., CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, 12 Av. Benjamin Franklin, 46980, Paterna, Valencia, Spain; Department of Food Science, Faculty of Veterinary Medicine-FARAH, University of Liège, 10 Av. Cureghem, 4000, Sart-Tilman, Liège, Belgium.
| | - Mazia Amber
- The Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, BT9 5DL, Belfast, Northern Ireland, United Kingdom.
| | - Rafael Gozalbes
- ProtoQSAR SL., CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, 12 Av. Benjamin Franklin, 46980, Paterna, Valencia, Spain; MolDrug AI Systems SL, 45 Olimpia Arozena Torres, 46018, Valencia, Spain.
| | - Lisa Connolly
- The Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, BT9 5DL, Belfast, Northern Ireland, United Kingdom.
| | - Stephen J Barigye
- ProtoQSAR SL., CEEI (Centro Europeo de Empresas Innovadoras), Parque Tecnológico de Valencia, 12 Av. Benjamin Franklin, 46980, Paterna, Valencia, Spain; MolDrug AI Systems SL, 45 Olimpia Arozena Torres, 46018, Valencia, Spain.
| |
Collapse
|
3
|
Emara Y, Fantke P, Judson R, Chang X, Pradeep P, Lehmann A, Siegert MW, Finkbeiner M. Integrating endocrine-related health effects into comparative human toxicity characterization. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 762:143874. [PMID: 33401053 DOI: 10.1016/j.scitotenv.2020.143874] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/10/2020] [Accepted: 11/11/2020] [Indexed: 06/12/2023]
Abstract
Endocrine-disrupting chemicals have the ability to interfere with and alter functions of the hormone system, leading to adverse effects on reproduction, growth and development. Despite growing concerns over their now ubiquitous presence in the environment, endocrine-related human health effects remain largely outside of comparative human toxicity characterization frameworks as applied for example in life cycle impact assessments. In this paper, we propose a new methodological framework to consistently integrate endocrine-related health effects into comparative human toxicity characterization. We present two quantitative and operational approaches for extrapolating towards a common point of departure from both in vivo and dosimetry-adjusted in vitro endocrine-related effect data and deriving effect factors as well as corresponding characterization factors for endocrine-active/endocrine-disrupting chemicals. Following the proposed approaches, we calculated effect factors for 323 chemicals, reflecting their endocrine potency, and related characterization factors for 157 chemicals, expressing their relative endocrine-related human toxicity potential. Developed effect and characterization factors are ready for use in the context of chemical prioritization and substitution as well as life cycle impact assessment and other comparative assessment frameworks. Endocrine-related effect factors were found comparable to existing effect factors for cancer and non-cancer effects, indicating that (1) the chemicals' endocrine potency is not necessarily higher or lower than other effect potencies and (2) using dosimetry-adjusted effect data to derive effect factors does not consistently overestimate the effect of potential endocrine disruptors. Calculated characterization factors span over 8-11 orders of magnitude for different substances and emission compartments and are dominated by the range in endocrine potencies.
Collapse
Affiliation(s)
- Yasmine Emara
- Department of Environmental Technology, Technical University Berlin, 10623 Berlin, Germany.
| | - Peter Fantke
- Quantitative Sustainability Assessment, Department of Technology, Management and Economics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.
| | - Richard Judson
- Office of Research and Development, Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711.
| | - Xiaoqing Chang
- Integrated Laboratory Systems, LLC., Morrisville, NC 27560, United States.
| | - Prachi Pradeep
- Biomolecular and Computational Toxicology Division, Center for Computational Toxicology and Exposure, U.S. Environmental Protection Agency, Research Triangle Park, NC 27711.
| | - Annekatrin Lehmann
- Department of Environmental Technology, Technical University Berlin, 10623 Berlin, Germany.
| | - Marc-William Siegert
- Department of Environmental Technology, Technical University Berlin, 10623 Berlin, Germany.
| | - Matthias Finkbeiner
- Department of Environmental Technology, Technical University Berlin, 10623 Berlin, Germany.
| |
Collapse
|
4
|
Schneider M, Pons JL, Bourguet W, Labesse G. Towards accurate high-throughput ligand affinity prediction by exploiting structural ensembles, docking metrics and ligand similarity. Bioinformatics 2020; 36:160-168. [PMID: 31350558 PMCID: PMC6956784 DOI: 10.1093/bioinformatics/btz538] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 05/29/2019] [Accepted: 07/19/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Nowadays, virtual screening (VS) plays a major role in the process of drug development. Nonetheless, an accurate estimation of binding affinities, which is crucial at all stages, is not trivial and may require target-specific fine-tuning. Furthermore, drug design also requires improved predictions for putative secondary targets among which is Estrogen Receptor alpha (ERα). RESULTS VS based on combinations of Structure-Based VS (SBVS) and Ligand-Based VS (LBVS) is gaining momentum to improve VS performances. In this study, we propose an integrated approach using ligand docking on multiple structural ensembles to reflect receptor flexibility. Then, we investigate the impact of the two different types of features (structure-based and ligand molecular descriptors) on affinity predictions using a random forest algorithm. We find that ligand-based features have lower predictive power (rP = 0.69, R2 = 0.47) than structure-based features (rP = 0.78, R2 = 0.60). Their combination maintains high accuracy (rP = 0.73, R2 = 0.50) on the internal test set, but it shows superior robustness on external datasets. Further improvement and extending the training dataset to include xenobiotics, leads to a novel high-throughput affinity prediction method for ERα ligands (rP = 0.85, R2 = 0.71). The presented prediction tool is provided to the community as a dedicated satellite of the @TOME server in which one can upload a ligand dataset in mol2 format and get ligand docked and affinity predicted. AVAILABILITY AND IMPLEMENTATION http://edmon.cbs.cnrs.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Melanie Schneider
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - Jean-Luc Pons
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - William Bourguet
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| | - Gilles Labesse
- Centre de Biochimie Structurale, CNRS, INSERM, Univ Montpellier, 34090 Montpellier, France
| |
Collapse
|
5
|
Jiang W, Chen Q, Zhou B, Wang F. In silico prediction of estrogen receptor subtype binding affinity and selectivity using 3D-QSAR and molecular docking. Med Chem Res 2019. [DOI: 10.1007/s00044-019-02428-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Yan L, Zhang Q, Huang F, Nie WW, Hu CQ, Ying HZ, Dong XW, Zhao MR. Ternary classification models for predicting hormonal activities of chemicals via nuclear receptors. Chem Phys Lett 2018. [DOI: 10.1016/j.cplett.2018.06.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
7
|
Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 2018. [PMID: 29520515 PMCID: PMC5843579 DOI: 10.1186/s13321-018-0263-1] [Citation(s) in RCA: 319] [Impact Index Per Article: 45.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The collection of chemical structure information and associated experimental data for quantitative structure–activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2–15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission’s Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure–activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency’s CompTox Chemistry Dashboard.![]()
Collapse
Affiliation(s)
- Kamel Mansouri
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA. .,Oak Ridge Institute for Science and Education, 1299 Bethel Valley Road, Oak Ridge, TN, 37830, USA. .,ScitoVation LLC, 6 Davis Drive, Research Triangle Park, NC, 27709, USA.
| | - Chris M Grulke
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Richard S Judson
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| | - Antony J Williams
- National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, 27711, USA
| |
Collapse
|
8
|
Wong JC, Zidar J, Ho J, Wang Y, Lee KK, Zheng J, Sullivan MB, You X, Kriegel R. Assessment of several machine learning methods towards reliable prediction of hormone receptor binding affinity. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.cdc.2017.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
9
|
Zhang Q, Yan L, Wu Y, Ji L, Chen Y, Zhao M, Dong X. A ternary classification using machine learning methods of distinct estrogen receptor activities within a large collection of environmental chemicals. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017; 580:1268-1275. [PMID: 28011018 DOI: 10.1016/j.scitotenv.2016.12.088] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 12/12/2016] [Accepted: 12/13/2016] [Indexed: 06/06/2023]
Abstract
Endocrine-disrupting chemicals (EDCs), which can threaten ecological safety and be harmful to human beings, have been cause for wide concern. There is a high demand for efficient methodologies for evaluating potential EDCs in the environment. Herein an evaluation platform was developed using novel and statistically robust ternary models via different machine learning models (i.e., linear discriminant analysis, classification and regression tree, and support vector machines). The platform is aimed at effectively classifying chemicals with agonistic, antagonistic, or no estrogen receptor (ER) activities. A total of 440 chemicals from the literature were selected to derive and optimize the three-class model. One hundred and nine new chemicals appeared on the 2014 EPA list for EDC screening, which were used to assess the predictive performances by comparing the E-screen results with the predicted results of the classification models. The best model was obtained using support vector machines (SVM) which recognized agonists and antagonists with accuracies of 76.6% and 75.0%, respectively, on the test set (with an overall predictive accuracy of 75.2%), and achieved a 10-fold cross-validation (CV) of 73.4%. The external predicted accuracy validated by the E-screen assay was 87.5%, which demonstrated the application value for a virtual alert for EDCs with ER agonistic or antagonistic activities. It was demonstrated that the ternary computational model could be used as a faster and less expensive method to identify EDCs that act through nuclear receptors, and to classify these chemicals into different mechanism groups.
Collapse
Affiliation(s)
- Quan Zhang
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Environment, Zhejiang University of Technology, Hangzhou 310032, China
| | - Lu Yan
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Environment, Zhejiang University of Technology, Hangzhou 310032, China
| | - Yan Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Li Ji
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yuanchen Chen
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Environment, Zhejiang University of Technology, Hangzhou 310032, China
| | - Meirong Zhao
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Environment, Zhejiang University of Technology, Hangzhou 310032, China.
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
10
|
Tong L, Guo L, Lv X, Li Y. Modification of polychlorinated phenols and evaluation of their toxicity, biodegradation and bioconcentration using three-dimensional quantitative structure–activity relationship models. J Mol Graph Model 2017; 71:1-12. [DOI: 10.1016/j.jmgm.2016.10.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 09/19/2016] [Accepted: 10/14/2016] [Indexed: 01/04/2023]
|
11
|
Ng HW, Doughty SW, Luo H, Ye H, Ge W, Tong W, Hong H. Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. Chem Res Toxicol 2015; 28:2343-51. [PMID: 26524122 DOI: 10.1021/acs.chemrestox.5b00358] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Some chemicals in the environment possess the potential to interact with the endocrine system in the human body. Multiple receptors are involved in the endocrine system; estrogen receptor α (ERα) plays very important roles in endocrine activity and is the most studied receptor. Understanding and predicting estrogenic activity of chemicals facilitates the evaluation of their endocrine activity. Hence, we have developed a decision forest classification model to predict chemical binding to ERα using a large training data set of 3308 chemicals obtained from the U.S. Food and Drug Administration's Estrogenic Activity Database. We tested the model using cross validations and external data sets of 1641 chemicals obtained from the U.S. Environmental Protection Agency's ToxCast project. The model showed good performance in both internal (92% accuracy) and external validations (∼ 70-89% relative balanced accuracies), where the latter involved the validations of the model across different ER pathway-related assays in ToxCast. The important features that contribute to the prediction ability of the model were identified through informative descriptor analysis and were related to current knowledge of ER binding. Prediction confidence analysis revealed that the model had both high prediction confidence and accuracy for most predicted chemicals. The results demonstrated that the model constructed based on the large training data set is more accurate and robust for predicting ER binding of chemicals than the published models that have been developed using much smaller data sets. The model could be useful for the evaluation of ERα-mediated endocrine activity potential of environmental chemicals.
Collapse
Affiliation(s)
- Hui Wen Ng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Stephen W Doughty
- School of Pharmacy, University of Nottingham Malaysia Campus , Jalan Broga, 43500 Semenyih, Selangor, Malaysia
| | - Heng Luo
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Hao Ye
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Weigong Ge
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration , 3900 NCTR Road, Jefferson, Arkansas 72079, United States
| |
Collapse
|
12
|
Thareja S. Steroidal 5α-Reductase Inhibitors: A Comparative 3D-QSAR Study Review. Chem Rev 2015; 115:2883-94. [DOI: 10.1021/cr5005953] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Suresh Thareja
- School
of Pharmaceutical
Sciences, Guru Ghasidas Central University, Bilaspur, Chhattisgarh 495 009, India
| |
Collapse
|
13
|
Weidlich IE, Pevzner Y, Miller BT, Filippov IV, Woodcock HL, Brooks BR. Development and implementation of (Q)SAR modeling within the CHARMMing web-user interface. J Comput Chem 2014; 36:62-7. [PMID: 25362883 DOI: 10.1002/jcc.23765] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 10/03/2014] [Accepted: 10/10/2014] [Indexed: 11/07/2022]
Abstract
Recent availability of large publicly accessible databases of chemical compounds and their biological activities (PubChem, ChEMBL) has inspired us to develop a web-based tool for structure activity relationship and quantitative structure activity relationship modeling to add to the services provided by CHARMMing (www.charmming.org). This new module implements some of the most recent advances in modern machine learning algorithms-Random Forest, Support Vector Machine, Stochastic Gradient Descent, Gradient Tree Boosting, so forth. A user can import training data from Pubchem Bioassay data collections directly from our interface or upload his or her own SD files which contain structures and activity information to create new models (either categorical or numerical). A user can then track the model generation process and run models on new data to predict activity.
Collapse
Affiliation(s)
- Iwona E Weidlich
- Computational Drug Design Systems (CODDES) LLC, Rockville, Maryland, 20852; Laboratory of Computational Biology, NIH, National Heart, Lung, and Blood Institute, Rockville, Maryland, 20852
| | | | | | | | | | | |
Collapse
|
14
|
Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions. J Cheminform 2013; 5:27. [PMID: 23721648 PMCID: PMC3679843 DOI: 10.1186/1758-2946-5-27] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 05/23/2013] [Indexed: 11/18/2022] Open
Abstract
Background With the growing popularity of using QSAR predictions towards regulatory purposes, such predictive models are now required to be strictly validated, an essential feature of which is to have the model’s Applicability Domain (AD) defined clearly. Although in recent years several different approaches have been proposed to address this goal, no optimal approach to define the model’s AD has yet been recognized. Results This study proposes a novel descriptor-based AD method which accounts for the data distribution and exploits k-Nearest Neighbours (kNN) principle to derive a heuristic decision rule. The proposed method is a three-stage procedure to address several key aspects relevant in judging the reliability of QSAR predictions. Inspired from the adaptive kernel method for probability density function estimation, the first stage of the approach defines a pattern of thresholds corresponding to the various training samples and these thresholds are later used to derive the decision rule. Criterion deciding if a given test sample will be retained within the AD is defined in the second stage of the approach. Finally, the last stage tries reflecting upon the reliability in derived results taking model statistics and prediction error into account. Conclusions The proposed approach addressed a novel strategy that integrated the kNN principle to define the AD of QSAR models. Relevant features that characterize the proposed AD approach include: a) adaptability to local density of samples, useful when the underlying multivariate distribution is asymmetric, with wide regions of low data density; b) unlike several kernel density estimators (KDE), effectiveness also in high-dimensional spaces; c) low sensitivity to the smoothing parameter k; and d) versatility to implement various distances measures. The results derived on a case study provided a clear understanding of how the approach works and defines the model’s AD for reliable predictions.
Collapse
|
15
|
Xu X, Yang W, Li Y, Wang Y. Discovery of estrogen receptor modulators: a review of virtual screening and SAR efforts. Expert Opin Drug Discov 2012; 5:21-31. [PMID: 22823969 DOI: 10.1517/17460440903490395] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
IMPORTANCE OF THE FIELD Virtual screening (VS) coupled with structural biology is a significantly important approach to increase the number and enhance the success of projects in lead identification stage of drug discovery process. Recent advances and future directions in estrogen therapy have resulted in great demand for identifying the potential estrogen receptor (ER) modulators with more activity and selectivity. AREAS COVERED IN THIS REVIEW This review presents the current state of the art in VS and structure-activity relationship of ER modulators in recent discovery, and discusses the strengths and weaknesses of the technology. WHAT THE READER WILL GAIN Readers will gain an overview of the current platforms of in silico screening for discovery of ER modulators; they will learn which structural information is significantly correlated with the bioactivity of ER modulators and what novel strategies should be considered for the creation of more effective chemical structures. TAKE HOME MESSAGE With the goal of reducing toxicity and/or improving efficacy, challenges to the successful modeling of endocrine agents are proposed, providing new paradigms for the design of ER inhibitors.
Collapse
Affiliation(s)
- Xue Xu
- Northwest A&F University, Center of Bioinformatics, Yangling, Shaanxi, 712100, China
| | | | | | | |
Collapse
|
16
|
Li F, Wu H, Li L, Li X, Zhao J, Peijnenburg WJGM. Docking and QSAR study on the binding interactions between polycyclic aromatic hydrocarbons and estrogen receptor. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2012; 80:273-279. [PMID: 22503158 DOI: 10.1016/j.ecoenv.2012.03.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 03/15/2012] [Accepted: 03/18/2012] [Indexed: 05/31/2023]
Abstract
Little is known about the estrogenic activities of polycyclic aromatic hydrocarbons (PAHs) and the underlying mechanisms on estrogenic activities are still unclear. Molecular docking and quantitative structure-activity relationship (QSAR) were used to understand the relationship between molecular structural features and estrogenic activity, and to predict the binding affinity of PAHs to estrogen receptor α (ERα). From molecular docking analysis, hydrogen bonding as well as hydrophobic and π interactions were found between PAHs and ERα. Based on the docking results, appropriate molecular structural parameters were adopted to develop a QSAR model. Five descriptors were included in the QSAR model, which indicated that the estrogenic activity was related to molecular size, van der Waals volumes, shape profiles, polarizabilities and electropological states were significant parameters explaining the estrogenicity. Comparatively, the developed QSAR model had good robustness, predictive ability and mechanistic interpretability. Moreover, the applicability domain of the model was described.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Coastal Zone Environmental Processes, Yantai Institute of Coastal Zone Research (YIC), Chinese Academy of Sciences (CAS), Shandong Provincial Key Laboratory of Coastal Zone Environmental Processes, YICCAS, Yantai Shandong 264003, PR China
| | | | | | | | | | | |
Collapse
|
17
|
A QSAR study of environmental estrogens based on a novel variable selection method. Molecules 2012; 17:6126-45. [PMID: 22614865 PMCID: PMC6268217 DOI: 10.3390/molecules17056126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2012] [Revised: 04/19/2012] [Accepted: 04/26/2012] [Indexed: 11/16/2022] Open
Abstract
A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR) model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI) with leave-multiple-out cross validation (LMOCV) to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD) principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR) algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
Collapse
|
18
|
Li C, Colosi LM. Molecular similarity analysis as tool to prioritize research among emerging contaminants in the environment. Sep Purif Technol 2012. [DOI: 10.1016/j.seppur.2011.02.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
19
|
Toropov AA, Toropova AP, Diaza RG, Benfenati E, Gini G. SMILES-based optimal descriptors: QSAR modeling of estrogen receptor binding affinity by correlation balance. Struct Chem 2011. [DOI: 10.1007/s11224-011-9892-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
20
|
In silico prediction of estrogen receptor subtype binding affinity and selectivity using statistical methods and molecular docking with 2-arylnaphthalenes and 2-arylquinolines. Int J Mol Sci 2010; 11:3434-58. [PMID: 20957105 PMCID: PMC2956105 DOI: 10.3390/ijms11093434] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2010] [Revised: 08/23/2010] [Accepted: 08/27/2010] [Indexed: 11/24/2022] Open
Abstract
Over the years development of selective estrogen receptor (ER) ligands has been of great concern to researchers involved in the chemistry and pharmacology of anticancer drugs, resulting in numerous synthesized selective ER subtype inhibitors. In this work, a data set of 82 ER ligands with ERα and ERβ inhibitory activities was built, and quantitative structure-activity relationship (QSAR) methods based on the two linear (multiple linear regression, MLR, partial least squares regression, PLSR) and a nonlinear statistical method (Bayesian regularized neural network, BRNN) were applied to investigate the potential relationship of molecular structural features related to the activity and selectivity of these ligands. For ERα and ERβ, the performances of the MLR and PLSR models are superior to the BRNN model, giving more reasonable statistical properties (ERα: for MLR, Rtr2 = 0.72, Qte2 = 0.63; for PLSR, Rtr2 = 0.92, Qte2 = 0.84. ERβ: for MLR, Rtr2 = 0.75, Qte2 = 0.75; for PLSR, Rtr2 = 0.98, Qte2 = 0.80). The MLR method is also more powerful than other two methods for generating the subtype selectivity models, resulting in Rtr2 = 0.74 and Qte2 = 0.80. In addition, the molecular docking method was also used to explore the possible binding modes of the ligands and a relationship between the 3D-binding modes and the 2D-molecular structural features of ligands was further explored. The results show that the binding affinity strength for both ERα and ERβ is more correlated with the atom fragment type, polarity, electronegativites and hydrophobicity. The substitutent in position 8 of the naphthalene or the quinoline plane and the space orientation of these two planes contribute the most to the subtype selectivity on the basis of similar hydrogen bond interactions between binding ligands and both ER subtypes. The QSAR models built together with the docking procedure should be of great advantage for screening and designing ER ligands with improved affinity and subtype selectivity property.
Collapse
|
21
|
Quantitative structure-activity relationship of compounds binding to estrogen receptor β based on heuristic method. Sci China Chem 2010. [DOI: 10.1007/s11426-010-4077-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
22
|
Ghafourian T, Bozorgi AHA. Estimation of drug solubility in water, PEG 400 and their binary mixtures using the molecular structures of solutes. Eur J Pharm Sci 2010; 40:430-40. [DOI: 10.1016/j.ejps.2010.04.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2010] [Revised: 03/30/2010] [Accepted: 04/29/2010] [Indexed: 10/19/2022]
|
23
|
Li F, Li X, Shao J, Chi P, Chen J, Wang Z. Estrogenic Activity of Anthraquinone Derivatives: In Vitro and In Silico Studies. Chem Res Toxicol 2010; 23:1349-55. [DOI: 10.1021/tx100118g] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Fei Li
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| | - Xuehua Li
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| | - Jianping Shao
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| | - Ping Chi
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| | - Jingwen Chen
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| | - Zijian Wang
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, P. R. China, and State Key Laboratory of Environmental Aquatic Chemistry, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, P.O. Box 2871, Beijing 100085, P. R. China
| |
Collapse
|
24
|
Kortagere S, Ekins S. Troubleshooting computational methods in drug discovery. J Pharmacol Toxicol Methods 2010; 61:67-75. [PMID: 20176118 DOI: 10.1016/j.vascn.2010.02.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2010] [Accepted: 02/11/2010] [Indexed: 10/19/2022]
Abstract
Computational approaches for drug discovery such as ligand-based and structure-based methods, are increasingly seen as an efficient approach for lead discovery as well as providing insights on absorption, distribution, metabolism, excretion and toxicity (ADME/Tox). What is perhaps less well known and widely described are the limitations of the different technologies. We have therefore presented a troubleshooting approach to QSAR, homology modeling, docking as well as hybrid methods. If such computational or cheminformatics methods are to become more widely used by non-experts it is critical that such limitations are brought to the user's attention and addressed during their workflows. This could improve the quality of the models and results that are obtained.
Collapse
Affiliation(s)
- Sandhya Kortagere
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA.
| | | |
Collapse
|
25
|
Li J, Gramatica P. The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders. Mol Divers 2009; 14:687-96. [DOI: 10.1007/s11030-009-9212-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Accepted: 10/24/2009] [Indexed: 11/30/2022]
|
26
|
Gao C, Zhang A, Lin Y, Yin D, Wang L. Quantitative structure-activity relationships of selected phenols with non-monotonic dose-response curves. Sci Bull (Beijing) 2009. [DOI: 10.1007/s11434-009-0174-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
27
|
Li F, Chen J, Wang Z, Li J, Qiao X. Determination and prediction of xenoestrogens by recombinant yeast-based assay and QSAR. CHEMOSPHERE 2009; 74:1152-1157. [PMID: 19136139 DOI: 10.1016/j.chemosphere.2008.11.081] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2008] [Revised: 11/28/2008] [Accepted: 11/30/2008] [Indexed: 05/27/2023]
Abstract
Estrogenic activities expressed by the logarithm of relative potency (logRP), for 8 xenoestrogens were determined using the recombinant yeast-based assay. The determined logRP values were employed as an independent external data set to validate an estrogenic activity quantitative structure-activity relationship (QSAR) model. The QSAR model was established using partial least squares regression and molecular descriptors derived from DRAGON software. For the training set of the QSAR model that included 25 xenoestrogens, R(2)=0.889, the leave-one-out cross-validation squared correlation coefficient (Q(LOO)(2)) was 0.897. For the external validation set, the predicted logRP values were consistent with the observed values, with a root mean square error (RMSE) of 0.736 log units and the squared correlation coefficient (Q(EXT)(2)) was 0.775. Six descriptors were included in the QSAR model, which indicated that the logRP value was related to molecular size, shape profiles, symmetry and polarizability. Comparatively, the developed model has good robustness and predictivity. Moreover, the applicability domain of the model was discussed.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Industrial Ecology and Environmental Engineering (MOE), Department of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, China
| | | | | | | | | |
Collapse
|
28
|
Li J, Lei B, Liu H, Li S, Yao X, Liu M, Gramatica P. QSAR study of malonyl-CoA decarboxylase inhibitors using GA-MLR and a new strategy of consensus modeling. J Comput Chem 2008; 29:2636-47. [DOI: 10.1002/jcc.21002] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
29
|
Agatonovic-Kustrin S, Turner J, Glass B. Molecular structural characteristics as determinants of estrogen receptor selectivity. J Pharm Biomed Anal 2008; 48:369-75. [DOI: 10.1016/j.jpba.2008.04.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 04/07/2008] [Accepted: 04/07/2008] [Indexed: 12/01/2022]
|
30
|
Progress and perspectives of quantitative structure-activity relationships used for ecological risk assessment of toxic organic compounds. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/s11426-008-0076-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
31
|
QSAR study on estrogenic activity of structurally diverse compounds using generalized regression neural network. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/s11426-008-0070-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
32
|
Liu H, Papa E, Gramatica P. Evaluation and QSAR modeling on multiple endpoints of estrogen activity based on different bioassays. CHEMOSPHERE 2008; 70:1889-97. [PMID: 17884132 DOI: 10.1016/j.chemosphere.2007.07.071] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Revised: 06/22/2007] [Accepted: 07/30/2007] [Indexed: 05/17/2023]
Abstract
There is a great need for an effective means of rapidly assessing endocrine-disrupting activity, especially estrogen-simulating activity, due to the large number of chemicals that have serious adverse effects on the environment. Many approaches using a variety of biological screening assays are used to identify endocrine disrupting chemicals. The present investigation analyzes the consistency and peculiarity of information from different experimental assays collected from a literature survey, by studying the correlation of the different endpoints. In addition, the activity values of more widely used selected bioassays have been combined by principle components analysis (PCA) to build one cumulative endpoint, the estrogen activity index (EAI), for priority setting to identify chemicals most likely possessing estrogen activity for early entry into screening. This index was then modeled using only a few theoretical molecular descriptors. The constructed MLR-QSAR model has been statistically validated for its predictive power, and can be proposed as a preliminary evaluative method to screen/prioritize estrogens according to their integrated estrogen activity, just starting from molecular structure.
Collapse
Affiliation(s)
- Huanxiang Liu
- Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy
| | | | | |
Collapse
|
33
|
Ji L, Wang X, Yang X, Liu S, Wang L. Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals. ACTA ACUST UNITED AC 2008. [DOI: 10.1007/s11434-007-0484-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
34
|
Luan F, Liu HT, Ma WP, Fan BT. Classification of estrogen receptor-β ligands on the basis of their binding affinities using support vector machine and linear discriminant analysis. Eur J Med Chem 2008; 43:43-52. [PMID: 17459530 DOI: 10.1016/j.ejmech.2007.03.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Revised: 03/03/2007] [Accepted: 03/06/2007] [Indexed: 01/22/2023]
Abstract
Classification models of estrogen receptor-beta ligands were proposed using linear and nonlinear models. The data set was divided into active and inactive classes on the basis of their binding affinities. The two-class problem (active, inactive) was firstly explored by linear classifier approach, linear discriminant analysis (LDA). In order to get a more accurate prediction model, the nonlinear novel machine learning technique, support vectors machine (SVM), was subsequently used to investigate. The heuristic method (HM) was used to pre-select the whole descriptor sets. The model containing eight descriptors founded by SVM, showed better predictive ability than LDA. The accuracy in prediction for the training, test and overall data sets are 92.9%, 85.8% and 91.4% for SVM, 83.1%, 76.1% and 81.9% for LDA, respectively. The results indicate that SVM can be used as a powerful modeling tool for QSAR studies.
Collapse
Affiliation(s)
- F Luan
- Department of Applied Chemistry, Yantai University, Yantai, Shandong 264005, PR China.
| | | | | | | |
Collapse
|
35
|
Fang Y, Feng Y, Li M. Optimal QSAR Analysis of the Carcinogenic Activity of Aromatic and Heteroaromatic Amines. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200710077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
36
|
Abstract
y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.
Collapse
|
37
|
Chen D, Cai W, Shao X. Removing uncertain variables based on ensemble partial least squares. Anal Chim Acta 2007; 598:19-26. [PMID: 17693302 DOI: 10.1016/j.aca.2007.07.023] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2006] [Revised: 06/02/2007] [Accepted: 07/11/2007] [Indexed: 10/23/2022]
Abstract
A strategy, named as removing uncertain variables based on ensemble partial least squares (RUV-EPLS), was proposed. In this strategy, the uncertainty in PLS regression coefficients is evaluated by the criterion of stability, and the variables whose regression coefficients carry a relatively large uncertainty are eliminated. Then, a new EPLS model with the remaining variables is constructed. To reasonably control the quality of the PLS member models in the RUV-EPLS, an objective criterion based on the F-test is used, which makes the RUV-EPLS convenient to perform in practice. To validate the effectiveness and universality of the strategy, it was applied to two different sets of near-infrared (NIR) spectra. It is of great interest to be found that the RUV-EPLS is not so sensitive to the outliers as many other calibration methods, and the selected variables are indeed known to be informative for corresponding compounds, which results in a reliable and high-quality calibration model. The study reveals that the RUV-EPLS method is of value to improve stability and predictive ability of multivariate calibration involving complex matrices that may contain a small number of outliers.
Collapse
Affiliation(s)
- Da Chen
- Research Center for Analytical Sciences, State Key Laboratory of Functional Polymer Materials for Adsorption and Separation, Department of Chemistry, Nankai University, Tianjin 300071, PR China
| | | | | |
Collapse
|
38
|
Korhonen SP, Tuppurainen K, Asikainen A, Laatikainen R, Peräkylä M. SOMFA on Large Diverse Xenoestrogen Dataset: The Effect of Superposition Algorithms and External Regression Tools. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200610003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
39
|
Tuppurainen K, Korhonen SP, Ruuskanen J. Performance of multicomponent self-organizing regression (MCSOR) in QSAR, QSPR, and multivariate calibration: comparison with partial least-squares (PLS) and validation with large external data sets. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:549-61. [PMID: 17162386 DOI: 10.1080/10629360601033390] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
A novel method for underdetermined regression problems, multicomponent self-organizing regression (MCSOR), has been recently introduced. Here, its performance is compared with partial least-squares (PLS), which is perhaps the most widely adopted multivariate method in chemometrics. A potpourri of models is presented, and MCSOR appears to provide highly predictive models that are comparable with or better than the corresponding PLS models in large internal (leave-one-out, LOO) and pseudo-external (leave-many-out, LMO) validation tests. The "blind" external predictive ability of MCSOR and PLS is demonstrated employing large melting point, factor Xa, log P and log S data sets. In a nutshell, MCSOR is fast, conceptually simple (employing multiple linear regression, MLR, as a statistical tool), and applicable to all kinds of multivariate problems with single Y-variable.
Collapse
Affiliation(s)
- K Tuppurainen
- Department of Chemistry, University of Kuopio, PO Box 1627, Kuopio, Finland.
| | | | | |
Collapse
|
40
|
Li H, Ung CY, Yap CW, Xue Y, Li ZR, Chen YZ. Prediction of estrogen receptor agonists and characterization of associated molecular descriptors by statistical learning methods. J Mol Graph Model 2006; 25:313-23. [PMID: 16497524 DOI: 10.1016/j.jmgm.2006.01.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2005] [Revised: 12/21/2005] [Accepted: 01/19/2006] [Indexed: 01/04/2023]
Abstract
Specific estrogen receptor (ER) agonists have been used for hormone replacement therapy, contraception, osteoporosis prevention, and prostate cancer treatment. Some ER agonists and partial-agonists induce cancer and endocrine function disruption. Methods for predicting ER agonists are useful for facilitating drug discovery and chemical safety evaluation. Structure-activity relationships and rule-based decision forest models have been derived for predicting ER binders at impressive accuracies of 87.1-97.6% for ER binders and 80.2-96.0% for ER non-binders. However, these are not designed for identifying ER agonists and they were developed from a subset of known ER binders. This work explored several statistical learning methods (support vector machines, k-nearest neighbor, probabilistic neural network and C4.5 decision tree) for predicting ER agonists from comprehensive set of known ER agonists and other compounds. The corresponding prediction systems were developed and tested by using 243 ER agonists and 463 ER non-agonists, respectively, which are significantly larger in number and structural diversity than those in previous studies. A feature selection method was used for selecting molecular descriptors responsible for distinguishing ER agonists from non-agonists, some of which are consistent with those used in other studies and the findings from X-ray crystallography data. The prediction accuracies of these methods are comparable to those of earlier studies despite the use of significantly more diverse range of compounds. SVM gives the best accuracy of 88.9% for ER agonists and 98.1% for non-agonists. Our study suggests that statistical learning methods such as SVM are potentially useful for facilitating the prediction of ER agonists and for characterizing the molecular descriptors associated with ER agonists.
Collapse
Affiliation(s)
- H Li
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | |
Collapse
|
41
|
Liu H, Papa E, Gramatica P. QSAR Prediction of Estrogen Activity for a Large Set of Diverse Chemicals under the Guidance of OECD Principles. Chem Res Toxicol 2006; 19:1540-8. [PMID: 17112243 DOI: 10.1021/tx0601509] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A large number of environmental chemicals, known as endocrine-disrupting chemicals, are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones, and such chemicals may pose a serious threat to the health of humans and wildlife. They are thought to act through a variety of mechanisms, mainly estrogen-receptor-mediated mechanisms of toxicity. However, it is practically impossible to perform thorough toxicological tests on all potential xenoestrogens, and thus, the quantitative structure--activity relationship (QSAR) provides a promising method for the estimation of a compound's estrogenic activity. Here, QSAR models of the estrogen receptor binding affinity of a large data set of heterogeneous chemicals have been built using theoretical molecular descriptors, giving full consideration to the new OECD principles in regulation for QSAR acceptability, during model construction and assessment. An unambiguous multiple linear regression (MLR) algorithm was used to build the models, and model predictive ability was validated by both internal and external validation. The applicability domain was checked by the leverage approach to verify prediction reliability. The results obtained using several validation paths indicate that the proposed QSAR model is robust and satisfactory, and can provide a feasible and practical tool for the rapid screening of the estrogen activity of organic compounds.
Collapse
Affiliation(s)
- Huanxiang Liu
- Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy
| | | | | |
Collapse
|
42
|
Gramatica P, Giani E, Papa E. Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. J Mol Graph Model 2006; 25:755-66. [PMID: 16890002 DOI: 10.1016/j.jmgm.2006.06.005] [Citation(s) in RCA: 177] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Revised: 06/26/2006] [Accepted: 06/26/2006] [Indexed: 10/24/2022]
Abstract
The soil sorption partition coefficient (log K(oc)) of a heterogeneous set of 643 organic non-ionic compounds, with a range of more than 6 log units, is predicted by a statistically validated QSAR modeling approach. The applied multiple linear regression (ordinary least squares, OLS) is based on a variety of theoretical molecular descriptors selected by the genetic algorithms-variable subset selection (GA-VSS) procedure. The models were validated for predictivity by different internal and external validation approaches. For external validation we applied self organizing maps (SOM) to split the original data set: the best four-dimensional model, developed on a reduced training set of 93 chemicals, has a predictivity of 78% when applied on 550 validation chemicals (prediction set). The selected molecular descriptors, which could be interpreted through their mechanistic meaning, were compared with the more common physico-chemical descriptors log K(ow) and log S(w). The chemical applicability domain of each model was verified by the leverage approach in order to propose only reliable data. The best predicted data were obtained by consensus modeling from 10 different models in the genetic algorithm model population.
Collapse
Affiliation(s)
- Paola Gramatica
- Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy.
| | | | | |
Collapse
|
43
|
Devillers J, Marchand-Geneste N, Carpy A, Porcher JM. SAR and QSAR modeling of endocrine disruptors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:393-412. [PMID: 16920661 DOI: 10.1080/10629360600884397] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
A number of xenobiotics by mimicking natural hormones can disrupt crucial functions in wildlife and humans. These chemicals termed endocrine disruptors are able to exert adverse effects through a variety of mechanisms. Fortunately, there is a growing interest in the study of these structurally diverse chemicals mainly through research programs based on in vitro and in vivo experimentations but also by means of SAR and QSAR models. The goal of our study was to retrieve from the literature all the papers dealing with structure-activity models on endocrine disruptor xenobiotics. A critical analysis of these models was made focusing our attention on the quality of the biological data, the significance of the molecular descriptors and the validity of the statistical tools used for deriving the models. The predictive power and domain of application of these models were also discussed.
Collapse
Affiliation(s)
- J Devillers
- CTIS, 3 Chemin de la Gravière, 69140 Rillieux La Pape, France.
| | | | | | | |
Collapse
|
44
|
Asikainen A, Kolehmainen M, Ruuskanen J, Tuppurainen K. Structure-based classification of active and inactive estrogenic compounds by decision tree, LVQ and kNN methods. CHEMOSPHERE 2006; 62:658-73. [PMID: 15992856 DOI: 10.1016/j.chemosphere.2005.04.115] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2004] [Revised: 04/18/2005] [Accepted: 04/29/2005] [Indexed: 05/03/2023]
Abstract
The performance of decision tree (DT), learning vector quantization (LVQ), and k-nearest neighbour (kNN) methods classifying active and inactive estrogenic compounds in terms of their structure activity relationship (SAR) was evaluated. A set of 311 compounds was used for construction of the models, the predictive power of which was verified with separate training and test sets. Principal components derived from molecular descriptors calculated with DRAGON software were used as variables representing the structures of the compounds. Broadly, kNN had the best classification ability and DT the weakest, although the performance of each method was dependent on the group of compounds used for modelling. The best performance was obtained with kNN for the calf estrogen receptor data, averaging 98.3% of correctly classified compounds in the external tests. Overall, the results indicate that all the methods tested are suitable for the SAR classification of estrogenic compounds, producing models with a predictive power ranging from adequate to excellent.
Collapse
Affiliation(s)
- Arja Asikainen
- Department of Environmental Sciences, University of Kuopio, P.O. Box 1627, FIN-70211 Kuopio, Finland.
| | | | | | | |
Collapse
|
45
|
Marini F, Roncaglioni A, Novic M. Variable Selection and Interpretation in Structure−Affinity Correlation Modeling of Estrogen Receptor Binders. J Chem Inf Model 2005; 45:1507-19. [PMID: 16309247 DOI: 10.1021/ci0501645] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A computational approach for the identification and investigation of correlations between a chemical structure and a selected biological property is described. It is based on a set of 132 compounds of known chemical structures, which were tested for their binding affinities to the estrogen receptor. Different multivariate modeling methods, i.e., partial least-squares regression, counterpropagation neural network, and error-back-propagation neural network, were applied, and the prediction ability of each model was tested in order to compare the results of the obtained models. To reduce the extensive set of calculated structural descriptors, two types of variable selection methods were applied, depending on the modeling approach used. In particular, the final partial least-squares regression model was built using the "variable importance in projection" variable selection method, while genetic algorithms were applied in neural network modeling to select the optimal set of descriptors. A thorough statistical study of the variables selected by genetic algorithms is shown. The results were assessed with the aim to get insight to the mechanisms involved in the binding of estrogenic compounds to the receptor. The variable selection on the basis of genetic algorithm was controlled with the test set of compounds, extracted from the data set available. To compare the predictive ability of all the optimized models, a leave-one-out cross-validation procedure was applied, the best model being the nonlinear neural network model based on error back-propagation algorithm, which resulted in R2= 92.2% and Q2= 70.8%.
Collapse
Affiliation(s)
- Federico Marini
- National Institute of Chemistry, Ljubljana, Slovenia, University of Rome La Sapienza, Rome, Italy
| | | | | |
Collapse
|
46
|
Korhonen SP, Tuppurainen K, Laatikainen R, Peräkylä M. Comparing the Performance of FLUFF-BALL to SEAL-CoMFA with a Large Diverse Estrogen Data Set: From Relevant Superpositions to Solid Predictions. J Chem Inf Model 2005; 45:1874-83. [PMID: 16309295 DOI: 10.1021/ci050021i] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this work a template-based molecular mechanistic superposition algorithm FLUFF (Flexible Ligand Unified Force Field) and an accompanying local coordinate QSAR method BALL (Boundless Adaptive Localized Ligand) are validated against the benchmark techniques SEAL (Steric and Electrostatic Alignment) and CoMFA (Comparative Molecular Field Analysis) using a large diverse set of 245 xenoestrogens extracted from the EDKB (Endocrine Disruptor Knowledge Base) maintained by NCTR (National Centre for Toxicological Research). The results indicate that FLUFF is capable of generating relevant superpositions not only for BALL but also for CoMFA, as both techniques give predictive QSAR models. When the BALL and CoMFA methods are compared, it is clear that the BALL algorithm met or even exceeded the results of the standard 3D-QSAR method CoMFA using alignments either from the tailor-made superposition technique FLUFF or the reference method SEAL. The FLUFF-BALL method can be easily automated, and it is computationally light, providing thus a good computational "sieve" capable of fast screening of large molecule libraries.
Collapse
Affiliation(s)
- Samuli-Petrus Korhonen
- Department of Chemistry, University of Kuopio, P.O. Box 1627, FIN-70211, Kuopio, Finland.
| | | | | | | |
Collapse
|