1
|
Rakhimbekova A, Akhmetshin TN, Minibaeva GI, Nugmanov RI, Gimadiev TR, Madzhidov TI, Baskin II, Varnek A. Cross-validation strategies in QSPR modelling of chemical reactions. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:207-219. [PMID: 33601989 DOI: 10.1080/1062936x.2021.1883107] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/26/2021] [Indexed: 06/12/2023]
Abstract
In this article, we consider cross-validation of the quantitative structure-property relationship models for reactions and show that the conventional k-fold cross-validation (CV) procedure gives an 'optimistically' biased assessment of prediction performance. To address this issue, we suggest two strategies of model cross-validation, 'transformation-out' CV, and 'solvent-out' CV. Unlike the conventional k-fold cross-validation approach that does not consider the nature of objects, the proposed procedures provide an unbiased estimation of the predictive performance of the models for novel types of structural transformations in chemical reactions and reactions going under new conditions. Both the suggested strategies have been applied to predict the rate constants of bimolecular elimination and nucleophilic substitution reactions, and Diels-Alder cycloaddition. All suggested cross-validation methodologies and tutorial are implemented in the open-source software package CIMtools (https://github.com/cimm-kzn/CIMtools).
Collapse
Affiliation(s)
- A Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T N Akhmetshin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
| | - G I Minibaeva
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - R I Nugmanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - T R Gimadiev
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| | - T I Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
| | - I I Baskin
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, Haifa, Israel
| | - A Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, Strasbourg, France
- Institute for Chemical Reaction Design and Discovery, Hokkaido University, Sapporo, Japan
| |
Collapse
|
2
|
Kang D, Oh S. Balanced training/test set sampling for proper evaluation of classification models. INTELL DATA ANAL 2020. [DOI: 10.3233/ida-194477] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Donghoon Kang
- Department of Data Science, Dankook University 152, Gyeonggi-do, Korea
| | - Sejong Oh
- Department of Software Science, Dankook University 152, Gyeonggi-do, Korea
| |
Collapse
|
3
|
Pandey AK, Shukla DV, Singh P, Dwivedi A. Molecular Docking of a Bio Material [1-{[(Z)-Cyclopentylidene] Amino}-3-Phenylthiourea] by First Principles Study. Polycycl Aromat Compd 2019. [DOI: 10.1080/10406638.2019.1692878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Anoop Kumar Pandey
- Department of Physics, Saket Mahavidhyalaya, Ayodhya, Utter Pradesh, India
| | - D. V. Shukla
- Department of Physics, GLA University, Mathura, Utter Pradesh, India
| | - Prashant Singh
- Department of Physics, Rajendra College, Chhapra, Bihar, India
| | - Apoorva Dwivedi
- Department of Physics, Marwar Business School, Gorakhpur, Utter Pradesh, India
| |
Collapse
|
4
|
Sheikhpour R, Sarram MA, Rezaeian M, Sheikhpour E. QSAR modelling using combined simple competitive learning networks and RBF neural networks. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2018; 29:257-276. [PMID: 29372662 DOI: 10.1080/1062936x.2018.1424030] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 01/02/2018] [Indexed: 06/07/2023]
Abstract
The aim of this study was to propose a QSAR modelling approach based on the combination of simple competitive learning (SCL) networks with radial basis function (RBF) neural networks for predicting the biological activity of chemical compounds. The proposed QSAR method consisted of two phases. In the first phase, an SCL network was applied to determine the centres of an RBF neural network. In the second phase, the RBF neural network was used to predict the biological activity of various phenols and Rho kinase (ROCK) inhibitors. The predictive ability of the proposed QSAR models was evaluated and compared with other QSAR models using external validation. The results of this study showed that the proposed QSAR modelling approach leads to better performances than other models in predicting the biological activity of chemical compounds. This indicated the efficiency of simple competitive learning networks in determining the centres of RBF neural networks.
Collapse
Affiliation(s)
- R Sheikhpour
- a Department of Computer Engineering , Yazd University , Yazd , Iran
| | - M A Sarram
- a Department of Computer Engineering , Yazd University , Yazd , Iran
| | - M Rezaeian
- a Department of Computer Engineering , Yazd University , Yazd , Iran
| | - E Sheikhpour
- b Hematology and Oncology Research Center , Shahid Sadoughi University of Medical Sciences , Yazd , Iran
| |
Collapse
|
5
|
An improvement on the prediction power of the 3D-QSAR CoMFA models using a hybrid of statistical and machine learning methods: a case study on γ‑secretase modulators of Alzheimer’s disease. Med Chem Res 2017. [DOI: 10.1007/s00044-017-1828-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Abstract
INTRODUCTION Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. AREAS COVERED In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. EXPERT OPINION Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It's anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Collapse
Affiliation(s)
- Igor I Baskin
- a Faculty of Physics , M.V. Lomonosov Moscow State University , Moscow , Russia.,b A.M. Butlerov Institute of Chemistry , Kazan Federal University , Kazan , Russia
| | - David Winkler
- c CSIRO Manufacturing , Clayton , VIC , Australia.,d Monash Institute for Pharmaceutical Sciences , Monash University , Parkville , VIC , Australia.,e Latrobe Institute for Molecular Science , Bundoora , VIC , Australia.,f School of Chemical and Physical Sciences , Flinders University , Bedford Park , SA , Australia
| | - Igor V Tetko
- g Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) , Institute of Structural Biology , Neuherberg , Germany.,h BigChem GmbH , Neuherberg , Germany
| |
Collapse
|
7
|
Aksakal F, Shvets N, Dimoglo A. The study of dual COX-2/5-LOX inhibitors by using electronic-topological approach based on data on the ligand–receptor interactions. J Mol Graph Model 2015; 60:79-88. [DOI: 10.1016/j.jmgm.2015.06.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 06/14/2015] [Accepted: 06/16/2015] [Indexed: 11/17/2022]
|
8
|
Li L, Hu J, Ho YS. Global Performance and Trend of QSAR/QSPR Research: A Bibliometric Analysis. Mol Inform 2014; 33:655-668. [PMID: 27485301 DOI: 10.1002/minf.201300180] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 08/04/2014] [Indexed: 11/08/2022]
Abstract
A bibliometric analysis based on the Science Citation Index Expanded was conducted to provide insights into the publication performance and research trend of quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) from 1993 to 2012. The results show that the number of articles per year quadrupled from 1993 to 2006 and plateaued since 2007. Journal of Chemical Information and Modeling was the most prolific journal. The internal methodological innovations in acquiring molecular descriptors and modeling stimulated the articles' increase in the research fields of drug design and synthesis, and chemoinformatics; while the external regulatory demands on model validation and reliability fueled the increase in environmental sciences. "Prediction endpoints", "statistical algorithms", and "molecular descriptors" were identified as three research hotspots. The articles from developed countries were larger in number and more influential in citation, whereas those from developing countries were higher in output growth rates.
Collapse
Affiliation(s)
- Li Li
- State Key Joint Laboratory for Environmental Simulation and Pollution Control, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, People's Republic of China
| | - Jianxin Hu
- State Key Joint Laboratory for Environmental Simulation and Pollution Control, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, People's Republic of China
| | - Yuh-Shan Ho
- Trend Research Centre, Asia University, Taichung 41354, Taiwan.
- Department of Environmental Engineering, Peking University, Beijing 100871, People's Republic of China tel: +886 4 2332 3456 x 1797; fax: +886 4 2330 5834..
| |
Collapse
|
9
|
The continuous molecular fields approach to building 3D-QSAR models. J Comput Aided Mol Des 2013; 27:427-42. [PMID: 23719959 DOI: 10.1007/s10822-013-9656-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Accepted: 05/22/2013] [Indexed: 10/26/2022]
Abstract
The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure-activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.
Collapse
|
10
|
Roy K. On some aspects of validation of predictive quantitative structure-activity relationship models. Expert Opin Drug Discov 2013; 2:1567-77. [PMID: 23488901 DOI: 10.1517/17460441.2.12.1567] [Citation(s) in RCA: 183] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The success of any quantitative structure-activity relationship model depends on the accuracy of the input data, selection of appropriate descriptors and statistical tools and, most importantly, the validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose. This review focuses on the importance of validation of quantitative structure-activity relationship models and different methods of validation. Some important issues, such as internal versus external validation, method of selection of training set compounds and training set size, applicability domain, variable selection and suitable parameters to indicate external predictivity, are also discussed.
Collapse
Affiliation(s)
- Kunal Roy
- Jadavpur University, Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Kolkata 700 032, India +91 98315 94140 ; +91 33 2837 1078 ;
| |
Collapse
|
11
|
Tosco P, Balle T. A 3D-QSAR-Driven Approach to Binding Mode and Affinity Prediction. J Chem Inf Model 2011; 52:302-7. [DOI: 10.1021/ci200411s] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Paolo Tosco
- Department of Drug Science and Technology, University of Turin, Via Pietro Giuria 9, 10125 Torino, Italy
| | - Thomas Balle
- Department of Medicinal Chemistry, The Faculty of Pharmaceutical Sciences, University of Copenhagen, 2 Universitetsparken, 2100 Copenhagen, Denmark
| |
Collapse
|
12
|
Macaev F, Ribkovskaia Z, Pogrebnoi S, Boldescu V, Rusu G, Shvets N, Dimoglo A, Geronikaki A, Reynolds R. The structure–antituberculosis activity relationships study in a series of 5-aryl-2-thio-1,3,4-oxadiazole derivatives. Bioorg Med Chem 2011; 19:6792-807. [DOI: 10.1016/j.bmc.2011.09.038] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 09/11/2011] [Accepted: 09/21/2011] [Indexed: 11/28/2022]
|
13
|
Ning X, Karypis G. In silico structure-activity-relationship (SAR) models from machine learning: a review. Drug Dev Res 2010. [DOI: 10.1002/ddr.20410] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
14
|
ETM-ANN approach application for thiobenzamide and quinolizidine derivatives. J Biomed Biotechnol 2010; 2010. [PMID: 20871848 PMCID: PMC2943087 DOI: 10.1155/2010/693031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Revised: 05/15/2010] [Accepted: 06/30/2010] [Indexed: 11/23/2022] Open
Abstract
The structure anti-influenza activity relationships of thiobenzamide and quinolizidine derivatives, being influenza fusion inhibitors, have been investigated using the electronic-topological method (ETM) and artificial neural network (ANN) method. Molecular fragments specific for active compounds and breaks of activity were calculated for influenza fusion inhibitors by applying the ETM. QSAR descriptors such as molecular weight, EHOMO, ELUMO, ΔE, chemical potential, softness, electrophilicity index, dipole moment, and so forth were calculated, and it was found to give good statistical qualities (classified correctly 92%, or 48 compounds from 52 in training set, and 69% or 9 compounds from 13 in the external test set). By using multiple linear regression, several QSAR models were performed with the help of calculated descriptors and the compounds activity data. Among the obtained QSAR models, statistically the most significant one is the one of skeleton 1 with R2 = 0.999.
Collapse
|
15
|
Machado A, Tejera E, Cruz-Monteagudo M, Rebelo I. Application of desirability-based multi(bi)-objective optimization in the design of selective arylpiperazine derivates for the 5-HT1A serotonin receptor. Eur J Med Chem 2009; 44:5045-54. [DOI: 10.1016/j.ejmech.2009.09.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2009] [Revised: 07/06/2009] [Accepted: 09/06/2009] [Indexed: 10/20/2022]
|
16
|
Quantitative Series Enrichment Analysis (QSEA): a novel procedure for 3D-QSAR analysis. J Comput Aided Mol Des 2008; 22:541-51. [DOI: 10.1007/s10822-008-9195-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 02/07/2008] [Indexed: 10/22/2022]
|
17
|
Kovalishyn VV, Kholodovych V, Tetko IV, Welsh WJ. Volume learning algorithm significantly improved PLS model for predicting the estrogenic activity of xenoestrogens. J Mol Graph Model 2007; 26:591-4. [PMID: 17433745 DOI: 10.1016/j.jmgm.2007.03.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2006] [Accepted: 03/12/2007] [Indexed: 10/23/2022]
Abstract
Volume learning algorithm (VLA) artificial neural network and partial least squares (PLS) methods were compared using the leave-one-out cross-validation procedure for prediction of relative potency of xenoestrogenic compounds to the estrogen receptor. Using Wilcoxon signed rank test we showed that VLA outperformed PLS by producing models with statistically superior results for a structurally diverse set of compounds comprising eight chemical families. Thus, CoMFA/VLA models are successful in prediction of the endocrine disrupting potential of environmental pollutants and can be effectively applied for testing of prospective chemicals prior their exposure to the environment.
Collapse
Affiliation(s)
- Vasyl V Kovalishyn
- Institute of Bioorganic Chemistry and Petrochemistry, Kyiv, Murmanska 1, 02660, Ukraine
| | | | | | | |
Collapse
|
18
|
Salo OMH, Savinainen JR, Parkkari T, Nevalainen T, Lahtela-Kakkonen M, Gynther J, Laitinen JT, Järvinen T, Poso A. 3D-QSAR Studies on Cannabinoid CB1 Receptor Agonists: G-Protein Activation as Biological Data. J Med Chem 2005; 49:554-66. [PMID: 16420041 DOI: 10.1021/jm0505157] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
G-protein activation via the CB1 receptor was determined for a group of various CB1 ligands and utilized as biological activity data in subsequent CoMFA and CoMSIA studies. Both manual techniques and automated docking at CB1 receptor models were used to obtain a common alignment of endocannabinoid and classical cannabinoid derivatives. In the final alignment models, the endocannabinoid headgroup occupies a unique region distinct from the classical cannabinoid structures, supporting the hypothesis that these structurally diverse molecules overlap only partially within the receptor binding site. Both CoMFA and CoMSIA produce statistically significant models based on the manual alignment and a docking alignment at one receptor conformer. Leave-half-out cross-validation and progressive scrambling were successfully used in assessing the predictivity of the QSAR models.
Collapse
Affiliation(s)
- Outi M H Salo
- Department of Pharmaceutical Chemistry, University of Kuopio, FIN-70211 Kuopio, Finland.
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Willis PG, Pavlova OA, Chefer SI, Vaupel DB, Mukhin AG, Horti AG. Synthesis and Structure−Activity Relationship of a Novel Series of Aminoalkylindoles with Potential for Imaging the Neuronal Cannabinoid Receptor by Positron Emission Tomography. J Med Chem 2005; 48:5813-22. [PMID: 16134948 DOI: 10.1021/jm0502743] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A new series of CB(1) ligands with high binding affinity (K(i) = 0.7-100 nM) and moderate lipophilicity (cLogD(7.4)) in the range of 2.1-4.5 has been synthesized. A structure-activity relationship study demonstrated that for the studied set of aminoalkylindoles, the molecular dipole of the ground state conformation within the series was inversely related to the affinity. The racemic ligand with highest affinity (0.7 nM), 3-(4-fluoronaphthoyl)-1-(N-methylpiperidin-2-ylmethyl)indole, was radiolabeled with (18)F. This radioligand specifically labeled CB(1) receptors in mouse brain and accumulated in regions of high versus low CB(1) receptor density in a ratio of 1.6. The displaceable radioactivity of one enantiomer in the brains of mice determined in a pretreatment study using the CB(1) antagonist N-(piperidinyl)-5-(4-chlorophenyl)-1-(2,4-dichlorophenyl)-4-methyl-1H-pyrazole-3-carboxamide (SR141716) was nearly double that of the racemate for the same determination; therefore, the active enantiomer is a candidate for PET studies in animals. A pretreatement study for the other enantiomer found no displaceable radioactivity in the same group of mice; this result suggested the enantiomer was inactive.
Collapse
Affiliation(s)
- Peter G Willis
- Neuroimaging Research Branch, Intramural Research Program, National Institute on Drug Abuse, NIH, DHHS, 5500 Nathan Shock Drive, Baltimore, Maryland 21224, USA.
| | | | | | | | | | | |
Collapse
|
20
|
Guha R, Serra JR, Jurs PC. Generation of QSAR sets with a self-organizing map. J Mol Graph Model 2005; 23:1-14. [PMID: 15331049 DOI: 10.1016/j.jmgm.2004.03.003] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2003] [Revised: 12/01/2003] [Accepted: 03/03/2004] [Indexed: 11/17/2022]
Abstract
A Kohonen self-organizing map (SOM) is used to classify a data set consisting of dihydrofolate reductase inhibitors with the help of an external set of Dragon descriptors. The resultant classification is used to generate training, cross-validation (CV) and prediction sets for QSAR modeling using the ADAPT methodology. The results are compared to those of QSAR models generated using sets created by activity binning and a sphere exclusion method. The results indicate that the SOM is able to generate QSAR sets that are representative of the composition of the overall data set in terms of similarity. The resulting QSAR models are half the size of those published and have comparable RMS errors. Furthermore, the RMS errors of the QSAR sets are consistent, indicating good predictive capabilities as well as generalizability.
Collapse
Affiliation(s)
- Rajarshi Guha
- Department of Chemistry, Penn State University, 152 Davey Laboratory, University Park 16802, USA
| | | | | |
Collapse
|
21
|
Macaev F, Rusu G, Pogrebnoi S, Gudima A, Stingaci E, Vlad L, Shvets N, Kandemirli F, Dimoglo A, Reynolds R. Synthesis of novel 5-aryl-2-thio-1,3,4-oxadiazoles and the study of their structure–anti-mycobacterial activities. Bioorg Med Chem 2005; 13:4842-50. [PMID: 15993090 DOI: 10.1016/j.bmc.2005.05.011] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2005] [Revised: 05/05/2005] [Accepted: 05/06/2005] [Indexed: 11/21/2022]
Abstract
The preparation of novel 5-aryl-2-thio-1,3,4-oxadiazoles 4a-41 and the computer-aided study of their in vitro anti-tubercular activity against Mycobacterium tuberculosis H37Rv (ATCC 27294) are reported. The average accuracy of the electronic-topological method and neural network methods applied to the activity prediction in leave-one-out cross validation is 80%.
Collapse
Affiliation(s)
- Fliur Macaev
- Institute of Chemistry, Academy of Sciences of Moldova, Chisinau, MD-2028, Republic of Moldova.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Hoshi K, Kawakami J, Kumagai M, Kasahara S, Nishimura N, Nakamura H, Sato K. An Analysis of Thyroid Function Diagnosis Using Bayesian-Type and SOM-Type Neural Networks. Chem Pharm Bull (Tokyo) 2005; 53:1570-4. [PMID: 16327191 DOI: 10.1248/cpb.53.1570] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Thyroid function diagnosis is an important classification problem, and we made reanalysis of the human thyroid data, which had been analyzed by the multivariate analysis, by the two notable neural networks. One is the self-organizing map approach which clusters the patients and displays visually a characteristic of the distribution according to laboratory tests. We found that self-organizing map (SOM) consists of three well separated clusters corresponding to hyperthyroid, hypothyroid and normal, and more detailed information for patients is obtained from the position in the map. Besides, the missing value SOM which we had introduced to investigate QSAR problem turned out to be also useful in treating such classification problem. We estimated the classification rates of thyroid disease using Bayesian regularized neural network (BRNN) and found that its prediction accuracy is better than multivariate analysis. Automatic relevance determination (ARD) method of BRNN was surely verified to be effective by the direct calculation of classification rates using BRNN without ARD for all possible combinations of laboratory tests.
Collapse
Affiliation(s)
- Kenji Hoshi
- Information Science Center, Tohoku Pharmaceutical University, Komatsushima, Sendai, Japan
| | | | | | | | | | | | | |
Collapse
|
23
|
Murcia-Soler M, Pérez-Giménez F, García-March FJ, Salabert-Salvador MT, Díaz-Villanueva W, Castro-Bleda MJ, Villanueva-Pareja A. Artificial Neural Networks and Linear Discriminant Analysis: A Valuable Combination in the Selection of New Antibacterial Compounds. ACTA ACUST UNITED AC 2004; 44:1031-41. [PMID: 15154772 DOI: 10.1021/ci030340e] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A set of topological descriptors has been used to discriminate between antibacterial and nonantibacterial drugs. Topological descriptors are simple integers calculated from the molecular structure represented in SMILES format. The methods used for antibacterial activity discrimination were linear discriminant analysis (LDA) and artificial neural networks of a multilayer perceptron (MLP) type. The following plot frequency distribution diagrams were used: a function of the number of drugs within a value interval of the discriminant function and the output value of the neural network versus these values. Pharmacological distribution diagrams (PDD) were used as a visualizing technique for the identification of antibacterial agents. The results confirmed the discriminative capacity of the topological descriptors proposed. The combined use of LDA and MLP in the guided search and the selection of new structures with theoretical antibacterial activity proved highly effective, as shown by the in vitro activity and toxicity assays conducted.
Collapse
Affiliation(s)
- Miguel Murcia-Soler
- Department of Physical Chemistry, Faculty of Pharmacy, Universitat de València, Av. Vicent Andrés Estellés, s/n. 46100 Burjassot, Valencia, Spain
| | | | | | | | | | | | | |
Collapse
|
24
|
Kawakami J, Hoshi K, Ishiyama A, Miyagishima S, Sato K. Application of a Self-Organizing Map to Quantitative Structure-Activity Relationship Analysis of Carboquinone and Benzodiazepine. Chem Pharm Bull (Tokyo) 2004; 52:751-5. [PMID: 15187400 DOI: 10.1248/cpb.52.751] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Self-organizing map (SOM) of Kohonen seems to be a promising approach beyond the standard one to regression for some classification problems encountered in the field of pharmacy. We apply them, therefore, to the quantitative structure-activity relationship (QSAR) in carboquinones and benzodiazepines, and show their usefulness. Most QSAR analysis using neural networks has been made by adopting neural networks with supervised learning. On the contrary, SOM obeys unsupervised learning and originally does not involve the use of desired target data. If we note that an appreciable fraction of data may be missing without making the similarity comparison impossible in SOM if the number of attributes considered is appreciable, QSAR analysis using SOM is found to be possible as if supervised learning. Similar to target data in supervised learning, we can take into account target data (=observed activity) as one of attributes in addition to other attributes (=structural descriptors). Choice of optimal descriptors as input parameters was found to be essential to generate valuable SOM.
Collapse
Affiliation(s)
- Junko Kawakami
- Information Science Center, Tohoku Pharmaceutical University, 4-4-1 Komatsushima, Aoba-ku, Sendai 981-8558, Japan
| | | | | | | | | |
Collapse
|
25
|
Golbraikh A, Tropsha A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. Mol Divers 2003; 5:231-43. [PMID: 12549674 DOI: 10.1023/a:1021372108686] [Citation(s) in RCA: 154] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.
Collapse
Affiliation(s)
- Alexander Golbraikh
- The Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599-7360, USA
| | | |
Collapse
|
26
|
Abstract
To provide an objective QSAR methodology that might accelerate lead optimization, the CoMFA and topomer technologies have been merged, with surprisingly good results. A series of input structures are each broken into two or more fragments at central acyclic single bonds, while removing any core fragment structurally common to the entire series. Standard topomer 3D models are automatically constructed for each fragment, and a set of steric and electrostatic fields ("CoMFA column") is generated for each set of topomers. Application of "topomer CoMFA" to 15 3D-QSAR analyses taken from the literature (847 structures) were all successful, with an average q(2) of 0.520 (literature average q(2) = 0.636) and an average standard deviation of true prediction (SDEP) of 0.688 (literature average SDEP = 0.553) for 133 structures. Topomer CoMFA results are particularly promising as queries into virtual libraries already composed of topomer structures, to directly seek structures having increased potency. Accordingly, in 13 of the 15 such "topomer CoMFA searches" attempted, combinations of commercially offered fragments were retrieved that were predicted to be more potent than any structure described in the original publication (average predicted potency increase = 20 x), showing in principle how optimization could occur.
Collapse
Affiliation(s)
- Richard D Cramer
- Tripos Inc., 1699 South Hanley Road, St. Louis, Missouri 63144, USA.
| |
Collapse
|
27
|
Tetko IV. Neural network studies. 4. Introduction to associative neural networks. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:717-28. [PMID: 12086534 DOI: 10.1021/ci010379o] [Citation(s) in RCA: 119] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Associative neural network (ASNN) represents a combination of an ensemble of feed-forward neural networks and the k-nearest neighbor technique. This method uses the correlation between ensemble responses as a measure of distance amid the analyzed cases for the nearest neighbor technique. This provides an improved prediction by the bias correction of the neural network ensemble. An associative neural network has a memory that can coincide with the training set. If new data becomes available, the network further improves its predictive ability and provides a reasonable approximation of the unknown function without a need to retrain the neural network ensemble. This feature of the method dramatically improves its predictive ability over traditional neural networks and k-nearest neighbor techniques, as demonstrated using several artificial data sets and a program to predict lipophilicity of chemical compounds. Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models. It is shown that analysis of such correlations makes it possible to provide "property-targeted" clustering of data. The possible applications and importance of ASNN in drug design and medicinal and combinatorial chemistry are discussed. The method is available on-line at http://www.vcclab.org/lab/asnn.
Collapse
Affiliation(s)
- Igor V Tetko
- Laboratoire de Neuro-Heuristique, Institut de Physiologie, Rue du Bugnon 7, Lausanne, CH-1005, Switzerland.
| |
Collapse
|
28
|
Golbraikh A, Tropsha A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J Comput Aided Mol Des 2002; 16:357-69. [PMID: 12489684 DOI: 10.1023/a:1020869118689] [Citation(s) in RCA: 291] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
One of the most important characteristics of Quantitative Structure Activity Relashionships (QSAR) models is their predictive power. The latter can be defined as the ability of a model to predict accurately the target property (e.g., biological activity) of compounds that were not used for model development. We suggest that this goal can be achieved by rational division of an experimental SAR dataset into the training and test set, which are used for model development and validation, respectively. Given that all compounds are represented by points in multidimensional descriptor space, we argue that training and test sets must satisfy the following criteria: (i) Representative points of the test set must be close to those of the training set; (ii) Representative points of the training set must be close to representative points of the test set; (iii) Training set must be diverse. For quantitative description of these criteria, we use molecular dataset diversity indices introduced recently (Golbraikh, A., J. Chem. Inf. Comput. Sci., 40 (2000) 414-425). For rational division of a dataset into the training and test sets, we use three closely related sphere-exclusion algorithms. Using several experimental datasets, we demonstrate that QSAR models built and validated with our approach have statistically better predictive power than models generated with either random or activity ranking based selection of the training and test sets. We suggest that rational approaches to the selection of training and test sets based on diversity principles should be used routinely in all QSAR modeling research.
Collapse
Affiliation(s)
- Alexander Golbraikh
- The Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599-7360, USA
| | | |
Collapse
|