1
|
Liu M, Yang Y, Zhao X, Wang Y, Li M, Wang Y, Tian M, Zhou J. Classification and characterization on sorghums based on HS-GC-IMS combined with OPLS-DA and GA-PLS. Curr Res Food Sci 2024; 8:100692. [PMID: 38352629 PMCID: PMC10862501 DOI: 10.1016/j.crfs.2024.100692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/14/2024] [Accepted: 01/30/2024] [Indexed: 02/16/2024] Open
Abstract
Headspace gas chromatography-ion mobility spectrometry (HS-GC-IMS) detected 206 and 186 samples of fresh and stored sorghums respectively with three major types in Baijiu industry. The fingerprints showed the differences of volatile compounds among fresh sorghum types by qualitative analysis and artificial recognition. Organic waxy sorghums had more contents of nonanal and 2-ethyl-1-hexanol but fewer ketones. The contents of acetoin in non-glutinous sorghums and organic non-glutinous sorghums were high. On the other hand, genetic algorithm-partial least squares (GA-PLS) selected 19 and 32 characteristic volatile compounds in fresh and stored sorghums. After centering and auto scaling to unit variance, the classification models with three major types of organic waxy sorghum, non-glutinous sorghum and organic non-glutinous sorghum were established based on orthogonal partial least squares-discriminant analysis (OPLS-DA). The goodness-of-fit (R2Y) and the goodness-of-prediction in cross-validation (Q2) in the model of fresh sorghum types all exceeded 0.9, in stored were over 0.8, the correct classification rates of external prediction were 95 % and 100 %, which revealed good performance and prediction. On this basis, the correct classification rates reached 87 % in organic waxy sorghums adulterated over 10 % ratio. GC-IMS combined with chemometrics is applicable in practical production for rapid identification of sorghum types and adulterations.
Collapse
Affiliation(s)
- Mengjie Liu
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Yang Yang
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Xiaobo Zhao
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
- National Engineering Research Center of Solid-State Brewing, Luzhou, 646000, China
| | - Yao Wang
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Meiyin Li
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Yu Wang
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Min Tian
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
| | - Jun Zhou
- Luzhou Laojiao Co. Ltd., Luzhou, 646000, China
- National Engineering Research Center of Solid-State Brewing, Luzhou, 646000, China
| |
Collapse
|
2
|
Estimation of the late postmortem interval using FTIR spectroscopy and chemometrics in human skeletal remains. Forensic Sci Int 2017; 281:113-120. [DOI: 10.1016/j.forsciint.2017.10.033] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 09/16/2017] [Accepted: 10/24/2017] [Indexed: 01/25/2023]
|
3
|
Wang Q, He H, Li B, Lin H, Zhang Y, Zhang J, Wang Z. UV-Vis and ATR-FTIR spectroscopic investigations of postmortem interval based on the changes in rabbit plasma. PLoS One 2017; 12:e0182161. [PMID: 28753641 PMCID: PMC5533326 DOI: 10.1371/journal.pone.0182161] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 07/13/2017] [Indexed: 11/23/2022] Open
Abstract
Estimating PMI is of great importance in forensic investigations. Although many methods are used to estimate the PMI, a few investigations focus on the postmortem redistribution. In this study, ultraviolet-visible (UV-Vis) measurement combined with visual inspection indicated a regular diffusion of hemoglobin into plasma after death showing the redistribution of postmortem components in blood. Thereafter, attenuated total reflection-Fourier transform infrared (ATR-FTIR) spectroscopy was used to confirm the variations caused by this phenomenon. First, full-spectrum partial least-squares (PLS) and genetic algorithm combined with PLS (GA-PLS) models were constructed to predict the PMI. The performance of GA-PLS model was better than that of full-spectrum PLS model based on its root mean square error (RMSE) of cross-validation of 3.46 h (R2 = 0.95) and the RMSE of prediction of 3.46 h (R2 = 0.94). The investigation on the similarity of spectra between blood plasma and formed elements also supported the role of redistribution of components in spectral changes in postmortem plasma. These results demonstrated that ATR-FTIR spectroscopy coupled with the advanced mathematical methods could serve as a convenient and reliable tool to study the redistribution of postmortem components and estimate the PMI.
Collapse
Affiliation(s)
- Qi Wang
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Haijun He
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Bing Li
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Hancheng Lin
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Yinming Zhang
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Ji Zhang
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Zhenyuan Wang
- Department of Forensic Pathology, College of Forensic Medicine, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| |
Collapse
|
4
|
Comprehensive comparison of twenty structural characterization scales applied as QSAM of antimicrobial dodecapeptides derived from Bac2A against P. aeruginosa. J Mol Graph Model 2017; 71:88-95. [DOI: 10.1016/j.jmgm.2016.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Revised: 11/02/2016] [Accepted: 11/06/2016] [Indexed: 02/04/2023]
|
5
|
te Pas MF, Kruijt L, Pierzchala M, Crump RE, Boeren S, Keuning E, Hoving-Bolink R, Hortós M, Gispert M, Arnau J, Diestre A, Mulder HA. Identification of proteomic biomarkers in M. Longissimus dorsi as potential predictors of pork quality. Meat Sci 2013; 95:679-87. [DOI: 10.1016/j.meatsci.2012.12.015] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Revised: 12/06/2012] [Accepted: 12/21/2012] [Indexed: 11/15/2022]
|
6
|
Liu Y, Liu SS, Cui SH, Cai SX. A Novel Quantitative Structure-Biodegradability Relationship (QSBR) of Substituted Benzenes Based on MHDV Descriptor. J CHIN CHEM SOC-TAIP 2013. [DOI: 10.1002/jccs.200300047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
7
|
Liu SS, Yin CS, Wang XD, Wang LS. QSAR Studies on Dipeptides Based on a Combinatorial MHDV-GA-MLR Method. J CHIN CHEM SOC-TAIP 2013. [DOI: 10.1002/jccs.200200157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
8
|
Sarkhosh M, Ghasemi JB, Ayati M. A quantitative structure- property relationship of gas chromatographic/mass spectrometric retention data of 85 volatile organic compounds as air pollutant materials by multivariate methods. Chem Cent J 2012; 6 Suppl 2:S4. [PMID: 22594439 PMCID: PMC3395126 DOI: 10.1186/1752-153x-6-s2-s4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
A quantitative structure-property relationship (QSPR) study is suggested for the prediction of retention times of volatile organic compounds. Various kinds of molecular descriptors were calculated to represent the molecular structure of compounds. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and artificial neural network (ANN). The stepwise regression was used for the selection of the variables which gives the best-fitted models. After variable selection ANN, MLR methods were used with leave-one-out cross validation for building the regression models. The prediction results are in very good agreement with the experimental values. MLR as the linear regression method shows good ability in the prediction of the retention times of the prediction set. This provided a new and effective method for predicting the chromatography retention index for the volatile organic compounds.
Collapse
Affiliation(s)
- Maryam Sarkhosh
- Chemistry Department, Faculty of Sciences, K,N,Toosi University of Technology, Tehran, Iran.
| | | | | |
Collapse
|
9
|
KAWASE M, SAITO T, YASUNAGA T, TAKAGI T, FUKUDA I, ASHIDA H. New Structure Descriptor in the Structure-Activity Relationship Study on the Suppression of Aryl Hydrocarbon Receptor Transformation by Anthraquinones. FOOD SCIENCE AND TECHNOLOGY RESEARCH 2012. [DOI: 10.3136/fstr.18.173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
10
|
Lin P, Chen Y, He Y. Identification of Geographical Origin of Olive Oil Using Visible and Near-Infrared Spectroscopy Technique Combined with Chemometrics. FOOD BIOPROCESS TECH 2009. [DOI: 10.1007/s11947-009-0302-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Zhang Z, Niu J, Zhi X. A QSAR model for predicting mutagenicity of nitronaphthalenes and methylnitronaphthalenes. BULLETIN OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2008; 81:498-502. [PMID: 18777149 DOI: 10.1007/s00128-008-9540-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 08/22/2008] [Indexed: 05/26/2023]
Abstract
A quantitative structure-activity relationship model for prediction of mutagenicity of nitronaphthalenes and methylnitronaphthalenes was developed using some fundamental quantum chemical descriptors. The cumulative cross-validated regression coefficient value for the optimal quantitative structure-activity relationship model is 0.711, showing a good predictive capability for mutagenicity of nitronaphthalenes and methylnitronaphthalenes. Results from this study indicate that mutagenicity of nitronaphthalenes and methylnitronaphthalenes increases with increasing frontier molecular orbital energy value, i.e., the sum of the energy of the highest occupied molecular orbital and the energy of the lowest unoccupied molecular orbital, or decreasing the energy of the second highest occupied molecular orbital, final heat of formation, and core-core repulsion energy values.
Collapse
Affiliation(s)
- Zheyun Zhang
- State Key Laboratory of Water Environment Simulation, School of Environment, Beijing Normal University, Beijing 100875, People's Republic of China
| | | | | |
Collapse
|
12
|
Liang G, Yang L, Kang L, Mei H, Li Z. Using multidimensional patterns of amino acid attributes for QSAR analysis of peptides. Amino Acids 2008; 37:583-91. [PMID: 18821054 DOI: 10.1007/s00726-008-0177-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Accepted: 08/25/2008] [Indexed: 10/21/2022]
Abstract
On the basis of exploratory factor analysis, six multidimensional patterns of 516 amino acid attributes, namely, factor analysis scales of generalized amino acid information (FASGAI) involving hydrophobicity, alpha and turn propensities, bulky properties, compositional characteristics, local flexibility and electronic properties, are proposed to represent structures of 48 bitter-tasting dipeptides and 58 angiotensin-converting enzyme inhibitors. Characteristic parameters related to bioactivities of the peptides studied are selected by genetic algorithm, and quantitative structure-activity relationship (QSAR) models are constructed by partial least square (PLS). Our results by a leave-one-out cross validation are compared with the previously known structure representation method and are shown to give slightly superior or comparative performance. Further, two data sets are divided into training sets and test sets to validate the characterization repertoire of FASGAI. Performance of the PLS models developed by training samples by a leave-one-out cross validation and external validation for test samples are satisfying. These results demonstrate that FASGAI is an effective representation technique of peptide structures, and that FASGAI vectors have many preponderant characteristics such as straightforward physicochemical information, high characterization competence and easy manipulation. They can be further applied to investigate the relationship between structures and functions of various peptides, even proteins.
Collapse
Affiliation(s)
- G Liang
- College of Bioengineering, Chongqing University, 400030, Chongqing, China.
| | | | | | | | | |
Collapse
|
13
|
Liang G, Chen G, Niu W, Li Z. Factor Analysis Scales of Generalized Amino Acid Information as Applied in Predicting Interactions between the Human Amphiphysin-1 SH3 Domains and Their Peptide Ligands. Chem Biol Drug Des 2008; 71:345-51. [DOI: 10.1111/j.1747-0285.2008.00641.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
|
15
|
Liang G, Li Z. Scores of generalized base properties for quantitative sequence-activity modelings for E. coli promoters based on support vector machine. J Mol Graph Model 2007; 26:269-81. [PMID: 17291800 DOI: 10.1016/j.jmgm.2006.12.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 11/18/2006] [Accepted: 12/10/2006] [Indexed: 10/23/2022]
Abstract
A novel base sequence representation technique, namely SGBP (scores of generalized base properties), was derived from principal component analysis of a matrix of 1209 property parameters including 0D, 1D, 2D and 3D information for five bases such as A, C, G, T and U. It was then employed to represent sequence structures of E. coli promoters. Variables which were used as inputs of partial least square (PLS) and support vector machine (SVM) were selected by genetic arithmetic-partial least square. All samples were divided into train set which was applied to develop quantitative sequence-activity modelings (QSAMs) and test set which was used to validate the predictive power of the resulting models according to D-optimal design. Investigation on QSAM by PLS showed properties of base of position -42, -34, -31, -33, -41, -46 and -29 may yield more influence on strengths, which has thus pointed us further into the direction of strong promoters. Parameters of SVM were determined by response surface methodology. Satisfactory results indicated that the simulative and the predictive abilities for the internal and external samples of QSAM by SVM were better than those of PLS. Those results showed that SGBP is a useful structural representation methodology in QSAMs due to its many advantages including plentiful structural information, easy manipulation, and high characterization competence. Moreover, SGBP-GA-SVM route for sequences design and activities prediction of DNA or RNA can further be applied.
Collapse
Affiliation(s)
- Guizhao Liang
- College of Bioengineering, Chongqing University, Chongqing 400030, PR China
| | | |
Collapse
|
16
|
Chapter 15 A quantitative structure-activity relationship of 1,4-dihydropyridine calcium channel blockers with electronic descriptors produced by quantum chemical topology. ACTA ACUST UNITED AC 2007. [DOI: 10.1016/s1380-7323(07)80016-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
17
|
|
18
|
Maldonado AG, Doucet JP, Petitjean M, Fan BT. Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 2006; 10:39-79. [PMID: 16404528 DOI: 10.1007/s11030-006-8697-1] [Citation(s) in RCA: 179] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Accepted: 06/14/2005] [Indexed: 01/04/2023]
Abstract
This review is dedicated to a survey on molecular similarity and diversity. Key findings reported in recent investigations are selectively highlighted and summarized. Even if this overview is mainly centered in chemoinformatics, applications in other areas (pharmaceutical and medical chemistry, combinatorial chemistry, chemical databases management, etc.) are also introduced. The approaches used to define and describe the concepts of molecular similarity and diversity in the context of chemoinformatics are discussed in the first part of this review. We introduce, in the second and third parts, the descriptions and analyses of different methods and techniques. Finally, current applications and problems are enumerated and discussed in the last part.
Collapse
Affiliation(s)
- Ana G Maldonado
- ITODYS, Université Paris 7--Denis Diderot, CNRS UMR-7086, 1 rue Guy de la Brosse, 75005, Paris, France
| | | | | | | |
Collapse
|
19
|
Cabrera Pérez MA, Sanz MB. In silico prediction of central nervous system activity of compounds. Identification of potential pharmacophores by the TOPS–MODE approach. Bioorg Med Chem 2004; 12:5833-43. [PMID: 15498659 DOI: 10.1016/j.bmc.2004.08.038] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2004] [Revised: 08/14/2004] [Accepted: 08/26/2004] [Indexed: 10/26/2022]
Abstract
The central nervous system (CNS) activity has been investigated by using a topological substructural molecular approach (TOPS-MODE). A discriminant analysis to classify CNS and non-CNS drugs was developed on a data set (302 compounds) of great structural variability where more than 81% (247/302) were well classified. Randic's orthogonalization procedures was carried out to allow the interpretation of the model and to avoid the collinearity among descriptors. The discriminant model was assessed by a leave-n-out (when n varies from 2 to 20) cross-validation procedure (79.94% of correct classification), an external prediction set composed by 78 CNS/non-CNS drugs (80.77% of correct classification) and a 5-fold full cross-validation (removing 78 compounds in each cycle, 80.00% of good classification). With this methodology was demonstrated that the hydrophobicity increase the CNS activity, while the dipole moment and the polar surface area decrease it; evidencing the capacity of the TOPS-MODE descriptors to estimate CNS activity for new drug candidates. The structural contributions to the CNS activity for two compounds are presented on the basis of fragment contributions. The model has also been able to identify potential structural pharmacophore, showing its possibilities in the lead generation and optimization processes.
Collapse
Affiliation(s)
- Miguel Angel Cabrera Pérez
- Drug Design Department, Center of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba.
| | | |
Collapse
|
20
|
Venkatraman V, Dalby AR, Yang ZR. Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR. ACTA ACUST UNITED AC 2004; 44:1686-92. [PMID: 15446827 DOI: 10.1021/ci049933v] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Feature selection is a key step in Quantitative Structure Activity Relationship (QSAR) analysis. Chance correlations and multicollinearity are two major problems often encountered when attempting to find generalized QSAR models for use in drug design. Optimal QSAR models require an objective variable relevance analysis step for producing robust classifiers with low complexity and good predictive accuracy. Genetic algorithms coupled with information theoretic approaches such as mutual information have been used to find near-optimal solutions to such multicriteria optimization problems. In this paper, we describe a novel approach for analyzing QSAR data based on these methods. Our experiments with the Thrombin dataset, previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001 demonstrate the feasibility of this approach. It has been found that it is important to take into account the data distribution, the rule "interestingness", and the need to look at more invariant and monotonic measures of feature selection.
Collapse
Affiliation(s)
- Vishwesh Venkatraman
- School of Biological Sciences, University of Exeter, Exeter EX4 4QF, Great Britain
| | | | | |
Collapse
|
21
|
Du Y, Liang Y, Li B, Xu C. Orthogonalization of block variables by subspace-projection for quantitative structure property relationship (QSPR) research. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:993-1003. [PMID: 12376986 DOI: 10.1021/ci020283+] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A subspace-projection method is developed to construct orthogonal block variable, which is originally from some kinds of series of topological indices or quantum chemical parameters. With the help of canonical correlation analysis, the orthogonal block variables were used to establish the structure-retention index correlation model. The regression of only few new orthogonal variables obtained by canonical correlation analysis against retention index shows significant improvement both in fitting and prediction ability of the correlation model. Moreover, the quantitative intercorrelation between the different block variables of topological indices can also be evaluated with the help of the subspace-projection technique proposed in this work.
Collapse
Affiliation(s)
- Yiping Du
- Institute of Chemometrics and Chemical Sensing Technology, Hunan University, Changsha 410082, P R China
| | | | | | | |
Collapse
|
22
|
Liu SS, Yin CS, Wang LS. Combined MEDV-GA-MLR method for QSAR of three panels of steroids, dipeptides, and COX-2 inhibitors. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:749-56. [PMID: 12086537 DOI: 10.1021/ci010245a] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The MEDV-13, molecular electronegativity distance vector based on 13 atomic types, has at best 91 descriptors. It is impossible to indirectly use multiple linear regression (MLR) to derive a quantitative structure-activity relationship (QSAR) model. Although principal component regression (PCR) or partial least-squares regression (PLSR) can be employed to develop a latent QSAR model, it is still difficult how to determine the principal components (PCs) and depict the physical meaning of the PCs. So, a genetic algorithm (GA) is first employed to select an optimal subset of the descriptors from original MEDV-13 descriptor set. Then MLR is utilized to build a QSAR model between the optimal subset and the biological activities of three sets of compounds. For 31 benchmark steroids, a 5-descriptor QSAR model (M1) between the corticosteroid-binding globulin (CBG) binding affinity of the steroids and 5-descriptor subset is developed. The root-mean-square error of estimations (RMSEE) and the correlation coefficient of estimations (r) between the CBG binding affinity (BA) observed and the BA estimated by M1 are 0.422 and 0.9182, respectively. The root-mean-square error of predictions (RMSEP) and the correlation coefficient of predictions (q) between the BA observed and the BA predicted by leave-one-out cross validations are 0.504 and 0.8818, respectively. For 58 dipeptides inhibiting angiotensin-converting enzyme (ACE), a 5-variable QSAR model (M2) between the pIC(50) of peptides and 5-descriptor subset is derived. The M2 has a high quality with RMSEE = 0.339 and r = 0.9398 and RMSEP = 0.370 and q = 0.9280. For 16 indomethacin amides and esters (ImAE) inhibiting cyclooxygenase-2 (COX-2), a 6-variable QSAR model (M3) with RMSEE = 0.079 and r = 0.9839 and RMSEP = 0.151 and q = 0.9413 is built.
Collapse
Affiliation(s)
- Shu-Shen Liu
- State Key Laboratory of Pollution Control and Resources Reuse, Department of Environmental Science & Engineering, Nanjing University, Nanjing 210093, People's Republic of China.
| | | | | |
Collapse
|
23
|
|
24
|
Yasri A, Hartsough D. Toward an optimal procedure for variable selection and QSAR model building. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2001; 41:1218-27. [PMID: 11604021 DOI: 10.1021/ci010291a] [Citation(s) in RCA: 140] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this work, we report the development of a novel QSAR technique combining genetic algorithms and neural networks for selecting a subset of relevant descriptors and building the optimal neural network architecture for QSAR studies. This technique uses a neural network to map the dependent property of interest with the descriptors preselected by the genetic algorithm. This technique differs from other variable selection techniques combining genetic algorithms to neural networks by two main features: (1) The variable selection search performed by the genetic algorithm is not constrained to a defined number of descriptors. (2) The optimal neural network architecture is explored in parallel with the variable selection by dynamically modifying the size of the hidden layer. By using both artificial data and real biological data, we show that this technique can be used to build both classification and regression models and outperforms simpler variable selection techniques mainly for nonlinear data sets. The results obtained on real data are compared to previous work using other modeling techniques. We also discuss some important issues in building QSAR models and good practices for QSAR studies.
Collapse
Affiliation(s)
- A Yasri
- Computational Design Group, ArQule Inc., 19 Presidential Way, Woburn, MA 01801, USA.
| | | |
Collapse
|
25
|
Burden FR, Ford MG, Whitley DC, Winkler DA. Use of automatic relevance determination in QSAR studies using Bayesian neural networks. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2000; 40:1423-30. [PMID: 11128101 DOI: 10.1021/ci000450a] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We describe the use of Bayesian regularized artificial neural networks (BRANNs) coupled with automatic relevance determination (ARD) in the development of quantitative structure-activity relationship (QSAR) models. These BRANN-ARD networks have the potential to solve a number of problems which arise in QSAR modeling such as the following: choice of model; robustness of model; choice of validation set; size of validation effort; and optimization of network architecture. The ARD method ensures that irrelevant or highly correlated indices used in the modeling are neglected as well as showing which are the most important variables in modeling the activity data. The application of the methods to QSAR of compounds active at the benzodiazepine and muscarinic receptors as well as some toxicological data of the effect of substituted benzenes on Tetetrahymena pyriformis is illustrated.
Collapse
Affiliation(s)
- F R Burden
- School of Chemistry, Monash University, Victoria, Australia.
| | | | | | | |
Collapse
|
26
|
So SS, van Helden SP, van Geerestein VJ, Karplus M. Quantitative structure-activity relationship studies of progesterone receptor binding steroids. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2000; 40:762-72. [PMID: 10850780 DOI: 10.1021/ci990130v] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The selection of appropriate descriptors is an important step in the successful formulation of quantitative structure-activity relationships (QSARs). This paper compares a number of feature selection routines and mapping methods that are in current use. They include forward stepping regression (FSR), genetic function approximation (GFA), generalized simulated annealing (GSA), and genetic neural network (GNN). On the basis of a data set of steroids of known in vitro binding affinity to the progsterone receptor, a number of QSAR models are constructed. A comparison of the predictive qualities for both training and test compounds demonstrates that the GNN protocol achieves the best results among the 2D QSAR that are considered. Analysis of the choice of descriptors by the GNN method shows that the results are consistent with established SARs on this series of compounds.
Collapse
Affiliation(s)
- S S So
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | |
Collapse
|
27
|
Nonparametric regression applied to quantitative structure-activity relationships. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2000; 40:452-9. [PMID: 10761152 DOI: 10.1021/ci990082e] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
Collapse
|
28
|
Hasegawa K, Funatsu K. Partial least squares modeling and genetic algorithm optimization in quantitative structure-activity relationships. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2000; 11:189-209. [PMID: 10969871 DOI: 10.1080/10629360008033231] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.
Collapse
Affiliation(s)
- K Hasegawa
- Nippon Roche Research Center, Nippon Roche K.K., Kanagawa, Japan
| | | |
Collapse
|
29
|
Development of new 3D-QSAR method by Kohonen network and 3wayPLS analysis. JOURNAL OF COMPUTER AIDED CHEMISTRY 2000. [DOI: 10.2751/jcac.1.22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
30
|
Liu L, Guo QX. Novel Prediction for the Driving Force and Guest Orientation in the Complexation of α- and β-Cyclodextrin with Benzene Derivatives. J Phys Chem B 1999. [DOI: 10.1021/jp984545f] [Citation(s) in RCA: 93] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
31
|
Hasegawa K, Kimura T, Funatsu K. GA strategy for variable selection in QSAR studies: application of GA-based region selection to a 3D-QSAR study of acetylcholinesterase inhibitors. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 1999; 39:112-20. [PMID: 10094610 DOI: 10.1021/ci980088o] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Comparative molecular field analysis (CoMFA) with partial least squares (PLS) is one of the most frequently used tools in three-dimensional quantitative structure-activity relationships (3D-QSAR) studies. Although many successful CoMFA applications have proved the value of this approach, there are some problems in its proper application. Especially, the inability of PLS to handle the low signal-to-noise ratio (sample-to-variable ratio) has attracted much attention from QSAR researchers as an exciting research target, and several variable selection methods have been proposed. More recently, we have developed a novel variable selection method for CoMFA modeling (GARGS: genetic algorithm-based region selection), and its utility has been demonstrated in the previous paper (Kimura, T., et al. J. Chem. Inf. Comput. Sci. 1998, 38, 276-282). The purpose of this study is to evaluate whether GARGS can pinpoint known molecular interactions in 3D space. We have used a published set of acetylcholinesterase (AChE) inhibitors as a test example. By applying GARGS to a data set of AChE inhibitors, several improved models with high internal prediction and low number of field variables were obtained. External validation was performed to select a final model among them. The coefficient contour maps of the final GARGS model were compared with the properties of the active site in AChE and the consistency between them was evaluated.
Collapse
Affiliation(s)
- K Hasegawa
- Tokyo Research Laboratories, Kowa Co. Ltd., Higashimurayama, Japan
| | | | | |
Collapse
|