1
|
Yan H, Song X, Tian K, Gao J, Li Q, Xiong Y, Min S. A modification of the bootstrapping soft shrinkage approach for spectral variable selection in the issue of over-fitting, model accuracy and variable selection credibility. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2019; 210:362-371. [PMID: 30502724 DOI: 10.1016/j.saa.2018.10.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/04/2018] [Accepted: 10/20/2018] [Indexed: 06/09/2023]
Abstract
In this study, we proposed a new computational method stabilized bootstrapping soft shrinkage approach (SBOSS) for variable selection based on bootstrapping soft shrinkage approach (BOSS) which can enhance the analysis of chemical interest from the massive variables among the overlapped absorption bands. In SBOSS, variable is selected by the index of stability of regression coefficients instead of regression coefficients absolute value. In each loop, a weighted bootstrap sampling (WBS) is applied to generate sub-models, according to the weights update by conducting model population analysis (MPA) on the stability of regression coefficients (RC) of these sub-models. Finally, the subset with the lowest RMSECV is chosen to be the optimal variable set. The performance of the SBOSS was evaluated by one simulated dataset and three NIR datasets. The results show that SBOSS can select the fewer variables and supply the least RMSEP and latent variable number of the PLS model with the best stability comparing with methods of Monte Carlo uninformative variables elimination (MCUVE), genetic algorithm (GA), competitive reweighted sampling (CARS), stability of competitive adaptive reweighted sampling (SCARS) and BOSS.
Collapse
Affiliation(s)
- Hong Yan
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Xiangzhong Song
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Kuangda Tian
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Jingxian Gao
- College of Science, China Agricultural University, Beijing 100193, PR China
| | - Qianqian Li
- School of Marine Science, China University of Geoscience, Beijing 100083, PR China
| | - Yanmei Xiong
- College of Science, China Agricultural University, Beijing 100193, PR China.
| | - Shungeng Min
- College of Science, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
2
|
Bayden AS, Gomez EF, Audie J, Chakravorty DK, Diller DJ. A combined cheminformatic and bioinformatic approach to address the proteolytic stability challenge in peptide-based drug discovery. Biopolymers 2015; 104:775-89. [PMID: 26270398 DOI: 10.1002/bip.22711] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Revised: 07/22/2015] [Accepted: 08/09/2015] [Indexed: 11/10/2022]
Abstract
We have created models to predict cleavage sites for several human proteases including caspase-1, caspase-3, caspase-6, caspase-7, cathepsin B, cathepsin D, cathepsin G, cathepsin K, cathepsin L, elastase-2, granzyme A, granzyme B, matrix metallopeptidase-2 (MMP2), MMP7, MMP9, thrombin, and trypsin-1. Rather than representing the sequence pattern around the potential cleavage site through a series of flags with each flag representing one of the 20 standard amino acids, we first represent each amino acid by its calculated properties. For these calculated properties, we use validated cheminformatic descriptors, such as molecular weight, logP, and polar surface area, of the individual amino acids. Finally, the cleavage site-specific descriptors are calculated through various combinations of the individual amino acid descriptors for the residues surrounding the cleavage site. Some of these combinations do not take into account the location of the residue, as long as it is in a prescribed neighborhood of the potential cleavage site, whereas others are sensitive to the precise order of the residues in the sequence. The key advantage of this approach is that it allows one to perform meaningful calculations with nonstandard amino acids for which little or no data exists. Finally, using both docking and molecular dynamics simulations, we examine the potential for and limitations of protease crystal structures to impact the design of proteolytically stable peptides.
Collapse
Affiliation(s)
| | - Edwin F Gomez
- Department of Chemistry, University of New Orleans, New Orleans, LA
| | - Joseph Audie
- CMDBioscience Inc., 5 Science Park, New Haven, CT
| | | | | |
Collapse
|
3
|
Zare-Shahabadi V, Lotfizadeh M, Gandomani ARA, Papari MM. Determination of boiling points of azeotropic mixtures using quantitative structure–property relationship (QSPR) strategy. J Mol Liq 2013. [DOI: 10.1016/j.molliq.2013.09.037] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
4
|
Li Z, Lu H, Yang J, Zeng X, Zhao L, Li H, Liao Q, Peng S, Zhou M, Wu M, Xiang J, Wang Y, Li G. Analysis of the raw serum peptidomic pattern in glioma patients. Clin Chim Acta 2013; 425:221-6. [DOI: 10.1016/j.cca.2013.08.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Revised: 07/17/2013] [Accepted: 08/02/2013] [Indexed: 12/19/2022]
|
5
|
|
6
|
Abbasitabar F, Zare-Shahabadi V. Development predictive QSAR models for artemisinin analogues by various feature selection methods: a comparative study. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:1-15. [PMID: 22040327 DOI: 10.1080/1062936x.2011.623316] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Quantitative structure-activity relationship (QSAR) models were derived for 179 analogues of artemisinin, a potent antimalarial agent. The activities of these compounds were investigated by means of multiple linear regression (MLR). To select relevant descriptors, several methods including stepwise selection, successive projection algorithm and an ant colony optimization algorithm (called memorized_ACS) were employed. A wide variety of molecular descriptors belonging to various structural properties were calculated for each molecule. Two matrixes (D1 and D2) of molecular properties were built. The D1 matrix included the calculated descriptors and the D2 matrix contained the first to third orders of the calculated descriptors and the logarithm of absolute values of the calculated descriptors. For both data matrixes, significant QSAR models were obtained by the memorized_ACS algorithm. The reactive and PEOE (partial equalization of orbital electronegativity) descriptors represented the highest impact on the antimalarial activity. The PEOE descriptors belong to partial charge descriptors and the reactive descriptor is an indicator of the presence of the reactive groups in the molecule. The best MLR model has a training error of 0.71 log RA units (r (2 )= 0.81) and a prediction error of 0.48 log RA units (r (2) = 0.88).
Collapse
Affiliation(s)
- F Abbasitabar
- Department of Chemistry, Marvdasht Branch, Islamic Azad University, Marvdasht, Iran.
| | | |
Collapse
|
7
|
Ghasemi-Varnamkhasti M, Mohtasebi SS, Rodriguez-Mendez ML, Gomes AA, Araújo MCU, Galvão RK. Screening analysis of beer ageing using near infrared spectroscopy and the Successive Projections Algorithm for variable selection. Talanta 2012; 89:286-91. [DOI: 10.1016/j.talanta.2011.12.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Revised: 11/30/2011] [Accepted: 12/06/2011] [Indexed: 10/14/2022]
|
8
|
Hemmateenejad B, Shamsipur M, Zare-Shahabadi V, Akhond M. Building optimal regression tree by ant colony system–genetic algorithm: Application to modeling of melting points. Anal Chim Acta 2011; 704:57-62. [DOI: 10.1016/j.aca.2011.08.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 07/22/2011] [Accepted: 08/04/2011] [Indexed: 10/17/2022]
|
9
|
Bayden AS, Yakovlev VA, Graves PR, Mikkelsen RB, Kellogg GE. Factors influencing protein tyrosine nitration--structure-based predictive models. Free Radic Biol Med 2011; 50:749-62. [PMID: 21172423 PMCID: PMC3039091 DOI: 10.1016/j.freeradbiomed.2010.12.016] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 11/15/2010] [Accepted: 12/10/2010] [Indexed: 01/30/2023]
Abstract
Models for exploring tyrosine nitration in proteins have been created based on 3D structural features of 20 proteins for which high-resolution X-ray crystallographic or NMR data are available and for which nitration of 35 total tyrosines has been experimentally proven under oxidative stress. Factors suggested in previous work to enhance nitration were examined with quantitative structural descriptors. The role of neighboring acidic and basic residues is complex: for the majority of tyrosines that are nitrated the distance to the heteroatom of the closest charged side chain corresponds to the distance needed for suspected nitrating species to form hydrogen bond bridges between the tyrosine and that charged amino acid. This suggests that such bridges play a very important role in tyrosine nitration. Nitration is generally hindered for tyrosines that are buried and for those tyrosines for which there is insufficient space for the nitro group. For in vitro nitration, closed environments with nearby heteroatoms or unsaturated centers that can stabilize radicals are somewhat favored. Four quantitative structure-based models, depending on the conditions of nitration, have been developed for predicting site-specific tyrosine nitration. The best model, relevant for both in vitro and in vivo cases, predicts 30 of 35 tyrosine nitrations (positive predictive value) and has a sensitivity of 60/71 (11 false positives).
Collapse
Affiliation(s)
- Alexander S. Bayden
- Department of Medicinal Chemistry and Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Vasily A. Yakovlev
- Department of Radiation Oncology, Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Paul R. Graves
- Department of Radiation Oncology, Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Ross B. Mikkelsen
- Department of Radiation Oncology, Massey Cancer Center, Virginia Commonwealth University, Richmond, Virginia, USA
- Corresponding authors. (R.B. Mikkelsen); (G.E. Kellogg)
| | - Glen E. Kellogg
- Department of Medicinal Chemistry and Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, Virginia, USA
- Corresponding authors. (R.B. Mikkelsen); (G.E. Kellogg)
| |
Collapse
|
10
|
Zare-Shahabadi V, Abbasitabar F. Application of ant colony optimization in development of models for prediction of anti-HIV-1 activity of HEPT derivatives. J Comput Chem 2010; 31:2354-62. [PMID: 20575016 DOI: 10.1002/jcc.21529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Quantitative structure-activity relationship models were derived for 107 analogs of 1-[(2-hydroxyethoxy) methyl]-6-(phenylthio)thymine, a potent inhibitor of the HIV-1 reverse transcriptase. The activities of these compounds were investigated by means of multiple linear regression (MLR) technique. An ant colony optimization algorithm, called Memorized_ACS, was applied for selecting relevant descriptors and detecting outliers. This algorithm uses an external memory based upon knowledge incorporation from previous iterations. At first, the memory is empty, and then it is filled by running several ACS algorithms. In this respect, after each ACS run, the elite ant is stored in the memory and the process is continued to fill the memory. Here, pheromone updating is performed by all elite ants collected in the memory; this results in improvements in both exploration and exploitation behaviors of the ACS algorithm. The memory is then made empty and is filled again by performing several ACS algorithms using updated pheromone trails. This process is repeated for several iterations. At the end, the memory contains several top solutions for the problem. Number of appearance of each descriptor in the external memory is a good criterion for its importance. Finally, prediction is performed by the elitist ant, and interpretation is carried out by considering the importance of each descriptor. The best MLR model has a training error of 0.47 log (1/EC(50)) units (R(2) = 0.90) and a prediction error of 0.76 log (1/EC(50)) units (R(2) = 0.88).
Collapse
Affiliation(s)
- Vali Zare-Shahabadi
- Department of Chemistry, Islamic Azad University-Mahshahr Branch, Mahshahr, Iran.
| | | |
Collapse
|