1
|
Wang X, Cao Z, Su J, Ge X, Zhou Z. Oral barriers to food-derived active peptides and nano-delivery strategies. J Food Sci 2025; 90:e17672. [PMID: 39828408 DOI: 10.1111/1750-3841.17672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 12/04/2024] [Accepted: 01/01/2025] [Indexed: 01/22/2025]
Abstract
Food-derived bioactive peptides are a class of peptides from natural protein. It may have biological effects on the human body and play a significant role in protecting human physiological health and regulating physiological metabolism, such as lowering blood pressure, lowering cholesterol, antioxidant, antibacterial, regulating immune activity, and so on. However, most of the natural food-derived functional peptides need to overcome a variety of barriers in the body to enter the blood circulation system and target to specific tissues to generate physiological activity. During this process, the bioavailability of the functional peptides will be reduced. The nano-delivery system can offer the feasibility to overcome these obstacles and improve the stability and bioavailability of food-derived active peptides by nanoencapsulation. This work summarizes the application of food-derived bioactive peptides and the obstacles during the delivery pathway in vivo. Moreover, the different nano-delivery systems used for bioactive peptides and their application were summarized, which could provide ideas for oral delivery of food-derived bioactive peptides.
Collapse
Affiliation(s)
- Xinyu Wang
- Department of Food Science and Technology, College of Light Industry Science and Engineering, Nanjing Forestry University, Nanjing, P. R. China
| | - Zhaoxin Cao
- Department of Food Science and Technology, College of Light Industry Science and Engineering, Nanjing Forestry University, Nanjing, P. R. China
| | - Jingyi Su
- Department of Food Science and Technology, College of Light Industry Science and Engineering, Nanjing Forestry University, Nanjing, P. R. China
| | - Xuemei Ge
- Department of Food Science and Technology, College of Light Industry Science and Engineering, Nanjing Forestry University, Nanjing, P. R. China
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, P. R. China
| | - Zhiyong Zhou
- College of Medicine and Health Sciences, China Three Gorges University, Yichang, P. R. China
| |
Collapse
|
2
|
Qin D, Liang X, Jiao L, Wang R, Zhao Y, Xue W, Wang J, Liang G. Sequence-Activity Relationship of Angiotensin-Converting Enzyme Inhibitory Peptides Derived from Food Proteins, Based on a New Deep Learning Model. Foods 2024; 13:3550. [PMID: 39593966 PMCID: PMC11592644 DOI: 10.3390/foods13223550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 10/29/2024] [Accepted: 11/05/2024] [Indexed: 11/28/2024] Open
Abstract
Food-derived peptides are usually safe natural drug candidates that can potentially inhibit the angiotensin-converting enzyme (ACE). The wet experiments used to identify ACE inhibitory peptides (ACEiPs) are time-consuming and costly, making it important and urgent to reduce the scope of experimental validation through bioinformatics methods. Here, we construct an ACE inhibitory peptide predictor (ACEiPP) using optimized amino acid descriptors (AADs) and long- and short-term memory neural networks. Our results show that combined-AAD models exhibit more efficient feature transformation ability than single-AAD models, especially the training model with the optimal descriptors as the feature inputs, which exhibits the highest predictive ability in the independent test (Acc = 0.9479 and AUC = 0.9876), with a significant performance improvement compared to the existing three predictors. The model can effectively characterize the structure-activity relationship of ACEiPs. By combining the model with database mining, we used ACEiPP to screen four ACEiPs with multiple reported functions. We also used ACEiPP to predict peptides from 21,249 food-derived proteins in the Database of Food-derived Bioactive Peptides (DFBP) and construct a library of potential ACEiPs to facilitate the discovery of new anti-ACE peptides.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China; (D.Q.); (X.L.); (L.J.); (R.W.); (Y.Z.); (W.X.); (J.W.)
| |
Collapse
|
3
|
T. RR, Demerdash ONA, Smith JC. TCR-H: explainable machine learning prediction of T-cell receptor epitope binding on unseen datasets. Front Immunol 2024; 15:1426173. [PMID: 39221256 PMCID: PMC11361934 DOI: 10.3389/fimmu.2024.1426173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 07/29/2024] [Indexed: 09/04/2024] Open
Abstract
Artificial-intelligence and machine-learning (AI/ML) approaches to predicting T-cell receptor (TCR)-epitope specificity achieve high performance metrics on test datasets which include sequences that are also part of the training set but fail to generalize to test sets consisting of epitopes and TCRs that are absent from the training set, i.e., are 'unseen' during training of the ML model. We present TCR-H, a supervised classification Support Vector Machines model using physicochemical features trained on the largest dataset available to date using only experimentally validated non-binders as negative datapoints. TCR-H exhibits an area under the curve of the receiver-operator characteristic (AUC of ROC) of 0.87 for epitope 'hard splitting' (i.e., on test sets with all epitopes unseen during ML training), 0.92 for TCR hard splitting and 0.89 for 'strict splitting' in which neither the epitopes nor the TCRs in the test set are seen in the training data. Furthermore, we employ the SHAP (Shapley additive explanations) eXplainable AI (XAI) method for post hoc interrogation to interpret the models trained with different hard splits, shedding light on the key physiochemical features driving model predictions. TCR-H thus represents a significant step towards general applicability and explainability of epitope:TCR specificity prediction.
Collapse
Affiliation(s)
- Rajitha Rajeshwar T.
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Omar N. A. Demerdash
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Jeremy C. Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, United States
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
4
|
Huang J, Osthushenrich T, MacNamara A, Mälarstig A, Brocchetti S, Bradberry S, Scarabottolo L, Ferrada E, Sosnin S, Digles D, Superti-Furga G, Ecker GF. ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction. RSC Adv 2024; 14:13083-13094. [PMID: 38655474 PMCID: PMC11034476 DOI: 10.1039/d4ra00748d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.
Collapse
Affiliation(s)
- Jiahui Huang
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Tanja Osthushenrich
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Aidan MacNamara
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Anders Mälarstig
- Emerging Science & Innovation, Pfizer Worldwide Research, Development and Medical Cambridge MA USA
| | | | | | | | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Sergey Sosnin
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| |
Collapse
|
5
|
Venanzi NE, Basciu A, Vargiu AV, Kiparissides A, Dalby PA, Dikicioglu D. Machine Learning Integrating Protein Structure, Sequence, and Dynamics to Predict the Enzyme Activity of Bovine Enterokinase Variants. J Chem Inf Model 2024; 64:2681-2694. [PMID: 38386417 PMCID: PMC11005043 DOI: 10.1021/acs.jcim.3c00999] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 02/24/2024]
Abstract
Despite recent advances in computational protein science, the dynamic behavior of proteins, which directly governs their biological activity, cannot be gleaned from sequence information alone. To overcome this challenge, we propose a framework that integrates the peptide sequence, protein structure, and protein dynamics descriptors into machine learning algorithms to enhance their predictive capabilities and achieve improved prediction of the protein variant function. The resulting machine learning pipeline integrates traditional sequence and structure information with molecular dynamics simulation data to predict the effects of multiple point mutations on the fold improvement of the activity of bovine enterokinase variants. This study highlights how the combination of structural and dynamic data can provide predictive insights into protein functionality and address protein engineering challenges in industrial contexts.
Collapse
Affiliation(s)
| | - Andrea Basciu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Attilio Vittorio Vargiu
- Department
of Physics, University of Cagliari, Cittadella
Universitaria, I-09042 Monserrato, Cagliari, Italy
| | - Alexandros Kiparissides
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
- Department
of Chemical Engineering, Aristotle University
of Thessaloniki, 54 124 Thessaloniki, Greece
| | - Paul A. Dalby
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| | - Duygu Dikicioglu
- Department
of Biochemical Engineering, University College
London, Gower Street, WC1E 6BT London, U.K.
| |
Collapse
|
6
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
7
|
Ogawa Y, Saito Y, Yamaguchi H, Katsuyama Y, Ohnishi Y. Engineering the Substrate Specificity of Toluene Degrading Enzyme XylM Using Biosensor XylS and Machine Learning. ACS Synth Biol 2023; 12:572-582. [PMID: 36734676 DOI: 10.1021/acssynbio.2c00577] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Enzyme engineering using machine learning has been developed in recent years. However, to obtain a large amount of data on enzyme activities for training data, it is necessary to develop a high-throughput and accurate method for evaluating enzyme activities. Here, we examined whether a biosensor-based enzyme engineering method can be applied to machine learning. As a model experiment, we aimed to modify the substrate specificity of XylM, a rate-determining enzyme in a multistep oxidation reaction catalyzed by XylMABC in Pseudomonas putida. XylMABC naturally converts toluene and xylene to benzoic acid and toluic acid, respectively. We aimed to engineer XylM to improve its conversion efficiency to a non-native substrate, 2,6-xylenol. Wild-type XylMABC slightly converted 2,6-xylenol to 3-methylsalicylic acid, which is the ligand of the transcriptional regulator XylS in P. putida. By locating a fluorescent protein gene under the control of the Pm promoter to which XylS binds, a XylS-producing Escherichia coli strain showed higher fluorescence intensity in a 3-methylsalicylic acid concentration-dependent manner. We evaluated the 3-methylsalicylic acid productivity of XylM variants using the fluorescence intensity of the sensor strain as an indicator. The obtained data provided the training data for machine learning for the directed evolution of XylM. Two cycles of machine learning-assisted directed evolution resulted in the acquisition of XylM-D140E-V144K-F243L-N244S with 15 times higher productivity than wild-type XylM. These results demonstrate that an indirect enzyme activity evaluation method using biosensors is sufficiently quantitative and high-throughput to be used as training data for machine learning. The findings expand the versatility of machine learning in enzyme engineering.
Collapse
Affiliation(s)
- Yuki Ogawa
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo169-8555, Japan.,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Hideki Yamaguchi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Yohei Katsuyama
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| | - Yasuo Ohnishi
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| |
Collapse
|
8
|
Deng W, Sha J, Xue F, Jami-Alahmadi Y, Plath K, Wohlschlegel J. High-Field Asymmetric Waveform Ion Mobility Spectrometry Interface Enhances Parallel Reaction Monitoring on an Orbitrap Mass Spectrometer. Anal Chem 2022; 94:15939-15947. [PMID: 36347042 PMCID: PMC9685594 DOI: 10.1021/acs.analchem.2c01287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 10/26/2022] [Indexed: 11/09/2022]
Abstract
High-field asymmetric waveform ion mobility spectrometry (FAIMS) enables gas-phase separations on a chromatographic time scale and has become a useful tool for proteomic applications. Despite its emerging utility, however, the molecular determinants underlying peptide separation by FAIMS have not been systematically investigated. Here, we characterize peptide transmission in a FAIMS device across a broad range of compensation voltages (CVs) and used machine learning to identify charge state and three-dimensional (3D) electrostatic peptide potential as major contributors to peptide intensity at a given CV. We also demonstrate that the machine learning model can be used to predict optimized CV values for peptides, which significantly improves parallel reaction monitoring workflows. Together, these data provide insight into peptide separation by FAIMS and highlight its utility in targeted proteomic applications.
Collapse
Affiliation(s)
- Weixian Deng
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
- Molecular
Biology Interdepartmental Graduate Program, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Jihui Sha
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Fanglei Xue
- University
of Technology Sydney, Ultimo, New South Wales 2007, Australia
| | - Yasaman Jami-Alahmadi
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Kathrin Plath
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - James Wohlschlegel
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| |
Collapse
|
9
|
Janairo JIB. Machine Learning Model for Biomimetic Chromatography Peptide Ligands. ACS APPLIED BIO MATERIALS 2022; 5:5264-5269. [PMID: 36265018 DOI: 10.1021/acsabm.2c00684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Purification is an essential part of antibody production, which are important therapeutic biomolecules. Common methods of antibody purification rely on affinity chromatography (AC), wherein whole proteins are oftentimes used as ligands to catch the antibodies to be purified. While AC has been successful in purifying antibodies, it is associated with multiple challenges such as high cost and low stability, among others. A promising alternative is using short peptide sequences in place of whole proteins as the stationary phase for the chromatographic separation of the antibodies. In an effort to accelerate the discovery and development of short peptides for biomimetic chromatography, this study reports the creation of a machine learning classification which was trained and tested on 480 tetrapeptides. The optimized logistic regression model uses Cruciani properties as the input variables and can categorize peptides into one of two classes based on their binding affinity with immunoglobulin G (IgG). The externally validated model demonstrates satisfactory predictive performance and excellent discrimination as demonstrated by performance metrics such as AUC = 0.874, Balanced Accuracy = 0.874, F1 = 0.871, Precision = 0.884, and Recall = 0.859. Apart from this, the classifier has also provided valuable insights into important variables that influence the classification, such as electrostatic and hydrophobic interactions. Overall, the classifier can be regarded as a welcome development for biomimetic chromatography and is the first study that aims to integrate machine learning in the biomimetic chromatography peptide development process.
Collapse
Affiliation(s)
- Jose Isagani B Janairo
- Department of Biology, De La Salle University, 2401 Taft Avenue, 0922Manila, Philippines
| |
Collapse
|
10
|
Lertampaiporn S, Hongsthong A, Wattanapornprom W, Thammarongtham C. Ensemble-AHTPpred: A Robust Ensemble Machine Learning Model Integrated With a New Composite Feature for Identifying Antihypertensive Peptides. Front Genet 2022; 13:883766. [PMID: 35571042 PMCID: PMC9096110 DOI: 10.3389/fgene.2022.883766] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
- *Correspondence: Chinae Thammarongtham,
| |
Collapse
|
11
|
Janairo JIB. A Machine Learning Classification Model for Gold-Binding Peptides. ACS OMEGA 2022; 7:14069-14073. [PMID: 35559171 PMCID: PMC9089360 DOI: 10.1021/acsomega.2c00640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/31/2022] [Indexed: 06/15/2023]
Abstract
There has been growing interest in using peptides for the controlled synthesis of nanomaterials. Peptides play a crucial role not only in regulating the nanostructure formation process but also in influencing the resulting properties of the nanomaterials. Leveraging machine learning (ML) in the biomimetic workflow is anticipated to accelerate peptide discovery, make the process more resource-efficient, and unravel associations among attributes that may be useful in peptide design. In this study, a binary ML classifier is formulated that was trained and tested on 1720 peptide examples. The support vector machine classifier uses Kidera factors to categorize peptides into one of two groups based on their binding ability. The classifier exhibits satisfactory performance, as demonstrated by various performance metrics. In addition, key variables that bear a huge impact on the model were identified, such as peptide hydrophobicity. As these trends were derived from a large and diverse dataset, the insights drawn from the data are expected to be generalizable and robust. Thus, the presented ML model is an important step toward the rational and predictive peptide design.
Collapse
|
12
|
Shao X, Kong W, Li Y, Zhang S. Quantitative structure-activity relationship modeling reveals the minimal sequence requirement and amino acid preference of sirtuin-1's deacetylation substrates in diabetes mellitus. J Bioinform Comput Biol 2022; 20:2250008. [PMID: 35451939 DOI: 10.1142/s0219720022500081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Sirtuin 1 (SIRT1) is a nicotinamide adenine dinucleotide (NAD[Formula: see text]-dependent deacetylase involved in multiple glucose metabolism pathways and plays an important role in the pathogenesis of diabetes mellitus (DM). The enzyme specifically recognizes its deacetylation substrates' peptide segments containing a central acetyl-lysine residue as well as a number of amino acids flanking the central residue. In this study, we attempted to ascertain the minimal sequence requirement (MSR) around the central acetyl-lysine residue of SIRT1 substrate-recognition sites as well as the amino acid preference (AAP) at different residues of the MSR window through quantitative structure-activity relationship (QSAR) strategy, which would benefit our understanding of SIRT1 substrate specificity at the molecular level and is also helpful to rationally design substrate-mimicking peptidic agents against DM by competitively targeting SIRT1 active site. In this procedure, a large-scale dataset containing 6801 13-mer acetyl-lysine peptides (and their SIRT1-catalyized deacetylation activities) were compiled to train 10 QSAR regression models developed by systematic combination of machine learning methods (PLS and SVM) and five amino acids descriptors (DPPS, T-scale, MolSurf, [Formula: see text]-score, and FASGAI). The two best QSAR models (PLS+FASGAI and SVM+DPPS) were then employed to statistically examine the contribution of residue positions to the deacetylation activity of acetyl-lysine peptide substrates, revealing that the MSR can be represented by 5-mer acetyl-lysine peptides that meet a consensus motif X[Formula: see text]X[Formula: see text]X[Formula: see text](AcK)0X[Formula: see text]. Structural analysis found that the X[Formula: see text] and (AcK)0 residues are tightly packed against the enzyme active site and confer both stability and specificity for the enzyme-substrate complex, whereas the X[Formula: see text], X[Formula: see text] and X[Formula: see text] residues are partially exposed to solvent but can also effectively stabilize the complex system. Subsequently, a systematic deacetylation activity change profile (SDACP) was created based on QSAR modeling, from which the AAP for each residue position of MSR was depicted. With the profile, we were able to rationally design an SDACP combinatorial library with promising deacetylation activity, from which nine MSR acetyl-lysine peptides as well as two known SIRT1 acetyl-lysine peptide substrates were tested by using SIRT1 deacetylation assay. It is revealed that the designed peptides exhibit a comparable or even higher activity than the controls, although the former is considerably shorter than the latter.
Collapse
Affiliation(s)
- X Shao
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - W Kong
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - Y Li
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - S Zhang
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| |
Collapse
|
13
|
Yin JY, Han YN, Liu MQ, Piao ZH, Zhang X, Xue YT, Zhang YH. Structure-guided discovery of antioxidant peptides bounded to the Keap1 receptor as hunter for potential dietary antioxidants. Food Chem 2022; 373:130999. [PMID: 34710694 DOI: 10.1016/j.foodchem.2021.130999] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 07/17/2021] [Accepted: 08/29/2021] [Indexed: 01/27/2023]
Abstract
Human health can be damaged by free radicals, and antioxidant peptides are excellent radical scavengers. Antioxidant tripeptides data set based on 2,2'-azino-bis (3-ethylbenzothiazoline-6-sulofnic acid) (ABTS) assay was created, 9 types of descriptors were integrated and 4 quantitative structure-activity relationship (QSAR) models were constructed in this study. Several structural factors influencing the activity of antioxidant tripeptides and the dominant amino acids at each position of tripeptides were revealed by the optimal model. Ten food-derived tripeptides with higher activity were selected for synthesis and activity determination. Molecular docking results demonstrated that these tripeptides were stably bound to the Keap1 receptor, further elucidating the antioxidant mechanism. It was known from the simulation of gastrointestinal digestion experiments that the model results possessed a guiding effect on the selection of proteins with high antioxidant activity. The performance of the model was proved to be robust after validation.
Collapse
Affiliation(s)
- Jia-Yi Yin
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Ya-Ning Han
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Meng-Qi Liu
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Zan-Hao Piao
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Xu Zhang
- Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Yu-Ting Xue
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China
| | - Ying-Hua Zhang
- Key Laboratory of Dairy Science, Ministry of Education, Northeast Agricultural University, Harbin 150030, PR China; Department of Food Science, Northeast Agricultural University, Harbin 150030, PR China.
| |
Collapse
|
14
|
Meher PK, Dash S, Sahu TK, Satpathy S, Pradhan SK. GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2022; 28:1-16. [PMID: 35221569 PMCID: PMC8847649 DOI: 10.1007/s12298-022-01130-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/31/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
UNLABELLED In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s12298-022-01130-6.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
- Division of Statistical Genetics, ICAR-IASRI, New Delhi-12, India
| | - Sagarika Dash
- Orissa University of Agriculture and Technology, Bhubaneswar, Odisha India
| | - Tanmaya Kumar Sahu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Subhrajit Satpathy
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | | |
Collapse
|
15
|
Sharma A, Kumar R, Varadwaj PK. OBPred: feature-fusion-based deep neural network classifier for odorant-binding protein prediction. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06347-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
16
|
Saito Y, Oikawa M, Sato T, Nakazawa H, Ito T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03753] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Takumi Sato
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
17
|
Tam C, Kumar A, Zhang KYJ. NbX: Machine Learning-Guided Re-Ranking of Nanobody-Antigen Binding Poses. Pharmaceuticals (Basel) 2021; 14:ph14100968. [PMID: 34681192 PMCID: PMC8537642 DOI: 10.3390/ph14100968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/17/2021] [Accepted: 09/21/2021] [Indexed: 12/02/2022] Open
Abstract
Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.
Collapse
Affiliation(s)
- Chunlai Tam
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
| | - Kam Y. J. Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
- Correspondence:
| |
Collapse
|
18
|
|
19
|
Bo W, Chen L, Qin D, Geng S, Li J, Mei H, Li B, Liang G. Application of quantitative structure-activity relationship to food-derived peptides: Methods, situations, challenges and prospects. Trends Food Sci Technol 2021; 114:176-188. [DOI: 10.1016/j.tifs.2021.05.031] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Cao Y, Yu C, Huang S, Wang S, Zuo Y, Yang L. Characterization and Prediction of Presynaptic and Postsynaptic Neurotoxins Based on Reduced Amino Acids and Biological Properties. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200707150512] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Presynaptic and postsynaptic neurotoxins are two important neurotoxins. Due to the important
role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, identification of them becomes very
important in biology.
Method:
In this study, the statistical test and F-score were used to calculate the difference between amino acids and
biological properties. The support vector machine was used to predict the presynaptic and postsynaptic neurotoxins by
using the reduced amino acid alphabet types.
Results:
By using the reduced amino acid alphabet as the input parameters of support vector machine, the overall accuracy
of our classifier had increased to 91.07%, which was the highest overall accuracy in this study. When compared with the
other published methods, better predictive results were obtained by our classifier.
Conclusion:
In summary, we analyzed the differences between two neurotoxins in amino acids and biological properties,
and constructed a classifier that could predict these two neurotoxins by using the reduced amino acid alphabet.
Collapse
Affiliation(s)
- Yiyin Cao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Chunlu Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shenghui Huang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
21
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
22
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
23
|
|
24
|
|
25
|
Robinson SL, Smith MD, Richman JE, Aukema KG, Wackett LP. Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily. Synth Biol (Oxf) 2020. [DOI: 10.1093/synbio/ysaa004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Abstract
Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.
Collapse
Affiliation(s)
- Serina L Robinson
- Graduate Program in Bioinformatics and Computational Biology, University of Minnesota, 111 S. Broadway, Suite 300, Rochester, MN 55904, USA
- Graduate Program in Microbiology, Immunology, and Cancer Biology, University of Minnesota, 689 23rd Ave SE, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Megan D Smith
- Graduate Program in Microbiology, Immunology, and Cancer Biology, University of Minnesota, 689 23rd Ave SE, Minneapolis, MN 55455, USA
- BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Jack E Richman
- BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Kelly G Aukema
- BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| | - Lawrence P Wackett
- BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| |
Collapse
|
26
|
Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets. Biochim Biophys Acta Gen Subj 2020; 1864:129535. [DOI: 10.1016/j.bbagen.2020.129535] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 01/09/2020] [Accepted: 01/14/2020] [Indexed: 11/18/2022]
|
27
|
Kęska P, Stadnik J. Structure-activity relationships study on biological activity of peptides as dipeptidyl peptidase IV inhibitors by chemometric modeling. Chem Biol Drug Des 2019; 95:291-301. [PMID: 31709757 DOI: 10.1111/cbdd.13643] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 11/01/2019] [Accepted: 11/06/2019] [Indexed: 12/17/2022]
Abstract
The aim of this study is to identify the potential descriptors affecting the inhibitory activity of the peptides inhibiting dipeptidyl peptidase IV (DPP-IV). This study provides important information for assessing the biological activity of the new peptide sequences of food origin or making structural modifications to the current inhibitors to improve their performance. For this purpose, the chemometric method describing the relationship between the structure of food peptides and their biological activity (structure-activity relationship [SAR]) was used to theoretically predict the potential of bioactivity of peptides. Data on the physicochemical properties of amino acids in the dipeptides acting as inhibitors of DPP-IV were collected and analyzed for using these properties as descriptors in further analysis. A total of 252 dipeptide sequences with confirmed DPP-IV inhibitory activity available in the BIOPEP-UWM database were included in the analysis, and 16 descriptors defining individual amino acids (such as molecular weight, polarity, hydropathicity, bulkiness, buried residue, and acceptable and normalized frequency of alpha-helix and beta-sheet) were identified. Based on this information, a data matrix was constructed and used in the chemometric analysis (principal component analysis and multiple linear regression). From the SAR model created, a multiple regression equation was derived to predict the biological activity of the dipeptide DPP-IV inhibitors.
Collapse
Affiliation(s)
- Paulina Kęska
- Department of Animal Raw Materials Technology, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, Lublin, Poland
| | - Joanna Stadnik
- Department of Animal Raw Materials Technology, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, Lublin, Poland
| |
Collapse
|
28
|
Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 2019; 20:1878-1912. [PMID: 30084866 PMCID: PMC6917215 DOI: 10.1093/bib/bby061] [Citation(s) in RCA: 266] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 05/25/2018] [Indexed: 01/16/2023] Open
Abstract
The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey
| | - Heval Atas
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| | - Rengul Cetin-Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tunca Doğan
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| |
Collapse
|
29
|
Xu B, Chung HY. Quantitative Structure-Activity Relationship Study of Bitter Di-, Tri- and Tetrapeptides Using Integrated Descriptors. Molecules 2019; 24:molecules24152846. [PMID: 31387305 PMCID: PMC6696392 DOI: 10.3390/molecules24152846] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 07/23/2019] [Accepted: 08/05/2019] [Indexed: 11/16/2022] Open
Abstract
New quantitative structure–activity relationship (QSAR) models for bitter peptides were built with integrated amino acid descriptors. Datasets contained 48 dipeptides, 52 tripeptides and 23 tetrapeptides with their reported bitter taste thresholds. Independent variables consisted of 14 amino acid descriptor sets. A bootstrapping soft shrinkage approach was utilized for variable selection. The importance of a variable was evaluated by both variable selecting frequency and standardized regression coefficient. Results indicated model qualities for di-, tri- and tetrapeptides with R2 and Q2 at 0.950 ± 0.002, 0.941 ± 0.001; 0.770 ± 0.006, 0.742 ± 0.004; and 0.972 ± 0.002, 0.956 ± 0.002, respectively. The hydrophobic C-terminal amino acid was the key determinant for bitterness in dipeptides, followed by the contribution of bulky hydrophobic N-terminal amino acids. For tripeptides, hydrophobicity of C-terminal amino acids and the electronic properties of the amino acids at the second position were important. For tetrapeptides, bulky hydrophobic amino acids at N-terminus, hydrophobicity and partial specific volume of amino acids at the second position, and the electronic properties of amino acids of the remaining two positions were critical. In summary, this study not only constructs reliable models for predicting the bitterness in different groups of peptides, but also facilitates better understanding of their structure-bitterness relationships and provides insights for their future studies.
Collapse
Affiliation(s)
- Biyang Xu
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Hau Yin Chung
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
| |
Collapse
|
30
|
Deng B, Long H, Tang T, Ni X, Chen J, Yang G, Zhang F, Cao R, Cao D, Zeng M, Yi L. Quantitative Structure-Activity Relationship Study of Antioxidant Tripeptides Based on Model Population Analysis. Int J Mol Sci 2019; 20:ijms20040995. [PMID: 30823542 PMCID: PMC6413046 DOI: 10.3390/ijms20040995] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 02/13/2019] [Accepted: 02/18/2019] [Indexed: 11/16/2022] Open
Abstract
Due to their beneficial effects on human health, antioxidant peptides have attracted much attention from researchers. However, the structure-activity relationships of antioxidant peptides have not been fully understood. In this paper, quantitative structure-activity relationships (QSAR) models were built on two datasets, i.e., the ferric thiocyanate (FTC) dataset and ferric-reducing antioxidant power (FRAP) dataset, containing 214 and 172 unique antioxidant tripeptides, respectively. Sixteen amino acid descriptors were used and model population analysis (MPA) was then applied to improve the QSAR models for better prediction performance. The results showed that, by applying MPA, the cross-validated coefficient of determination (Q²) was increased from 0.6170 to 0.7471 for the FTC dataset and from 0.4878 to 0.6088 for the FRAP dataset, respectively. These findings indicate that the integration of different amino acid descriptors provide additional information for model building and MPA can efficiently extract the information for better prediction performance.
Collapse
Affiliation(s)
- Baichuan Deng
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Hongrong Long
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Tianyue Tang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Xiaojun Ni
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Jialuo Chen
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Guangming Yang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Fan Zhang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Ruihua Cao
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China.
| | - Maomao Zeng
- State Key Laboratory of Food Science and Technology, International Joint Laboratory on Food Safety, Jiangnan University, Wuxi 214122, China.
| | - Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China.
| |
Collapse
|
31
|
|
32
|
Saito Y, Oikawa M, Nakazawa H, Niide T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins. ACS Synth Biol 2018; 7:2014-2022. [PMID: 30103599 DOI: 10.1021/acssynbio.8b00155] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Molecular evolution based on mutagenesis is widely used in protein engineering. However, optimal proteins are often difficult to obtain due to a large sequence space. Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we conduct two rounds of mutagenesis where an initial library of protein variants is used to train a machine-learning model to guide mutagenesis for the second-round library. This enables us to prepare a small library suited for screening experiments with high enrichment of functional proteins. We demonstrated a proof-of-concept of our approach by altering the reference green fluorescent protein (GFP) so that its fluorescence is changed into yellow. We successfully obtained a number of proteins showing yellow fluorescence, 12 of which had longer wavelengths than the reference yellow fluorescent protein (YFP). These results show the potential of our approach as a powerful method for directed evolution of fluorescent proteins.
Collapse
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Teppei Niide
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
33
|
Barley MH, Turner NJ, Goodacre R. Improved Descriptors for the Quantitative Structure-Activity Relationship Modeling of Peptides and Proteins. J Chem Inf Model 2018; 58:234-243. [PMID: 29338232 DOI: 10.1021/acs.jcim.7b00488] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The ability to model the activity of a protein using quantitative structure-activity relationships (QSAR) requires descriptors for the 20 naturally coded amino acids. In this work we show that by modifying some established descriptors we were able to model the activity data of 140 mutants of the enzyme epoxide hydrolase with improved accuracy. These new descriptors (referred to as physical descriptors) also gave very good results when tested against a series of four dipeptide data sets. The physical descriptors encode the amino acids using only two orthogonal scales: the first is strongly linked to hydrophilicity/hydrophobicity, and the second, to the volume of the amino acid residue. The use of these new amino acid descriptors should result in simpler and more readily interpretable models for the enzyme activity (and potentially other functions of interest, e.g., secondary and tertiary structure) of peptides and proteins.
Collapse
Affiliation(s)
- Mark H Barley
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Nicholas J Turner
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Royston Goodacre
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| |
Collapse
|
34
|
Comprehensive comparison of twenty structural characterization scales applied as QSAM of antimicrobial dodecapeptides derived from Bac2A against P. aeruginosa. J Mol Graph Model 2017; 71:88-95. [DOI: 10.1016/j.jmgm.2016.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Revised: 11/02/2016] [Accepted: 11/06/2016] [Indexed: 02/04/2023]
|
35
|
Qiu T, Qiu J, Feng J, Wu D, Yang Y, Tang K, Cao Z, Zhu R. The recent progress in proteochemometric modelling: focusing on target descriptors, cross-term descriptors and application scope. Brief Bioinform 2016; 18:125-136. [PMID: 26873661 DOI: 10.1093/bib/bbw004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 12/09/2015] [Indexed: 12/17/2022] Open
Abstract
As an extension of the conventional quantitative structure activity relationship models, proteochemometric (PCM) modelling is a computational method that can predict the bioactivity relations between multiple ligands and multiple targets. Traditional PCM modelling includes three essential elements: descriptors (including target descriptors, ligand descriptors and cross-term descriptors), bioactivity data and appropriate learning functions that link the descriptors to the bioactivity data. Since its appearance, PCM modelling has developed rapidly over the past decade by taking advantage of the progress of different descriptors and machine learning techniques, along with the increasing amounts of available bioactivity data. Specifically, the new emerging target descriptors and cross-term descriptors not only significantly increased the performance of PCM modelling but also expanded its application scope from traditional protein-ligand interaction to more abundant interactions, including protein-peptide, protein-DNA and even protein-protein interactions. In this review, target descriptors and cross-term descriptors, as well as the corresponding application scope, are intensively summarized. Additionally, we look forward to seeing PCM modelling extend into new application scopes, such as Target-Catalyst-Ligand systems, with the further development of descriptors, machine learning techniques and increasing amounts of available bioactivity data.
Collapse
|
36
|
Bosc N, Wroblowski B, Aci-Sèche S, Meyer C, Bonnet P. A Proteometric Analysis of Human Kinome: Insight into Discriminant Conformation-dependent Residues. ACS Chem Biol 2015; 10:2827-40. [PMID: 26411811 DOI: 10.1021/acschembio.5b00555] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Because of the success of imatinib, the first type-II kinase inhibitor approved by the FDA in 2001, sustained efforts have been made by the pharmaceutical industry to discover novel compounds stabilizing the inactive conformation of protein kinases. On the seven type-II inhibitors having reached the market, four were released in 2012, suggesting an acceleration of the research of such a class of compounds. Still, they represent less than a third of the protein kinase inhibitors available to patients today. The identification of key residues involved in the binding of this type of ligands in the kinase active site might ease the design of potent and selective type-II inhibitors. In order to identify those discriminant residues, we have developed a proteometric approach combining residue descriptors of protein kinase sequences and biological activities of various type-II kinase inhibitors. We applied Partial Least Squares (PLS) regression to identify 29 key residues that influence the binding of four type-II inhibitors to most proteins of the kinome. The gatekeeper residue was found to be the most relevant, confirming an essential role in ligand binding as well as in protein kinase conformational changes. Using the newly developed proteometric model, we predicted the propensity of each protein kinase to be inhibited by type-II ligands. The model was further validated using an external data set of protein/ligand activity pairs. Other residues present in the kinase domain, and more specifically in the binding site, have been highlighted by this approach, but their role in biological mechanisms is still unknown.
Collapse
Affiliation(s)
- Nicolas Bosc
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| | - Berthold Wroblowski
- Janssen Research & Development, a division of Janssen Pharmaceutica N.V., Turnhoutseweg 30, 2340 Beerse, Belgium
| | - Samia Aci-Sèche
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| | - Christophe Meyer
- Centre de Recherche Janssen-Cilag, Campus de Maigremont - CS
10615, 27106 Val de
Reuil Cedex, France
| | - Pascal Bonnet
- Institut
de Chimie Organique et Analytique (ICOA), UMR CNRS-Université d’Orléans 7311, Université d’Orléans
BP 6759, 45067 Orléans
Cedex 2, France
| |
Collapse
|
37
|
Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets. J Cheminform 2013; 5:42. [PMID: 24059743 PMCID: PMC4015169 DOI: 10.1186/1758-2946-5-42] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 amino acid descriptor sets have been benchmarked with respect to their ability of establishing bioactivity models. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI, BLOSUM, a novel protein descriptor set (termed ProtFP (4 variants)), and in addition we created and benchmarked three pairs of descriptor combinations. Prediction performance was evaluated in seven structure-activity benchmarks which comprise Angiotensin Converting Enzyme (ACE) dipeptidic inhibitor data, and three proteochemometric data sets, namely (1) GPCR ligands modeled against a GPCR panel, (2) enzyme inhibitors (NNRTIs) with associated bioactivities against a set of HIV enzyme mutants, and (3) enzyme inhibitors (PIs) with associated bioactivities on a large set of HIV enzyme mutants. Results The amino acid descriptor sets compared here show similar performance (<0.1 log units RMSE difference and <0.1 difference in MCC), while errors for individual proteins were in some cases found to be larger than those resulting from descriptor set differences ( > 0.3 log units RMSE difference and >0.7 difference in MCC). Combining different descriptor sets generally leads to better modeling performance than utilizing individual sets. The best performers were Z-scales (3) combined with ProtFP (Feature), or Z-Scales (3) combined with an average Z-Scale value for each target, while ProtFP (PCA8), ST-Scales, and ProtFP (Feature) rank last. Conclusions While amino acid descriptor sets capture different aspects of amino acids their ability to be used for bioactivity modeling is still – on average – surprisingly similar. Still, combining sets describing complementary information consistently leads to small but consistent improvement in modeling performance (average MCC 0.01 better, average RMSE 0.01 log units lower). Finally, performance differences exist between the targets compared thereby underlining that choosing an appropriate descriptor set is of fundamental for bioactivity modeling, both from the ligand- as well as the protein side.
Collapse
|
38
|
van Westen GJ, Swier RF, Wegner JK, Ijzerman AP, van Vlijmen HW, Bender A. Benchmarking of protein descriptor sets in proteochemometric modeling (part 1): comparative study of 13 amino acid descriptor sets. J Cheminform 2013; 5:41. [PMID: 24059694 PMCID: PMC3848949 DOI: 10.1186/1758-2946-5-41] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
Background While a large body of work exists on comparing and benchmarking of descriptors of molecular structures, a similar comparison of protein descriptor sets is lacking. Hence, in the current work a total of 13 different protein descriptor sets have been compared with respect to their behavior in perceiving similarities between amino acids. The descriptor sets included in the study are Z-scales (3 variants), VHSE, T-scales, ST-scales, MS-WHIM, FASGAI and BLOSUM, and a novel protein descriptor set termed ProtFP (4 variants). We investigate to which extent descriptor sets show collinear as well as orthogonal behavior via principal component analysis (PCA). Results In describing amino acid similarities, MSWHIM, T-scales and ST-scales show related behavior, as do the VHSE, FASGAI, and ProtFP (PCA3) descriptor sets. Conversely, the ProtFP (PCA5), ProtFP (PCA8), Z-Scales (Binned), and BLOSUM descriptor sets show behavior that is distinct from one another as well as both of the clusters above. Generally, the use of more principal components (>3 per amino acid, per descriptor) leads to a significant differences in the way amino acids are described, despite that the later principal components capture less variation per component of the original input data. Conclusion In this work a comparison is provided of how similar (and differently) currently available amino acids descriptor sets behave when converting structure to property space. The results obtained enable molecular modelers to select suitable amino acid descriptor sets for structure-activity analyses, e.g. those showing complementary behavior.
Collapse
Affiliation(s)
- Gerard Jp van Westen
- Division of Medicinal Chemistry, Leiden / Amsterdam Center for Drug Research, Einsteinweg 55, Leiden 2333, CC, The Netherlands.
| | | | | | | | | | | |
Collapse
|
39
|
Borkar MR, Pissurlenkar RRS, Coutinho EC. HomoSAR: Bridging comparative protein modeling with quantitative structural activity relationship to design new peptides. J Comput Chem 2013; 34:2635-46. [DOI: 10.1002/jcc.23436] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Revised: 08/17/2013] [Accepted: 08/21/2013] [Indexed: 12/19/2022]
Affiliation(s)
- Mahesh R. Borkar
- Department of Pharmaceutical Chemistry; Bombay College of Pharmacy; Kalina, Santacruz (East) Mumbai 400098 India
| | - Raghuvir R. S. Pissurlenkar
- Department of Pharmaceutical Chemistry; Bombay College of Pharmacy; Kalina, Santacruz (East) Mumbai 400098 India
| | - Evans C. Coutinho
- Department of Pharmaceutical Chemistry; Bombay College of Pharmacy; Kalina, Santacruz (East) Mumbai 400098 India
| |
Collapse
|
40
|
Wang JH, Liu YL, Ning JH, Yu J, Li XH, Wang FX. Is the structural diversity of tripeptides sufficient for developing functional food additives with satisfactory multiple bioactivities? J Mol Struct 2013. [DOI: 10.1016/j.molstruc.2013.03.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
41
|
Characterization of structure–antioxidant activity relationship of peptides in free radical systems using QSAR models: Key sequence positions and their amino acid properties. J Theor Biol 2013; 318:29-43. [DOI: 10.1016/j.jtbi.2012.10.029] [Citation(s) in RCA: 144] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 10/21/2012] [Accepted: 10/22/2012] [Indexed: 11/22/2022]
|
42
|
Quantitative Structure-Activity Relationship Study of Radical Scavenging Peptides Based on Orac Method by Using Different Sets of Amino Acids Descriptor. ACTA ACUST UNITED AC 2011. [DOI: 10.4028/www.scientific.net/amr.365.169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Some radical scavenging peptides by ORAC method from different hydrolysates were used for the quantitative structure-activity relationships (QSAR) research. Partial least-squares regression analysis (PLSR) was treated as the method to build the model with 17 kinds of amino acid descriptors. In order to translate the sequence to the same length, two-terminal position numbering (TTPN) was applied. Two of amino acid descriptors VSHE and VSW were selected for their excellent performance (R2, Q2, and RMSEcwith VHSE and VSW descriptor are 0.995, 0.630, 0.318 and 0.966, 0.543, 0.181 respectively). VHSE has the definite physicochemical meanings and easy to understand while VSW has good predictive ability (Rand RMSEpwith VHSE and VSW are 0.404, 2.633 and 0.635, 2.298 respectively). It is believed that the position No.2 amino acid from N-terminal (N2) have more importance than others in sequence, and most of electronic properties are negative to activity while all the steric properties are positive to activity as well as the hydrophobic properties. The suitable amino acids in sequence are as follow: G, R, K, W, Y, N, E, H, and Q are suitable for N2position which illustrated the importance of acidic amino acids in peptide sequence for radical scavenging activity.
Collapse
|
43
|
Li YW, Li B, He J, Qian P. Quantitative structure–activity relationship study of antioxidative peptide by using different sets of amino acids descriptors. J Mol Struct 2011. [DOI: 10.1016/j.molstruc.2011.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
44
|
ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues. Amino Acids 2009; 38:805-16. [PMID: 19373543 DOI: 10.1007/s00726-009-0287-y] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2008] [Accepted: 03/25/2009] [Indexed: 10/20/2022]
Abstract
In this study, structural topology scale (ST-scale) was recruited as a novel structural topological descriptor derived from principal component analysis on 827 structural variables of 167 amino acids. By using partial least squares (PLS), we applied ST-scale for the study of quantitative sequence-activity models (QSAMs) on three peptide datasets (58 angiotensin-converting enzyme (ACE) inhibitors, 34 antimicrobial peptides (AMPs) and 89 elastase substrates (ES)). The results of QSAMs were superior to that of the earlier studies, with determination coefficient (r(2)) and cross-validated (q(2)) equal to 0.855, 0.774; 0.79, 0.371 (OSC-PLS: 0.995, 0.848) and 0.846, 0.747, respectively. Therefore, ST-scale descriptors were considered to be competent to extract information from 827 structural variables and relate with their bioactivities.
Collapse
|