1
|
Karolak A, Urbaniak K, Monastyrskyi A, Duckett DR, Branciamore S, Stewart PA. Structure-independent machine-learning predictions of the CDK12 interactome. Biophys J 2024:S0006-3495(24)00344-8. [PMID: 38762754 DOI: 10.1016/j.bpj.2024.05.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 04/24/2024] [Accepted: 05/15/2024] [Indexed: 05/20/2024] Open
Abstract
Cyclin-dependent kinase 12 (CDK12) is a critical regulatory protein involved in transcription and DNA repair processes. Dysregulation of CDK12 has been implicated in various diseases, including cancer. Understanding the CDK12 interactome is pivotal for elucidating its functional roles and potential therapeutic targets. Traditional methods for interactome prediction often rely on protein structure information, limiting applicability to CDK12 characterized by partly disordered terminal C region. In this study, we present a structure-independent machine-learning model that utilizes proteins' sequence and functional data to predict the CDK12 interactome. This approach is motivated by the disordered character of the CDK12 C-terminal region mitigating a structure-driven search for binding partners. Our approach incorporates multiple data sources, including protein-protein interaction networks, functional annotations, and sequence-based features, to construct a comprehensive CDK12 interactome prediction model. The ability to predict CDK12 interactions without relying on structural information is a significant advancement, as many potential interaction partners may lack crystallographic data. In conclusion, our structure-independent machine-learning model presents a powerful tool for predicting the CDK12 interactome and holds promise in advancing our understanding of CDK12 biology, identifying potential therapeutic targets, and facilitating precision-medicine approaches for CDK12-associated diseases.
Collapse
Affiliation(s)
| | - Konstancja Urbaniak
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | | | - Derek R Duckett
- Department of Drug Discovery, Moffitt Cancer Center, Tampa, Florida
| | - Sergio Branciamore
- Department of Computational and Quantitative Medicine, City of Hope, Duarte, California
| | - Paul A Stewart
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, Florida
| |
Collapse
|
2
|
Huang J, Osthushenrich T, MacNamara A, Mälarstig A, Brocchetti S, Bradberry S, Scarabottolo L, Ferrada E, Sosnin S, Digles D, Superti-Furga G, Ecker GF. ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction. RSC Adv 2024; 14:13083-13094. [PMID: 38655474 PMCID: PMC11034476 DOI: 10.1039/d4ra00748d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.
Collapse
Affiliation(s)
- Jiahui Huang
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Tanja Osthushenrich
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Aidan MacNamara
- Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany
| | - Anders Mälarstig
- Emerging Science & Innovation, Pfizer Worldwide Research, Development and Medical Cambridge MA USA
| | | | | | | | - Evandro Ferrada
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Sergey Sosnin
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| | - Giulio Superti-Furga
- CeMM, Research Center for Molecular Medicine of the Austrian Academy of Sciences Vienna Austria
| | - Gerhard F Ecker
- University of Vienna, Department of Pharmaceutical Sciences Vienna Austria
| |
Collapse
|
3
|
Zidi M, Khoffi F, Girault E, Eidenschenk A, Barbet R, Tazibt A, Heim F, Msahli S. Medical textile implants: hybrid fibrous constructions towards improved performances. BIOMED ENG-BIOMED TE 2024; 0:bmt-2023-0335. [PMID: 38462974 DOI: 10.1515/bmt-2023-0335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Accepted: 02/21/2024] [Indexed: 03/12/2024]
Abstract
OBJECTIVES One main challenge for textile implants is to limit the foreign body reaction (FBR) and in particular the fibrosis development once the device is implanted. Fibrotic tissue in-growth depends on the fiber size, the pore size, and the organization of the fibrous construction. Basically, non-woven fibrous assemblies present a more favorable interface to biological tissues than do woven structures. However, they are mechanically less strong. In order to combine both strength and appropriate topography properties, the design of a hybrid fibrous construct was considered and discussed in this work. METHODS Two polyethylene terephthalate (PET) weaves (satin and plain) were assembled with a non-woven PET mat, using an ultrasound welding process. RESULTS The physical and mechanical properties of the construction as well as its ability to interact with the biological environment were then evaluated. In particular, the wettability of the obtained substrate as well as its ability to interact with mesenchymal stem cells (MSC) at 24 h (adhesion) and 72 h (proliferation) in vitro were studied. CONCLUSIONS The results show that the non-woven layer helps limiting cell proliferation in the plain weave construction and promotes conversely proliferation in the satin construction.
Collapse
Affiliation(s)
- Malèke Zidi
- Laboratoire de Génie Textile (LGTex), Ksar-Hellal, Tunisia
| | - Foued Khoffi
- Laboratoire de Génie Textile (LGTex), Ksar-Hellal, Tunisia
| | - Elise Girault
- Laboratoire de Physique et Mécanique Textiles (LPMT), ENSISA, Mulhouse, France
| | | | - Romain Barbet
- Institut de Recherche en Hématologie et Transplantation (IRHT), Mulhouse, France
| | - Abdel Tazibt
- CRITT Techniques Jet Fluide et Usinage (TJFU), Bar-Le-Duc, France
| | - Fréderic Heim
- Laboratoire de Physique et Mécanique Textiles (LPMT), ENSISA, Mulhouse, France
- GEPROMED, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Slah Msahli
- Laboratoire de Génie Textile (LGTex), Ksar-Hellal, Tunisia
| |
Collapse
|
4
|
Wang J, Chen C, Yao G, Ding J, Wang L, Jiang H. Intelligent Protein Design and Molecular Characterization Techniques: A Comprehensive Review. Molecules 2023; 28:7865. [PMID: 38067593 PMCID: PMC10707872 DOI: 10.3390/molecules28237865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
In recent years, the widespread application of artificial intelligence algorithms in protein structure, function prediction, and de novo protein design has significantly accelerated the process of intelligent protein design and led to many noteworthy achievements. This advancement in protein intelligent design holds great potential to accelerate the development of new drugs, enhance the efficiency of biocatalysts, and even create entirely new biomaterials. Protein characterization is the key to the performance of intelligent protein design. However, there is no consensus on the most suitable characterization method for intelligent protein design tasks. This review describes the methods, characteristics, and representative applications of traditional descriptors, sequence-based and structure-based protein characterization. It discusses their advantages, disadvantages, and scope of application. It is hoped that this could help researchers to better understand the limitations and application scenarios of these methods, and provide valuable references for choosing appropriate protein characterization techniques for related research in the field, so as to better carry out protein research.
Collapse
Affiliation(s)
| | | | | | - Junjie Ding
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Liangliang Wang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| | - Hui Jiang
- State Key Laboratory of NBC Protection for Civilian, Beijing 102205, China; (J.W.); (C.C.); (G.Y.)
| |
Collapse
|
5
|
Du A, Jia W. Virtual screening, identification, and potential antioxidant mechanism of novel bioactive peptides during aging by a short-chain peptidomics, quantitative structure-activity relationship analysis, and molecular docking. Food Res Int 2023; 172:113129. [PMID: 37689894 DOI: 10.1016/j.foodres.2023.113129] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/08/2023] [Accepted: 06/09/2023] [Indexed: 09/11/2023]
Abstract
Antioxidant peptides have received a great deal of attention. However, only a few studies have been conducted on the antioxidant peptides originating from Baijiu. A total of 1490 features deemed potential short-chain peptides (the amino acid number between 2 and 4, SCPs) were screened and analyzed by a customized short-chain peptidomics approach in Feng-flavor Baijiu (FFB) during 14 years of aging, with an obvious discrepancy between FFB aged for 3 years and 6 years being observed. Thirty-nine characteristic SCPs in total were identified and accurately quantified by high-throughput parallel reaction monitoring-based synthetic standards, with the contents ranging from 0.16 to 279.33 μg L-1. Combined with the absorption, distribution, metabolism, excretion, and toxicity analysis model, PGRW, WK, SC, and PAW, four novel antioxidant peptides with high ABTS radical scavenging capacity, were obtained using a customized quantitative structure-activity relationship (QSAR) model based on a two terminal position numbering method, with satisfied coefficients of determination (R2), internal cross-validated R2 (Q2), and external R2 (R2pre) of 0.925, 0.808, and 0.665, respectively. Furthermore, these 4 antioxidant peptides could block the Keap-Nrf2 interaction and promote the accumulation of Nrf2 by molecular docking analysis, and the interaction energy between peptide PGRW and Keap1 was higher than that between epigallocatechin gallate and Keap1 based on CHARMm forced field. Overall, this study facilitated the discovery of functional peptides in Baijiu and the understanding of aging mechanisms.
Collapse
Affiliation(s)
- An Du
- School of Food and Biological Engineering, Shaanxi University of Science & Technology, Xi'an 710021, China
| | - Wei Jia
- School of Food and Biological Engineering, Shaanxi University of Science & Technology, Xi'an 710021, China; Shaanxi Research Institute of Agricultural Products Processing Technology, Xi'an 710021, China.
| |
Collapse
|
6
|
Codina JR, Mascini M, Dikici E, Deo SK, Daunert S. Accelerating the Screening of Small Peptide Ligands by Combining Peptide-Protein Docking and Machine Learning. Int J Mol Sci 2023; 24:12144. [PMID: 37569520 PMCID: PMC10419121 DOI: 10.3390/ijms241512144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/19/2023] [Accepted: 07/28/2023] [Indexed: 08/13/2023] Open
Abstract
This research introduces a novel pipeline that couples machine learning (ML), and molecular docking for accelerating the process of small peptide ligand screening through the prediction of peptide-protein docking. Eight ML algorithms were analyzed for their potential. Notably, Light Gradient Boosting Machine (LightGBM), despite having comparable F1-score and accuracy to its counterparts, showcased superior computational efficiency. LightGBM was used to classify peptide-protein docking performance of the entire tetrapeptide library of 160,000 peptide ligands against four viral envelope proteins. The library was classified into two groups, 'better performers' and 'worse performers'. By training the LightGBM algorithm on just 1% of the tetrapeptide library, we successfully classified the remaining 99%with an accuracy range of 0.81-0.85 and an F1-score between 0.58-0.67. Three different molecular docking software were used to prove that the process is not software dependent. With an adjustable probability threshold (from 0.5 to 0.95), the process could be accelerated by a factor of at least 10-fold and still get 90-95% concurrence with the method without ML. This study validates the efficiency of machine learning coupled to molecular docking in rapidly identifying top peptides without relying on high-performance computing power, making it an effective tool for screening potential bioactive compounds.
Collapse
Affiliation(s)
- Josep-Ramon Codina
- Department of Biochemistry and Molecular Biology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (J.-R.C.); (E.D.); (S.K.D.)
| | - Marcello Mascini
- Department of Bioscience and Technology for Food, Agriculture and Environment, University of Teramo, 64100 Teramo, Italy
| | - Emre Dikici
- Department of Biochemistry and Molecular Biology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (J.-R.C.); (E.D.); (S.K.D.)
- Dr. John T. Macdonald Foundation Biomedical Nanotechnology Institute (BioNIUM), University of Miami, Miami, FL 33136, USA
| | - Sapna K. Deo
- Department of Biochemistry and Molecular Biology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (J.-R.C.); (E.D.); (S.K.D.)
- Dr. John T. Macdonald Foundation Biomedical Nanotechnology Institute (BioNIUM), University of Miami, Miami, FL 33136, USA
| | - Sylvia Daunert
- Department of Biochemistry and Molecular Biology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA; (J.-R.C.); (E.D.); (S.K.D.)
- Dr. John T. Macdonald Foundation Biomedical Nanotechnology Institute (BioNIUM), University of Miami, Miami, FL 33136, USA
- Clinical and Translational Science Institute (CTSI), University of Miami, Miami, FL 33136, USA
| |
Collapse
|
7
|
Cesaro A, Bagheri M, Torres MDT, Wan F, de la Fuente-Nunez C. Deep learning tools to accelerate antibiotic discovery. Expert Opin Drug Discov 2023; 18:1245-1257. [PMID: 37794737 PMCID: PMC10790350 DOI: 10.1080/17460441.2023.2250721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/18/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION As machine learning (ML) and artificial intelligence (AI) expand to many segments of our society, they are increasingly being used for drug discovery. Recent deep learning models offer an efficient way to explore high-dimensional data and design compounds with desired properties, including those with antibacterial activity. AREAS COVERED This review covers key frameworks in antibiotic discovery, highlighting physicochemical features and addressing dataset limitations. The deep learning approaches here described include discriminative models such as convolutional neural networks, recurrent neural networks, graph neural networks, and generative models like neural language models, variational autoencoders, generative adversarial networks, normalizing flow, and diffusion models. As the integration of these approaches in drug discovery continues to evolve, this review aims to provide insights into promising prospects and challenges that lie ahead in harnessing such technologies for the development of antibiotics. EXPERT OPINION Accurate antimicrobial prediction using deep learning faces challenges such as imbalanced data, limited datasets, experimental validation, target strains, and structure. The integration of deep generative models with bioinformatics, molecular dynamics, and data augmentation holds the potential to overcome these challenges, enhance model performance, and utlimately accelerate antimicrobial discovery.
Collapse
Affiliation(s)
- Angela Cesaro
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Mojtaba Bagheri
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Marcelo D. T. Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Fangping Wan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
8
|
Ogawa Y, Saito Y, Yamaguchi H, Katsuyama Y, Ohnishi Y. Engineering the Substrate Specificity of Toluene Degrading Enzyme XylM Using Biosensor XylS and Machine Learning. ACS Synth Biol 2023; 12:572-582. [PMID: 36734676 DOI: 10.1021/acssynbio.2c00577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Enzyme engineering using machine learning has been developed in recent years. However, to obtain a large amount of data on enzyme activities for training data, it is necessary to develop a high-throughput and accurate method for evaluating enzyme activities. Here, we examined whether a biosensor-based enzyme engineering method can be applied to machine learning. As a model experiment, we aimed to modify the substrate specificity of XylM, a rate-determining enzyme in a multistep oxidation reaction catalyzed by XylMABC in Pseudomonas putida. XylMABC naturally converts toluene and xylene to benzoic acid and toluic acid, respectively. We aimed to engineer XylM to improve its conversion efficiency to a non-native substrate, 2,6-xylenol. Wild-type XylMABC slightly converted 2,6-xylenol to 3-methylsalicylic acid, which is the ligand of the transcriptional regulator XylS in P. putida. By locating a fluorescent protein gene under the control of the Pm promoter to which XylS binds, a XylS-producing Escherichia coli strain showed higher fluorescence intensity in a 3-methylsalicylic acid concentration-dependent manner. We evaluated the 3-methylsalicylic acid productivity of XylM variants using the fluorescence intensity of the sensor strain as an indicator. The obtained data provided the training data for machine learning for the directed evolution of XylM. Two cycles of machine learning-assisted directed evolution resulted in the acquisition of XylM-D140E-V144K-F243L-N244S with 15 times higher productivity than wild-type XylM. These results demonstrate that an indirect enzyme activity evaluation method using biosensors is sufficiently quantitative and high-throughput to be used as training data for machine learning. The findings expand the versatility of machine learning in enzyme engineering.
Collapse
Affiliation(s)
- Yuki Ogawa
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan
| | - Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo169-8555, Japan.,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Hideki Yamaguchi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba277-8561, Japan
| | - Yohei Katsuyama
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| | - Yasuo Ohnishi
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo113-8657, Japan.,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo113-8657, Japan
| |
Collapse
|
9
|
Lin J, Wen L, Zhou Y, Wang S, Ye H, Su J, Li J, Shu J, Huang J, Zhou P. PepQSAR: a comprehensive data source and information platform for peptide quantitative structure-activity relationships. Amino Acids 2023; 55:235-242. [PMID: 36474016 DOI: 10.1007/s00726-022-03219-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022]
Abstract
Peptide quantitative structure-activity relationships (pQSARs) have been widely applied to the statistical modeling and empirical prediction of peptide activity, property and feature. In the procedure, the peptide structure is characterized at sequence level using amino acid descriptors (AADs) and then correlated with observations by machine learning methods (MLMs), consequently resulting in a variety of quantitative regression models used to explain the structural factors that govern peptide activities, to generalize peptide properties of unknown from known samples, and to design new peptides with desired features. In this study, we developed a comprehensive platform, termed PepQSAR database, which is a systematic collection and decomposition of various data sources and abundant information regarding the pQSARs, including AADs, MLMs, data sets, peptide sequences, measured activities, model statistics, and literatures. The database also provides a comparison function for the various previously built pQSAR models reported by different groups via distinct approaches. The structured and searchable PepQSAR database is expected to provide a useful resource and powerful tool for the computational peptidology community, which is freely available at http://i.uestc.edu.cn/PQsarDB .
Collapse
Affiliation(s)
- Jing Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Li Wen
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Yuwei Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Shaozhou Wang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Haiyang Ye
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jun Su
- College of Music, Chengdu Normal University, Chengdu, 611130, China
| | - Juelin Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jianping Shu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China
| | - Jian Huang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China.
| | - Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), No. 2006 Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China.
| |
Collapse
|
10
|
Yue ZX, Yan TC, Xu HQ, Liu YH, Hong YF, Chen GX, Xie T, Tao L. A systematic review on the state-of-the-art strategies for protein representation. Comput Biol Med 2023; 152:106440. [PMID: 36543002 DOI: 10.1016/j.compbiomed.2022.106440] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/08/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022]
Abstract
The study of drug-target protein interaction is a key step in drug research. In recent years, machine learning techniques have become attractive for research, including drug research, due to their automated nature, predictive power, and expected efficiency. Protein representation is a key step in the study of drug-target protein interaction by machine learning, which plays a fundamental role in the ultimate accomplishment of accurate research. With the progress of machine learning, protein representation methods have gradually attracted attention and have consequently developed rapidly. Therefore, in this review, we systematically classify current protein representation methods, comprehensively review them, and discuss the latest advances of interest. According to the information extraction methods and information sources, these representation methods are generally divided into structure and sequence-based representation methods. Each primary class can be further divided into specific subcategories. As for the particular representation methods involve both traditional and the latest approaches. This review contains a comprehensive assessment of the various methods which researchers can use as a reference for their specific protein-related research requirements, including drug research.
Collapse
Affiliation(s)
- Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
11
|
Deng W, Sha J, Xue F, Jami-Alahmadi Y, Plath K, Wohlschlegel J. High-Field Asymmetric Waveform Ion Mobility Spectrometry Interface Enhances Parallel Reaction Monitoring on an Orbitrap Mass Spectrometer. Anal Chem 2022; 94:15939-15947. [PMID: 36347042 PMCID: PMC9685594 DOI: 10.1021/acs.analchem.2c01287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
High-field asymmetric waveform ion mobility spectrometry (FAIMS) enables gas-phase separations on a chromatographic time scale and has become a useful tool for proteomic applications. Despite its emerging utility, however, the molecular determinants underlying peptide separation by FAIMS have not been systematically investigated. Here, we characterize peptide transmission in a FAIMS device across a broad range of compensation voltages (CVs) and used machine learning to identify charge state and three-dimensional (3D) electrostatic peptide potential as major contributors to peptide intensity at a given CV. We also demonstrate that the machine learning model can be used to predict optimized CV values for peptides, which significantly improves parallel reaction monitoring workflows. Together, these data provide insight into peptide separation by FAIMS and highlight its utility in targeted proteomic applications.
Collapse
Affiliation(s)
- Weixian Deng
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States,Molecular
Biology Interdepartmental Graduate Program, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Jihui Sha
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Fanglei Xue
- University
of Technology Sydney, Ultimo, New South Wales 2007, Australia
| | - Yasaman Jami-Alahmadi
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - Kathrin Plath
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States
| | - James Wohlschlegel
- David
Geffen School of Medicine, Department of Biological Chemistry, University of California Los Angeles, Los Angeles, California 90095, United States,
| |
Collapse
|
12
|
Janairo JIB. Machine Learning Model for Biomimetic Chromatography Peptide Ligands. ACS APPLIED BIO MATERIALS 2022; 5:5264-5269. [PMID: 36265018 DOI: 10.1021/acsabm.2c00684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Purification is an essential part of antibody production, which are important therapeutic biomolecules. Common methods of antibody purification rely on affinity chromatography (AC), wherein whole proteins are oftentimes used as ligands to catch the antibodies to be purified. While AC has been successful in purifying antibodies, it is associated with multiple challenges such as high cost and low stability, among others. A promising alternative is using short peptide sequences in place of whole proteins as the stationary phase for the chromatographic separation of the antibodies. In an effort to accelerate the discovery and development of short peptides for biomimetic chromatography, this study reports the creation of a machine learning classification which was trained and tested on 480 tetrapeptides. The optimized logistic regression model uses Cruciani properties as the input variables and can categorize peptides into one of two classes based on their binding affinity with immunoglobulin G (IgG). The externally validated model demonstrates satisfactory predictive performance and excellent discrimination as demonstrated by performance metrics such as AUC = 0.874, Balanced Accuracy = 0.874, F1 = 0.871, Precision = 0.884, and Recall = 0.859. Apart from this, the classifier has also provided valuable insights into important variables that influence the classification, such as electrostatic and hydrophobic interactions. Overall, the classifier can be regarded as a welcome development for biomimetic chromatography and is the first study that aims to integrate machine learning in the biomimetic chromatography peptide development process.
Collapse
Affiliation(s)
- Jose Isagani B Janairo
- Department of Biology, De La Salle University, 2401 Taft Avenue, 0922Manila, Philippines
| |
Collapse
|
13
|
Du Z, Wang D, Li Y. Comprehensive Evaluation and Comparison of Machine Learning Methods in QSAR Modeling of Antioxidant Tripeptides. ACS OMEGA 2022; 7:25760-25771. [PMID: 35910147 PMCID: PMC9330208 DOI: 10.1021/acsomega.2c03062] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
Due to their multiple beneficial effects, antioxidant peptides have attracted increasing interest. Currently, the screening and identification of bioactive peptides, including antioxidative peptides based on wet-chemistry methods are time-consuming and highly rely on many advanced instruments and trained personnel. Quantitative structure-activity relationship (QSAR) analysis as an in silico method can be more efficient and cost-effective. However, model performance of QSAR studies on antioxidant peptides was still poor due to limited attempts in model development approaches. The objective of this study was to compare popular machine learning methods for antioxidant activity modeling and screening of tripeptides and identify the critical amino acid features that determine the antioxidant activity. 533 numerical indices of amino acids were adopted to characterize 130 tripeptides with known antioxidant activity from the published literature, and then 7 feature selection strategies plus pairwise correlation were used to screen the most important indices for antioxidant activity and model building. 14 machine learning methods were used to build models based on the feature selection strategies, respectively. Among the 98 models, non-linear regression methods tended to perform better, and the best model with an R 2 Test of 0.847 and RMSETest of 0.627 for tripeptide antioxidants was obtained by combining random forest for feature selection and tree-based extreme gradient boost regression for model development. Based on the predicted antioxidant values of 7870 unknown tripeptides, potentially high antioxidant activity tripeptides all have a tyrosine, tryptophan, or cysteine residue at the C-terminal position. Furthermore, the predicted antioxidant activity of six synthesized tripeptides was confirmed through experimental determination, and for the first time, the cysteine or tyrosine residue at the C-terminal was found to be critical to the antioxidant activity based on both QSAR models and experimental observations.
Collapse
Affiliation(s)
- Zhenjiao Du
- Department
of Grain Science and Industry, Kansas State
University, Manhattan, Kansas 66506, United States
| | - Donghai Wang
- Department
of Biological and Agricultural Engineering, Kansas State University, Manhattan, Kansas 66506, United States
| | - Yonghui Li
- Department
of Grain Science and Industry, Kansas State
University, Manhattan, Kansas 66506, United States
| |
Collapse
|
14
|
Lertampaiporn S, Hongsthong A, Wattanapornprom W, Thammarongtham C. Ensemble-AHTPpred: A Robust Ensemble Machine Learning Model Integrated With a New Composite Feature for Identifying Antihypertensive Peptides. Front Genet 2022; 13:883766. [PMID: 35571042 PMCID: PMC9096110 DOI: 10.3389/fgene.2022.883766] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 04/04/2022] [Indexed: 11/13/2022] Open
Abstract
Hypertension or elevated blood pressure is a serious medical condition that significantly increases the risks of cardiovascular disease, heart disease, diabetes, stroke, kidney disease, and other health problems, that affect people worldwide. Thus, hypertension is one of the major global causes of premature death. Regarding the prevention and treatment of hypertension with no or few side effects, antihypertensive peptides (AHTPs) obtained from natural sources might be useful as nutraceuticals. Therefore, the search for alternative/novel AHTPs in food or natural sources has received much attention, as AHTPs may be functional agents for human health. AHTPs have been observed in diverse organisms, although many of them remain underinvestigated. The identification of peptides with antihypertensive activity in the laboratory is time- and resource-consuming. Alternatively, computational methods based on robust machine learning can identify or screen potential AHTP candidates prior to experimental verification. In this paper, we propose Ensemble-AHTPpred, an ensemble machine learning algorithm composed of a random forest (RF), a support vector machine (SVM), and extreme gradient boosting (XGB), with the aim of integrating diverse heterogeneous algorithms to enhance the robustness of the final predictive model. The selected feature set includes various computed features, such as various physicochemical properties, amino acid compositions (AACs), transitions, n-grams, and secondary structure-related information; these features are able to learn more information in terms of analyzing or explaining the characteristics of the predicted peptide. In addition, the tool is integrated with a newly proposed composite feature (generated based on a logistic regression function) that combines various feature aspects to enable improved AHTP characterization. Our tool, Ensemble-AHTPpred, achieved an overall accuracy above 90% on independent test data. Additionally, the approach was applied to novel experimentally validated AHTPs, obtained from recent studies, which did not overlap with the training and test datasets, and the tool could precisely predict these AHTPs.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
- *Correspondence: Chinae Thammarongtham,
| |
Collapse
|
15
|
Janairo JIB. A Machine Learning Classification Model for Gold-Binding Peptides. ACS OMEGA 2022; 7:14069-14073. [PMID: 35559171 PMCID: PMC9089360 DOI: 10.1021/acsomega.2c00640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 03/31/2022] [Indexed: 06/15/2023]
Abstract
There has been growing interest in using peptides for the controlled synthesis of nanomaterials. Peptides play a crucial role not only in regulating the nanostructure formation process but also in influencing the resulting properties of the nanomaterials. Leveraging machine learning (ML) in the biomimetic workflow is anticipated to accelerate peptide discovery, make the process more resource-efficient, and unravel associations among attributes that may be useful in peptide design. In this study, a binary ML classifier is formulated that was trained and tested on 1720 peptide examples. The support vector machine classifier uses Kidera factors to categorize peptides into one of two groups based on their binding ability. The classifier exhibits satisfactory performance, as demonstrated by various performance metrics. In addition, key variables that bear a huge impact on the model were identified, such as peptide hydrophobicity. As these trends were derived from a large and diverse dataset, the insights drawn from the data are expected to be generalizable and robust. Thus, the presented ML model is an important step toward the rational and predictive peptide design.
Collapse
|
16
|
Shao X, Kong W, Li Y, Zhang S. Quantitative structure-activity relationship modeling reveals the minimal sequence requirement and amino acid preference of sirtuin-1's deacetylation substrates in diabetes mellitus. J Bioinform Comput Biol 2022; 20:2250008. [PMID: 35451939 DOI: 10.1142/s0219720022500081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Sirtuin 1 (SIRT1) is a nicotinamide adenine dinucleotide (NAD[Formula: see text]-dependent deacetylase involved in multiple glucose metabolism pathways and plays an important role in the pathogenesis of diabetes mellitus (DM). The enzyme specifically recognizes its deacetylation substrates' peptide segments containing a central acetyl-lysine residue as well as a number of amino acids flanking the central residue. In this study, we attempted to ascertain the minimal sequence requirement (MSR) around the central acetyl-lysine residue of SIRT1 substrate-recognition sites as well as the amino acid preference (AAP) at different residues of the MSR window through quantitative structure-activity relationship (QSAR) strategy, which would benefit our understanding of SIRT1 substrate specificity at the molecular level and is also helpful to rationally design substrate-mimicking peptidic agents against DM by competitively targeting SIRT1 active site. In this procedure, a large-scale dataset containing 6801 13-mer acetyl-lysine peptides (and their SIRT1-catalyized deacetylation activities) were compiled to train 10 QSAR regression models developed by systematic combination of machine learning methods (PLS and SVM) and five amino acids descriptors (DPPS, T-scale, MolSurf, [Formula: see text]-score, and FASGAI). The two best QSAR models (PLS+FASGAI and SVM+DPPS) were then employed to statistically examine the contribution of residue positions to the deacetylation activity of acetyl-lysine peptide substrates, revealing that the MSR can be represented by 5-mer acetyl-lysine peptides that meet a consensus motif X[Formula: see text]X[Formula: see text]X[Formula: see text](AcK)0X[Formula: see text]. Structural analysis found that the X[Formula: see text] and (AcK)0 residues are tightly packed against the enzyme active site and confer both stability and specificity for the enzyme-substrate complex, whereas the X[Formula: see text], X[Formula: see text] and X[Formula: see text] residues are partially exposed to solvent but can also effectively stabilize the complex system. Subsequently, a systematic deacetylation activity change profile (SDACP) was created based on QSAR modeling, from which the AAP for each residue position of MSR was depicted. With the profile, we were able to rationally design an SDACP combinatorial library with promising deacetylation activity, from which nine MSR acetyl-lysine peptides as well as two known SIRT1 acetyl-lysine peptide substrates were tested by using SIRT1 deacetylation assay. It is revealed that the designed peptides exhibit a comparable or even higher activity than the controls, although the former is considerably shorter than the latter.
Collapse
Affiliation(s)
- X Shao
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - W Kong
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - Y Li
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| | - S Zhang
- Department of Nephrology, Suzhou Kowloon Hospital, Shanghai Jiao Tong University, School of Medicine, Suzhou 215000, P. R. China
| |
Collapse
|
17
|
Du BX, Qin Y, Jiang YF, Xu Y, Yiu SM, Yu H, Shi JY. Compound–protein interaction prediction by deep learning: Databases, descriptors and models. Drug Discov Today 2022; 27:1350-1366. [DOI: 10.1016/j.drudis.2022.02.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/19/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022]
|
18
|
Vanella R, Kovacevic G, Doffini V, Fernández de Santaella J, Nash MA. High-throughput screening, next generation sequencing and machine learning: advanced methods in enzyme engineering. Chem Commun (Camb) 2022; 58:2455-2467. [PMID: 35107442 PMCID: PMC8851469 DOI: 10.1039/d1cc04635g] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Enzyme engineering is an important biotechnological process capable of generating tailored biocatalysts for applications in industrial chemical conversion and biopharma. Typical enhancements sought in enzyme engineering and in vitro evolution campaigns include improved folding stability, catalytic activity, and/or substrate specificity. Despite significant progress in recent years in the areas of high-throughput screening and DNA sequencing, our ability to explore the vast space of functional enzyme sequences remains severely limited. Here, we review the currently available suite of modern methods for enzyme engineering, with a focus on novel readout systems based on enzyme cascades, and new approaches to reaction compartmentalization including single-cell hydrogel encapsulation techniques to achieve a genotype–phenotype link. We further summarize systematic scanning mutagenesis approaches and their merger with deep mutational scanning and massively parallel next-generation DNA sequencing technologies to generate mutability landscapes. Finally, we discuss the implementation of machine learning models for computational prediction of enzyme phenotypic fitness from sequence. This broad overview of current state-of-the-art approaches for enzyme engineering and evolution will aid newcomers and experienced researchers alike in identifying the important challenges that should be addressed to move the field forward. Enzyme engineering is an important biotechnological process capable of generating tailored biocatalysts for applications in industrial chemical conversion and biopharma.![]()
Collapse
Affiliation(s)
- Rosario Vanella
- Department of Chemistry, University of Basel, 4058 Basel, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland.
| | - Gordana Kovacevic
- Department of Chemistry, University of Basel, 4058 Basel, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland.
| | - Vanni Doffini
- Department of Chemistry, University of Basel, 4058 Basel, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland.
| | - Jaime Fernández de Santaella
- Department of Chemistry, University of Basel, 4058 Basel, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland.
| | - Michael A Nash
- Department of Chemistry, University of Basel, 4058 Basel, Switzerland.,Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland.
| |
Collapse
|
19
|
Li W, Sun T, Li M, He Y, Li L, Wang L, Wang H, Li J, Wen H, Liu Y, Chen Y, Fan Y, Xin B, Zhang J. GNIFdb: a neoantigen intrinsic feature database for glioma. Database (Oxford) 2022; 2022:6527499. [PMID: 35150127 PMCID: PMC9216533 DOI: 10.1093/database/baac004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 01/06/2022] [Accepted: 01/29/2022] [Indexed: 12/24/2022]
Abstract
ABSTRACT Neoantigens are mutation-containing immunogenic peptides from tumor cells. Neoantigen intrinsic features are neoantigens' sequence-associated features characterized by different amino acid descriptors and physical-chemical properties, which have a crucial function in prioritization of neoantigens with immunogenic potentials and predicting patients with better survival. Different intrinsic features might have functions to varying degrees in evaluating neoantigens' potentials of immunogenicity. Identification and comparison of intrinsic features among neoantigens are particularly important for developing neoantigen-based personalized immunotherapy. However, there is still no public repository to host the intrinsic features of neoantigens. Therefore, we developed GNIFdb, a glioma neoantigen intrinsic feature database specifically designed for hosting, exploring and visualizing neoantigen and intrinsic features. The database provides a comprehensive repository of computationally predicted Human leukocyte antigen class I (HLA-I) restricted neoantigens and their intrinsic features; a systematic annotation of neoantigens including sequence, neoantigen-associated mutation, gene expression, glioma prognosis, HLA-I subtype and binding affinity between neoantigens and HLA-I; and a genome browser to visualize them in an interactive manner. It represents a valuable resource for the neoantigen research community and is publicly available at http://www.oncoimmunobank.cn/index.php. DATABASE URL http://www.oncoimmunobank.cn/index.php.
Collapse
Affiliation(s)
- Wendong Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Ting Sun
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Muyang Li
- Department of Plant Genetics and Breeding, State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, China Agricultural University, No.17 Qinghua East Road, Haidian District, Beijing 100193, P. R. China
| | - Yufei He
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Lin Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Lu Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Haoyu Wang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Jing Li
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Hao Wen
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yong Liu
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yifan Chen
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Yubo Fan
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| | - Beibei Xin
- Department of Plant Genetics and Breeding, State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, China Agricultural University, No.17 Qinghua East Road, Haidian District, Beijing 100193, P. R. China
| | - Jing Zhang
- Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing Advanced Innovation Centre for Biomedical Engineering, School of Engineering Medicine, School of Biological Science and Medical Engineering, Beihang University, No.37 Xueyuan Road, Haidian District, Beijing 100083, P. R. China
| |
Collapse
|
20
|
Büchler J, Malca SH, Patsch D, Voss M, Turner NJ, Bornscheuer UT, Allemann O, Le Chapelain C, Lumbroso A, Loiseleur O, Buller R. Algorithm-aided engineering of aliphatic halogenase WelO5* for the asymmetric late-stage functionalization of soraphens. Nat Commun 2022; 13:371. [PMID: 35042883 PMCID: PMC8766452 DOI: 10.1038/s41467-022-27999-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 12/17/2021] [Indexed: 02/08/2023] Open
Abstract
Late-stage functionalization of natural products offers an elegant route to create novel entities in a relevant biological target space. In this context, enzymes capable of halogenating sp3 carbons with high stereo- and regiocontrol under benign conditions have attracted particular attention. Enabled by a combination of smart library design and machine learning, we engineer the iron/α-ketoglutarate dependent halogenase WelO5* for the late-stage functionalization of the complex and chemically difficult to derivatize macrolides soraphen A and C, potent anti-fungal agents. While the wild type enzyme WelO5* does not accept the macrolide substrates, our engineering strategy leads to active halogenase variants and improves upon their apparent kcat and total turnover number by more than 90-fold and 300-fold, respectively. Notably, our machine-learning guided engineering approach is capable of predicting more active variants and allows us to switch the regio-selectivity of the halogenases facilitating the targeted analysis of the derivatized macrolides’ structure-function activity in biological assays. The late-stage functionalization of unactivated carbon–hydrogen bonds is a difficult but important task, which has been met with promising but limited success through synthetic organic chemistry. Here the authors use machine learning to engineer WelO5* halogenase variants, which led to regioselective chlorination of inert C–H bonds on a representative polyketide that is a non-natural substrate for the enzyme.
Collapse
Affiliation(s)
- Johannes Büchler
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.,School of Chemistry, The University of Manchester, Manchester Institute of Biotechnology, Manchester, M1 7DN, United Kingdom
| | - Sumire Honda Malca
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland
| | - David Patsch
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.,Institute of Biochemistry, Dept. of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487, Greifswald, Germany
| | - Moritz Voss
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland
| | - Nicholas J Turner
- School of Chemistry, The University of Manchester, Manchester Institute of Biotechnology, Manchester, M1 7DN, United Kingdom
| | - Uwe T Bornscheuer
- Institute of Biochemistry, Dept. of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487, Greifswald, Germany
| | - Oliver Allemann
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland.,Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, 4123, Allschwil, Switzerland
| | | | - Alexandre Lumbroso
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland
| | - Olivier Loiseleur
- Syngenta Crop Protection AG, Schaffhauserstrasse 101, 4332, Stein, Switzerland.
| | - Rebecca Buller
- Competence Center for Biocatalysis, Institute of Chemistry and Biotechnology, Zurich University of Applied Sciences, Einsiedlerstrasse 31, 8820, Wädenswil, Switzerland.
| |
Collapse
|
21
|
Sharma A, Kumar R, Varadwaj PK. OBPred: feature-fusion-based deep neural network classifier for odorant-binding protein prediction. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06347-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
22
|
Saito Y, Oikawa M, Sato T, Nakazawa H, Ito T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03753] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Takumi Sato
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
23
|
Tam C, Kumar A, Zhang KYJ. NbX: Machine Learning-Guided Re-Ranking of Nanobody-Antigen Binding Poses. Pharmaceuticals (Basel) 2021; 14:ph14100968. [PMID: 34681192 PMCID: PMC8537642 DOI: 10.3390/ph14100968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/17/2021] [Accepted: 09/21/2021] [Indexed: 12/02/2022] Open
Abstract
Modeling the binding pose of an antibody is a prerequisite to structure-based affinity maturation and design. Without knowing a reliable binding pose, the subsequent structural simulation is largely futile. In this study, we have developed a method of machine learning-guided re-ranking of antigen binding poses of nanobodies, the single-domain antibody which has drawn much interest recently in antibody drug development. We performed a large-scale self-docking experiment of nanobody–antigen complexes. By training a decision tree classifier through mapping a feature set consisting of energy, contact and interface property descriptors to a measure of their docking quality of the refined poses, significant improvement in the median ranking of native-like nanobody poses by was achieved eightfold compared with ClusPro and an established deep 3D CNN classifier of native protein–protein interaction. We further interpreted our model by identifying features that showed relatively important contributions to the prediction performance. This study demonstrated a useful method in improving our current ability in pose prediction of nanobodies.
Collapse
Affiliation(s)
- Chunlai Tam
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
| | - Kam Y. J. Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan; (C.T.); (A.K.)
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan
- Correspondence:
| |
Collapse
|
24
|
Machine Learning for the Cleaner Production of Antioxidant Peptides. Int J Pept Res Ther 2021. [DOI: 10.1007/s10989-021-10232-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
25
|
Bell DR, Chen SH. Toward Guided Mutagenesis: Gaussian Process Regression Predicts MHC Class II Antigen Mutant Binding. J Chem Inf Model 2021; 61:4857-4867. [PMID: 34375111 DOI: 10.1021/acs.jcim.1c00458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Antigen-specific immunotherapies (ASI) require successful loading and presentation of antigen peptides into the major histocompatibility complex (MHC) binding cleft. One route of ASI design is to mutate native antigens for either stronger or weaker binding interaction to MHC. Exploring all possible mutations is costly both experimentally and computationally. To reduce experimental and computational expense, here we investigate the minimal amount of prior data required to accurately predict the relative binding affinity of point mutations for peptide-MHC class II (pMHCII) binding. Using data from different residue subsets, we interpolate pMHCII mutant binding affinities by Gaussian process (GP) regression of residue volume and hydrophobicity. We apply GP regression to an experimental data set from the Immune Epitope Database, and theoretical data sets from NetMHCIIpan and Free Energy Perturbation calculations. We find that GP regression can predict binding affinities of nine neutral residues from a six-residue subset with an average R2 coefficient of determination value of 0.62 ± 0.04 (±95% CI), average error of 0.09 ± 0.01 kcal/mol (±95% CI), and with an receiver operating characteristic (ROC) AUC value of 0.92 for binary classification of enhanced or diminished binding affinity. Similarly, metrics increase to an R2 value of 0.69 ± 0.04, average error of 0.07 ± 0.01 kcal/mol, and an ROC AUC value of 0.94 for predicting seven neutral residues from an eight-residue subset. Our work finds that prediction is most accurate for neutral residues at anchor residue sites without register shift. This work holds relevance to predicting pMHCII binding and accelerating ASI design.
Collapse
Affiliation(s)
- David R Bell
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland 21701, United States
| | - Serena H Chen
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States
| |
Collapse
|
26
|
Yamaguchi H, Saito Y. Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins. Brief Bioinform 2021; 22:6309928. [PMID: 34180966 DOI: 10.1093/bib/bbab234] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 05/28/2021] [Accepted: 05/30/2021] [Indexed: 12/14/2022] Open
Abstract
Accurate variant effect prediction has broad impacts on protein engineering. Recent machine learning approaches toward this end are based on representation learning, by which feature vectors are learned and generated from unlabeled sequences. However, it is unclear how to effectively learn evolutionary properties of an engineering target protein from homologous sequences, taking into account the protein's sequence-level structure called domain architecture (DA). Additionally, no optimal protocols are established for incorporating such properties into Transformer, the neural network well-known to perform the best in natural language processing research. This article proposes DA-aware evolutionary fine-tuning, or 'evotuning', protocols for Transformer-based variant effect prediction, considering various combinations of homology search, fine-tuning and sequence vectorization strategies. We exhaustively evaluated our protocols on diverse proteins with different functions and DAs. The results indicated that our protocols achieved significantly better performances than previous DA-unaware ones. The visualizations of attention maps suggested that the structural information was incorporated by evotuning without direct supervision, possibly leading to better prediction accuracy.
Collapse
Affiliation(s)
- Hideki Yamaguchi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan
| | - Yutaka Saito
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan.,AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Shinjuku-ku, Tokyo 169-8555, Japan
| |
Collapse
|
27
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
28
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
29
|
Karlberg M, de Souza JV, Fan L, Kizhedath A, Bronowska AK, Glassey J. QSAR Implementation for HIC Retention Time Prediction of mAbs Using Fab Structure: A Comparison between Structural Representations. Int J Mol Sci 2020; 21:ijms21218037. [PMID: 33126648 PMCID: PMC7663183 DOI: 10.3390/ijms21218037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/22/2020] [Accepted: 10/27/2020] [Indexed: 12/19/2022] Open
Abstract
Monoclonal antibodies (mAbs) constitute a rapidly growing biopharmaceutical sector. However, their growth is impeded by high failure rates originating from failed clinical trials and developability issues in process development. There is, therefore, a growing need for better in silico tools to aid in risk assessment of mAb candidates to promote early-stage screening of potentially problematic mAb candidates. In this study, a quantitative structure–activity relationship (QSAR) modelling workflow was designed for the prediction of hydrophobic interaction chromatography (HIC) retention times of mAbs. Three novel descriptor sets derived from primary sequence, homology modelling, and atomistic molecular dynamics (MD) simulations were developed and assessed to determine the necessary level of structural resolution needed to accurately capture the relationship between mAb structures and HIC retention times. The results showed that descriptors derived from 3D structures obtained after MD simulations were the most suitable for HIC retention time prediction with a R2 = 0.63 in an external test set. It was found that when using homology modelling, the resulting 3D structures became biased towards the used structural template. Performing an MD simulation therefore proved to be a necessary post-processing step for the mAb structures in order to relax the structures and allow them to attain a more natural conformation. Based on the results, the proposed workflow in this paper could therefore potentially contribute to aid in risk assessment of mAb candidates in early development.
Collapse
Affiliation(s)
- Micael Karlberg
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
| | - João Victor de Souza
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Lanyu Fan
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Arathi Kizhedath
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
| | - Agnieszka K. Bronowska
- Chemistry—School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (J.V.d.S.); (A.K.B.)
| | - Jarka Glassey
- School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK; (M.K.); (L.F.); (A.K.)
- Correspondence:
| |
Collapse
|
30
|
|
31
|
|
32
|
Zhang W, Liu J, Shan H, Yin F, Zhong B, Zhang C, Yu X. Machine learning-guided evolution of BMP-2 knuckle Epitope-Derived osteogenic peptides to target BMP receptor II. J Drug Target 2020; 28:802-810. [PMID: 32354236 DOI: 10.1080/1061186x.2020.1757100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Wei Zhang
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Jiazhi Liu
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Haojie Shan
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Fuli Yin
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Biao Zhong
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Chi Zhang
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Xiaowei Yu
- Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| |
Collapse
|
33
|
Mahmoodi-Reihani M, Abbasitabar F, Zare-Shahabadi V. In Silico Rational Design and Virtual Screening of Bioactive Peptides Based on QSAR Modeling. ACS OMEGA 2020; 5:5951-5958. [PMID: 32226875 PMCID: PMC7097998 DOI: 10.1021/acsomega.9b04302] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 02/27/2020] [Indexed: 05/15/2023]
Abstract
Predicting the bioactivity of peptides is an important challenge in drug development and peptide research. In this study, numerical descriptive vectors (NDVs) for peptide sequences were calculated based on the physicochemical properties of amino acids (AAs) and principal component analysis (PCA). The resulted NDV had the same length as the peptide sequence, so that each entry of NDV corresponded to one AA in the sequence. They were then applied to quantitative structure-activity relationship (QSAR) analysis of angiotensin-converting enzyme (ACE) inhibitor dipeptides, bitter-tasting dipeptides, and nonameric binding peptides of the human leukocyte antigens (HLA-A*0201). Multiple linear regression was used to construct the QSAR models. For each peptide set, a proper subset of physicochemical properties was chosen by the ant colony optimization algorithm. The leave-one-out cross-validation (q loo 2) values were 0.855, 0.936, and 0.642 and the root-mean-square errors (RMSEs) were 0.450, 0.149, and 0.461. Our results revealed that the new numerical descriptive vector can afford extensive characterization of peptide sequence so that it can be easily employed in peptide QSAR studies. Moreover, the proposed numerical descriptive vectors were able to determine hot spot residues in the peptides under study.
Collapse
Affiliation(s)
| | - Fatemeh Abbasitabar
- Department
of Chemistry, Marvdasht Branch, Islamic
Azad University, Marvdasht, Iran
| | - Vahid Zare-Shahabadi
- Department
of Chemistry, Mahshahr Branch, Islamic Azad
University, Mahshahr Iran
| |
Collapse
|
34
|
Ge C, Zhang W, He R, Cai H. Systematic Identification and Comparative Analysis of Human Cartilage-Derived Self-peptides Presented Differently by Ankylosing Spondylitis (AS)-Associated HLA-B*27:05 and Non-AS-associated HLA-B*27:09. Int J Pept Res Ther 2020. [DOI: 10.1007/s10989-019-09857-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
35
|
Nagaoka K, Mei H, Guo Y, Han J, Konno H, Moriwaki H, Soloshonok VA. Michael addition reactions of chiral glycine Schiff base Ni (II)‐complex with 1‐(1‐phenylsulfonyl)benzene. Chirality 2020; 32:885-893. [DOI: 10.1002/chir.23203] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 02/11/2020] [Indexed: 12/11/2022]
Affiliation(s)
- Keita Nagaoka
- School of Chemistry and Chemical Engineering, State of Key Laboratory of CoordinationNanjing University Nanjing China
- Department of Biological Engineering, Graduate School of Science and EngineeringYamagata University Yamagata Japan
| | - Haibo Mei
- School of Chemistry and Chemical Engineering, State of Key Laboratory of CoordinationNanjing University Nanjing China
| | - Yunjie Guo
- School of Chemistry and Chemical Engineering, State of Key Laboratory of CoordinationNanjing University Nanjing China
| | - Jianlin Han
- School of Chemistry and Chemical Engineering, State of Key Laboratory of CoordinationNanjing University Nanjing China
| | - Hiroyuki Konno
- Department of Biological Engineering, Graduate School of Science and EngineeringYamagata University Yamagata Japan
| | | | - Vadim A. Soloshonok
- Department of Organic Chemistry I, Faculty of ChemistryUniversity of the Basque Country UPV/EHU San Sebastián Spain
- IKERBASQUE, Basque Foundation for Science Bilbao Spain
| |
Collapse
|
36
|
Kritikos N, Tsantili-Kakoulidou A, Loukas YL, Dotsikas Y. Novel Molecular Descriptors for the Liquid- and the Gas-Chromatography Analysis of Amino Acids Analogues Derivatized with n-Propyl Chloroformate. Chromatographia 2019. [DOI: 10.1007/s10337-019-03767-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
37
|
Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 2019; 20:1878-1912. [PMID: 30084866 PMCID: PMC6917215 DOI: 10.1093/bib/bby061] [Citation(s) in RCA: 221] [Impact Index Per Article: 44.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 05/25/2018] [Indexed: 01/16/2023] Open
Abstract
The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.
Collapse
Affiliation(s)
- Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey
| | - Heval Atas
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Maria Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| | - Rengul Cetin-Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey
| | - Tunca Doğan
- Cancer System Biology Laboratory (CanSyL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey and European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Cambridge, Hinxton, UK
| |
Collapse
|
38
|
Xu B, Chung HY. Quantitative Structure-Activity Relationship Study of Bitter Di-, Tri- and Tetrapeptides Using Integrated Descriptors. Molecules 2019; 24:molecules24152846. [PMID: 31387305 PMCID: PMC6696392 DOI: 10.3390/molecules24152846] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 07/23/2019] [Accepted: 08/05/2019] [Indexed: 11/16/2022] Open
Abstract
New quantitative structure–activity relationship (QSAR) models for bitter peptides were built with integrated amino acid descriptors. Datasets contained 48 dipeptides, 52 tripeptides and 23 tetrapeptides with their reported bitter taste thresholds. Independent variables consisted of 14 amino acid descriptor sets. A bootstrapping soft shrinkage approach was utilized for variable selection. The importance of a variable was evaluated by both variable selecting frequency and standardized regression coefficient. Results indicated model qualities for di-, tri- and tetrapeptides with R2 and Q2 at 0.950 ± 0.002, 0.941 ± 0.001; 0.770 ± 0.006, 0.742 ± 0.004; and 0.972 ± 0.002, 0.956 ± 0.002, respectively. The hydrophobic C-terminal amino acid was the key determinant for bitterness in dipeptides, followed by the contribution of bulky hydrophobic N-terminal amino acids. For tripeptides, hydrophobicity of C-terminal amino acids and the electronic properties of the amino acids at the second position were important. For tetrapeptides, bulky hydrophobic amino acids at N-terminus, hydrophobicity and partial specific volume of amino acids at the second position, and the electronic properties of amino acids of the remaining two positions were critical. In summary, this study not only constructs reliable models for predicting the bitterness in different groups of peptides, but also facilitates better understanding of their structure-bitterness relationships and provides insights for their future studies.
Collapse
Affiliation(s)
- Biyang Xu
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Hau Yin Chung
- Food and Nutritional Sciences Programme, School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
| |
Collapse
|
39
|
Kizhedath A, Karlberg M, Glassey J. Cross-Interaction Chromatography-Based QSAR Model for Early-Stage Screening to Facilitate Enhanced Developability of Monoclonal Antibody Therapeutics. Biotechnol J 2019; 14:e1800696. [PMID: 30810283 DOI: 10.1002/biot.201800696] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 01/19/2019] [Indexed: 01/13/2023]
Abstract
Monoclonal antibodies (mAbs) constitute a rapidly growing biopharmaceutical sector. However, their growth is impeded by developability issues such as polyspecificity and lack of solubility, which leads to attrition as well as manufacturing failures. In this study a multitool hybrid quantitative structure-activity relationship (QSAR) model development framework is described. This framework uses four novel datasets derived from the primary sequences of IgG1-κ-humanized mAbs with varying degrees of resolutions. Unsupervised pattern recognition is first performed on the descriptor sets to visualize any intrinsic property-based clustering, followed by regression of descriptors against cross-interaction chromatography (CIC) retention times. Model optimization is performed via unsupervised variable reduction followed by supervised variable selection. Finally, the models and datasets are benchmarked based on the regression model performance metrics such as R2 , Q2 , and RMSE. The results show that datasets containing localized descriptors rather than averaged value over the entire protein have better predictive performance of CIC retention behavior with R2 > 0.8 and RMSE < 0.3. Furthermore, the results indicate the physicochemical, electronic, and topological properties of hypervariable regions of antibodies that contribute most to the CIC retention times. The results of these studies could contribute to early-stage screening and better design of mAbs.
Collapse
Affiliation(s)
- Arathi Kizhedath
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| | - Micael Karlberg
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| | - Jarka Glassey
- School of Engineering, Newcastle University, Newcastle upon Tyne, NE17RU, UK
| |
Collapse
|
40
|
Deng B, Long H, Tang T, Ni X, Chen J, Yang G, Zhang F, Cao R, Cao D, Zeng M, Yi L. Quantitative Structure-Activity Relationship Study of Antioxidant Tripeptides Based on Model Population Analysis. Int J Mol Sci 2019; 20:ijms20040995. [PMID: 30823542 PMCID: PMC6413046 DOI: 10.3390/ijms20040995] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 02/13/2019] [Accepted: 02/18/2019] [Indexed: 11/16/2022] Open
Abstract
Due to their beneficial effects on human health, antioxidant peptides have attracted much attention from researchers. However, the structure-activity relationships of antioxidant peptides have not been fully understood. In this paper, quantitative structure-activity relationships (QSAR) models were built on two datasets, i.e., the ferric thiocyanate (FTC) dataset and ferric-reducing antioxidant power (FRAP) dataset, containing 214 and 172 unique antioxidant tripeptides, respectively. Sixteen amino acid descriptors were used and model population analysis (MPA) was then applied to improve the QSAR models for better prediction performance. The results showed that, by applying MPA, the cross-validated coefficient of determination (Q²) was increased from 0.6170 to 0.7471 for the FTC dataset and from 0.4878 to 0.6088 for the FRAP dataset, respectively. These findings indicate that the integration of different amino acid descriptors provide additional information for model building and MPA can efficiently extract the information for better prediction performance.
Collapse
Affiliation(s)
- Baichuan Deng
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Hongrong Long
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Tianyue Tang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Xiaojun Ni
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Jialuo Chen
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Guangming Yang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Fan Zhang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Ruihua Cao
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410083, China.
| | - Maomao Zeng
- State Key Laboratory of Food Science and Technology, International Joint Laboratory on Food Safety, Jiangnan University, Wuxi 214122, China.
| | - Lunzhao Yi
- Yunnan Food Safety Research Institute, Kunming University of Science and Technology, Kunming 650500, China.
| |
Collapse
|
41
|
Saito Y, Oikawa M, Nakazawa H, Niide T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Mutagenesis for Directed Evolution of Fluorescent Proteins. ACS Synth Biol 2018; 7:2014-2022. [PMID: 30103599 DOI: 10.1021/acssynbio.8b00155] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Molecular evolution based on mutagenesis is widely used in protein engineering. However, optimal proteins are often difficult to obtain due to a large sequence space. Here, we propose a novel approach that combines molecular evolution with machine learning. In this approach, we conduct two rounds of mutagenesis where an initial library of protein variants is used to train a machine-learning model to guide mutagenesis for the second-round library. This enables us to prepare a small library suited for screening experiments with high enrichment of functional proteins. We demonstrated a proof-of-concept of our approach by altering the reference green fluorescent protein (GFP) so that its fluorescence is changed into yellow. We successfully obtained a number of proteins showing yellow fluorescence, 12 of which had longer wavelengths than the reference yellow fluorescent protein (YFP). These results show the potential of our approach as a powerful method for directed evolution of fluorescent proteins.
Collapse
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Teppei Niide
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
42
|
Karlberg M, von Stosch M, Glassey J. Exploiting mAb structure characteristics for a directed QbD implementation in early process development. Crit Rev Biotechnol 2018. [DOI: 10.1080/07388551.2017.1421899] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Micael Karlberg
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Moritz von Stosch
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| | - Jarka Glassey
- School of Chemical Engineering and Advanced Materials, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
43
|
Kapusta K, Sizochenko N, Karabulut S, Okovytyy S, Voronkov E, Leszczynski J. QSPR modeling of optical rotation of amino acids using specific quantum chemical descriptors. J Mol Model 2018; 24:59. [DOI: 10.1007/s00894-018-3593-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 01/24/2018] [Indexed: 11/28/2022]
|
44
|
Barley MH, Turner NJ, Goodacre R. Improved Descriptors for the Quantitative Structure-Activity Relationship Modeling of Peptides and Proteins. J Chem Inf Model 2018; 58:234-243. [PMID: 29338232 DOI: 10.1021/acs.jcim.7b00488] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The ability to model the activity of a protein using quantitative structure-activity relationships (QSAR) requires descriptors for the 20 naturally coded amino acids. In this work we show that by modifying some established descriptors we were able to model the activity data of 140 mutants of the enzyme epoxide hydrolase with improved accuracy. These new descriptors (referred to as physical descriptors) also gave very good results when tested against a series of four dipeptide data sets. The physical descriptors encode the amino acids using only two orthogonal scales: the first is strongly linked to hydrophilicity/hydrophobicity, and the second, to the volume of the amino acid residue. The use of these new amino acid descriptors should result in simpler and more readily interpretable models for the enzyme activity (and potentially other functions of interest, e.g., secondary and tertiary structure) of peptides and proteins.
Collapse
Affiliation(s)
- Mark H Barley
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Nicholas J Turner
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| | - Royston Goodacre
- School of Chemistry, Manchester Institute of Biotechnology, University of Manchester , 131 Princess Street, Manchester, M1 7DN, U.K
| |
Collapse
|
45
|
QSAR Study of Angiotensin I-Converting Enzyme Inhibitory Peptides Using SVHEHS Descriptor and OSC-SVM. Int J Pept Res Ther 2018. [DOI: 10.1007/s10989-017-9661-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
46
|
Deng B, Ni X, Zhai Z, Tang T, Tan C, Yan Y, Deng J, Yin Y. New Quantitative Structure-Activity Relationship Model for Angiotensin-Converting Enzyme Inhibitory Dipeptides Based on Integrated Descriptors. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2017; 65:9774-9781. [PMID: 28984136 DOI: 10.1021/acs.jafc.7b03367] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Angiotensin-converting enzyme (ACE) inhibitory peptides derived from food proteins have been widely reported for hypertension treatment. In this paper, a benchmark data set containing 141 unique ACE inhibitory dipeptides was constructed through database mining, and a quantitative structure-activity relationships (QSAR) study was carried out to predict half-inhibitory concentration (IC50) of ACE activity. Sixteen descriptors were tested and the model generated by G-scale descriptor showed the best predictive performance with the coefficient of determination (R2) and cross-validated R2 (Q2) of 0.6692 and 0.6220, respectively. For most other descriptors, R2 were ranging from 0.52 to 0.68 and Q2 were ranging from 0.48 to 0.61. A complex model combining all 16 descriptors was carried out and variable selection was performed in order to further improve the prediction performance. The quality of model using integrated descriptors (R2 0.7340 ± 0.0038, Q2 0.7151 ± 0.0019) was better than that of G-scale. An in-depth study of variable importance showed that the most correlated properties to ACE inhibitory activity were hydrophobicity, steric, and electronic properties and C-terminal amino acids contribute more than N-terminal amino acids. Five novel predicted ACE-inhibitory peptides were synthesized, and their IC50 values were validated through in vitro experiments. The results indicated that the constructed model could give a reliable prediction of ACE-inhibitory activity of peptides, and it may be useful in the design of novel ACE-inhibitory peptides.
Collapse
Affiliation(s)
- Baichuan Deng
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Xiaojun Ni
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Zhenya Zhai
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Tianyue Tang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Chengquan Tan
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Yijing Yan
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Jinping Deng
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
| | - Yulong Yin
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, Subtropical Institute of Animal Nutrition and Feed, College of Animal Science, South China Agricultural University , Guangzhou 510642, Guangdong, P.R. China
- National Engineering Laboratory for Pollution Control and Waste Utilization in Livestock and Poultry Production, Key Laboratory of Agro-Ecological Processes in Subtropical Region, Institute of Subtropical Agriculture, Chinese Academy of Sciences , Changsha 410125, Hunan, P.R. China
| |
Collapse
|
47
|
He Y, He X. Molecular design and genetic optimization of antimicrobial peptides containing unnatural amino acids against antibiotic-resistant bacterial infections. Biopolymers 2017; 106:746-56. [PMID: 27258330 DOI: 10.1002/bip.22885] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 04/30/2016] [Accepted: 05/31/2016] [Indexed: 01/25/2023]
Abstract
Antimicrobial peptides (AMPs) have been the focus of intense research towards the finding of a viable alternative to current small-molecule antibiotics, owing to their commonly observed and naturally occurring resistance against pathogens. However, natural peptides have many problems such as low bioavailability and high allergenicity that largely limit the clinical applications of AMPs. In the present study, an integrative protocol that combined chemoinformatics modeling, molecular dynamics simulations, and in vitro susceptibility test was described to design AMPs containing unnatural amino acids (AMP-UAAs). To fulfill this, a large panel of synthetic AMPs with determined activity was collected and used to perform quantitative structure-activity relationship (QSAR) modeling. The obtained QSAR predictors were then employed to direct genetic algorithm (GA)-based optimization of AMP-UAA population, to which a number of commercially available, structurally diverse unnatural amino acids were introduced during the optimization process. Subsequently, several designed AMP-UAAs were confirmed to have high antibacterial potency against two antibiotic-resistant strains, i.e. multidrug-resistant Pseudomonas aeruginosa (MDRPA) and methicillin-resistant Staphylococcus aureus (MRSA), with minimum inhibitory concentration (MIC) < 10 μg/ml. Structural dynamics characterizations revealed that the most potent AMP-UAA peptide is an amphipathic helix that can spontaneously embed into an artificial lipid bilayer and exhibits a strong destructuring tendency associated with the embedding process. © 2016 Wiley Periodicals, Inc. Biopolymers (Pept Sci) 106: 746-756, 2016.
Collapse
Affiliation(s)
- Yongkang He
- Department of Infectious Diseases, Taixing People's Hospital, Yangzhou University, Taixing, 225400, China.
| | - Xiaofeng He
- Department of Infectious Diseases, Taixing People's Hospital, Yangzhou University, Taixing, 225400, China
| |
Collapse
|
48
|
Huang RZ, Zhang B, Huang XC, Liang GB, Qin JM, Pan YM, Liao ZX, Wang HS. Synthesis and biological evaluation of terminal functionalized thiourea-containing dipeptides as antitumor agents. RSC Adv 2017. [DOI: 10.1039/c6ra25590f] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Terminal functionalized dipeptide derivatives containing the thiourea moiety were synthesized and evaluated for antitumor activity. Representative compoundI-11induced apoptosis by the ROS-dependent endoplasmic reticulum pathway in NCI-H460 cells.
Collapse
Affiliation(s)
- Ri-Zhen Huang
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
- Pharmaceutical Research Center and School of Chemistry and Chemical Engineering
| | - Bin Zhang
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
| | - Xiao-Chao Huang
- Pharmaceutical Research Center and School of Chemistry and Chemical Engineering
- Southeast University
- Nanjing 211189
- China
| | - Gui-Bin Liang
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
| | - Jian-Mei Qin
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
| | - Ying-Ming Pan
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
| | - Zhi-Xin Liao
- Pharmaceutical Research Center and School of Chemistry and Chemical Engineering
- Southeast University
- Nanjing 211189
- China
| | - Heng-Shan Wang
- State Key Laboratory for the Chemistry and Molecular Engineering of Medicinal Resources (Ministry of Education of China)
- School of Chemistry and Pharmaceutical Sciences of Guangxi Normal University
- Guilin 541004
- PR China
| |
Collapse
|
49
|
Comprehensive comparison of twenty structural characterization scales applied as QSAM of antimicrobial dodecapeptides derived from Bac2A against P. aeruginosa. J Mol Graph Model 2017; 71:88-95. [DOI: 10.1016/j.jmgm.2016.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Revised: 11/02/2016] [Accepted: 11/06/2016] [Indexed: 02/04/2023]
|
50
|
Wang Y, Yang YJ, Chen YN, Zhao HY, Zhang S. Computer-aided design, structural dynamics analysis, and in vitro susceptibility test of antibacterial peptides incorporating unnatural amino acids against microbial infections. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 134:215-223. [PMID: 27480745 DOI: 10.1016/j.cmpb.2016.06.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 05/16/2016] [Accepted: 06/30/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND OBJECTIVE Antibacterial peptides (ABPs) are essential components of host defense against microbial infections present in all domains of life. The AMPs incorporating unnatural amino acids (uABPs) exhibit several advantages over naturally occurring AMPs based on factors such as bioavailability, metabolic stability and overall toxicity. METHODS Computer-aided modeling and in vitro susceptibility test were combined to rationally design short uABPs with potent antimicrobial activity. In the procedure, peptide characterization and machine learning modeling were used to develop statistical regression predictors, which were then employed to guide the molecular design and structural optimization of uABPs, to which a number of commercially available unnatural amino acids were introduced. RESULTS An improved uABP population was obtained, from which several promising candidates were successfully prepared and their antibacterial potencies against three bacterial strains Staphylococcus aureus, Pseudomonas aeruginosa and Escherichia coli were measured using broth microdilution assay. Consequently, four uABPs with hybrid structure property were determined to have high potency against the tested strains with minimum inhibitory concentration (MIC) of <50 µg/ml. CONCLUSIONS Molecular dynamics (MD) simulations revealed that the designed uABPs are amphipathic helix in solution but they would largely unfold when spontaneously embedding into an artificial lipid bilayer that mimics microbial membrane.
Collapse
Affiliation(s)
- Yan Wang
- Department of Hematology, The First People's Hospital of Jining, Jining 272011, China
| | - Yong-Jian Yang
- Department of Anesthesiology, The Central Hospital of Jinan, Shandong University, Jinan 250013, China
| | - Ya-Na Chen
- Department of Obstetrics, The Central Hospital of Jinan, Shandong University, Jinan 250013, China
| | - Hong-Yu Zhao
- Department of Hematology, The Central Hospital of Jinan, Shandong University, Jinan 250013, China
| | - Shuai Zhang
- Department of Orthopedics, Qilu Hospital, Shandong University, Jinan 250011, China.
| |
Collapse
|