1
|
Guerin N, Childs H, Zhou P, Donald BR. DexDesign: A new OSPREY-based algorithm for designing de novo D-peptide inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.12.579944. [PMID: 38405797 PMCID: PMC10888900 DOI: 10.1101/2024.02.12.579944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
With over 270 unique occurrences in the human genome, peptide-recognizing PDZ domains play a central role in modulating polarization, signaling, and trafficking pathways. Mutations in PDZ domains lead to diseases such as cancer and cystic fibrosis, making PDZ domains attractive targets for therapeutic intervention. D-peptide inhibitors offer unique advantages as therapeutics, including increased metabolic stability and low immunogenicity. Here, we introduce DexDesign, a novel OSPREY-based algorithm for computationally designing de novo D-peptide inhibitors. DexDesign leverages three novel techniques that are broadly applicable to computational protein design: the Minimum Flexible Set, K*-based Mutational Scan, and Inverse Alanine Scan, which enable exponential reductions in the size of the peptide sequence search space. We apply these techniques and DexDesign to generate novel D-peptide inhibitors of two biomedically important PDZ domain targets: CAL and MAST2. We introduce a new framework for analyzing de novo peptides-evaluation along a replication/restitution axis-and apply it to the DexDesign-generated D-peptides. Notably, the peptides we generated are predicted to bind their targets tighter than their targets' endogenous ligands, validating the peptides' potential as lead therapeutic candidates. We provide an implementation of DexDesign in the free and open source computational protein design software OSPREY.
Collapse
|
2
|
Guerin N, Childs H, Zhou P, Donald BR. DexDesign: an OSPREY-based algorithm for designing de novo D-peptide inhibitors. Protein Eng Des Sel 2024; 37:gzae007. [PMID: 38757573 PMCID: PMC11099876 DOI: 10.1093/protein/gzae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 04/17/2024] [Indexed: 05/18/2024] Open
Abstract
With over 270 unique occurrences in the human genome, peptide-recognizing PDZ domains play a central role in modulating polarization, signaling, and trafficking pathways. Mutations in PDZ domains lead to diseases such as cancer and cystic fibrosis, making PDZ domains attractive targets for therapeutic intervention. D-peptide inhibitors offer unique advantages as therapeutics, including increased metabolic stability and low immunogenicity. Here, we introduce DexDesign, a novel OSPREY-based algorithm for computationally designing de novo D-peptide inhibitors. DexDesign leverages three novel techniques that are broadly applicable to computational protein design: the Minimum Flexible Set, K*-based Mutational Scan, and Inverse Alanine Scan. We apply these techniques and DexDesign to generate novel D-peptide inhibitors of two biomedically important PDZ domain targets: CAL and MAST2. We introduce a framework for analyzing de novo peptides-evaluation along a replication/restitution axis-and apply it to the DexDesign-generated D-peptides. Notably, the peptides we generated are predicted to bind their targets tighter than their targets' endogenous ligands, validating the peptides' potential as lead inhibitors. We also provide an implementation of DexDesign in the free and open source computational protein design software OSPREY.
Collapse
Affiliation(s)
- Nathan Guerin
- Department of Computer Science, Duke University, 308 Research Drive, Durham, NC 27708, United States
| | - Henry Childs
- Department of Chemistry, Duke University, 124 Science Drive, Durham, NC 27708, United States
| | - Pei Zhou
- Department of Biochemistry, Duke University School of Medicine, 307 Research Drive, Durham, NC 22710, United States
| | - Bruce R Donald
- Department of Computer Science, Duke University, 308 Research Drive, Durham, NC 27708, United States
- Department of Chemistry, Duke University, 124 Science Drive, Durham, NC 27708, United States
- Department of Biochemistry, Duke University School of Medicine, 307 Research Drive, Durham, NC 22710, United States
- Department of Mathematics, Duke University, 120 Science Drive, Durham, NC 27708, United States
| |
Collapse
|
3
|
A Transfer-Learning-Based Deep Convolutional Neural Network for Predicting Leukemia-Related Phosphorylation Sites from Protein Primary Sequences. Int J Mol Sci 2022; 23:ijms23031741. [PMID: 35163663 PMCID: PMC8915183 DOI: 10.3390/ijms23031741] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 01/27/2022] [Accepted: 01/29/2022] [Indexed: 12/27/2022] Open
Abstract
As one of the most important post-translational modifications (PTMs), phosphorylation refers to the binding of a phosphate group with amino acid residues like Ser (S), Thr (T) and Tyr (Y) thus resulting in diverse functions at the molecular level. Abnormal phosphorylation has been proved to be closely related with human diseases. To our knowledge, no research has been reported describing specific disease-associated phosphorylation sites prediction which is of great significance for comprehensive understanding of disease mechanism. In this work, focusing on three types of leukemia, we aim to develop a reliable leukemia-related phosphorylation site prediction models by combing deep convolutional neural network (CNN) with transfer-learning. CNN could automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of leukemia-related phosphorylation site prediction. With the largest dataset of myelogenous leukemia, the optimal models for S/T/Y phosphorylation sites give the AUC values of 0.8784, 0.8328 and 0.7716 respectively. When transferred learning on the small size datasets, the models for T-cell and lymphoid leukemia also give the promising performance by common sharing the optimal parameters. Compared with other five machine-learning methods, our CNN models reveal the superior performance. Finally, the leukemia-related pathogenesis analysis and distribution analysis on phosphorylated proteins along with K-means clustering analysis and position-specific conversation profiles on the phosphorylation site all indicate the strong practical feasibility of our easy-to-use CNN models.
Collapse
|
4
|
van Gastel J, Leysen H, Boddaert J, Vangenechten L, Luttrell LM, Martin B, Maudsley S. Aging-related modifications to G protein-coupled receptor signaling diversity. Pharmacol Ther 2020; 223:107793. [PMID: 33316288 DOI: 10.1016/j.pharmthera.2020.107793] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 11/26/2020] [Indexed: 02/06/2023]
Abstract
Aging is a highly complex molecular process, affecting nearly all tissue systems in humans and is the highest risk factor in developing neurodegenerative disorders such as Alzheimer's and Parkinson's disease, cardiovascular disease and Type 2 diabetes mellitus. The intense complexity of the aging process creates an incentive to develop more specific drugs that attenuate or even reverse some of the features of premature aging. As our current pharmacopeia is dominated by therapeutics that target members of the G protein-coupled receptor (GPCR) superfamily it may be prudent to search for effective anti-aging therapeutics in this fertile domain. Since the first demonstration of GPCR-based β-arrestin signaling, it has become clear that an enhanced appreciation of GPCR signaling diversity may facilitate the creation of therapeutics with selective signaling activities. Such 'biased' ligand signaling profiles can be effectively investigated using both standard molecular biological techniques as well as high-dimensionality data analyses. Through a more nuanced appreciation of the quantitative nature across the multiple dimensions of signaling bias that drugs possess, researchers may be able to further refine the efficacy of GPCR modulators to impact the complex aberrations that constitute the aging process. Identifying novel effector profiles could expand the effective pharmacopeia and assist in the design of precision medicines. This review discusses potential non-G protein effectors, and specifically their potential therapeutic suitability in aging and age-related disorders.
Collapse
Affiliation(s)
- Jaana van Gastel
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium; Faculty of Pharmacy, Biomedical and Veterinary Science, University of Antwerp, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium; Faculty of Pharmacy, Biomedical and Veterinary Science, University of Antwerp, Antwerp, Belgium
| | - Jan Boddaert
- Molecular Pathology Group, Faculty of Medicine and Health Sciences, Laboratory of Cell Biology and Histology, Antwerp, Belgium
| | - Laura Vangenechten
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Louis M Luttrell
- Division of Endocrinology, Diabetes & Medical Genetics, Medical University of South Carolina, USA
| | - Bronwen Martin
- Faculty of Pharmacy, Biomedical and Veterinary Science, University of Antwerp, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Lab, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium; Faculty of Pharmacy, Biomedical and Veterinary Science, University of Antwerp, Antwerp, Belgium.
| |
Collapse
|
5
|
Nakariyakul S. A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification. PLoS One 2019; 14:e0212333. [PMID: 30768654 PMCID: PMC6377117 DOI: 10.1371/journal.pone.0212333] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 01/31/2019] [Indexed: 12/31/2022] Open
Abstract
We address gene selection and machine learning methods for cancer classification using microarray gene expression data. Due to the high dimensionality of microarray data, traditional gene selection algorithms are filter-based, focusing on intrinsic properties of the data such as distance, dependency, and correlation. These methods are fast but select far too many genes to use for the classification task. In this work, we present a new hybrid filter-wrapper gene subset selection algorithm that is an improved modification of our prior algorithm. Our proposed method employs interaction information to rank candidate genes to add into a gene subset. It then conditionally adds one gene at a time into the current subset and verifies whether the resultant subset improves the classification performance significantly. Only significant genes are selected, and the candidate gene list is updated every time a gene is added to the subset. Thus, our gene selection algorithm is very dynamic. Experimental results on ten public cancer microarray data sets show that our method consistently outperforms prior gene selection algorithms in terms of classification accuracy, while requiring a small number of selected genes.
Collapse
Affiliation(s)
- Songyot Nakariyakul
- Department of Electrical and Computer Engineering, Thammasat University, Khlongluang, Pathumthani, Thailand
- * E-mail:
| |
Collapse
|
6
|
Tang J, Ning J, Liu X, Wu B, Hu R. A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides. Curr Comput Aided Drug Des 2019; 15:206-211. [PMID: 30251610 DOI: 10.2174/1573409914666180925100355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 07/04/2018] [Accepted: 09/05/2018] [Indexed: 11/22/2022]
Abstract
INTRODUCTION Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. MATERIALS AND METHODS In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. RESULTS The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. CONCLUSION All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.
Collapse
Affiliation(s)
- Jihui Tang
- School of Pharmacy, Anhui Medical University, 81 Meishan Road, Hefei 230032, China
| | - Jie Ning
- Department of Oncology, The First Affiliated Hospital, Anhui Medical University, Hefei 230022, China
| | - Xiaoyan Liu
- School of Pharmacy, Anhui Medical University, 81 Meishan Road, Hefei 230032, China
| | - Baoming Wu
- School of Pharmacy, Anhui Medical University, 81 Meishan Road, Hefei 230032, China
| | - Rongfeng Hu
- Key Laboratory of Xin'an Medicine, Ministry of Education, Anhui Province Key Laboratory of R&D of Chinese Medicine, Anhui University of Chinese Medicine, Anhui "115" Xin'an Medicine Research & Development Innovation Team, Hefei 230038, China
| |
Collapse
|
7
|
Sarkar D, Jana T, Saha S. LMDIPred: A web-server for prediction of linear peptide sequences binding to SH3, WW and PDZ domains. PLoS One 2018; 13:e0200430. [PMID: 30001346 PMCID: PMC6042728 DOI: 10.1371/journal.pone.0200430] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 06/26/2018] [Indexed: 12/29/2022] Open
Abstract
Protein-peptide interactions form an important subset of the total protein interaction network in the cell and play key roles in signaling and regulatory networks, and in major biological processes like cellular localization, protein degradation, and immune response. In this work, we have described the LMDIPred web server, an online resource for generalized prediction of linear peptide sequences that may bind to three most prevalent and well-studied peptide recognition modules (PRMs)—SH3, WW and PDZ. We have developed support vector machine (SVM)-based prediction models that achieved maximum Matthews Correlation Coefficient (MCC) of 0.85 with an accuracy of 94.55% for SH3, MCC of 0.90 with an accuracy of 95.82% for WW, and MCC of 0.83 with an accuracy of 92.29% for PDZ binding peptides. LMDIPred output combines predictions from these SVM models with predictions using Position-Specific Scoring Matrices (PSSMs) and string-matching methods using known domain-binding motif instances and regular expressions. All of these methods were evaluated using a five-fold cross-validation technique on both balanced and unbalanced datasets, and also validated on independent datasets. LMDIPred aims to provide a preliminary bioinformatics platform for sequence-based prediction of probable binding sites for SH3, WW or PDZ domains.
Collapse
Affiliation(s)
| | - Tanmoy Jana
- Bioinformatics Centre, Bose Institute, Kolkata, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, India
- * E-mail: ,
| |
Collapse
|
8
|
|
9
|
Daqrouq K, Alhmouz R, Balamesh A, Memic A. Application of wavelet transform for PDZ domain classification. PLoS One 2015; 10:e0122873. [PMID: 25860375 PMCID: PMC4393179 DOI: 10.1371/journal.pone.0122873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2014] [Accepted: 02/24/2015] [Indexed: 11/18/2022] Open
Abstract
PDZ domains have been identified as part of an array of signaling proteins that are often unrelated, except for the well-conserved structural PDZ domain they contain. These domains have been linked to many disease processes including common Avian influenza, as well as very rare conditions such as Fraser and Usher syndromes. Historically, based on the interactions and the nature of bonds they form, PDZ domains have most often been classified into one of three classes (class I, class II and others - class III), that is directly dependent on their binding partner. In this study, we report on three unique feature extraction approaches based on the bigram and trigram occurrence and existence rearrangements within the domain's primary amino acid sequences in assisting PDZ domain classification. Wavelet packet transform (WPT) and Shannon entropy denoted by wavelet entropy (WE) feature extraction methods were proposed. Using 115 unique human and mouse PDZ domains, the existence rearrangement approach yielded a high recognition rate (78.34%), which outperformed our occurrence rearrangements based method. The recognition rate was (81.41%) with validation technique. The method reported for PDZ domain classification from primary sequences proved to be an encouraging approach for obtaining consistent classification results. We anticipate that by increasing the database size, we can further improve feature extraction and correct classification.
Collapse
Affiliation(s)
- Khaled Daqrouq
- Electrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Rami Alhmouz
- Electrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Ahmed Balamesh
- Electrical and Computer Engineering Department, Faculty of Engineering, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Adnan Memic
- Center of Nanotechnology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
- * E-mail:
| |
Collapse
|
10
|
Zhu H, Liu Z, Huang Y, Zhang C, Li G, Liu W. Biochemical and structural characterization of MUPP1-PDZ4 domain from Mus musculus. Acta Biochim Biophys Sin (Shanghai) 2015; 47:199-206. [PMID: 25662616 DOI: 10.1093/abbs/gmv002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Specific protein-protein interactions are important for biological signal transduction. The postsynaptic density-95, disc-large, and zonulin-1 (PDZ) domain is one of the most abundant protein interaction modules. Multi-PDZ-domain protein 1 (MUPP1), as a scaffold protein, contains 13 PDZ domains and plays an important role in cytoskeletal organization, cell polarity, and cell proliferation. The study on PDZ domain of MUPP1 helps to understand the mechanisms and functions of MUPP1. In the present study, the fourth PDZ domain of MUPP1 (MUPP1-PDZ4) from Mus musculus was cloned, expressed, purified, and characterized. The MUPP1-PDZ4 domain was subcloned into a pET-vector and expressed in Escherichia coli. Affinity chromatography and size-exclusion chromatography were used to purify the protein. MUPP1-PDZ4 protein was a monomer with a molar mass of 16.4 kDa in solution and had a melting point of 60.3°C. Using the sitting-drop vapor-diffusion method, MUPP1-PDZ4 protein crystals were obtained in a solution (pH 7.0) containing 2% (v/v) polyethylene glycol 400, 0.1 M imidazole, and 24% (w/v) polyethylene glycol monoethyl ether 5000. Finally, the crystal was diffracted with 1.6 Å resolution. The crystal structure showed that MUPP1-PDZ4 domain contained three α-helices and six β-strands in the core. The GLGI motif, L562/A564 on the β-strand B, and H605/V608/L612 on the α-helix B formed a PDZ binding pocket which could bind to the C-terminal of the binding partners. This biochemical and structural information will provide insights into how PDZ binds to its target peptide and the theoretical foundation for the function of MUPP1.
Collapse
Affiliation(s)
- Haili Zhu
- Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China
| | - Zexu Liu
- Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Hong Kong, China
| | - Yuxin Huang
- Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China
| | - Chao Zhang
- Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China
| | - Gang Li
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China
| | - Wei Liu
- Shenzhen Key Laboratory for Neuronal Structural Biology, Biomedical Research Institute, Shenzhen Peking University-The Hong Kong University of Science and Technology Medical Center, Shenzhen 518036, China Division of Life Science, State Key Laboratory of Molecular Neuroscience, Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
11
|
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. MOLECULAR BIOSYSTEMS 2015; 10:2229-35. [PMID: 24931825 DOI: 10.1039/c4mb00316k] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (http://lin.uestc.edu.cn/server/PVPred). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations.
Collapse
Affiliation(s)
- Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | | | | | | |
Collapse
|
12
|
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. MOLECULAR BIOSYSTEMS 2014. [DOI: 10.1039/c4mb00316k pmid: 24931825] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells.
Collapse
Affiliation(s)
- Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054, China
| | - Peng-Mian Feng
- School of Public Health
- Hebei United University
- Tangshan 063000, China
| | - Wei Chen
- Department of Physics
- School of Sciences
- and Center for Genomics and Computational Biology
- Hebei United University
- Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education
- Center of Bioinformatics
- School of Life Science and Technology
- University of Electronic Science and Technology of China
- Chengdu 610054, China
| |
Collapse
|