1
|
Gao L, Zhang Y, Ge F, Li S, Guo Y, Song J, Yu DJ. Structure-Directed Pan-Specific T-Cell Receptor-Peptide-Major Histocompatibility Complex Interaction Prediction. J Chem Inf Model 2025; 65:4674-4686. [PMID: 40297927 DOI: 10.1021/acs.jcim.5c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
T-cell receptors (TCRs) play a pivotal role in the adaptive immune system, and understanding their antigen recognition mechanism remains a critical area of research. With the increasing availability of binding and interaction data between TCRs and peptide-major histocompatibility complexes (pMHCs), data-driven computational methods are emerging as powerful tools with significant potential for advancement. In this study, we collected and curated comprehensive sequence and structure data sets of TCRs from human CD8+ T-cells and cognate epitopes presented by MHC class I molecules. We developed two innovative computational frameworks: SG-TPMI, a lightweight, extensible, and structure-guided model for predicting TCR-pMHC binding specificity, and Seq/Struct-TCS, a pair of models (sequence-based and structure-based) for predicting contact sites within TCR-pMHC complexes. Notably, we directly integrated MHC-I alpha helices (or pseudosequences) and structural information on the protein complex into the prediction models. Our comprehensive modeling approach enabled quantitative investigations of TCR-pMHC interaction mechanisms, empowering SG-TPMI and Struct-TCS to achieve performances comparable to those of state-of-the-art methods. Furthermore, our results highlight the necessity of CDR1 and CDR2 loops as well as MHC restriction in pan-specific TCR-pMHC interaction prediction, providing new insights into TCR recognition. In summary, we not only propose SG-TPMI as an effective computational method for predicting TCR-pMHC binary interactions but also introduce the Seq/Struct-TCS design for predicting TCR interacting sites with peptide or MHC alpha helices.
Collapse
Affiliation(s)
- Letao Gao
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| | - Yumeng Zhang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, Nanjing 210003,China
| | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
2
|
Xiong S, Cai J, Shi H, Cui F, Zhang Z, Wei L. UMPPI: Unveiling Multilevel Protein-Peptide Interaction Prediction via Language Models. J Chem Inf Model 2025; 65:3789-3799. [PMID: 40077987 DOI: 10.1021/acs.jcim.4c02365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2025]
Abstract
Protein-peptide interactions are essential to cellular processes and disease mechanisms. Identifying protein-peptide binding residues is critical for understanding peptide function and advancing drug discovery. However, experimental methods are costly and time-intensive, while existing computational approaches often predict interactions or binding residues separately, lack effective feature integration, or rely heavily on limited high-quality structural data. To address these challenges, we propose UMPPI (Unveiling Multilevel Protein-Peptide Interaction), a multiobjective framework based on the pretrained protein language model ESM2. UMPPI simultaneously predicts binary protein-peptide interactions and binding residues on both peptides and proteins through a multiobjective optimization strategy. By integrating ESM2 to encode sequences and extract latent structural information, UMPPI bridges the gap between sequence-based and structure-based methods. Extensive experiments demonstrated that UMPPI successfully captured binary interactions between peptides and proteins and identified the binding residues on peptides and proteins. UMPPI can serve as a useful tool for protein-peptide interaction prediction and identification of critical binding residues, thereby facilitating the peptide drug discovery process.
Collapse
Affiliation(s)
- Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Jiajie Cai
- School of Software, Shandong University, Jinan 250101, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361005, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
3
|
Ünlü A, Ulusoy E, Yiğit MG, Darcan M, Doğan T. Protein language models for predicting drug-target interactions: Novel approaches, emerging methods, and future directions. Curr Opin Struct Biol 2025; 91:103017. [PMID: 39985946 DOI: 10.1016/j.sbi.2025.103017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/28/2025] [Accepted: 01/29/2025] [Indexed: 02/24/2025]
Abstract
Identifying new drug candidates remains a critical and complex challenge in drug development. Recent advances in deep learning have demonstrated significant potential to accelerate this process, particularly through the use of protein language models (pLMs). These models aim to effectively capture the structural and functional properties of proteins by embedding them in high-dimensional spaces, thereby providing powerful tools for predictive tasks. This review examines the application of pLMs in drug-target interaction (DTI) prediction, addressing both small-molecule and protein-based therapeutics. We explore diverse methodologies, including end-to-end learning models and those that leverage pre-trained foundational pLMs. Furthermore, we highlight the role of heterogeneous data integration-ranging from protein structures to knowledge graphs-to improve the accuracy of DTI predictions. Despite notable progress, challenges persist in accurately identifying DTIs, mainly due to data-related limitations and algorithmic constraints. Future research directions include utilising multimodal learning approaches, incorporating temporal/dynamic interaction data into training, and employing novel deep learning architectures to refine protein representations, gain a deeper understanding of biological context regarding molecular interactions, and, thus, advance the DTI prediction field.
Collapse
Affiliation(s)
- Atabey Ünlü
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye
| | - Erva Ulusoy
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye
| | - Melih Gökay Yiğit
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Computer Engineering, Middle East Technical University, 06800, Ankara, Türkiye
| | - Melih Darcan
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye
| | - Tunca Doğan
- Biological Data Science Lab, Dept. of Computer Engineering, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Bioinformatics, Graduate School of Health Sciences, Hacettepe University, 06800, Ankara, Türkiye; Dept. of Health Informatics, Institute of Informatics, Hacettepe University, 06800, Ankara, Türkiye.
| |
Collapse
|
4
|
Malhotra Y, John J, Yadav D, Sharma D, Vanshika, Rawal K, Mishra V, Chaturvedi N. Advancements in protein structure prediction: A comparative overview of AlphaFold and its derivatives. Comput Biol Med 2025; 188:109842. [PMID: 39970826 DOI: 10.1016/j.compbiomed.2025.109842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2024] [Revised: 02/07/2025] [Accepted: 02/10/2025] [Indexed: 02/21/2025]
Abstract
This review provides a comprehensive analysis of AlphaFold (AF) and its derivatives (AF2 and AF3) in protein structure prediction. These tools have revolutionized structural biology with their highly accurate predictions, driving progress in protein modeling, drug discovery, and the study of protein dynamics. Its exceptional accuracy has redefined our understanding of protein folding, which enables groundbreaking advancements in protein design, disease research and discusses future integration with experimental techniques. In addition, their achievement features, architectures, important case studies, and noteworthy effects in the field of biology and medicine were evaluated. In consideration of the fact that AF2 is a relatively recent innovation, it has already been taken into account in many studies that highlight its applications in many ways. Moreover, the limitations of AF2 that directed to the introduction of AF3 are also reported, which is a great improvement as it provides precise predictions of the structures and interactions of proteins, DNA, RNA, and ligands, thereby aiding in the understanding of the molecular level. Addressing current challenges and forecasting future developments, this work underscores the lasting significance of AF in reshaping the scientific landscape of protein research.
Collapse
Affiliation(s)
- Yuktika Malhotra
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Jerry John
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepika Yadav
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Deepshikha Sharma
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vanshika
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Kamal Rawal
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India
| | - Vaibhav Mishra
- Amity Institute of Microbial Technology, Amity University, Uttar Pradesh, 201303, India
| | - Navaneet Chaturvedi
- Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, 201303, India.
| |
Collapse
|
5
|
Wang Y, Han S, Wang Y, Liang Q, Luo W. Artificial Intelligence Technology Assists Enzyme Prediction and Rational Design. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2025; 73:7065-7073. [PMID: 40066931 DOI: 10.1021/acs.jafc.4c13201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2025]
Abstract
Since the structure of enzymes determines their function, elucidating the structure of enzymes lays a solid foundation for deciphering their catalytic mechanism and enabling rational design. The development of artificial intelligence (AI) has sparked a technological revolution, infusing new vitality into theoretical studies of enzymology and the advancement of enzyme engineering techniques. This Review outlines the development process and main methods of AI applied in the structural elucidation and functional prediction of enzymes. Furthermore, it emphasizes AI-based rational design of enzymes and provides a detailed exposition of representative AI algorithms and case studies. With the support of AI technology, the comprehension of enzyme structure and function and their relationship will become deeper and more efficient, thereby promoting the widespread application of enzyme engineering in various fields.
Collapse
Affiliation(s)
- Yuhang Wang
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| | - Shuangxin Han
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| | - Yi Wang
- Department of Biological and Agricultural Engineering, University of California, Davis, 1 Shields Ave, Davis, California 95616, United States
| | - Quanfeng Liang
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao 266237, P. R. China
| | - Wei Luo
- The Key Laboratory of Carbohydrate Chemistry and Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University, Wuxi 214126, China
| |
Collapse
|
6
|
Khan S, Noor S, Awan HH, Iqbal S, AlQahtani SA, Dilshad N, Ahmad N. Deep-ProBind: binding protein prediction with transformer-based deep learning model. BMC Bioinformatics 2025; 26:88. [PMID: 40121399 PMCID: PMC11929993 DOI: 10.1186/s12859-025-06101-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 03/04/2025] [Indexed: 03/25/2025] Open
Abstract
Binding proteins play a crucial role in biological systems by selectively interacting with specific molecules, such as DNA, RNA, or peptides, to regulate various cellular processes. Their ability to recognize and bind target molecules with high specificity makes them essential for signal transduction, transport, and enzymatic activity. Traditional experimental methods for identifying protein-binding peptides are costly and time-consuming. Current sequence-based approaches often struggle with accuracy, focusing too narrowly on proximal sequence features and ignoring structural data. This study presents Deep-ProBind, a powerful prediction model designed to classify protein binding sites by integrating sequence and structural information. The proposed model employs a transformer and evolutionary-based attention mechanism, i.e., Bidirectional Encoder Representations from Transformers (BERT) and Pseudo position specific scoring matrix -Discrete Wavelet Transform (PsePSSM -DWT) approach to encode peptides. The SHapley Additive exPlanations (SHAP) algorithm selects the optimal hybrid features, and a Deep Neural Network (DNN) is then used as the classification algorithm to predict protein-binding peptides. The performance of the proposed model was evaluated in comparison with traditional Machine Learning (ML) algorithms and existing models. Experimental results demonstrate that Deep-ProBind achieved 92.67% accuracy with tenfold cross-validation on benchmark datasets and 93.62% accuracy on independent samples. The Deep-ProBind outperforms existing models by 3.57% on training data and 1.52% on independent tests. These results demonstrate Deep-ProBind's reliability and effectiveness, making it a valuable tool for researchers and a potential resource in pharmacological studies, where peptide binding plays a critical role in therapeutic development.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KPK, Pakistan
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Hamid Hussain Awan
- Department of Computer Science, Rawalpindi Women University, Rawalpindi, 46300, Punjab, Pakistan
| | - Shehryar Iqbal
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, UK
| | - Salman A AlQahtani
- New Emerging Technologies and 5g Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Naqqash Dilshad
- Department of Computer Science & Engineering, Sejong University, Seoul, 05006, South Korea
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
7
|
Tang S, Zhang Y, Tong A, Chatterjee P. Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation. ARXIV 2025:arXiv:2503.17361v1. [PMID: 40166737 PMCID: PMC11957225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.
Collapse
Affiliation(s)
- Sophia Tang
- Department of Biomedical Engineering, Duke University
- Management and Technology Program, University of Pennsylvania
| | - Yinuo Zhang
- Department of Biomedical Engineering, Duke University
- Center of Computational Biology, Duke-NUS Medical School
| | | | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
8
|
Shi C, Liu F, Su X, Yang Z, Wang Y, Xie S, Xie S, Sun Q, Chen Y, Sang L, Tan M, Zhu L, Lei K, Li J, Yang J, Gao Z, Yu M, Wang X, Wang J, Chen J, Zhuo W, Fang Z, Liu J, Yan Q, Neculai D, Sun Q, Shao J, Lin W, Liu W, Chen J, Wang L, Liu Y, Li X, Zhou T, Lin A. Comprehensive discovery and functional characterization of the noncanonical proteome. Cell Res 2025; 35:186-204. [PMID: 39794466 PMCID: PMC11909191 DOI: 10.1038/s41422-024-01059-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 11/14/2024] [Indexed: 01/13/2025] Open
Abstract
The systematic identification and functional characterization of noncanonical translation products, such as novel peptides, will facilitate the understanding of the human genome and provide new insights into cell biology. Here, we constructed a high-coverage peptide sequencing reference library with 11,668,944 open reading frames and employed an ultrafiltration tandem mass spectrometry assay to identify novel peptides. Through these methods, we discovered 8945 previously unannotated peptides from normal gastric tissues, gastric cancer tissues and cell lines, nearly half of which were derived from noncoding RNAs. Moreover, our CRISPR screening revealed that 1161 peptides are involved in tumor cell proliferation. The presence and physiological function of a subset of these peptides, selected based on screening scores, amino acid length, and various indicators, were verified through Flag-knockin and multiple other methods. To further characterize the potential regulatory mechanisms involved, we constructed a framework based on artificial intelligence structure prediction and peptide‒protein interaction network analysis for the top 100 candidates and revealed that these cancer-related peptides have diverse subcellular locations and participate in organelle-specific processes. Further investigation verified the interacting partners of pep1-nc-OLMALINC, pep5-nc-TRHDE-AS1, pep-nc-ZNF436-AS1 and pep2-nc-AC027045.3, and the functions of these peptides in mitochondrial complex assembly, energy metabolism, and cholesterol metabolism, respectively. We showed that pep5-nc-TRHDE-AS1 and pep2-nc-AC027045.3 had substantial impacts on tumor growth in xenograft models. Furthermore, the dysregulation of these four peptides is closely correlated with clinical prognosis. Taken together, our study provides a comprehensive characterization of the noncanonical proteome, and highlights critical roles of these previously unannotated peptides in cancer biology.
Collapse
Affiliation(s)
- Chengyu Shi
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Fangzhou Liu
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Xinwan Su
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Zuozhen Yang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Ying Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Shanshan Xie
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Shaofang Xie
- Key Laboratory of Structural Biology of Zhejiang Province, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, Hangzhou, Zhejiang, China
| | - Qiang Sun
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Yu Chen
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Lingjie Sang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Manman Tan
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Linyu Zhu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Kai Lei
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Junhong Li
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Jiecheng Yang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Zerui Gao
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Meng Yu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Xinyi Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Junfeng Wang
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China
| | - Jing Chen
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Wei Zhuo
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Zhaoyuan Fang
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, Zhejiang, China
- The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Jian Liu
- Zhejiang University-University of Edinburgh Institute, Zhejiang University School of Medicine, Haining, Zhejiang, China
- Hangzhou Cancer Hospital, Hangzhou, Zhejiang, China
| | - Qingfeng Yan
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Dante Neculai
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Qiming Sun
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jianzhong Shao
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Weiqiang Lin
- Department of Nephrology, Center for Regeneration and Aging Medicine, The Fourth Affiliated Hospital of School of Medicine and International School of Medicine, International Institutes of Medicine, Zhejiang University, Yiwu, Zhejiang, China
| | - Wei Liu
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China
| | - Jian Chen
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China
- Department of Gastrointestinal Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Liangjing Wang
- Department of Gastroenterology, the Second Affiliated Hospital, School of Medicine and Institute of Gastroenterology, Zhejiang University, Hangzhou, Zhejiang, China
| | - Yang Liu
- Institute of Immunology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Xu Li
- Key Laboratory of Structural Biology of Zhejiang Province, Westlake Laboratory of Life Sciences and Biomedicine, Westlake University, Hangzhou, Zhejiang, China
| | - Tianhua Zhou
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China.
- Department of Cell Biology and Program in Molecular Cell Biology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
| | - Aifu Lin
- The Center for RNA Medicine, International Institutes of Medicine, International School of Medicine, The 4th Affiliated Hospital of Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, China.
- Cancer Center, Zhejiang University, Hangzhou, Zhejiang, China.
- Key Laboratory of Cancer Prevention and Intervention, China National Ministry of Education, Hangzhou, Zhejiang, China.
- Future Health Laboratory, Innovation Center of Yangtze River Delta, Zhejiang University, Jiashan, Zhejiang, China.
- Key Laboratory for Cell and Gene Engineering of Zhejiang Province, Hangzhou, Zhejiang, China.
| |
Collapse
|
9
|
Sequeira A, Rocha M, Lousa D. Machine and deep learning to predict viral fusion peptides. Comput Struct Biotechnol J 2025; 27:692-704. [PMID: 40083606 PMCID: PMC11903910 DOI: 10.1016/j.csbj.2025.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 02/10/2025] [Accepted: 02/17/2025] [Indexed: 03/16/2025] Open
Abstract
Viral fusion proteins, located on the surface of enveloped viruses like SARS-CoV-2, Influenza, and HIV, play a vital role in fusing the virus envelope with the host cell membrane. Fusion peptides, conserved segments within these proteins, are crucial for the fusion process and are potential targets for therapy. Experimental identification of fusion peptides is time-consuming and costly, which creates the need for bioinformatics tools that can predict the segment within the fusion protein sequence that corresponds to the FP. Although homology-based methods have been used towards this end, they fail to identify fusion peptides lacking overall sequence similarity to known counterparts. Therefore, alternative methods are needed to discover new putative fusion peptides, namely those based on machine learning. In this study, we explore various ML-based approaches to identify fusion peptides within a fusion protein sequence. We employ token classification methods and sliding window approaches coupled with machine and deep learning models. We evaluate different protein sequence representations, including one-hot encoding, physicochemical features, as well as representations from Natural Language Processing, such as word embeddings and transformers. Through the examination of over 50 combinations of models and features, we achieve promising results, particularly with models based on a state-of-the-art transformer for amino acid token classification. Furthermore, we utilize the best models to predict hypothetical fusion peptides for SARS-CoV-2, and critically analyse annotated peptides from existing research. Overall, our models effectively predict the location of fusion peptides, even in viruses for which limited experimental data is available.
Collapse
Affiliation(s)
- A.M. Sequeira
- Department of Informatics, School of Engineering, University of Minho, Braga, Portugal
| | - M. Rocha
- Department of Informatics, School of Engineering, University of Minho, Braga, Portugal
| | - Diana Lousa
- ITQB NOVA, Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal
| |
Collapse
|
10
|
Zhai S, Liu T, Lin S, Li D, Liu H, Yao X, Hou T. Artificial intelligence in peptide-based drug design. Drug Discov Today 2025; 30:104300. [PMID: 39842504 DOI: 10.1016/j.drudis.2025.104300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 01/14/2025] [Accepted: 01/15/2025] [Indexed: 01/24/2025]
Abstract
Protein-protein interactions (PPIs) are fundamental to a variety of biological processes, but targeting them with small molecules is challenging because of their large and complex interaction interfaces. However, peptides have emerged as highly promising modulators of PPIs, because they can bind to protein surfaces with high affinity and specificity. Nonetheless, computational peptide design remains difficult, hindered by the intrinsic flexibility of peptides and the substantial computational resources required. Recent advances in artificial intelligence (AI) are paving new paths for peptide-based drug design. In this review, we explore the advanced deep generative models for designing target-specific peptide binders, highlight key challenges, and offer insights into the future direction of this rapidly evolving field.
Collapse
Affiliation(s)
- Silong Zhai
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao; College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tiantao Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Shaolong Lin
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao
| | - Xiaojun Yao
- Faculty of Applied Science, Macao Polytechnic University, 999078, Macao.
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
11
|
Bhat S, Palepu K, Hong L, Mao J, Ye T, Iyer R, Zhao L, Chen T, Vincoff S, Watson R, Wang TZ, Srijay D, Kavirayuni VS, Kholina K, Goel S, Vure P, Deshpande AJ, Soderling SH, DeLisa MP, Chatterjee P. De novo design of peptide binders to conformationally diverse targets with contrastive language modeling. SCIENCE ADVANCES 2025; 11:eadr8638. [PMID: 39841846 PMCID: PMC11753435 DOI: 10.1126/sciadv.adr8638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 12/20/2024] [Indexed: 01/24/2025]
Abstract
Designing binders to target undruggable proteins presents a formidable challenge in drug discovery. In this work, we provide an algorithmic framework to design short, target-binding linear peptides, requiring only the amino acid sequence of the target protein. To do this, we propose a process to generate naturalistic peptide candidates through Gaussian perturbation of the peptidic latent space of the ESM-2 protein language model and subsequently screen these novel sequences for target-selective interaction activity via a contrastive language-image pretraining (CLIP)-based contrastive learning architecture. By integrating these generative and discriminative steps, we create a Peptide Prioritization via CLIP (PepPrCLIP) pipeline and validate highly ranked, target-specific peptides experimentally, both as inhibitory peptides and as fusions to E3 ubiquitin ligase domains. PepPrCLIP-derived constructs demonstrate functionally potent binding and degradation of conformationally diverse, disease-driving targets in vitro. In total, PepPrCLIP empowers the modulation of previously inaccessible proteins without reliance on stable and ordered tertiary structures.
Collapse
Affiliation(s)
- Suhaas Bhat
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Kalyan Palepu
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Lauren Hong
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Joey Mao
- Department of Cell Biology, Duke University, Durham, NC, USA
| | - Tianzheng Ye
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
| | - Rema Iyer
- Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tianlai Chen
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Sophia Vincoff
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Rio Watson
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Tian Z. Wang
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Divya Srijay
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | | | - Kseniia Kholina
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Shrey Goel
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Pranay Vure
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
| | - Aniruddha J. Deshpande
- Cancer Genome and Epigenetics Program, Sanford Burnham Prebys Institute, San Diego, CA, USA
| | | | - Matthew P. DeLisa
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Department of Computer Science, Duke University, Durham, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
12
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
13
|
Gagoski D, Rube HT, Rastogi C, Melo LAN, Li X, Voleti R, Shah NH, Bussemaker HJ. Accurate sequence-to-affinity models for SH2 domains from multi-round peptide binding assays coupled with free-energy regression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.12.23.630085. [PMID: 39764007 PMCID: PMC11703206 DOI: 10.1101/2024.12.23.630085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/12/2025]
Abstract
Short linear peptide motifs play important roles in cell signaling. They can act as modification sites for enzymes and as recognition sites for peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. Quantifying this sequence specificity is critical for deciphering phosphotyrosine-dependent signaling networks. In recent years, protein display technologies and deep sequencing have allowed researchers to profile SH2 domain binding across thousands of candidate ligands. Here, we present a concerted experimental and computational strategy that improves the predictive power of SH2 specificity profiling. Through multi-round affinity selection and deep sequencing with large randomized phosphopeptide libraries, we produce suitable data to train an additive binding free energy model that covers the full theoretical ligand sequence space. Our models can be used to predict signaling network connectivity and the impact of missense variants in phosphoproteins on SH2 binding.
Collapse
Affiliation(s)
- Dejan Gagoski
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Chemistry, Columbia University, New York, NY, USA
| | - H. Tomas Rube
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Applied Mathematics, University of California-Merced, Merced, CA, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Lucas A. N. Melo
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Rashmi Voleti
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Neel H. Shah
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| |
Collapse
|
14
|
Sun X, Wu Z, Su J, Li C. GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5. Int J Biol Macromol 2024; 282:136933. [PMID: 39471921 DOI: 10.1016/j.ijbiomac.2024.136933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 10/21/2024] [Accepted: 10/24/2024] [Indexed: 11/01/2024]
Abstract
Protein-protein/peptide interactions play crucial roles in various biological processes. Exploring their interactions attracts wide attention. However, accurately predicting their binding sites remains a challenging task. Here, we develop an effective model GraphPBSP based on Graph Attention Network with Convolutional Neural Network and Multilayer Perceptron for protein-protein/peptide binding site prediction, which utilizes various feature types derived from protein sequence and structure including interface residue pairwise propensity developed by us and sequence embeddings obtained from a new pre-trained model ProstT5, alongside physicochemical properties and structural features. To our best knowledge, ProstT5 sequence embeddings and residue pairwise propensity are first introduced for protein-protein/peptide binding site prediction. Additionally, we propose a spatial neighbor-based feature statistic method for effectively considering key spatially neighboring information that significantly improves the model's prediction ability. For model training, a multi-scale objective function is constructed, which enhances the learning capability across samples of the same or different classes. On multiple protein-protein/peptide binding site test sets, GraphPBSP outperforms the currently available state-of-the-art methods with an excellent performance. Additionally, its performances on protein-DNA/RNA binding site test sets also demonstrate its good generalization ability. In conclusion, GraphPBSP is a promising method, which can offer valuable information for protein engineering and drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
15
|
Huang J, Li W, Xiao B, Zhao C, Zheng H, Li Y, Wang J. PepCA: Unveiling protein-peptide interaction sites with a multi-input neural network model. iScience 2024; 27:110850. [PMID: 39391726 PMCID: PMC11465048 DOI: 10.1016/j.isci.2024.110850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/13/2024] [Accepted: 08/27/2024] [Indexed: 10/12/2024] Open
Abstract
The protein-peptide interaction plays a pivotal role in fields such as drug development, yet remains underexplored experimentally and challenging to model computationally. Herein, we introduce PepCA, a sequence-based approach for predicting peptide-binding sites on proteins. A primary obstacle in predicting peptide-protein interactions is the difficulty in acquiring precise protein structures, coupled with the uncertainty of polypeptide configurations. To address this, we first encode protein sequences using the Evolutionary Scale Modeling 2 (ESM-2) pre-trained model to extract latent structural information. Additionally, we have developed a multi-input coattention mechanism to concurrently update the encoding of both peptide and protein residues. PepCA integrates this module within an encoder-decoder structure. This model's high precision in identifying binding sites significantly advances the field of computational biology, offering vital insights for peptide drug development and protein science.
Collapse
Affiliation(s)
- Junxiong Huang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Weikang Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Bin Xiao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Chunqing Zhao
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Hancheng Zheng
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
| | - Yingrui Li
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, Surrey, UK
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| | - Jun Wang
- iCarbonX (Zhuhai) Company Limited, Zhuhai, Guangdong, China
- State Key Laboratory of Quality Research in Chinese Medicine, Macau University of Science and Technology, Taipa, Macau, China
- Shenzhen Digital Life Institute, Shenzhen, Guangdong, China
- iCarbonX (Shenzhen) Pharmaceutical Technology Co, Shenzhen, Guangdong, China
| |
Collapse
|
16
|
Shafiee S, Fathi A, Taherzadeh G. DP-site: A dual deep learning-based method for protein-peptide interaction site prediction. Methods 2024; 229:17-29. [PMID: 38871095 DOI: 10.1016/j.ymeth.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/22/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND Protein-peptide interaction prediction is an important topic for several applications including various biological processes, understanding drug discovery, protein function abnormal cellular behaviors, and treating diseases. Over the years, studies have shown that experimental methods have improved the identification of this bio-molecular interaction. However, predicting protein-peptide interactions using these methods is laborious, time-consuming, dependent on third-party tools, and costly. METHOD To address these previous drawbacks, this study introduces a computational framework called DP-Site. The proposed framework concentrates on using a compound of a dual pipeline along with a combination predictor. A deep convolutional neural network for feature extraction and classification is embedded in pipeline 1. In addition, pipeline 2 includes a deep long-short-term memory-based and a random forest classifier for feature extraction and classification. In this investigation, the evolutionary, structure-based, sequence-based, and physicochemical information of proteins is utilized for identifying protein-peptide interaction at the residue level. RESULTS The proposed method is evaluated on both the ten-fold cross-validation and independent test sets. The robust and consistent results between cross-validation and independent test sets confirm the ability of the proposed method to predict peptide binding residues in proteins. Moreover, experimental findings demonstrate that DP-Site has significantly outperformed other state-of-the-art sequence-based and structure-based methods. The proposed method achieves a remarkable balance between a specificity of 0.799 and a sensitivity of 0.770, along with the best f-measure of 0.661 and the highest precision of 0.580 using an independent test set. CONCLUSIONS The outcome of various experiments confirms the proficiency of the proposed method and outperforms state-of-the-art sequence-based and structure-based methods in terms of the mentioned criteria. DP-Site can be accessed at https://github.com/shafiee 95/shima.shafiee.DP-Site.
Collapse
Affiliation(s)
- Shima Shafiee
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Abdolhossein Fathi
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Ghazaleh Taherzadeh
- Department of Math, Physics, and Computer Science, Wilkes University, Pennsylvania, USA.
| |
Collapse
|
17
|
Sun X, Wu Z, Su J, Li C. A deep attention model for wide-genome protein-peptide binding affinity prediction at a sequence level. Int J Biol Macromol 2024; 276:133811. [PMID: 38996881 DOI: 10.1016/j.ijbiomac.2024.133811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/09/2024] [Accepted: 07/09/2024] [Indexed: 07/14/2024]
Abstract
Peptides are pivotal in numerous biological activities by engaging in up to 40 % of protein-protein interactions in many cellular processes. Due to their exceptional specificity and effectiveness, peptides have emerged as promising candidates for drug design. However, accurately predicting protein-peptide binding affinity remains a challenging. Aiming at the problem, we develop a prediction model PepPAP based on convolutional neural network and multi-head attention, which relies solely on sequence features. These features include physicochemical properties, intrinsic disorder, sequence encoding, and especially interface propensity which is extracted from 16,689 non-redundant protein-peptide complexes. Notably, the adopted regression stratification cross-validation scheme proposed in our previous work is beneficial to improve the prediction for the cases with extreme binding affinity values. On three benchmark test datasets: T100, a series of peptides targeting to PDZ domain and CXCR4, PepPAP shows excellent performance, outperforming the existing methods and demonstrating its good generalization ability. Furthermore, PepPAP has good results in binary interaction prediction, and the analysis of the feature space distribution visualization highlights PepPAP's effectiveness. To the best of our knowledge, PepPAP is the first sequence-based deep attention model for wide-genome protein-peptide binding affinity prediction, and holds the potential to offer valuable insights for the peptide-based drug design.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
18
|
Yadalam PK, Ramadoss R, Anegundi RV. HyperAttention and Linformer-Based β-catenin Sequence Prediction For Bone Formation. Cureus 2024; 16:e68849. [PMID: 39376879 PMCID: PMC11456985 DOI: 10.7759/cureus.68849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 09/07/2024] [Indexed: 10/09/2024] Open
Abstract
Introduction Beta (β)-catenin, a pivotal protein in bone development and homeostasis, is implicated in various bone disorders. Peptide-based therapeutics offer a promising approach due to their specificity and potential for reduced side effects. Attention networks are widely used for peptide sequence prediction, specifically sequence-to-sequence models. Hence, the current study aims to develop a HyperAttention and informatics-based β-catenin sequence prediction for bone formation. Methods β-catenin protein sequences were downloaded and quality-checked using UniProt and FASTA sequences using DeepBio (Deep Bio Inc., Seoul, South Korea) for predictive analysis. Data was analyzed for duplicates, outliers, and missing values. The data was then split into training and testing sets, with 80% of the data used for training and 20% for testing, and peptide sequences were encoded and subjected to algorithms. Results The HyperAttention and Linformer models perform well in predictive sequence, with HyperAttention correctly predicting 87% of instances and Linformer predicting 89%. Both models have higher sensitivity and specificity, with Linformer showing better identification of 91% of negative instances and slightly better sensitivity. Conclusion The HyperAttention and Linformer models effectively predict peptide sequences with high specificity and sensitivity. Further optimization and development are needed for optimal application and balance between positive and negative instances.
Collapse
Affiliation(s)
- Pradeep Kumar Yadalam
- Periodontics, Saveetha Dental College, Saveetha Institue of Medical and Technical Sciences (SIMATS) Deemed University, Chennai, IND
| | - Ramya Ramadoss
- Oral Pathology and Oral Biology, Saveetha Dental College, Saveetha Institue of Medical and Technical Sciences (SIMATS) Deemed University, Chennai, IND
| | - Raghavendra Vamsi Anegundi
- Periodontics, Saveetha Dental College, Saveetha Institue of Medical and Technical Sciences (SIMATS) Deemed University, Chennai, IND
| |
Collapse
|
19
|
Chen T, Dumas M, Watson R, Vincoff S, Peng C, Zhao L, Hong L, Pertsemlidis S, Shaepers-Cheu M, Wang TZ, Srijay D, Monticello C, Vure P, Pulugurta R, Kholina K, Goel S, DeLisa MP, Truant R, Aguilar HC, Chatterjee P. PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling. ARXIV 2024:arXiv:2310.03842v3. [PMID: 37873004 PMCID: PMC10593082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Target proteins that lack accessible binding pockets and conformational stability have posed increasing challenges for drug development. Induced proximity strategies, such as PROTACs and molecular glues, have thus gained attention as pharmacological alternatives, but still require small molecule docking at binding pockets for targeted protein degradation. The computational design of protein-based binders presents unique opportunities to access "undruggable" targets, but have often relied on stable 3D structures or structure-influenced latent spaces for effective binder generation. In this work, we introduce PepMLM, a target sequence-conditioned generator of de novo linear peptide binders. By employing a novel span masking strategy that uniquely positions cognate peptide sequences at the C-terminus of target protein sequences, PepMLM fine-tunes the state-of-the-art ESM-2 pLM to fully reconstruct the binder region, achieving low perplexities matching or improving upon validated peptide-protein sequence pairs. After successful in silico benchmarking with AlphaFold-Multimer, outperforming RFDiffusion on structured targets, we experimentally verify PepMLM's efficacy via fusion of model-derived peptides to E3 ubiquitin ligase domains, demonstrating endogenous degradation of emergent viral phosphoproteins and Huntington's disease-driving proteins. In total, PepMLM enables the generative design of candidate binders to any target protein, without the requirement of target structure, empowering downstream therapeutic applications.
Collapse
Affiliation(s)
- Tianlai Chen
- Department of Biomedical Engineering, Duke University
| | - Madeleine Dumas
- Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University
- Department of Microbiology, College of Agriculture and Life Sciences, Cornell University
| | - Rio Watson
- Department of Biomedical Engineering, Duke University
| | | | - Christina Peng
- Department of Biochemistry and Biomedical Sciences, McMaster University
| | - Lin Zhao
- Department of Biomedical Engineering, Duke University
| | - Lauren Hong
- Department of Biomedical Engineering, Duke University
| | | | - Mayumi Shaepers-Cheu
- Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University
| | - Tian Zi Wang
- Department of Biomedical Engineering, Duke University
| | - Divya Srijay
- Department of Biomedical Engineering, Duke University
| | - Connor Monticello
- Department of Biochemistry and Biomedical Sciences, McMaster University
| | - Pranay Vure
- Department of Biomedical Engineering, Duke University
| | | | | | - Shrey Goel
- Department of Biomedical Engineering, Duke University
| | - Matthew P. DeLisa
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
- Robert F. Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY, USA
- Cornell Institute of Biotechnology, Cornell University, Ithaca, NY, USA
| | - Ray Truant
- Department of Biochemistry and Biomedical Sciences, McMaster University
| | - Hector C. Aguilar
- Department of Microbiology and Immunology, College of Veterinary Medicine, Cornell University
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Computer Science, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
| |
Collapse
|
20
|
Wu J, Wang Y, Cai W, Chen D, Peng X, Dong H, Li J, Liu H, Shi S, Tang S, Li Z, Sui H, Wang Y, Wu C, Zhang Y, Fu X, Yin Y. Ribosomal translation of fluorinated non-canonical amino acids for de novo biologically active fluorinated macrocyclic peptides. Chem Sci 2024:d4sc04061a. [PMID: 39129776 PMCID: PMC11310889 DOI: 10.1039/d4sc04061a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 07/25/2024] [Indexed: 08/13/2024] Open
Abstract
Fluorination has emerged as a promising strategy in medicinal chemistry to improve the pharmacological profiles of drug candidates. Similarly, incorporating fluorinated non-canonical amino acids into macrocyclic peptides expands chemical diversity and enhances their pharmacological properties, from improved metabolic stability to enhanced cell permeability and target interactions. However, only a limited number of fluorinated non-canonical amino acids, which are canonical amino acid analogs, have been incorporated into macrocyclic peptides by ribosomes for de novo construction and target-based screening of fluorinated macrocyclic peptides. In this study, we report the ribosomal translation of a series of distinct fluorinated non-canonical amino acids, including mono-to tri-fluorinated variants, as well as fluorinated l-amino acids, d-amino acids, β-amino acids, etc. This enabled the de novo discovery of fluorinated macrocyclic peptides with high affinity for EphA2, and particularly the identification of those exhibiting broad-spectrum activity against Gram-negative bacteria by targeting the BAM complex. This study not only expands the scope of ribosomally translatable fluorinated amino acids but also underscores the versatility of fluorinated macrocyclic peptides as potent therapeutic agents.
Collapse
Affiliation(s)
- Junjie Wu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Yuchan Wang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Wenfeng Cai
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Danyan Chen
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Xiangda Peng
- Shanghai Zelixir Biotech Company Ltd Shanghai 200030 China
| | - Huilei Dong
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Jinjing Li
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Hongtan Liu
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Shuting Shi
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Sen Tang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Zhifeng Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Haiyan Sui
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Yan Wang
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Chuanliu Wu
- College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
| | - Xinmiao Fu
- College of Life Sciences, Fujian Normal University Fuzhou 350117 China
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University Qingdao 266237 China
- Shandong Research Institute of Industrial Technology Jinan 250101 China
| |
Collapse
|
21
|
Chen T, Zhang Y, Chatterjee P. moPPIt: De Novo Generation of Motif-Specific Binders with Protein Language Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.31.606098. [PMID: 39131360 PMCID: PMC11312608 DOI: 10.1101/2024.07.31.606098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
The ability to precisely target specific motifs on disease-related proteins, whether conserved epitopes on viral proteins, intrinsically disordered regions within transcription factors, or breakpoint junctions in fusion oncoproteins, is essential for modulating their function while minimizing off-target effects. Current methods struggle to achieve this specificity without reliable structural information. In this work, we introduce a motif-specific PPI targeting algorithm, moPPIt, for de novo generation of motif-specific peptide binders from the target protein sequence alone. At the core of moPPIt is BindEvaluator, a transformer-based model that interpolates protein language model embeddings of two proteins via a series of multi-headed self-attention blocks, with a key focus on local motif features. Trained on over 510,000 annotated PPIs, BindEvaluator accurately predicts target binding sites given protein-protein sequence pairs with a test AUC > 0.94, improving to AUC > 0.96 when fine-tuned on peptide-protein pairs. By combining BindEvaluator with our PepMLM peptide generator and genetic algorithm-based optimization, moPPIt generates peptides that bind specifically to user-defined residues on target proteins. We demonstrate moPPIt's efficacy in computationally designing binders to specific motifs, first on targets with known binding peptides and then extending to structured and disordered targets with no known binders. In total, moPPIt serves as a powerful tool for developing highly specific peptide therapeutics without relying on target structure or structure-dependent latent spaces.
Collapse
Affiliation(s)
- Tong Chen
- Department of Biomedical Engineering, Duke University
| | - Yinuo Zhang
- Department of Biostatistics and Bioinformatics, Duke University
| | - Pranam Chatterjee
- Department of Biomedical Engineering, Duke University
- Department of Biostatistics and Bioinformatics, Duke University
- Department of Computer Science, Duke University
| |
Collapse
|
22
|
Yuan Q, Tian C, Song Y, Ou P, Zhu M, Zhao H, Yang Y. GPSFun: geometry-aware protein sequence function predictions with language models. Nucleic Acids Res 2024; 52:W248-W255. [PMID: 38738636 PMCID: PMC11223820 DOI: 10.1093/nar/gkae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 04/22/2024] [Accepted: 04/26/2024] [Indexed: 05/14/2024] Open
Abstract
Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Peihua Ou
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Mingming Zhu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong 510000, China
| |
Collapse
|
23
|
Zhu C, Zhang C, Shang T, Zhang C, Zhai S, Cao L, Xu Z, Su Z, Song Y, Su A, Li C, Duan H. GAPS: a geometric attention-based network for peptide binding site identification by the transfer learning approach. Brief Bioinform 2024; 25:bbae297. [PMID: 38990514 PMCID: PMC11238429 DOI: 10.1093/bib/bbae297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/28/2024] [Accepted: 06/07/2024] [Indexed: 07/12/2024] Open
Abstract
Protein-peptide interactions (PPepIs) are vital to understanding cellular functions, which can facilitate the design of novel drugs. As an essential component in forming a PPepI, protein-peptide binding sites are the basis for understanding the mechanisms involved in PPepIs. Therefore, accurately identifying protein-peptide binding sites becomes a critical task. The traditional experimental methods for researching these binding sites are labor-intensive and time-consuming, and some computational tools have been invented to supplement it. However, these computational tools have limitations in generality or accuracy due to the need for ligand information, complex feature construction, or their reliance on modeling based on amino acid residues. To deal with the drawbacks of these computational algorithms, we describe a geometric attention-based network for peptide binding site identification (GAPS) in this work. The proposed model utilizes geometric feature engineering to construct atom representations and incorporates multiple attention mechanisms to update relevant biological features. In addition, the transfer learning strategy is implemented for leveraging the protein-protein binding sites information to enhance the protein-peptide binding sites recognition capability, taking into account the common structure and biological bias between proteins and peptides. Consequently, GAPS demonstrates the state-of-the-art performance and excellent robustness in this task. Moreover, our model exhibits exceptional performance across several extended experiments including predicting the apo protein-peptide, protein-cyclic peptide and the AlphaFold-predicted protein-peptide binding sites. These results confirm that the GAPS model is a powerful, versatile, stable method suitable for diverse binding site predictions.
Collapse
Affiliation(s)
- Cheng Zhu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengyun Zhang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Tianfeng Shang
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Chenhao Zhang
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Silong Zhai
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Lujing Cao
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Zhenyu Xu
- AI Department, Shanghai Highslab Therapeutics. Inc, Zhangheng Road, Pudong New Area, Shanghai 201203, China
| | - Zhihao Su
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Ying Song
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - An Su
- College of Chemical Engineering, Zhejiang University of Technology, Chaowang Road, Gongshu District, Hangzhou 310014, China
| | - Chengxi Li
- College of Chemical and Biological Engineering, Zhejiang University, Yuhangtang Road, Xihu District, Hangzhou 310027, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| |
Collapse
|
24
|
Yin S, Mi X, Shukla D. Leveraging machine learning models for peptide-protein interaction prediction. RSC Chem Biol 2024; 5:401-417. [PMID: 38725911 PMCID: PMC11078210 DOI: 10.1039/d3cb00208j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/07/2024] [Indexed: 05/12/2024] Open
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as docking and molecular dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
- Department of Bioengineering, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| |
Collapse
|
25
|
Yuan Q, Tian C, Yang Y. Genome-scale annotation of protein binding sites via language model and geometric deep learning. eLife 2024; 13:RP93695. [PMID: 38630609 PMCID: PMC11023698 DOI: 10.7554/elife.93695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven't fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Chong Tian
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen UniversityGuangzhouChina
| |
Collapse
|
26
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
27
|
Yin S, Mi X, Shukla D. Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction. ARXIV 2024:arXiv:2310.18249v2. [PMID: 37961736 PMCID: PMC10635286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as Docking and Molecular Dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
28
|
Vottero P, Olivetti EC, D'Agostino LC, Di Grazia L, Vezzetti E, Aminpour M, Tuszynski JA, Marcolin F. Understanding the contagiousness of Covid-19 strains: A geometric approach. J Mol Graph Model 2024; 126:108670. [PMID: 37984193 DOI: 10.1016/j.jmgm.2023.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/22/2023]
Abstract
Protein-protein interaction occurs on surface patches with some degree of complementary geometric and chemical features. Building on this understanding, this study endeavors to characterize the spike protein of the SARS-CoV-2 virus at the morphological and geometrical levels in its Alpha, Delta, and Omicron variants. In particular, the affinity between different SARS-CoV-2 spike proteins and the ACE2 receptor present on the membrane of the human respiratory system cells is investigated. To achieve an adequate degree of geometrical accuracy, the 3D depth maps of the proteins in exam are filtered by developing an ad-hoc convolutional filter with a kernel implemented as a sphere of varying radius, simulating a ball rolling on the surface (similar to the 'rolling ball' filter). This ball ideally models a hypothetical molecule that could interface with the protein and is inspired by the geometric approach to macromolecule-ligand interactions proposed by Kuntz et al. in 1982. The aim is to mitigate the imperfections and to obtain a smoother surface that could be studied from a geometrical perspective for binding purposes. A set of geometric descriptors, borrowed from the 3D face analysis context is then mapped point-by-point onto protein depth maps. Following a feature extraction phase inspired by Histogram of Oriented Gradients and Local Binary Patterns, the final histogram features are used as input for a Support Vector Machine classifier to automatically classify the proteins according to their surface affinity, where a similarity in shape is observed between ACE2 and the spike protein of the SARS-CoV-2 Omicron variant. Finally, Root Mean Square Error analysis is used to quantify the geometrical affinity between the ACE2 receptor and the respective Receptor Binding Domains of the three SARS-CoV-2 variants, culminating in a geometrical explanation for the higher contagiousness of Omicron relative to the other variants under study.
Collapse
Affiliation(s)
- Paola Vottero
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Elena Carlotta Olivetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Lucia Chiara D'Agostino
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Luca Di Grazia
- Department of Computer Science, University of Stuttgart, Universitätsstr. 38, 70569, Stuttgart, Germany
| | - Enrico Vezzetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Maral Aminpour
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Jacek Adam Tuszynski
- Department of Physics, University of Alberta, Edmonton, AB, T6G 2H7, Canada; Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy; Department of Data Science and Engineering, The Silesian University of Technology, Gliwice, Poland.
| | - Federica Marcolin
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| |
Collapse
|
29
|
Zhang Z, Verburgt J, Kagaya Y, Christoffer C, Kihara D. Improved Peptide Docking with Privileged Knowledge Distillation using Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.01.569671. [PMID: 38106114 PMCID: PMC10723353 DOI: 10.1101/2023.12.01.569671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Protein-peptide interactions play a key role in biological processes. Understanding the interactions that occur within a receptor-peptide complex can help in discovering and altering their biological functions. Various computational methods for modeling the structures of receptor-peptide complexes have been developed. Recently, accurate structure prediction enabled by deep learning methods has significantly advanced the field of structural biology. AlphaFold (AF) is among the top-performing structure prediction methods and has highly accurate structure modeling performance on single-chain targets. Shortly after the release of AlphaFold, AlphaFold-Multimer (AFM) was developed in a similar fashion as AF for prediction of protein complex structures. AFM has achieved competitive performance in modeling protein-peptide interactions compared to previous computational methods; however, still further improvement is needed. Here, we present DistPepFold, which improves protein-peptide complex docking using an AFM-based architecture through a privileged knowledge distillation approach. DistPepFold leverages a teacher model that uses native interaction information during training and transfers its knowledge to a student model through a teacher-student distillation process. We evaluated DistPepFold's docking performance on two protein-peptide complex datasets and showed that DistPepFold outperforms AFM. Furthermore, we demonstrate that the student model was able to learn from the teacher model to make structural improvements based on AFM predictions.
Collapse
Affiliation(s)
- Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, 47907, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, 47907, USA
| |
Collapse
|
30
|
Chandra A, Sharma A, Dehzangi I, Tsunoda T, Sattar A. PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features. Sci Rep 2023; 13:20882. [PMID: 38016996 PMCID: PMC10684570 DOI: 10.1038/s41598-023-47624-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023] Open
Abstract
Protein-peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein-peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at https://github.com/abelavit/PepCNN.git .
Collapse
Affiliation(s)
- Abel Chandra
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA
- Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia
| |
Collapse
|
31
|
Liu Y, Tian B. Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning. Brief Bioinform 2023; 25:bbad488. [PMID: 38171929 PMCID: PMC10782905 DOI: 10.1093/bib/bbad488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/28/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Protein-DNA interaction is critical for life activities such as replication, transcription and splicing. Identifying protein-DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called Contrastive Learning And Pre-trained Encoder (CLAPE), which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein-DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the area under ROC curve values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein-ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape.
Collapse
Affiliation(s)
- Yufan Liu
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
32
|
Song Y, Yuan Q, Zhao H, Yang Y. Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures. Brief Bioinform 2023; 24:bbad360. [PMID: 37824738 DOI: 10.1093/bib/bbad360] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/18/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
Collapse
Affiliation(s)
- Yidong Song
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Qianmu Yuan
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
33
|
Ghoreyshi ZS, George JT. Quantitative approaches for decoding the specificity of the human T cell repertoire. Front Immunol 2023; 14:1228873. [PMID: 37781387 PMCID: PMC10539903 DOI: 10.3389/fimmu.2023.1228873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/17/2023] [Indexed: 10/03/2023] Open
Abstract
T cell receptor (TCR)-peptide-major histocompatibility complex (pMHC) interactions play a vital role in initiating immune responses against pathogens, and the specificity of TCRpMHC interactions is crucial for developing optimized therapeutic strategies. The advent of high-throughput immunological and structural evaluation of TCR and pMHC has provided an abundance of data for computational approaches that aim to predict favorable TCR-pMHC interactions. Current models are constructed using information on protein sequence, structures, or a combination of both, and utilize a variety of statistical learning-based approaches for identifying the rules governing specificity. This review examines the current theoretical, computational, and deep learning approaches for identifying TCR-pMHC recognition pairs, placing emphasis on each method's mathematical approach, predictive performance, and limitations.
Collapse
Affiliation(s)
- Zahra S. Ghoreyshi
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
| | - Jason T. George
- Department of Biomedical Engineering, Texas A&M University, College Station, TX, United States
- Engineering Medicine Program, Texas A&M University, Houston, TX, United States
- Center for Theoretical Biological Physics, Rice University, Houston, TX, United States
| |
Collapse
|
34
|
McFee M, Kim PM. GDockScore: a graph-based protein-protein docking scoring function. BIOINFORMATICS ADVANCES 2023; 3:vbad072. [PMID: 37359726 PMCID: PMC10290236 DOI: 10.1093/bioadv/vbad072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/30/2023] [Accepted: 06/10/2023] [Indexed: 06/28/2023]
Abstract
Summary Protein complexes play vital roles in a variety of biological processes, such as mediating biochemical reactions, the immune response and cell signalling, with 3D structure specifying function. Computational docking methods provide a means to determine the interface between two complexed polypeptide chains without using time-consuming experimental techniques. The docking process requires the optimal solution to be selected with a scoring function. Here, we propose a novel graph-based deep learning model that utilizes mathematical graph representations of proteins to learn a scoring function (GDockScore). GDockScore was pre-trained on docking outputs generated with the Protein Data Bank biounits and the RosettaDock protocol, and then fine-tuned on HADDOCK decoys generated on the ZDOCK Protein Docking Benchmark. GDockScore performs similarly to the Rosetta scoring function on docking decoys generated using the RosettaDock protocol. Furthermore, state-of-the-art is achieved on the CAPRI score set, a challenging dataset for developing docking scoring functions. Availability and implementation The model implementation is available at https://gitlab.com/mcfeemat/gdockscore. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Matthew McFee
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | | |
Collapse
|
35
|
Peng X, Lei Y, Feng P, Jia L, Ma J, Zhao D, Zeng J. Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00634-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
36
|
Wang X, Ding Z, Wang R, Lin X. Deepro-Glu: combination of convolutional neural network and Bi-LSTM models using ProtBert and handcrafted features to identify lysine glutarylation sites. Brief Bioinform 2023; 24:6991122. [PMID: 36653898 DOI: 10.1093/bib/bbac631] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification of proteins with important roles in mitochondrial functions, oxidative damage, etc. The established biological experimental methods to identify glutarylation sites are often time-consuming and costly. Therefore, there is an urgent need to develop computational methods for efficient and accurate identification of glutarylation sites. Most of the existing computational methods only utilize handcrafted features to construct the prediction model and do not consider the positive impact of the pre-trained protein language model on the prediction performance. Based on this, we develop an ensemble deep-learning predictor Deepro-Glu that combines convolutional neural network and bidirectional long short-term memory network using the deep learning features and traditional handcrafted features to predict lysine glutaryation sites. The deep learning features are generated from the pre-trained protein language model called ProtBert, and the handcrafted features consist of sequence-based features, physicochemical property-based features and evolution information-based features. Furthermore, the attention mechanism is used to efficiently integrate the deep learning features and the handcrafted features by learning the appropriate attention weights. 10-fold cross-validation and independent tests demonstrate that Deepro-Glu achieves competitive or superior performance than the state-of-the-art methods. The source codes and data are publicly available at https://github.com/xwanggroup/Deepro-Glu.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Xi Lin
- Instiute of Artificial Intelligence, Xiamen University, No.4221, Xiang'an South Road, 361000, Xiamen, China
| |
Collapse
|
37
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
38
|
Cable J, Saphire EO, Hayday AC, Wiltshire TD, Mousa JJ, Humphreys DP, Breij ECW, Bruhns P, Broketa M, Furuya G, Hauser BM, Mahévas M, Carfi A, Cantaert T, Kwong PD, Tripathi P, Davis JH, Brewis N, Keyt BA, Fennemann FL, Dussupt V, Sivasubramanian A, Kim PM, Rawi R, Richardson E, Leventhal D, Wolters RM, Geuijen CAW, Sleeman MA, Pengo N, Donnellan FR. Antibodies as drugs-a Keystone Symposia report. Ann N Y Acad Sci 2023; 1519:153-166. [PMID: 36382536 PMCID: PMC10103175 DOI: 10.1111/nyas.14915] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Therapeutic antibodies have broad indications across diverse disease states, such as oncology, autoimmune diseases, and infectious diseases. New research continues to identify antibodies with therapeutic potential as well as methods to improve upon endogenous antibodies and to design antibodies de novo. On April 27-30, 2022, experts in antibody research across academia and industry met for the Keystone symposium "Antibodies as Drugs" to present the state-of-the-art in antibody therapeutics, repertoires and deep learning, bispecific antibodies, and engineering.
Collapse
Affiliation(s)
| | - Erica Ollmann Saphire
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, California, USA.,Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Adrian C Hayday
- Peter Gorer Department of Immunobiology, King's College London, London, UK.,Cancer Research UK Cancer Immunotherapy Accelerator, London, UK.,Immunosurveillance Laboratory, The Francis Crick Institute, London, UK
| | | | - Jarrod J Mousa
- Department of Infectious Diseases and Center for Vaccines and Immunology, College of Veterinary Medicine, Athens, Georgia, USA.,Department of Biochemistry and Molecular Biology, Franklin College of Arts and Sciences, University of Georgia, Athens, Georgia, USA.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | - Esther C W Breij
- Translational Research and Precision Medicine, Genmab BV, Utrecht, the Netherlands
| | - Pierre Bruhns
- Institut Pasteur, Université de Paris, Unit of Antibodies in Therapy and Pathology, Paris, France
| | - Matteo Broketa
- Institut Pasteur, Université de Paris, Unit of Antibodies in Therapy and Pathology, Paris, France
| | - Genta Furuya
- Department of Preventive Medicine and Department of Pathology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| | - Blake M Hauser
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts, USA
| | - Matthieu Mahévas
- Service de Médecine Interne, Centre de Référence des Cytopénies Auto-immunes de l'adulte, Centre Hospitalier Universitaire Henri-Mondor, Assistance Publique-Hôpitaux de Paris, Université Paris-Est Créteil, Créteil, France
| | - Andrea Carfi
- Moderna Inc., Cambridge, Massachusetts, USA.,Department of Pathology, Miller School of Medicine, University of Miami, Miami, Florida, USA
| | - Tineke Cantaert
- Immunology Unit, Institut Pasteur du Cambodge, The Pasteur Network, Phnom Penh, Cambodia
| | - Peter D Kwong
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Prabhanshu Tripathi
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | - Bruce A Keyt
- IGM Biosciences, Inc., Mountainview, California, USA
| | | | - Vincent Dussupt
- Emerging Infectious Diseases Branch, U.S. Military HIV Research Program, Walter Reed Army Institute of Research, Silver Spring, Maryland, USA.,Henry M. Jackson Foundation for the Advancement of Military Medicine, Bethesda, Maryland, USA
| | | | - Philip M Kim
- Department of Molecular Genetics, Donnelly Centre for Cellular and Biomolecular Research, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Reda Rawi
- Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Eve Richardson
- Department of Statistics, University of Oxford, Oxford, UK
| | | | - Rachael M Wolters
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | | | | | | |
Collapse
|
39
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
40
|
AmiA and AliA peptide ligands are secreted by Klebsiella pneumoniae and inhibit growth of Streptococcus pneumoniae. Sci Rep 2022; 12:22268. [PMID: 36564446 PMCID: PMC9789142 DOI: 10.1038/s41598-022-26838-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022] Open
Abstract
Streptococcus pneumoniae colonizes the human nasopharynx, a multi-species microbial niche. Pneumococcal Ami-AliA/AliB oligopeptide permease is an ABC transporter involved in environmental sensing with peptides AKTIKITQTR, FNEMQPIVDRQ, and AIQSEKARKHN identified as ligands of its substrate binding proteins AmiA, AliA, and AliB, respectively. These sequences match ribosomal proteins of multiple bacterial species, including Klebsiella pneumoniae. By mass spectrometry, we identified such peptides in the Klebsiella pneumoniae secretome. AmiA and AliA peptide ligands suppressed pneumococcal growth, but the effect was dependent on peptide length. Growth was suppressed for diverse pneumococci, including antibiotic-resistant strains, but not other bacterial species tested, with the exception of Streptococcus pseudopneumoniae, whose growth was suppressed by the AmiA peptide ligand. By multiple sequence alignments and protein and peptide binding site predictions, for AmiA we have identified the location of an amino acid in the putative binding site whose mutation appears to result in loss of response to the peptide. Our results indicate that pneumococci sense the presence of Klebsiella pneumoniae peptides in the environment.
Collapse
|