1
|
Xiao J, Hu G, Zhou X, Zheng Y, Li J. TIDGN: A Transfer Learning Framework for Predicting Interactions of Intrinsically Disordered Proteins with High Conformational Dynamics. J Chem Inf Model 2025; 65:4866-4877. [PMID: 40360271 DOI: 10.1021/acs.jcim.5c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2025]
Abstract
Interactions between intrinsically disordered proteins (IDPs) are crucial for biological processes, such as intracellular liquid-liquid phase separation (LLPS). Experiments (e.g., NMR) and simulations used to study IDP interactions encounter a variety of difficulties, highlighting the necessity to develop relevant machine learning methods. However, reliable machine learning methods face the challenge resulting from the scarcity of available training data. In this work, we propose a transfer learning-based invariant geometric dynamic graph model, named TIDGN, for predicting IDP interactions. The model consists of a pretraining task module and a downstream task module. The pretraining task module learns the dynamic structural encoding of IDP monomers, which is then used by the downstream task module for interaction site prediction. The IDP monomer structure data set and the IDP interaction event data set are constructed using all-atom molecular dynamics (MD) simulations. The transfer learning strategy effectively enhances the model's performance. Both homotypic interactions and heterotypic interactions between two IDPs are considered in this work. Interestingly, TIDGN performs well for the heterotypic interaction prediction. Additionally, the feature ablation analysis emphasizes the importance of invariant geometric graph features. Taken together, our work demonstrates that the integration of transfer learning and the invariant geometric graph network offers a promising approach for addressing data scarcity challenges of IDP interaction prediction.
Collapse
Affiliation(s)
- Jing Xiao
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Guorong Hu
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Xiaozhou Zhou
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Yuchuan Zheng
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| | - Jingyuan Li
- School of Physics, Zhejiang University, Hangzhou 310058, P. R. China
| |
Collapse
|
2
|
Lai L, Geng J, Duan H, Chen S, Huang L, Yu J. A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites. J Comput Biol 2025; 32:520-536. [PMID: 40000026 DOI: 10.1089/cmb.2024.0804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2025] Open
Abstract
Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.
Collapse
Affiliation(s)
- Lingwei Lai
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Jing Geng
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Haochen Duan
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Siyuan Chen
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Lvwen Huang
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Jiantao Yu
- College of Information Engineering, Northwest A&F University, Yangling, China
| |
Collapse
|
3
|
Zhai Z, Xu S, Ma W, Niu N, Qu C, Zong C. LGS-PPIS: A Local-Global Structural Information Aggregation Framework for Predicting Protein-Protein Interaction Sites. Proteins 2025; 93:716-727. [PMID: 39520116 DOI: 10.1002/prot.26763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/20/2024] [Accepted: 10/22/2024] [Indexed: 11/16/2024]
Abstract
Exploring protein-protein interaction sites (PPIS) is of significance to elucidating the intrinsic mechanisms of diverse biological processes. On this basis, recent studies have applied deep learning-based technologies to overcome the high cost of wet experiments for PPIS determination. However, the existing methods still suffer from two limitations that remain to be solved. Firstly, the process of feature aggregation in most methods only took into account node features, but ignored the complex edge features of the target residue to its neighbor residues, resulting in insufficient local feature extraction. Secondly, such feature aggregation was limited to aggregating spatially adjacent residues, and could not capture the "remote" residues that played a critical role in determining PPIS, which can be summed up as the lack of global feature at the residue level. To break the above limitations, a local-global structural information aggregation framework, LGS-PPIS, was proposed in this study, including two modules of edge-aware graph convolutional network (EA-GCN) and self-attention integrated with initial residual and identity mapping (SA-RIM), which achieved the aggregation of local and global information for PPIS prediction. Evaluation results of LGS-PPIS showed that the proposed method outperformed state-of-the-art deep learning methods on three widely used PPIS prediction benchmarks. Besides, the results of ablation experiments demonstrated that the local features from spatially adjacent residues and global features from "remote" residues separately captured by EA-GCN and SA-RIM could benefit the model performance. Among them, the former was shown to have a more significant role in the PPIS prediction.
Collapse
Affiliation(s)
- Zhengli Zhai
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Shiya Xu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Wenjian Ma
- College of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Niuwangjie Niu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Chunyu Qu
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| | - Chao Zong
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
| |
Collapse
|
4
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
5
|
Liang J, Tian J, Zhang H, Li H, Chen L. Proteomics: An In-Depth Review on Recent Technical Advances and Their Applications in Biomedicine. Med Res Rev 2025. [PMID: 39789883 DOI: 10.1002/med.22098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/11/2024] [Accepted: 12/12/2024] [Indexed: 01/12/2025]
Abstract
Proteins hold pivotal importance since many diseases manifest changes in protein activity. Proteomics techniques provide a comprehensive exploration of protein structure, abundance, and function in biological samples, enabling the holistic characterization of overall changes in organisms. Nowadays, the breadth of emerging methodologies in proteomics is unprecedentedly vast, with constant optimization of technologies in sample processing, data collection, data analysis, and its scope of application is steadily transitioning from the bench to the clinic. Here, we offer an insightful review of the technical developments in proteomics and its applications in biomedicine over the past 5 years. We focus on its profound contributions in profiling disease spectra, discovering new biomarkers, identifying promising drug targets, deciphering alterations in protein conformation, and unearthing protein-protein interactions. Moreover, we summarize the cutting-edge technologies and potential breakthroughs in the proteomics pipeline and provide the principal challenges in proteomics. Based on these, we aspire to broaden the applicability of proteomics and inspire researchers to enhance our understanding of complex biological systems by utilizing such techniques.
Collapse
Affiliation(s)
- Jing Liang
- Wuya College of Innovation, Key Laboratory of Structure-Based Drug Design & Discovery, Ministry of Education, Shenyang Pharmaceutical University, Shenyang, China
| | - Jundan Tian
- Wuya College of Innovation, Key Laboratory of Structure-Based Drug Design & Discovery, Ministry of Education, Shenyang Pharmaceutical University, Shenyang, China
| | - Huadong Zhang
- College of Pharmacy, Institute of Structural Pharmacology & TCM Chemical Biology, Fujian Key Laboratory of Chinese Materia Medica, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Hua Li
- Wuya College of Innovation, Key Laboratory of Structure-Based Drug Design & Discovery, Ministry of Education, Shenyang Pharmaceutical University, Shenyang, China
- College of Pharmacy, Institute of Structural Pharmacology & TCM Chemical Biology, Fujian Key Laboratory of Chinese Materia Medica, Fujian University of Traditional Chinese Medicine, Fuzhou, China
| | - Lixia Chen
- Wuya College of Innovation, Key Laboratory of Structure-Based Drug Design & Discovery, Ministry of Education, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
6
|
Zhao W, Xu G, Wang L, Cui Z, Zhang T, Yang J. Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1685-1696. [PMID: 38896523 DOI: 10.1109/tcbb.2024.3416341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Graph neural networks have drawn increasing attention and achieved remarkable progress recently due to their potential applications for a large amount of irregular data. It is a natural way to represent protein as a graph. In this work, we focus on protein-protein binding sites prediction between the ligand and receptor proteins. Previous work just simply adopts graph convolution to learn residue representations of ligand and receptor proteins, then concatenates them and feeds the concatenated representation into a fully connected layer to make predictions, losing much of the information contained in complexes and failing to obtain an optimal prediction. In this paper, we present Intra-Inter Graph Representation Learning for protein-protein binding sites prediction (IIGRL). Specifically, for intra-graph learning, we maximize the mutual information between local node representation and global graph summary to encourage node representation to embody the global information of protein graph. Then we explore fusing two separate ligand and receptor graphs as a whole graph and learning affinities between their residues/nodes to propagate information to each other, which could effectively capture inter-protein information and further enhance the discrimination of residue pairs. Extensive experiments on multiple benchmarks demonstrate that the proposed IIGRL model outperforms state-of-the-art methods.
Collapse
|
7
|
Zhong J, Zhao H, Zhao Q, Zhou R, Zhang L, Guo F, Wang J. RGCNPPIS: A Residual Graph Convolutional Network for Protein-Protein Interaction Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1676-1684. [PMID: 38843057 DOI: 10.1109/tcbb.2024.3410350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Accurate identification of protein-protein interaction (PPI) sites is crucial for understanding the mechanisms of biological processes, developing PPI networks, and detecting protein functions. Currently, most computational methods primarily concentrate on sequence context features and rarely consider the spatial neighborhood features. To address this limitation, we propose a novel residual graph convolutional network for structure-based PPI site prediction (RGCNPPIS). Specifically, we use a GCN module to extract the global structural features from all spatial neighborhoods, and utilize the GraphSage module to extract local structural features from local spatial neighborhoods. To the best of our knowledge, this is the first work utilizing local structural features for PPI site prediction. We also propose an enhanced residual graph connection to combine the initial node representation, local structural features, and the previous GCN layer's node representation, which enables information transfer between layers and alleviates the over-smoothing problem. Evaluation results demonstrate that RGCNPPIS outperforms state-of-the-art methods on three independent test sets. In addition, the results of ablation experiments and case studies confirm that RGCNPPIS is an effective tool for PPI site prediction.
Collapse
|
8
|
Alam R, Mahbub S, Bayzid MS. Pair-EGRET: enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models. Bioinformatics 2024; 40:btae588. [PMID: 39360982 PMCID: PMC11495673 DOI: 10.1093/bioinformatics/btae588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 09/03/2024] [Accepted: 10/01/2024] [Indexed: 10/05/2024] Open
Abstract
MOTIVATION Proteins are responsible for most biological functions, many of which require the interaction of more than one protein molecule. However, accurately predicting protein-protein interaction (PPI) sites (the interfacial residues of a protein that interact with other protein molecules) remains a challenge. The growing demand and cost associated with the reliable identification of PPI sites using conventional experimental methods call for computational tools for automated prediction and understanding of PPIs. RESULTS We present Pair-EGRET, an edge-aggregated graph attention network that leverages the features extracted from pretrained transformer-like models to accurately predict PPI sites. Pair-EGRET works on a k-nearest neighbor graph, representing the 3D structure of a protein, and utilizes the cross-attention mechanism for accurate identification of interfacial residues of a pair of proteins. Through an extensive evaluation study using a diverse array of experimental data, evaluation metrics, and case studies on representative protein sequences, we demonstrate that Pair-EGRET can achieve remarkable performance in predicting PPI sites. Moreover, Pair-EGRET can provide interpretable insights from the learned cross-attention matrix. AVAILABILITY AND IMPLEMENTATION Pair-EGRET is freely available in open source form at the GitHub Repository https://github.com/1705004/Pair-EGRET.
Collapse
Affiliation(s)
- Ramisa Alam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
9
|
Nandigrami P, Fiser A. Assessing the functional impact of protein binding site definition. Protein Sci 2024; 33:e5026. [PMID: 38757384 PMCID: PMC11099757 DOI: 10.1002/pro.5026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 05/01/2024] [Accepted: 05/03/2024] [Indexed: 05/18/2024]
Abstract
Many biomedical applications, such as classification of binding specificities or bioengineering, depend on the accurate definition of protein binding interfaces. Depending on the choice of method used, substantially different sets of residues can be classified as belonging to the interface of a protein. A typical approach used to verify these definitions is to mutate residues and measure the impact of these changes on binding. Besides the lack of exhaustive data, this approach also suffers from the fundamental problem that a mutation introduces an unknown amount of alteration into an interface, which potentially alters the binding characteristics of the interface. In this study we explore the impact of alternative binding site definitions on the ability of a protein to recognize its cognate ligand using a pharmacophore approach, which does not affect the interface. The study also shows that methods for protein binding interface predictions should perform above approximately F-score = 0.7 accuracy level to capture the biological function of a protein.
Collapse
Affiliation(s)
- Prithviraj Nandigrami
- Departments of Systems and Computational Biology, and BiochemistryAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Andras Fiser
- Departments of Systems and Computational Biology, and BiochemistryAlbert Einstein College of MedicineBronxNew YorkUSA
| |
Collapse
|
10
|
Qi X, Zhao Y, Qi Z, Hou S, Chen J. Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges. Molecules 2024; 29:903. [PMID: 38398653 PMCID: PMC10892089 DOI: 10.3390/molecules29040903] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open
Abstract
Drug discovery plays a critical role in advancing human health by developing new medications and treatments to combat diseases. How to accelerate the pace and reduce the costs of new drug discovery has long been a key concern for the pharmaceutical industry. Fortunately, by leveraging advanced algorithms, computational power and biological big data, artificial intelligence (AI) technology, especially machine learning (ML), holds the promise of making the hunt for new drugs more efficient. Recently, the Transformer-based models that have achieved revolutionary breakthroughs in natural language processing have sparked a new era of their applications in drug discovery. Herein, we introduce the latest applications of ML in drug discovery, highlight the potential of advanced Transformer-based ML models, and discuss the future prospects and challenges in the field.
Collapse
Affiliation(s)
- Xin Qi
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Yuanchun Zhao
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Zhuang Qi
- School of Software, Shandong University, Jinan 250101, China;
| | - Siyu Hou
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Jiajia Chen
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| |
Collapse
|
11
|
Fu X, Yuan Y, Qiu H, Suo H, Song Y, Li A, Zhang Y, Xiao C, Li Y, Dou L, Zhang Z, Cui F. AGF-PPIS: A protein-protein interaction site predictor based on an attention mechanism and graph convolutional networks. Methods 2024; 222:142-151. [PMID: 38242383 DOI: 10.1016/j.ymeth.2024.01.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/04/2024] [Accepted: 01/13/2024] [Indexed: 01/21/2024] Open
Abstract
Protein-protein interactions play an important role in various biological processes. Interaction among proteins has a wide range of applications. Therefore, the correct identification of protein-protein interactions sites is crucial. In this paper, we propose a novel predictor for protein-protein interactions sites, AGF-PPIS, where we utilize a multi-head self-attention mechanism (introducing a graph structure), graph convolutional network, and feed-forward neural network. We use the Euclidean distance between each protein residue to generate the corresponding protein graph as the input of AGF-PPIS. On the independent test dataset Test_60, AGF-PPIS achieves superior performance over comparative methods in terms of seven different evaluation metrics (ACC, precision, recall, F1-score, MCC, AUROC, AUPRC), which fully demonstrates the validity and superiority of the proposed AGF-PPIS model. The source codes and the steps for usage of AGF-PPIS are available at https://github.com/fxh1001/AGF-PPIS.
Collapse
Affiliation(s)
- Xiuhao Fu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Haoye Qiu
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Haodong Suo
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yingying Song
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Anqi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yupeng Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Cuilin Xiao
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Yazi Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, USA
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China.
| |
Collapse
|
12
|
Ding H, Li X, Han P, Tian X, Jing F, Wang S, Song T, Fu H, Kang N. MEG-PPIS: a fast protein-protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network. Bioinformatics 2024; 40:btae269. [PMID: 38640481 PMCID: PMC11252844 DOI: 10.1093/bioinformatics/btae269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/19/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024] Open
Abstract
MOTIVATION Protein-protein interaction sites (PPIS) are crucial for deciphering protein action mechanisms and related medical research, which is the key issue in protein action research. Recent studies have shown that graph neural networks have achieved outstanding performance in predicting PPIS. However, these studies often neglect the modeling of information at different scales in the graph and the symmetry of protein molecules within three-dimensional space. RESULTS In response to this gap, this article proposes the MEG-PPIS approach, a PPIS prediction method based on multi-scale graph information and E(n) equivariant graph neural network (EGNN). There are two channels in MEG-PPIS: the original graph and the subgraph obtained by graph pooling. The model can iteratively update the features of the original graph and subgraph through the weight-sharing EGNN. Subsequently, the max-pooling operation aggregates the updated features of the original graph and subgraph. Ultimately, the model feeds node features into the prediction layer to obtain prediction results. Comparative assessments against other methods on benchmark datasets reveal that MEG-PPIS achieves optimal performance across all evaluation metrics and gets the fastest runtime. Furthermore, specific case studies demonstrate that our method can predict more true positive and true negative sites than the current best method, proving that our model achieves better performance in the PPIS prediction task. AVAILABILITY AND IMPLEMENTATION The data and code are available at https://github.com/dhz234/MEG-PPIS.git.
Collapse
Affiliation(s)
- Hongzhen Ding
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Xue Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Peifu Han
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Xu Tian
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Fengrui Jing
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Shuang Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Tao Song
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Hanjiao Fu
- School of Humanities and Law, China University of Petroleum (East China), Qingdao, Shandong 266580, China
| | - Na Kang
- The Ninth Department of Health Care Administration, the Second Medical Center, Chinese PLA General Hospital, Beijing, 100853, China
| |
Collapse
|
13
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
14
|
Vottero P, Olivetti EC, D'Agostino LC, Di Grazia L, Vezzetti E, Aminpour M, Tuszynski JA, Marcolin F. Understanding the contagiousness of Covid-19 strains: A geometric approach. J Mol Graph Model 2024; 126:108670. [PMID: 37984193 DOI: 10.1016/j.jmgm.2023.108670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/22/2023]
Abstract
Protein-protein interaction occurs on surface patches with some degree of complementary geometric and chemical features. Building on this understanding, this study endeavors to characterize the spike protein of the SARS-CoV-2 virus at the morphological and geometrical levels in its Alpha, Delta, and Omicron variants. In particular, the affinity between different SARS-CoV-2 spike proteins and the ACE2 receptor present on the membrane of the human respiratory system cells is investigated. To achieve an adequate degree of geometrical accuracy, the 3D depth maps of the proteins in exam are filtered by developing an ad-hoc convolutional filter with a kernel implemented as a sphere of varying radius, simulating a ball rolling on the surface (similar to the 'rolling ball' filter). This ball ideally models a hypothetical molecule that could interface with the protein and is inspired by the geometric approach to macromolecule-ligand interactions proposed by Kuntz et al. in 1982. The aim is to mitigate the imperfections and to obtain a smoother surface that could be studied from a geometrical perspective for binding purposes. A set of geometric descriptors, borrowed from the 3D face analysis context is then mapped point-by-point onto protein depth maps. Following a feature extraction phase inspired by Histogram of Oriented Gradients and Local Binary Patterns, the final histogram features are used as input for a Support Vector Machine classifier to automatically classify the proteins according to their surface affinity, where a similarity in shape is observed between ACE2 and the spike protein of the SARS-CoV-2 Omicron variant. Finally, Root Mean Square Error analysis is used to quantify the geometrical affinity between the ACE2 receptor and the respective Receptor Binding Domains of the three SARS-CoV-2 variants, culminating in a geometrical explanation for the higher contagiousness of Omicron relative to the other variants under study.
Collapse
Affiliation(s)
- Paola Vottero
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Elena Carlotta Olivetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Lucia Chiara D'Agostino
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Luca Di Grazia
- Department of Computer Science, University of Stuttgart, Universitätsstr. 38, 70569, Stuttgart, Germany
| | - Enrico Vezzetti
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Maral Aminpour
- Department of Biomedical Engineering, University of Alberta, Edmonton, AB, T6G 2V2, Canada
| | - Jacek Adam Tuszynski
- Department of Physics, University of Alberta, Edmonton, AB, T6G 2H7, Canada; Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy; Department of Data Science and Engineering, The Silesian University of Technology, Gliwice, Poland.
| | - Federica Marcolin
- Department of Management and Production Engineering, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| |
Collapse
|
15
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
16
|
Nikam R, Yugandhar K, Gromiha MM. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 2023; 55:1305-1316. [PMID: 36574037 DOI: 10.1007/s00726-022-03228-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/15/2022] [Indexed: 12/28/2022]
Abstract
MOTIVATION Proteins-protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein-protein complexes. RESULTS We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods. AVAILABILITY AND IMPLEMENTATION The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html , along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html .
Collapse
Affiliation(s)
- Rahul Nikam
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Kumar Yugandhar
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
- Department of Computational Biology, Cornell University, New York, NY, USA
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India.
- Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
17
|
Mou M, Pan Z, Zhou Z, Zheng L, Zhang H, Shi S, Li F, Sun X, Zhu F. A Transformer-Based Ensemble Framework for the Prediction of Protein-Protein Interaction Sites. RESEARCH (WASHINGTON, D.C.) 2023; 6:0240. [PMID: 37771850 PMCID: PMC10528219 DOI: 10.34133/research.0240] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/08/2023] [Indexed: 09/30/2023]
Abstract
The identification of protein-protein interaction (PPI) sites is essential in the research of protein function and the discovery of new drugs. So far, a variety of computational tools based on machine learning have been developed to accelerate the identification of PPI sites. However, existing methods suffer from the low predictive accuracy or the limited scope of application. Specifically, some methods learned only global or local sequential features, leading to low predictive accuracy, while others achieved improved performance by extracting residue interactions from structures but were limited in their application scope for the serious dependence on precise structure information. There is an urgent need to develop a method that integrates comprehensive information to realize proteome-wide accurate profiling of PPI sites. Herein, a novel ensemble framework for PPI sites prediction, EnsemPPIS, was therefore proposed based on transformer and gated convolutional networks. EnsemPPIS can effectively capture not only global and local patterns but also residue interactions. Specifically, EnsemPPIS was unique in (a) extracting residue interactions from protein sequences with transformer and (b) further integrating global and local sequential features with the ensemble learning strategy. Compared with various existing methods, EnsemPPIS exhibited either superior performance or broader applicability on multiple PPI sites prediction tasks. Moreover, pattern analysis based on the interpretability of EnsemPPIS demonstrated that EnsemPPIS was fully capable of learning residue interactions within the local structure of PPI sites using only sequence information. The web server of EnsemPPIS is freely available at http://idrblab.org/ensemppis.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Zhimeng Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Hanyu Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital,
Zhejiang UniversitySchool of Medicine, National Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
18
|
Wu H, Han J, Zhang S, Xin G, Mou C, Liu J. Spatom: a graph neural network for structure-based protein-protein interaction site prediction. Brief Bioinform 2023; 24:bbad345. [PMID: 37779247 DOI: 10.1093/bib/bbad345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/22/2023] [Accepted: 09/13/2023] [Indexed: 10/03/2023] Open
Abstract
Accurate identification of protein-protein interaction (PPI) sites remains a computational challenge. We propose Spatom, a novel framework for PPI site prediction. This framework first defines a weighted digraph for a protein structure to precisely characterize the spatial contacts of residues, then performs a weighted digraph convolution to aggregate both spatial local and global information and finally adds an improved graph attention layer to drive the predicted sites to form more continuous region(s). Spatom was tested on a diverse set of challenging protein-protein complexes and demonstrated the best performance among all the compared methods. Furthermore, when tested on multiple popular proteins in a case study, Spatom clearly identifies the interaction interfaces and captures the majority of hotspots. Spatom is expected to contribute to the understanding of protein interactions and drug designs targeting protein binding.
Collapse
Affiliation(s)
- Haonan Wu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Jiyun Han
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Shizhuo Zhang
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Gaojia Xin
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Chaozhou Mou
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, Weihai 264209, China
| |
Collapse
|
19
|
Roche R, Moussad B, Shuvo MH, Bhattacharya D. E(3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction. PLoS Comput Biol 2023; 19:e1011435. [PMID: 37651442 PMCID: PMC10499216 DOI: 10.1371/journal.pcbi.1011435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 09/13/2023] [Accepted: 08/15/2023] [Indexed: 09/02/2023] Open
Abstract
Artificial intelligence-powered protein structure prediction methods have led to a paradigm-shift in computational structural biology, yet contemporary approaches for predicting the interfacial residues (i.e., sites) of protein-protein interaction (PPI) still rely on experimental structures. Recent studies have demonstrated benefits of employing graph convolution for PPI site prediction, but ignore symmetries naturally occurring in 3-dimensional space and act only on experimental coordinates. Here we present EquiPPIS, an E(3) equivariant graph neural network approach for PPI site prediction. EquiPPIS employs symmetry-aware graph convolutions that transform equivariantly with translation, rotation, and reflection in 3D space, providing richer representations for molecular data compared to invariant convolutions. EquiPPIS substantially outperforms state-of-the-art approaches based on the same experimental input, and exhibits remarkable robustness by attaining better accuracy with predicted structural models from AlphaFold2 than what existing methods can achieve even with experimental structures. Freely available at https://github.com/Bhattacharya-Lab/EquiPPIS, EquiPPIS enables accurate PPI site prediction at scale.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
20
|
Aybey E, Gümüş Ö. SENSDeep: An Ensemble Deep Learning Method for Protein-Protein Interaction Sites Prediction. Interdiscip Sci 2023; 15:55-87. [PMID: 36346583 DOI: 10.1007/s12539-022-00543-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/15/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022]
Abstract
PURPOSE The determination of which amino acid in a protein interacts with other proteins is important in understanding the functional mechanism of that protein. Although there are experimental methods to detect protein-protein interaction sites (PPISs), these are costly, time-consuming, and require expertise. Therefore, many computational methods have been proposed to accelerate this type of research, but they are generally insufficient to predict PPISs accurately. There is a need for development in this field. METHODS In this study, we introduce a new PPISs prediction method. This method is a sequence-based Stacking ENSemble Deep (SENSDeep) learning method that has an ensemble learning model including the models of RNN, CNN, GRU sequence to sequence (GRUs2s), GRU sequence to sequence with an attention layer (GRUs2satt) and a multilayer perceptron. Two embedded features, secondary structure, and protein sequence information are added to the training data set in addition to twelve existing features to improve the prediction performance of the method. RESULTS SENSDeep trained on the training data set without two extra features obtains a better performance on some of the independent testing data sets than that of the other methods in the literature, especially on scoring metrics of sensitivity, F1, MCC, and AUPRC, having increments up to 63.5%, 19.3%, 18.5%, 11.4%, respectively. It is shown that the added extra features improve the performance of the method by having almost the same performance with less data as the method trained on the data set without these added features. On the other hand, different sizes of the sliding window are tried on the data sets and an optimal sliding window size for SENSDeep is found. Moreover, SENSDeep has also been compared to structure-based methods. Some of these methods have been found to perform better. Using SENSDeep obtained by training with both training data sets, PPISs prediction examples of various proteins that are not in these training data sets are also presented. Furthermore, execution times for SENSDeep and its submodels are shown. AVAILABILITY AND IMPLEMENTATION https://github.com/enginaybey/SENSDeep.
Collapse
Affiliation(s)
- Engin Aybey
- Department of Health Bioinformatics, Ege University, 35100, Bornova, Izmir, Turkey.
- Rectorate, Marmara University, 34722, Kadıköy, Istanbul, Turkey.
| | - Özgür Gümüş
- Department of Computer Engineering, Ege University, 35100, Bornova, Izmir, Turkey
| |
Collapse
|
21
|
Nandigrami P, Fiser A. Assessing the functional impact of protein binding site definition. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525812. [PMID: 36747792 PMCID: PMC9900911 DOI: 10.1101/2023.01.26.525812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Many biomedical applications, such as classification of binding specificities or bioengineering, depend on the accurate definition of protein binding interfaces. Depending on the choice of method used, substantially different sets of residues can be classified as belonging to the interface of a protein. A typical approach used to verify these definitions is to mutate residues and measure the impact of these changes on binding. Besides the lack of exhaustive data this approach generates, it also suffers from the fundamental problem that a mutation introduces an unknown amount of alteration into an interface, which potentially alters the binding characteristics of the interface. In this study we explore the impact of alternative binding site definitions on the ability of a protein to recognize its cognate ligand using a pharmacophore approach, which does not affect the interface. The study also provides guidance on the minimum expected accuracy of interface definition that is required to capture the biological function of a protein.
Collapse
Affiliation(s)
- Prithviraj Nandigrami
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine 1300 Morris Park Ave, Bronx, NY 10461, USA
| | - Andras Fiser
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine 1300 Morris Park Ave, Bronx, NY 10461, USA
| |
Collapse
|
22
|
Li K, Quan L, Jiang Y, Li Y, Zhou Y, Wu T, Lyu Q. ctP 2ISP: Protein-Protein Interaction Sites Prediction Using Convolution and Transformer With Data Augmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:297-306. [PMID: 35213314 DOI: 10.1109/tcbb.2022.3154413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions are the basis of many cellular biological processes, such as cellular organization, signal transduction, and immune response. Identifying protein-protein interaction sites is essential for understanding the mechanisms of various biological processes, disease development, and drug design. However, it remains a challenging task to make accurate predictions, as the small amount of training data and severe imbalanced classification reduce the performance of computational methods. We design a deep learning method named ctP2ISP to improve the prediction of protein-protein interaction sites. ctP2ISP employs Convolution and Transformer to extract information and enhance information perception so that semantic features can be mined to identify protein-protein interaction sites. A weighting loss function with different sample weights is designed to suppress the preference of the model toward multi-category prediction. To efficiently reuse the information in the training set, a preprocessing of data augmentation with an improved sample-oriented sampling strategy is applied. The trained ctP2ISP was evaluated against current state-of-the-art methods on six public datasets. The results show that ctP2ISP outperforms all other competing methods on the balance metrics: F1, MCC, and AUPRC. In particular, our prediction on open tests related to viruses may also be consistent with biological insights. The source code and data can be obtained from https://github.com/lennylv/ctP2ISP.
Collapse
|
23
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
24
|
Baranwal M, Magner A, Saldinger J, Turali-Emre ES, Elvati P, Kozarekar S, VanEpps JS, Kotov NA, Violi A, Hero AO. Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions. BMC Bioinformatics 2022; 23:370. [PMID: 36088285 PMCID: PMC9464414 DOI: 10.1186/s12859-022-04910-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/26/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. RESULTS In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. CONCLUSIONS In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.
Collapse
Affiliation(s)
- Mayank Baranwal
- Division of Data and Decision Sciences, Tata Consultancy Services Research, Mumbai, India
- Systems and Control Engineering Group, Indian Institute of Technology, Bombay, India
| | - Abram Magner
- Department of Computer Science, University of Albany, SUNY, Albany, USA
| | - Jacob Saldinger
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | | | - Paolo Elvati
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
| | - Shivani Kozarekar
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
| | - J. Scott VanEpps
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Emergency Medicine, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
| | - Nicholas A. Kotov
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, USA
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, USA
| | - Angela Violi
- Department of Chemical Engineering, University of Michigan, Ann Arbor, USA
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, USA
- Biophysics Program, University of Michigan, Ann Arbor, USA
| | - Alfred O. Hero
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, USA
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, USA
- Department of Statistics, University of Michigan, Ann Arbor, USA
- Program in Applied Interdisciplinary Mathematics, University of Michigan, Ann Arbor, USA
- Program in Bioinformatics, University of Michigan, Ann Arbor, USA
| |
Collapse
|
25
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
26
|
ProB-Site: Protein Binding Site Prediction Using Local Features. Cells 2022; 11:cells11132117. [PMID: 35805201 PMCID: PMC9266162 DOI: 10.3390/cells11132117] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 01/16/2023] Open
Abstract
Protein–protein interactions (PPIs) are responsible for various essential biological processes. This information can help develop a new drug against diseases. Various experimental methods have been employed for this purpose; however, their application is limited by their cost and time consumption. Alternatively, computational methods are considered viable means to achieve this crucial task. Various techniques have been explored in the literature using the sequential information of amino acids in a protein sequence, including machine learning and deep learning techniques. The current efficiency of interaction-site prediction still has growth potential. Hence, a deep neural network-based model, ProB-site, is proposed. ProB-site utilizes sequential information of a protein to predict its binding sites. The proposed model uses evolutionary information and predicted structural information extracted from sequential information of proteins, generating three unique feature sets for every amino acid in a protein sequence. Then, these feature sets are fed to their respective sub-CNN architecture to acquire complex features. Finally, the acquired features are concatenated and classified using fully connected layers. This methodology performed better than state-of-the-art techniques because of the selection of the best features and contemplation of local information of each amino acid.
Collapse
|
27
|
Pozzati G, Kundrotas P, Elofsson A. Scoring of protein–protein docking models utilizing predicted interface residues. Proteins 2022; 90:1493-1505. [PMID: 35246997 PMCID: PMC9314140 DOI: 10.1002/prot.26330] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 02/23/2022] [Accepted: 02/28/2022] [Indexed: 11/08/2022]
Abstract
Scoring docking solutions is a difficult task, and many methods have been developed for this purpose. In docking, only a handful of the hundreds of thousands of models generated by docking algorithms are acceptable, causing difficulties when developing scoring functions. Today's best scoring functions can significantly increase the number of top‐ranked models but still fail for most targets. Here, we examine the possibility of utilizing predicted interface residues to score docking models generated during the scan stage of a docking algorithm. Many methods have been developed to infer the regions of a protein surface that interact with another protein, but most have not been benchmarked using docking algorithms. This study systematically tests different interface prediction methods for scoring >300.000 low‐resolution rigid‐body template free docking decoys. Overall we find that contact‐based interface prediction by BIPSPI is the best method to score docking solutions, with >12% of first ranked docking models being acceptable. Additional experiments indicated precision as a high‐importance metric when estimating interface prediction quality, focusing on docking constraints production. Finally, we discussed several limitations for adopting interface predictions as constraints in a docking protocol.
Collapse
Affiliation(s)
- Gabriele Pozzati
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
| | - Petras Kundrotas
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
- Center for Bioinformatics and Department of Molecular Biosciences University of Kansas Lawrence Kansas USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
| |
Collapse
|
28
|
Cong X, Zhang X, Liang X, He X, Tang Y, Zheng X, Lu S, Zhang J, Chen T. Delineating the conformational landscape and intrinsic properties of the angiotensin II type 2 receptor using a computational study. Comput Struct Biotechnol J 2022; 20:2268-2279. [PMID: 35615027 PMCID: PMC9117689 DOI: 10.1016/j.csbj.2022.05.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 05/04/2022] [Accepted: 05/06/2022] [Indexed: 12/22/2022] Open
Abstract
As a key regulator for the renin-angiotensin system, a class A G protein-coupled receptor (GPCR), AngII type 2 receptor (AT2R), plays a pivotal role in the homeostasis of the cardiovascular system. Compared with other GPCRs, AT2R has a unique antagonist-bound conformation and its mechanism is still an enigma. Here, we applied combined dynamic and evolutional approaches to investigate the conformational space and intrinsic properties of AT2R. With molecular dynamic simulations, Markov State Models, and statistics coupled analysis, we captured the conformational landscape of AT2R and identified its uniquity from both dynamical and evolutional viewpoints. A cryptic pocket was also discovered in the intermediate state during conformation transitions. These findings offer a deeper understanding of the AT2R mechanism at an atomic level and provide hints for the design of novel AT2R modulators.
Collapse
Affiliation(s)
- Xiaoliang Cong
- Department of Cardiology, Shanghai Changzheng Hospital, the Second Affiliated Hospital of Naval Medical University, Shanghai 200003, China
| | - Xiaogang Zhang
- Department of Cardiology, Shanghai University of Medicine & Health Sciences Affiliated Zhoupu Hospital, Shanghai 201318, China
| | - Xin Liang
- Department of Cardiology, Shanghai Changzheng Hospital, the Second Affiliated Hospital of Naval Medical University, Shanghai 200003, China
| | - Xinheng He
- Medicinal Chemistry and Bioinformatics Centre, Shanghai Jiao Tong University, School of Medicine, Shanghai 200025, China
| | - Yehua Tang
- Department of Cardiology, Shanghai Changzheng Hospital, the Second Affiliated Hospital of Naval Medical University, Shanghai 200003, China
| | - Xing Zheng
- Department of Cardiology, Changhai Hospital, Naval Medical University, Shanghai 200433, China
| | - Shaoyong Lu
- Medicinal Chemistry and Bioinformatics Centre, Shanghai Jiao Tong University, School of Medicine, Shanghai 200025, China
- Corresponding authors.
| | - Jiayou Zhang
- Department of Cardiology, Shanghai Changzheng Hospital, the Second Affiliated Hospital of Naval Medical University, Shanghai 200003, China
- Corresponding authors.
| | - Ting Chen
- Department of Cardiology, Shanghai Changzheng Hospital, the Second Affiliated Hospital of Naval Medical University, Shanghai 200003, China
- Corresponding authors.
| |
Collapse
|
29
|
Cha M, Emre EST, Xiao X, Kim JY, Bogdan P, VanEpps JS, Violi A, Kotov NA. Unifying structural descriptors for biological and bioinspired nanoscale complexes. NATURE COMPUTATIONAL SCIENCE 2022; 2:243-252. [PMID: 38177552 DOI: 10.1038/s43588-022-00229-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 03/17/2022] [Indexed: 01/06/2024]
Abstract
Biomimetic nanoparticles are known to serve as nanoscale adjuvants, enzyme mimics and amyloid fibrillation inhibitors. Their further development requires better understanding of their interactions with proteins. The abundant knowledge about protein-protein interactions can serve as a guide for designing protein-nanoparticle assemblies, but the chemical and biological inputs used in computational packages for protein-protein interactions are not applicable to inorganic nanoparticles. Analysing chemical, geometrical and graph-theoretical descriptors for protein complexes, we found that geometrical and graph-theoretical descriptors are uniformly applicable to biological and inorganic nanostructures and can predict interaction sites in protein pairs with accuracy >80% and classification probability ~90%. We extended the machine-learning algorithms trained on protein-protein interactions to inorganic nanoparticles and found a nearly exact match between experimental and predicted interaction sites with proteins. These findings can be extended to other organic and inorganic nanoparticles to predict their assemblies with biomolecules and other chemical structures forming lock-and-key complexes.
Collapse
Affiliation(s)
- Minjeong Cha
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, USA
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI, USA
| | - Emine Sumeyra Turali Emre
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Xiongye Xiao
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
| | - Ji-Young Kim
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Paul Bogdan
- Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
| | - J Scott VanEpps
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI, USA
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA
- Program in Macromolecular Science and Engineering, University of Michigan, Ann Arbor, MI, USA
- Department of Emergency Medicine, University of Michigan, Ann Arbor, MI, USA
- Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor, MI, USA
| | - Angela Violi
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI, USA
- Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI, USA
- Biophysics Program, University of Michigan, Ann Arbor, MI, USA
| | - Nicholas A Kotov
- Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, USA.
- Biointerfaces Institute, University of Michigan, Ann Arbor, MI, USA.
- Department of Chemical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Department of Biomedical Engineering, University of Michigan, Ann Arbor, MI, USA.
- Program in Macromolecular Science and Engineering, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
30
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
31
|
Mahbub S, Bayzid MS. EGRET: edge aggregated graph attention networks and transfer learning improve protein-protein interaction site prediction. Brief Bioinform 2022; 23:6518045. [PMID: 35106547 DOI: 10.1093/bib/bbab578] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 11/25/2021] [Accepted: 12/16/2021] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are central to most biological processes. However, reliable identification of PPI sites using conventional experimental methods is slow and expensive. Therefore, great efforts are being put into computational methods to identify PPI sites. RESULTS We present Edge Aggregated GRaph Attention NETwork (EGRET), a highly accurate deep learning-based method for PPI site prediction, where we have used an edge aggregated graph attention network to effectively leverage the structural information. We, for the first time, have used transfer learning in PPI site prediction. Our proposed edge aggregated network, together with transfer learning, has achieved notable improvement over the best alternate methods. Furthermore, we systematically investigated EGRET's network behavior to provide insights about the causes of its decisions. AVAILABILITY EGRET is freely available as an open source project at https://github.com/Sazan-Mahbub/EGRET. CONTACT shams_bayzid@cse.buet.ac.bd.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science University of Maryland, College Park, Maryland 20742, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
32
|
Yuan Q, Chen J, Zhao H, Zhou Y, Yang Y. Structure-aware protein-protein interaction site prediction using deep graph convolutional network. Bioinformatics 2021; 38:125-132. [PMID: 34498061 DOI: 10.1093/bioinformatics/btab643] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/03/2021] [Accepted: 09/03/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Protein-protein interactions (PPI) play crucial roles in many biological processes, and identifying PPI sites is an important step for mechanistic understanding of diseases and design of novel drugs. Since experimental approaches for PPI site identification are expensive and time-consuming, many computational methods have been developed as screening tools. However, these methods are mostly based on neighbored features in sequence, and thus limited to capture spatial information. RESULTS We propose a deep graph-based framework deep Graph convolutional network for Protein-Protein-Interacting Site prediction (GraphPPIS) for PPI site prediction, where the PPI site prediction problem was converted into a graph node classification task and solved by deep learning using the initial residual and identity mapping techniques. We showed that a deeper architecture (up to eight layers) allows significant performance improvement over other sequence-based and structure-based methods by more than 12.5% and 10.5% on AUPRC and MCC, respectively. Further analyses indicated that the predicted interacting sites by GraphPPIS are more spatially clustered and closer to the native ones even when false-positive predictions are made. The results highlight the importance of capturing spatially neighboring residues for interacting site prediction. AVAILABILITY AND IMPLEMENTATION The datasets, the pre-computed features, and the source codes along with the pre-trained models of GraphPPIS are available at https://github.com/biomed-AI/GraphPPIS. The GraphPPIS web server is freely available at https://biomed.nscc-gz.cn/apps/GraphPPIS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jianwen Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Yaoqi Zhou
- Peking University Shenzhen Graduate School, Shenzhen 518055, China.,Shenzhen Bay Laboratory, Shenzhen 518055, China.,Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4215, Australia
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing of MOE, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
33
|
Wang G, Zhai YJ, Xue ZZ, Xu YY. Improving Protein Subcellular Location Classification by Incorporating Three-Dimensional Structure Information. Biomolecules 2021; 11:1607. [PMID: 34827605 PMCID: PMC8615982 DOI: 10.3390/biom11111607] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/12/2022] Open
Abstract
The subcellular locations of proteins are closely related to their functions. In the past few decades, the application of machine learning algorithms to predict protein subcellular locations has been an important topic in proteomics. However, most studies in this field used only amino acid sequences as the data source. Only a few works focused on other protein data types. For example, three-dimensional structures, which contain far more functional protein information than sequences, remain to be explored. In this work, we extracted various handcrafted features to describe the protein structures from physical, chemical, and topological aspects, as well as the learned features obtained by deep neural networks. We then used these features to classify the protein subcellular locations. Our experimental results demonstrated that some of these structural features have a certain effect on the protein location classification, and can help improve the performance of sequence-based location predictors. Our method provides a new view for the analysis of protein spatial distribution, and is anticipated to be used in revealing the relationships between protein structures and functions.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Yu-Jia Zhai
- Guangzhou Women and Children’s Medical Center, Department of Pharmacy, Guangzhou Medical University, Guangzhou 510623, China;
| | - Zhen-Zhen Xue
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
- Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Ying-Ying Xu
- School of Biomedical Engineering, Southern Medical University, Guangzhou 510515, China; (G.W.); (Z.-Z.X.)
- Guangdong Provincial Key Laboratory of Medical Imaging Processing, Southern Medical University, Guangzhou 510515, China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
34
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|
35
|
Jandova Z, Vargiu AV, Bonvin AMJJ. Native or Non-Native Protein-Protein Docking Models? Molecular Dynamics to the Rescue. J Chem Theory Comput 2021; 17:5944-5954. [PMID: 34342983 PMCID: PMC8444332 DOI: 10.1021/acs.jctc.1c00336] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Indexed: 11/29/2022]
Abstract
Molecular docking excels at creating a plethora of potential models of protein-protein complexes. To correctly distinguish the favorable, native-like models from the remaining ones remains, however, a challenge. We assessed here if a protocol based on molecular dynamics (MD) simulations would allow distinguishing native from non-native models to complement scoring functions used in docking. To this end, the first models for 25 protein-protein complexes were generated using HADDOCK. Next, MD simulations complemented with machine learning were used to discriminate between native and non-native complexes based on a combination of metrics reporting on the stability of the initial models. Native models showed higher stability in almost all measured properties, including the key ones used for scoring in the Critical Assessment of PRedicted Interaction (CAPRI) competition, namely the positional root mean square deviations and fraction of native contacts from the initial docked model. A random forest classifier was trained, reaching a 0.85 accuracy in correctly distinguishing native from non-native complexes. Reasonably modest simulation lengths of the order of 50-100 ns are sufficient to reach this accuracy, which makes this approach applicable in practice.
Collapse
Affiliation(s)
- Zuzana Jandova
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| | - Attilio Vittorio Vargiu
- Physics
Department, University of Cagliari, Cittadella
Universitaria, S.P. 8 km 0.700, 09042 Monserrato, Italy
| | - Alexandre M. J. J. Bonvin
- Computational
Structural Biology Group, Bijvoet Centre for Biomolecular Research,
Faculty of Science—Chemistry, Utrecht
University, Padualaan 8, 3584 CH Utrecht, the Netherlands
| |
Collapse
|
36
|
Pal A, Pal D, Mitra P. A computational framework for modeling functional protein-protein interactions. Proteins 2021; 89:1353-1364. [PMID: 34076296 DOI: 10.1002/prot.26156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 04/17/2021] [Accepted: 05/19/2021] [Indexed: 11/06/2022]
Abstract
Protein interactions and their assemblies assist in understanding the cellular mechanisms through the knowledge of interactome. Despite recent advances, a vast number of interacting protein complexes is not annotated by three-dimensional structures. Therefore, a computational framework is a suitable alternative to fill the large gap between identified interactions and the interactions with known structures. In this work, we develop an automated computational framework for modeling functionally related protein-complex structures utilizing GO-based semantic similarity technique and co-evolutionary information of the interaction sites. The framework can consider protein sequence and structure information as input and employ both rigid-body docking and template-based modeling exploiting the existing structural templates and sequence homology information from the PDB. Our framework combines geometric as well as physicochemical features for re-ranking the docking decoys. The proposed framework has an 83% success rate when tested on a benchmark dataset while considering Top1 models for template-based modeling and Top10 models for the docking pipeline. We believe that our computational framework can be used for any pair of proteins with higher confidence to identify the functional protein-protein interactions.
Collapse
Affiliation(s)
- Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science Bangalore, Bangalore, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
37
|
Wang B, Mei C, Wang Y, Zhou Y, Cheng MT, Zheng CH, Wang L, Zhang J, Chen P, Xiong Y. Imbalance Data Processing Strategy for Protein Interaction Sites Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:985-994. [PMID: 31751283 DOI: 10.1109/tcbb.2019.2953908] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.
Collapse
|
38
|
Akbar R, Robert PA, Pavlović M, Jeliazkov JR, Snapkov I, Slabodkin A, Weber CR, Scheffer L, Miho E, Haff IH, Haug DTT, Lund-Johansen F, Safonova Y, Sandve GK, Greiff V. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell Rep 2021; 34:108856. [PMID: 33730590 DOI: 10.1016/j.celrep.2021.108856] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 11/29/2020] [Accepted: 02/22/2021] [Indexed: 12/16/2022] Open
Abstract
Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. The predictability of antibody-antigen binding is a prerequisite for de novo antibody and (neo-)epitope design. A fundamental premise for the predictability of antibody-antigen binding is the existence of paratope-epitope interaction motifs that are universally shared among antibody-antigen structures. In a dataset of non-redundant antibody-antigen structures, we identify structural interaction motifs, which together compose a commonly shared structure-based vocabulary of paratope-epitope interactions. We show that this vocabulary enables the machine learnability of antibody-antigen binding on the paratope-epitope level using generative machine learning. The vocabulary (1) is compact, less than 104 motifs; (2) distinct from non-immune protein-protein interactions; and (3) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs. Our work leverages combined structure- and sequence-based learning to demonstrate that machine-learning-driven predictive paratope and epitope engineering is feasible.
Collapse
Affiliation(s)
- Rahmad Akbar
- Department of Immunology, University of Oslo, Oslo, Norway.
| | | | - Milena Pavlović
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway; K.G. Jebsen Centre for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | | | - Igor Snapkov
- Department of Immunology, University of Oslo, Oslo, Norway
| | | | - Cédric R Weber
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
| | - Lonneke Scheffer
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway
| | - Enkelejda Miho
- Institute of Medical Engineering and Medical Informatics, School of Life Sciences, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Muttenz, Switzerland
| | | | | | | | - Yana Safonova
- Computer Science and Engineering Department, University of California, San Diego, La Jolla, CA, USA
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Oslo, Norway; Centre for Bioinformatics, University of Oslo, Norway; K.G. Jebsen Centre for Coeliac Disease Research, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo, Oslo, Norway.
| |
Collapse
|
39
|
Waiho K, Afiqah‐Aleng N, Iryani MTM, Fazhan H. Protein–protein interaction network: an emerging tool for understanding fish disease in aquaculture. REVIEWS IN AQUACULTURE 2021; 13:156-177. [DOI: 10.1111/raq.12468] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 06/11/2020] [Indexed: 01/03/2025]
Abstract
AbstractProtein–protein interactions (PPIs) play integral roles in a wide range of biological processes that regulate the overall growth, development, physiology and disease in living organisms. With the advancement of high‐throughput sequencing technologies, increasing numbers of PPI networks are being predicted and annotated, and these contribute greatly towards the understanding of pathogenesis and the discovery of novel drug targets for the treatment of diseases. The use of this tool is gaining popularity in the identification, understanding and treatment of diseases in humans and plants. Due to the importance of aquaculture in tackling the global food crisis by producing cheap and high‐quality protein source, the maintenance of the overall health status of aquaculture species is essential. With the increasing omics data on aquaculture species, the PPI network is an emerging tool for fish health maintenance. In this review, we first introduce the concept of PPI network, how they are discovered and their general application. Then, the current status of aquaculture and disease in aquaculture are discussed. The different applications of PPI network in aquaculture fish disease management such as biomarker identification, mechanism prediction, understanding of host–pathogen interaction, understanding of pathogen co‐infection interaction, and potential development of vaccines and treatments are subsequently highlighted. It is hoped that this emerging tool – PPI network – would deepen our understanding of the pathogenesis of various diseases and hasten the prevention and treatment processes in aquaculture species.
Collapse
Affiliation(s)
- Khor Waiho
- Institute of Tropical Aquaculture and Fisheries Universiti Malaysia Terengganu Terengganu Malaysia
| | - Nor Afiqah‐Aleng
- Institute of Marine Biotechnology Universiti Malaysia Terengganu Terengganu Malaysia
| | - Mat Taib Mimi Iryani
- Institute of Marine Biotechnology Universiti Malaysia Terengganu Terengganu Malaysia
| | - Hanafiah Fazhan
- Institute of Tropical Aquaculture and Fisheries Universiti Malaysia Terengganu Terengganu Malaysia
- Guangdong Provincial Key Laboratory of Marine Biotechnology Shantou University Guangdong China
| |
Collapse
|
40
|
Jamasb AR, Day B, Cangea C, Liò P, Blundell TL. Deep Learning for Protein-Protein Interaction Site Prediction. Methods Mol Biol 2021; 2361:263-288. [PMID: 34236667 DOI: 10.1007/978-1-0716-1641-3_16] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Protein-protein interactions (PPIs) are central to cellular functions. Experimental methods for predicting PPIs are well developed but are time and resource expensive and suffer from high false-positive error rates at scale. Computational prediction of PPIs is highly desirable for a mechanistic understanding of cellular processes and offers the potential to identify highly selective drug targets. In this chapter, details of developing a deep learning approach to predicting which residues in a protein are involved in forming a PPI-a task known as PPI site prediction-are outlined. The key decisions to be made in defining a supervised machine learning project in this domain are here highlighted. Alternative training regimes for deep learning models to address shortcomings in existing approaches and provide starting points for further research are discussed. This chapter is written to serve as a companion to developing deep learning approaches to protein-protein interaction site prediction, and an introduction to developing geometric deep learning projects operating on protein structure graphs.
Collapse
Affiliation(s)
- Arian R Jamasb
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Ben Day
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Cătălina Cangea
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
41
|
Zeng M, Zhang F, Wu FX, Li Y, Wang J, Li M. Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 2020; 36:1114-1120. [PMID: 31593229 DOI: 10.1093/bioinformatics/btz699] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 07/25/2019] [Accepted: 09/04/2019] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. RESULTS A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. AVAILABILITY AND IMPLEMENTATION The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon SKS7N5A9, Canada
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, People's Republic of China
| |
Collapse
|
42
|
Zhu H, Du X, Yao Y. ConvsPPIS: Identifying Protein-protein Interaction Sites by an Ensemble Convolutional Neural Network with Feature Graph. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191105155713] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background/Objective:
Protein-protein interactions are essentials for most cellular
processes and thus, unveiling how proteins interact with is a crucial question that can be better
understood by recognizing which residues participate in the interaction. Although many
computational approaches have been proposed to predict interface residues, their feature
perspective and model learning ability are not enough to achieve ideal results. So, our objective is
to improve the predictive performance under considering feature perspective and new learning
algorithm.
Method:
In this study, we proposed an ensemble deep convolutional neural network, which
explores the context and positional context of consecutive residues within a protein sub-sequence.
Specifically, unlike the feature view of previous methods, ConvsPPIS uses evolutionary,
physicochemical, and structural protein characteristics to construct their own feature graph
respectively. After that, three independent deep convolutional neural networks are trained on each
type of feature graph for learning the underlying pattern in sub-sequence. Lastly, we integrated
those three deep networks into an ensemble predictor with leveraging complementary information
of those features to predict potential interface residues.
Results:
Some comparative experiments have conducted through 10-fold cross-validation. The
results indicated that ConvsPPIS achieved superior performance on DBv5-Sel dataset with an
accuracy of 88%. Additional experiments on CAPRI-Alone dataset demonstrated ConvsPPIS has
also better prediction performance.
Conclusion:
The ConvsPPIS method provided a new perspective to capture protein feature
expression for identifying protein-protein interaction sites. The results proved the superiority of
this method.
Collapse
Affiliation(s)
- Huaixu Zhu
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Xiuquan Du
- School of Computer Science and Technology, Anhui University, Hefei, China
| | - Yu Yao
- School of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
43
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
44
|
Xie Z, Deng X, Shu K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int J Mol Sci 2020; 21:E467. [PMID: 31940793 PMCID: PMC7013409 DOI: 10.3390/ijms21020467] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 12/23/2019] [Accepted: 01/08/2020] [Indexed: 12/20/2022] Open
Abstract
Protein-protein interaction (PPI) sites play a key role in the formation of protein complexes, which is the basis of a variety of biological processes. Experimental methods to solve PPI sites are expensive and time-consuming, which has led to the development of different kinds of prediction algorithms. We propose a convolutional neural network for PPI site prediction and use residue binding propensity to improve the positive samples. Our method obtains a remarkable result of the area under the curve (AUC) = 0.912 on the improved data set. In addition, it yields much better results on samples with high binding propensity than on randomly selected samples. This suggests that there are considerable false-positive PPI sites in the positive samples defined by the distance between residue atoms.
Collapse
Affiliation(s)
- Zengyan Xie
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| | | | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
| |
Collapse
|
45
|
Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat Methods 2019; 17:184-192. [DOI: 10.1038/s41592-019-0666-6] [Citation(s) in RCA: 349] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 10/28/2019] [Indexed: 02/05/2023]
|
46
|
Galano-Frutos JJ, García-Cebollada H, Sancho J. Molecular dynamics simulations for genetic interpretation in protein coding regions: where we are, where to go and when. Brief Bioinform 2019; 22:3-19. [PMID: 31813950 DOI: 10.1093/bib/bbz146] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 09/22/2019] [Accepted: 10/25/2019] [Indexed: 12/18/2022] Open
Abstract
The increasing ease with which massive genetic information can be obtained from patients or healthy individuals has stimulated the development of interpretive bioinformatics tools as aids in clinical practice. Most such tools analyze evolutionary information and simple physical-chemical properties to predict whether replacement of one amino acid residue with another will be tolerated or cause disease. Those approaches achieve up to 80-85% accuracy as binary classifiers (neutral/pathogenic). As such accuracy is insufficient for medical decision to be based on, and it does not appear to be increasing, more precise methods, such as full-atom molecular dynamics (MD) simulations in explicit solvent, are also discussed. Then, to describe the goal of interpreting human genetic variations at large scale through MD simulations, we restrictively refer to all possible protein variants carrying single-amino-acid substitutions arising from single-nucleotide variations as the human variome. We calculate its size and develop a simple model that allows calculating the simulation time needed to have a 0.99 probability of observing unfolding events of any unstable variant. The knowledge of that time enables performing a binary classification of the variants (stable-potentially neutral/unstable-pathogenic). Our model indicates that the human variome cannot be simulated with present computing capabilities. However, if they continue to increase as per Moore's law, it could be simulated (at 65°C) spending only 3 years in the task if we started in 2031. The simulation of individual protein variomes is achievable in short times starting at present. International coordination seems appropriate to embark upon massive MD simulations of protein variants.
Collapse
Affiliation(s)
- Juan J Galano-Frutos
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| | | | - Javier Sancho
- Protein Folding and Molecular Design (ProtMol)' group at BIFI, University of Zaragoza
| |
Collapse
|
47
|
Vajdi A, Zarringhalam K, Haspel N. Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores. Bioinformatics 2019; 36:1460-1467. [DOI: 10.1093/bioinformatics/btz791] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 09/30/2019] [Accepted: 10/15/2019] [Indexed: 01/07/2023] Open
Abstract
Abstract
Motivation
Over the past decade, there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein–protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein–protein inter-residue contacts remains relatively limited.
Results
In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein–protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70% and 24% and the average relative precision by 80% and 36% in comparison with two state-of-the-art methods, PSICOV and GREMLIN.
Availability and implementation
https://github.com/BioMLBoston/PatchDCA
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amir Vajdi
- Computer Science Department, University of Massachusetts Boston, Boston, MA, USA
- Department of Informatics and Analytics, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Nurit Haspel
- Computer Science Department, University of Massachusetts Boston, Boston, MA, USA
| |
Collapse
|
48
|
Gil N, Fajardo EJ, Fiser A. Discovery of receptor-ligand interfaces in the immunoglobulin superfamily. Proteins 2019; 88:135-142. [PMID: 31298437 DOI: 10.1002/prot.25778] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 06/21/2019] [Accepted: 07/06/2019] [Indexed: 12/13/2022]
Abstract
Cell-surface-anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell-cell adhesion, and carcinogenesis. IgSF proteins generally function through protein-protein interactions carried out between extracellular, membrane-bound proteins on adjacent cells, known as trans-binding interfaces. These protein-protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular-level understanding of IgSF protein-protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans-binding interfaces. We propose a novel combination of structure and sequence information to identify trans-binding interfaces in IgSF proteins. We developed a structure-based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence-based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure-based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Eduardo J Fajardo
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
49
|
Wang X, Yu B, Ma A, Chen C, Liu B, Ma Q. Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 2019; 35:2395-2402. [PMID: 30520961 PMCID: PMC6612859 DOI: 10.1093/bioinformatics/bty995] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Revised: 11/19/2018] [Accepted: 12/03/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The prediction of protein-protein interaction (PPI) sites is a key to mutation design, catalytic reaction and the reconstruction of PPI networks. It is a challenging task considering the significant abundant sequences and the imbalance issue in samples. RESULTS A new ensemble learning-based method, Ensemble Learning of synthetic minority oversampling technique (SMOTE) for Unbalancing samples and RF algorithm (EL-SMURF), was proposed for PPI sites prediction in this study. The sequence profile feature and the residue evolution rates were combined for feature extraction of neighboring residues using a sliding window, and the SMOTE was applied to oversample interface residues in the feature space for the imbalance problem. The Multi-dimensional Scaling feature selection method was implemented to reduce feature redundancy and subset selection. Finally, the Random Forest classifiers were applied to build the ensemble learning model, and the optimal feature vectors were inserted into EL-SMURF to predict PPI sites. The performance validation of EL-SMURF on two independent validation datasets showed 77.1% and 77.7% accuracy, which were 6.2-15.7% and 6.1-18.9% higher than the other existing tools, respectively. AVAILABILITY AND IMPLEMENTATION The source codes and data used in this study are publicly available at http://github.com/QUST-AIBBDRC/EL-SMURF/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoying Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- School of Mathematics, Shandong University, Jinan, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
- School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Anjun Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, China
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
50
|
Ni D, Lu S, Zhang J. Emerging roles of allosteric modulators in the regulation of protein-protein interactions (PPIs): A new paradigm for PPI drug discovery. Med Res Rev 2019; 39:2314-2342. [PMID: 30957264 DOI: 10.1002/med.21585] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2018] [Revised: 03/12/2019] [Accepted: 03/24/2019] [Indexed: 12/26/2022]
Abstract
Protein-protein interactions (PPIs) are closely implicated in various types of cellular activities and are thus pivotal to health and disease states. Given their fundamental roles in a wide range of biological processes, the modulation of PPIs has enormous potential in drug discovery. However, owing to the general properties of large, flat, and featureless interfaces of PPIs, previous attempts have demonstrated that the generation of therapeutic agents targeting PPI interfaces is challenging, rendering them almost "undruggable" for decades. To date, rapid progress in chemical and structural biology techniques has promoted the exploitation of allostery as a novel approach in drug discovery. By attaching to allosteric sites that are topologically and spatially distinct from PPI interfaces, allosteric modulators can achieve improved physiochemical properties. Thus, allosteric modulators may represent an alternative strategy to target intractable PPIs and have attracted intense pharmaceutical interest. In this review, we first briefly introduce the characteristics of PPIs and then present different approaches for investigating PPIs, as well as the latest methods for modulating PPIs. Importantly, we comprehensively review the recent progress in the development of allosteric modulators to inhibit or stabilize PPIs. Finally, we conclude with future perspectives on the discovery of allosteric PPI modulators, especially the application of computational methods to aid in allosteric PPI drug discovery.
Collapse
Affiliation(s)
- Duan Ni
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Clinical and Fundamental Research Center, Renji Hospital, Shanghai Jiao-Tong University School of Medicine, Shanghai, China
| | - Shaoyong Lu
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Clinical and Fundamental Research Center, Renji Hospital, Shanghai Jiao-Tong University School of Medicine, Shanghai, China.,Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, China
| | - Jian Zhang
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Clinical and Fundamental Research Center, Renji Hospital, Shanghai Jiao-Tong University School of Medicine, Shanghai, China.,Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, China.,Center for Single-Cell Omics, Shanghai Jiao-Tong University School of Medicine, Shanghai, China
| |
Collapse
|