1
|
Chatterjee A, Ravandi B, Haddadi P, Philip NH, Abdelmessih M, Mowrey WR, Ricchiuto P, Liang Y, Ding W, Mobarec JC, Eliassi-Rad T. Topology-driven negative sampling enhances generalizability in protein-protein interaction prediction. Bioinformatics 2025; 41:btaf148. [PMID: 40193392 PMCID: PMC12080959 DOI: 10.1093/bioinformatics/btaf148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 03/03/2025] [Accepted: 04/04/2025] [Indexed: 04/09/2025] Open
Abstract
MOTIVATION Unraveling the human interactome to uncover disease-specific patterns and discover drug targets hinges on accurate protein-protein interaction (PPI) predictions. However, challenges persist in machine learning (ML) models due to a scarcity of quality hard negative samples, shortcut learning, and limited generalizability to novel proteins. RESULTS In this study, we introduce a novel approach for strategic sampling of protein-protein noninteractions (PPNIs) by leveraging higher-order network characteristics that capture the inherent complementarity-driven mechanisms of PPIs. Next, we introduce Unsupervised Pre-training of Node Attributes tuned for PPI (UPNA-PPI), a high throughput sequence-to-function ML pipeline, integrating unsupervised pre-training in protein representation learning with Topological PPNI (TPPNI) samples, capable of efficiently screening billions of interactions. By using our TPPNI in training the UPNA-PPI model, we improve PPI prediction generalizability and interpretability, particularly in identifying potential binding sites locations on amino acid sequences, strengthening the prioritization of screening assays and facilitating the transferability of ML predictions across protein families and homodimers. UPNA-PPI establishes the foundation for a fundamental negative sampling methodology in graph machine learning by integrating insights from network topology. AVAILABILITY AND IMPLEMENTATION Code and UPNA-PPI predictions are freely available at https://github.com/alxndgb/UPNA-PPI.
Collapse
Affiliation(s)
- Ayan Chatterjee
- BioClarity AI, Boston, MA 02130, United States
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
- Network Science Institute, Northeastern University, Boston, MA 02115, United States
| | - Babak Ravandi
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
- Network Science Institute, Northeastern University, Boston, MA 02115, United States
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Parham Haddadi
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Naomi H Philip
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Mario Abdelmessih
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - William R Mowrey
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Piero Ricchiuto
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Yupu Liang
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Wei Ding
- Bioinformatics and Data Science, Alexion AstraZeneca Rare Disease, Boston, MA 02210, United States
| | - Juan Carlos Mobarec
- Protein Structure and Biophysics, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Tina Eliassi-Rad
- Network Science Institute, Northeastern University, Boston, MA 02115, United States
- Khoury College of Computer Sciences, Northeastern University, Boston, MA CB2 0AA, United States
- Santa Fe Institute, Santa Fe, NM 87501, United States
| |
Collapse
|
2
|
Kiouri DP, Batsis GC, Chasapis CT. Structure-Based Deep Learning Framework for Modeling Human-Gut Bacterial Protein Interactions. Proteomes 2025; 13:10. [PMID: 39982320 PMCID: PMC11843979 DOI: 10.3390/proteomes13010010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 02/09/2025] [Accepted: 02/11/2025] [Indexed: 02/22/2025] Open
Abstract
Background: The interaction network between the human host proteins and the proteins of the gut bacteria is essential for the establishment of human health, and its dysregulation directly contributes to disease development. Despite its great importance, experimental data on protein-protein interactions (PPIs) between these species are sparse due to experimental limitations. Methods: This study presents a deep learning-based framework for predicting PPIs between human and gut bacterial proteins using structural data. The framework leverages graph-based protein representations and variational autoencoders (VAEs) to extract structural embeddings from protein graphs, which are then fused through a Bi-directional Cross-Attention module to predict interactions. The model addresses common challenges in PPI datasets, such as class imbalance, using focal loss to emphasize harder-to-classify samples. Results: The results demonstrated that this framework exhibits robust performance, with high precision and recall across validation and test datasets, underscoring its generalizability. By incorporating proteoforms in the analysis, the model accounts for the structural complexity within proteomes, making predictions biologically relevant. Conclusions: These findings offer a scalable tool for investigating the interactions between the host and the gut microbiota, potentially yielding new treatment targets and diagnostics for disorders linked to the microbiome.
Collapse
Affiliation(s)
- Despoina P. Kiouri
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
- Laboratory of Organic Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, 15772 Athens, Greece
| | - Georgios C. Batsis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| | - Christos T. Chasapis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| |
Collapse
|
3
|
Ambreen S, Umar M, Noor A, Jain H, Ali R. Advanced AI and ML frameworks for transforming drug discovery and optimization: With innovative insights in polypharmacology, drug repurposing, combination therapy and nanomedicine. Eur J Med Chem 2025; 284:117164. [PMID: 39721292 DOI: 10.1016/j.ejmech.2024.117164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 12/28/2024]
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are transforming drug discovery by overcoming traditional challenges like high costs, time-consuming, and frequent failures. AI-driven approaches streamline key phases, including target identification, lead optimization, de novo drug design, and drug repurposing. Frameworks such as deep neural networks (DNNs), convolutional neural networks (CNNs), and deep reinforcement learning (DRL) models have shown promise in identifying drug targets, optimizing delivery systems, and accelerating drug repurposing. Generative adversarial networks (GANs) and variational autoencoders (VAEs) aid de novo drug design by creating novel drug-like compounds with desired properties. Case studies, such as DDR1 kinase inhibitors designed using generative models and CDK20 inhibitors developed via structure-based methods, highlight AI's ability to produce highly specific therapeutics. Models like SNF-CVAE and DeepDR further advance drug repurposing by uncovering new therapeutic applications for existing drugs. Advanced ML algorithms enhance precision in predicting drug efficacy, toxicity, and ADME-Tox properties, reducing development costs and improving drug-target interactions. AI also supports polypharmacology by optimizing multi-target drug interactions and enhances combination therapy through predictions of drug synergies and antagonisms. In nanomedicine, AI models like CURATE.AI and the Hartung algorithm optimize personalized treatments by predicting toxicological risks and real-time dosing adjustments with high accuracy. Despite its potential, challenges like data quality, model interpretability, and ethical concerns must be addressed. High-quality datasets, transparent models, and unbiased algorithms are essential for reliable AI applications. As AI continues to evolve, it is poised to revolutionize drug discovery and personalized medicine, advancing therapeutic development and patient care.
Collapse
Affiliation(s)
- Subiya Ambreen
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Mohammad Umar
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Aaisha Noor
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Himangini Jain
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India
| | - Ruhi Ali
- Department of Pharmaceutical Chemistry, Delhi Institute of Pharmaceutical Sciences and Research (DIPSAR), DPSRU, Pushp Vihar, New Delhi, 110017, India.
| |
Collapse
|
4
|
Taha K. Protein-protein interaction detection using deep learning: A survey, comparative analysis, and experimental evaluation. Comput Biol Med 2025; 185:109449. [PMID: 39644584 DOI: 10.1016/j.compbiomed.2024.109449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 11/13/2024] [Accepted: 11/14/2024] [Indexed: 12/09/2024]
Abstract
This survey paper provides a comprehensive analysis of various Deep Learning (DL) techniques and algorithms for detecting protein-protein interactions (PPIs). It examines the scalability, interpretability, accuracy, and efficiency of each technique, offering a detailed empirical and experimental evaluation. Empirically, the techniques are assessed based on four key criteria, while experimentally, they are ranked by specific algorithms and broader methodological categories. Deep Neural Networks (DNNs) demonstrated high accuracy but faced limitations such as overfitting and low interpretability. Convolutional Neural Networks (CNNs) were highly efficient at extracting hierarchical features from biological sequences, while Generative Stochastic Networks (GSNs) excelled in handling uncertainty. Long Short-Term Memory (LSTM) networks effectively captured temporal dependencies within PPI sequences, though they presented scalability challenges. This paper concludes with insights into potential improvements and future directions for advancing DL techniques in PPI identification, highlighting areas where further optimization can enhance performance and applicability.
Collapse
Affiliation(s)
- Kamal Taha
- Department of Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
5
|
Kiouri DP, Batsis GC, Chasapis CT. Structure-Based Approaches for Protein-Protein Interaction Prediction Using Machine Learning and Deep Learning. Biomolecules 2025; 15:141. [PMID: 39858535 PMCID: PMC11763140 DOI: 10.3390/biom15010141] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 01/11/2025] [Accepted: 01/14/2025] [Indexed: 01/27/2025] Open
Abstract
Protein-Protein Interaction (PPI) prediction plays a pivotal role in understanding cellular processes and uncovering molecular mechanisms underlying health and disease. Structure-based PPI prediction has emerged as a robust alternative to sequence-based methods, offering greater biological accuracy by integrating three-dimensional spatial and biochemical features. This work summarizes the recent advances in computational approaches leveraging protein structure information for PPI prediction, focusing on machine learning (ML) and deep learning (DL) techniques. These methods not only improve predictive accuracy but also provide insights into functional sites, such as binding and catalytic residues. However, challenges such as limited high-resolution structural data and the need for effective negative sampling persist. Through the integration of experimental and computational tools, structure-based prediction paves the way for comprehensive proteomic network analysis, holding promise for advancements in drug discovery, biomarker identification, and personalized medicine. Future directions include enhancing scalability and dataset reliability to expand these approaches across diverse proteomes.
Collapse
Affiliation(s)
- Despoina P. Kiouri
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
- Laboratory of Organic Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, 15772 Athens, Greece
| | - Georgios C. Batsis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| | - Christos T. Chasapis
- Institute of Chemical Biology, National Hellenic Research Foundation, 11635 Athens, Greece; (D.P.K.); (G.C.B.)
| |
Collapse
|
6
|
Yang J, Li Y, Wang G, Chen Z, Wu D. An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2518-2530. [PMID: 39446541 DOI: 10.1109/tcbb.2024.3486216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Protein-protein interactions (PPIs) are essential to understanding cellular mechanisms, signaling networks, disease processes, and drug development, as they represent the physical contacts and functional associations between proteins. Recent advances have witnessed the achievements of artificial intelligence (AI) methods aimed at predicting PPIs. However, these approaches often handle the intricate web of relationships and mechanisms among proteins, drugs, diseases, ribonucleic acid (RNA), and protein structures in a fragmented or superficial manner. This is typically due to the limitations of non-end-to-end learning frameworks, which can lead to sub-optimal feature extraction and fusion, thereby compromising the prediction accuracy. To address these deficiencies, this paper introduces a novel end-to-end learning model, the Knowledge Graph Fused Graph Neural Network (KGF-GNN). This model comprises three integral components: (1) Protein Associated Network (PAN) Construction: We begin by constructing a PAN that extensively captures the diverse relationships and mechanisms linking proteins with drugs, diseases, RNA, and protein structures. (2) Graph Neural Network for Feature Extraction: A Graph Neural Network (GNN) is then employed to distill both topological and semantic features from the PAN, alongside another GNN designed to extract topological features directly from observed PPI networks. (3) Multi-layer Perceptron for Feature Fusion: Finally, a multi-layer perceptron integrates these varied features through end-to-end learning, ensuring that the feature extraction and fusion processes are both comprehensive and optimized for PPI prediction. Extensive experiments conducted on real-world PPI datasets validate the effectiveness of our proposed KGF-GNN approach, which not only achieves high accuracy in predicting PPIs but also significantly surpasses existing state-of-the-art models. This work not only enhances our ability to predict PPIs with a higher precision but also contributes to the broader application of AI in Bioinformatics, offering profound implications for biological research and therapeutic development.
Collapse
|
7
|
Hu J, Li Z, Rao B, Thafar MA, Arif M. Improving protein-protein interaction prediction using protein language model and protein network features. Anal Biochem 2024; 693:115550. [PMID: 38679191 DOI: 10.1016/j.ab.2024.115550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/12/2024] [Accepted: 04/25/2024] [Indexed: 05/01/2024]
Abstract
Interactions between proteins are ubiquitous in a wide variety of biological processes. Accurately identifying the protein-protein interaction (PPI) is of significant importance for understanding the mechanisms of protein functions and facilitating drug discovery. Although the wet-lab technological methods are the best way to identify PPI, their major constraints are their time-consuming nature, high cost, and labor-intensiveness. Hence, lots of efforts have been made towards developing computational methods to improve the performance of PPI prediction. In this study, we propose a novel hybrid computational method (called KSGPPI) that aims at improving the prediction performance of PPI via extracting the discriminative information from protein sequences and interaction networks. The KSGPPI model comprises two feature extraction modules. In the first feature extraction module, a large protein language model, ESM-2, is employed to exploit the global complex patterns concealed within protein sequences. Subsequently, feature representations are further extracted through CKSAAP, and a two-dimensional convolutional neural network (CNN) is utilized to capture local information. In the second feature extraction module, the query protein acquires its similar protein from the STRING database via the sequence alignment tool NW-align and then captures the graph embedding feature for the query protein in the protein interaction network of the similar protein using the algorithm of Node2vec. Finally, the features of these two feature extraction modules are efficiently fused; the fused features are then fed into the multilayer perceptron to predict PPI. The results of five-fold cross-validation on the used benchmarked datasets demonstrate that KSGPPI achieves an average prediction accuracy of 88.96 %. Additionally, the average Matthews correlation coefficient value (0.781) of KSGPPI is significantly higher than that of those state-of-the-art PPI prediction methods. The standalone package of KSGPPI is freely downloaded at https://github.com/rickleezhe/KSGPPI.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
| | - Zhe Li
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education, Beijing, 100039, China.
| | - Maha A Thafar
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar.
| |
Collapse
|
8
|
Thareja P, Chhillar RS, Dalal S, Simaiya S, Lilhore UK, Alroobaea R, Alsafyani M, Baqasah AM, Algarni S. Intelligence model on sequence-based prediction of PPI using AISSO deep concept with hyperparameter tuning process. Sci Rep 2024; 14:21797. [PMID: 39294330 PMCID: PMC11410825 DOI: 10.1038/s41598-024-72558-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
Protein-protein interaction (PPI) prediction is vital for interpreting biological activities. Even though many diverse sorts of data and machine learning approaches have been employed in PPI prediction, performance still has to be enhanced. As a result, we adopted an Aquilla Influenced Shark Smell (AISSO)-based hybrid prediction technique to construct a sequence-dependent PPI prediction model. This model has two stages of operation: feature extraction and prediction. Along with sequence-based and Gene Ontology features, unique features were produced in the feature extraction stage utilizing the improved semantic similarity technique, which may deliver reliable findings. These collected characteristics were then sent to the prediction step, and hybrid neural networks, such as the Improved Recurrent Neural Network and Deep Belief Networks, were used to predict the PPI using modified score level fusion. These neural networks' weight variables were adjusted utilizing a unique optimal methodology called Aquila Influenced Shark Smell (AISSO), and the outcomes showed that the developed model had attained an accuracy of around 88%, which is much better than the traditional methods; this model AISSO-based PPI prediction can provide precise and effective predictions.
Collapse
Affiliation(s)
- Preeti Thareja
- DCSA, Maharshi Dayanand University, Rohtak, Haryana, India
| | | | - Sandeep Dalal
- DCSA, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Sarita Simaiya
- Arba Minch University, Arba Minch, Ethiopia.
- Department of Computer Science and Engineering, Galgotias University, Greater Noida, UP, India.
| | - Umesh Kumar Lilhore
- Department of Computer Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, 21944, Taif, Saudi Arabia
| | - Majed Alsafyani
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, 21944, Taif, Saudi Arabia
| | - Abdullah M Baqasah
- Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Sultan Algarni
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
| |
Collapse
|
9
|
Zhao S, Cui Z, Zhang G, Gong Y, Su L. MGPPI: multiscale graph neural networks for explainable protein-protein interaction prediction. Front Genet 2024; 15:1440448. [PMID: 39076171 PMCID: PMC11284081 DOI: 10.3389/fgene.2024.1440448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 06/24/2024] [Indexed: 07/31/2024] Open
Abstract
Protein-Protein Interactions (PPIs) involves in various biological processes, which are of significant importance in cancer diagnosis and drug development. Computational based PPI prediction methods are more preferred due to their low cost and high accuracy. However, existing protein structure based methods are insufficient in the extraction of protein structural information. Furthermore, most methods are less interpretable, which hinder their practical application in the biomedical field. In this paper, we propose MGPPI, which is a Multiscale graph convolutional neural network model for PPI prediction. By incorporating multiscale module into the Graph Neural Network (GNN) and constructing multi convolutional layers, MGPPI can effectively capture both local and global protein structure information. For model interpretability, we introduce a novel visual explanation method named Gradient Weighted interaction Activation Mapping (Grad-WAM), which can highlight key binding residue sites. We evaluate the performance of MGPPI by comparing with state-of-the-arts methods on various datasets. Results shows that MGPPI outperforms other methods significantly and exhibits strong generalization capabilities on the multi-species dataset. As a practical case study, we predicted the binding affinity between the spike (S) protein of SARS-COV-2 and the human ACE2 receptor protein, and successfully identified key binding sites with known binding functions. Key binding sites mutation in PPIs can affect cancer patient survival statues. Therefore, we further verified Grad-WAM highlighted residue sites in separating patients survival groups in several different cancer type datasets. According to our results, some of the highlighted residues can be used as biomarkers in predicting patients survival probability. All these results together demonstrate the high accuracy and practical application value of MGPPI. Our method not only addresses the limitations of existing approaches but also can assists researchers in identifying crucial drug targets and help guide personalized cancer treatment.
Collapse
Affiliation(s)
| | | | | | | | - Lingtao Su
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| |
Collapse
|
10
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
11
|
Su Z, Griffin B, Emmons S, Wu Y. Prediction of interactions between cell surface proteins by machine learning. Proteins 2024; 92:567-580. [PMID: 38050713 DOI: 10.1002/prot.26648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/06/2023]
Abstract
Cells detect changes in their external environments or communicate with each other through proteins on their surfaces. These cell surface proteins form a complicated network of interactions in order to fulfill their functions. The interactions between cell surface proteins are highly dynamic and, thus, challenging to detect using traditional experimental techniques. Here, we tackle this challenge using a computational framework. The primary focus of the framework is to develop new tools to identify interactions between domains in the immunoglobulin (Ig) fold, which is the most abundant domain family in cell surface proteins. These interactions could be formed between ligands and receptors from different cells or between proteins on the same cell surface. In practice, we collected all structural data on Ig domain interactions and transformed them into an interface fragment pair library. A high-dimensional profile can then be constructed from the library for a given pair of query protein sequences. Multiple machine learning models were used to read this profile so that the probability of interaction between the query proteins could be predicted. We tested our models on an experimentally derived dataset that contains 564 cell surface proteins in humans. The cross-validation results show that we can achieve higher than 70% accuracy in identifying the PPIs within this dataset. We then applied this method to a group of 46 cell surface proteins in Caenorhabditis elegans. We screened every possible interaction between these proteins. Many interactions recognized by our machine learning classifiers have been experimentally confirmed in the literature. In conclusion, our computational platform serves as a useful tool to help identify potential new interactions between cell surface proteins in addition to current state-of-the-art experimental techniques. The tool is freely accessible for use by the scientific community. Moreover, the general framework of the machine learning classification can also be extended to study the interactions of proteins in other domain superfamilies.
Collapse
Affiliation(s)
- Zhaoqian Su
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Brian Griffin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Scott Emmons
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Yinghao Wu
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| |
Collapse
|
12
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
13
|
Martin J. AlphaFold2 Predicts Whether Proteins Interact Amidst Confounding Structural Compatibility. J Chem Inf Model 2024; 64:1473-1480. [PMID: 38373070 DOI: 10.1021/acs.jcim.3c01805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Predicting whether two proteins physically interact is one of the holy grails of computational biology, galvanized by rapid advancements in deep learning. AlphaFold2, although not developed with this goal, is promising in this respect. Here, I test the prediction capability of AlphaFold2 on a very challenging data set, where proteins are structurally compatible, even when they do not interact. AlphaFold2 achieves high discrimination between interacting and non-interacting proteins, and the cases of misclassifications can either be rescued by revisiting the input sequences or can suggest false positives and negatives in the data set. AlphaFold2 is thus not impaired by the compatibility between protein structures and has the potential to be applied on a large scale.
Collapse
Affiliation(s)
- Juliette Martin
- Univ Lyon, CNRS, UMR 5086 MMSB, 7 passage du Vercors F-69367, Lyon, France
- Laboratory of Biology and Modeling of the Cell, Ecole Normale Supérieure de Lyon, CNRS UMR 5239, Inserm U1293, University Claude Bernard Lyon 1, 69364, Lyon, France
| |
Collapse
|
14
|
Dang TH, Vu TA. xCAPT5: protein-protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model. BMC Bioinformatics 2024; 25:106. [PMID: 38461247 PMCID: PMC10924985 DOI: 10.1186/s12859-024-05725-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 02/28/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND Predicting protein-protein interactions (PPIs) from sequence data is a key challenge in computational biology. While various computational methods have been proposed, the utilization of sequence embeddings from protein language models, which contain diverse information, including structural, evolutionary, and functional aspects, has not been fully exploited. Additionally, there is a significant need for a comprehensive neural network capable of efficiently extracting these multifaceted representations. RESULTS Addressing this gap, we propose xCAPT5, a novel hybrid classifier that uniquely leverages the T5-XL-UniRef50 protein large language model for generating rich amino acid embeddings from protein sequences. The core of xCAPT5 is a multi-kernel deep convolutional siamese neural network, which effectively captures intricate interaction features at both micro and macro levels, integrated with the XGBoost algorithm, enhancing PPIs classification performance. By concatenating max and average pooling features in a depth-wise manner, xCAPT5 effectively learns crucial features with low computational cost. CONCLUSION This study represents one of the initial efforts to extract informative amino acid embeddings from a large protein language model using a deep and wide convolutional network. Experimental results show that xCAPT5 outperforms recent state-of-the-art methods in binary PPI prediction, excelling in cross-validation on several benchmark datasets and demonstrating robust generalization across intra-species, cross-species, inter-species, and stringent similarity contexts.
Collapse
Affiliation(s)
- Thanh Hai Dang
- Faculty of Information Technology, VNU University of Engineering and Technology, 144 Xuan Thuy, Hanoi, 10000, Vietnam.
| | - Tien Anh Vu
- Faculty of Biology, VNU University of Science, 334 Nguyen Trai, Hanoi, 10000, Vietnam
| |
Collapse
|
15
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
16
|
Liu T, Gao H, Ren X, Xu G, Liu B, Wu N, Luo H, Wang Y, Tu T, Yao B, Guan F, Teng Y, Huang H, Tian J. Protein-protein interaction and site prediction using transfer learning. Brief Bioinform 2023; 24:bbad376. [PMID: 37870286 DOI: 10.1093/bib/bbad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/14/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023] Open
Abstract
The advanced language models have enabled us to recognize protein-protein interactions (PPIs) and interaction sites using protein sequences or structures. Here, we trained the MindSpore ProteinBERT (MP-BERT) model, a Bidirectional Encoder Representation from Transformers, using protein pairs as inputs, making it suitable for identifying PPIs and their respective interaction sites. The pretrained model (MP-BERT) was fine-tuned as MPB-PPI (MP-BERT on PPI) and demonstrated its superiority over the state-of-the-art models on diverse benchmark datasets for predicting PPIs. Moreover, the model's capability to recognize PPIs among various organisms was evaluated on multiple organisms. An amalgamated organism model was designed, exhibiting a high level of generalization across the majority of organisms and attaining an accuracy of 92.65%. The model was also customized to predict interaction site propensity by fine-tuning it with PPI site data as MPB-PPISP. Our method facilitates the prediction of both PPIs and their interaction sites, thereby illustrating the potency of transfer learning in dealing with the protein pair task.
Collapse
Affiliation(s)
- Tuoyu Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Han Gao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Xiaopu Ren
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guoshun Xu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bo Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huiying Luo
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Yuan Wang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Tao Tu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Bin Yao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Feifei Guan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yue Teng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences, Beijing 100071, China
| | - Huoqing Huang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Jian Tian
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
17
|
Xie S, Xie X, Zhao X, Liu F, Wang Y, Ping J, Ji Z. HNSPPI: a hybrid computational model combing network and sequence information for predicting protein-protein interaction. Brief Bioinform 2023; 24:bbad261. [PMID: 37480553 DOI: 10.1093/bib/bbad261] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/24/2023] [Accepted: 06/26/2023] [Indexed: 07/24/2023] Open
Abstract
Most life activities in organisms are regulated through protein complexes, which are mainly controlled via Protein-Protein Interactions (PPIs). Discovering new interactions between proteins and revealing their biological functions are of great significance for understanding the molecular mechanisms of biological processes and identifying the potential targets in drug discovery. Current experimental methods only capture stable protein interactions, which lead to limited coverage. In addition, expensive cost and time consuming are also the obvious shortcomings. In recent years, various computational methods have been successfully developed for predicting PPIs based only on protein homology, primary sequences of protein or gene ontology information. Computational efficiency and data complexity are still the main bottlenecks for the algorithm generalization. In this study, we proposed a novel computational framework, HNSPPI, to predict PPIs. As a hybrid supervised learning model, HNSPPI comprehensively characterizes the intrinsic relationship between two proteins by integrating amino acid sequence information and connection properties of PPI network. The experimental results show that HNSPPI works very well on six benchmark datasets. Moreover, the comparison analysis proved that our model significantly outperforms other five existing algorithms. Finally, we used the HNSPPI model to explore the SARS-CoV-2-Human interaction system and found several potential regulations. In summary, HNSPPI is a promising model for predicting new protein interactions from known PPI data.
Collapse
Affiliation(s)
- Shijie Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xiaojun Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xin Zhao
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Fei Liu
- Joint International Research Laboratory of Animal Health and Food Safety of Ministry of Education & Single Molecule Nanometry Laboratory (Sinmolab), Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yiming Wang
- Key Laboratory of Biological Interactions and Crop Health, Department of Plant Pathology, Nanjing Agricultural University, 210095, Nanjing, China
| | - Jihui Ping
- MOE International Joint Collaborative Research Laboratory for Animal Health and Food Safety & Jiangsu Engineering Laboratory of Animal Immunology, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| |
Collapse
|
18
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
19
|
Tan M, Pan Q, Wu Q, Li J, Wang J. Aldolase B attenuates clear cell renal cell carcinoma progression by inhibiting CtBP2. Front Med 2023; 17:503-517. [PMID: 36790589 DOI: 10.1007/s11684-022-0947-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 06/28/2022] [Indexed: 02/16/2023]
Abstract
Aldolase B (ALDOB), a glycolytic enzyme, is uniformly depleted in clear cell renal cell carcinoma (ccRCC) tissues. We previously showed that ALDOB inhibited proliferation through a mechanism independent of its enzymatic activity in ccRCC, but the mechanism was not unequivocally identified. We showed that the corepressor C-terminal-binding protein 2 (CtBP2) is a novel ALDOB-interacting protein in ccRCC. The CtBP2-to-ALDOB expression ratio in clinical samples was correlated with the expression of CtBP2 target genes and was associated with shorter survival. ALDOB inhibited CtBP2-mediated repression of multiple cell cycle inhibitor, proapoptotic, and epithelial marker genes. Furthermore, ALDOB overexpression decreased the proliferation and migration of ccRCC cells in an ALDOB-CtBP2 interaction-dependent manner. Mechanistically, our findings showed that ALDOB recruited acireductone dioxygenase 1, which catalyzes the synthesis of an endogenous inhibitor of CtBP2, 4-methylthio 2-oxobutyric acid. ALDOB functions as a scaffold to bring acireductone dioxygenase and CtBP2 in close proximity to potentiate acireductone dioxygenase-mediated inhibition of CtBP2, and this scaffolding effect was independent of ALDOB enzymatic activity. Moreover, increased ALDOB expression inhibited tumor growth in a xenograft model and decreased lung metastasis in vivo. Our findings reveal that ALDOB is a negative regulator of CtBP2 and inhibits tumor growth and metastasis in ccRCC.
Collapse
Affiliation(s)
- Mingyue Tan
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200080, China
- Urology Center, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| | - Qi Pan
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200080, China
| | - Qi Wu
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200080, China
- Department of Urology, The Sixth Affiliated Hospital of Wenzhou Medical University (The People's Hospital of Lishui), Lishui, 323000, China
| | - Jianfa Li
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200080, China
| | - Jun Wang
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200080, China.
- Urology Center, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China.
- Department of Urology, The Sixth Affiliated Hospital of Wenzhou Medical University (The People's Hospital of Lishui), Lishui, 323000, China.
| |
Collapse
|
20
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
21
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
22
|
Chen H, Cai Y, Ji C, Selvaraj G, Wei D, Wu H. AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network. Brief Bioinform 2023; 24:bbac523. [PMID: 36526282 DOI: 10.1093/bib/bbac523] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/10/2022] [Accepted: 11/02/2022] [Indexed: 12/23/2022] Open
Abstract
Identifying unknown protein functional modules, such as protein complexes and biological pathways, from protein-protein interaction (PPI) networks, provides biologists with an opportunity to efficiently understand cellular function and organization. Finding complex nonlinear relationships in underlying functional modules may involve a long-chain of PPI and pose great challenges in a PPI network with an unevenly sparse and dense node distribution. To overcome these challenges, we propose AdaPPI, an adaptive convolution graph network in PPI networks to predict protein functional modules. We first suggest an attributed graph node presentation algorithm. It can effectively integrate protein gene ontology attributes and network topology, and adaptively aggregates low- or high-order graph structural information according to the node distribution by considering graph node smoothness. Based on the obtained node representations, core cliques and expansion algorithms are applied to find functional modules in PPI networks. Comprehensive performance evaluations and case studies indicate that the framework significantly outperforms state-of-the-art methods. We also presented potential functional modules based on their confidence.
Collapse
|
23
|
Murakami Y, Mizuguchi K. Recent developments of sequence-based prediction of protein-protein interactions. Biophys Rev 2022; 14:1393-1411. [PMID: 36589735 PMCID: PMC9789376 DOI: 10.1007/s12551-022-01038-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 12/25/2022] Open
Abstract
The identification of protein-protein interactions (PPIs) can lead to a better understanding of cellular functions and biological processes of proteins and contribute to the design of drugs to target disease-causing PPIs. In addition, targeting host-pathogen PPIs is useful for elucidating infection mechanisms. Although several experimental methods have been used to identify PPIs, these methods can yet to draw complete PPI networks. Hence, computational techniques are increasingly required for the prediction of potential PPIs, which have never been seen experimentally. Recent high-performance sequence-based methods have contributed to the construction of PPI networks and the elucidation of pathogenetic mechanisms in specific diseases. However, the usefulness of these methods depends on the quality and quantity of training data of PPIs. In this brief review, we introduce currently available PPI databases and recent sequence-based methods for predicting PPIs. Also, we discuss key issues in this field and present future perspectives of the sequence-based PPI predictions.
Collapse
Affiliation(s)
- Yoichi Murakami
- grid.440890.10000 0004 0640 9413Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-Ku, Chiba, 265-8501 Japan
| | - Kenji Mizuguchi
- grid.136593.b0000 0004 0373 3971Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita-Shi, Osaka, 565-0871 Japan ,grid.482562.fNational Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito Asagi, Ibaraki, Osaka 567-0085 Japan
| |
Collapse
|
24
|
Neumann D, Roy S, Minhas FUAA, Ben-Hur A. On the choice of negative examples for prediction of host-pathogen protein interactions. FRONTIERS IN BIOINFORMATICS 2022; 2:1083292. [PMID: 36591335 PMCID: PMC9798088 DOI: 10.3389/fbinf.2022.1083292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 11/14/2022] [Indexed: 12/23/2022] Open
Abstract
As practitioners of machine learning in the area of bioinformatics we know that the quality of the results crucially depends on the quality of our labeled data. While there is a tendency to focus on the quality of positive examples, the negative examples are equally as important. In this opinion paper we revisit the problem of choosing negative examples for the task of predicting protein-protein interactions, either among proteins of a given species or for host-pathogen interactions and describe important issues that are prevalent in the current literature. The challenge in creating datasets for this task is the noisy nature of the experimentally derived interactions and the lack of information on non-interacting proteins. A standard approach is to choose random pairs of non-interacting proteins as negative examples. Since the interactomes of all species are only partially known, this leads to a very small percentage of false negatives. This is especially true for host-pathogen interactions. To address this perceived issue, some researchers have chosen to select negative examples as pairs of proteins whose sequence similarity to the positive examples is sufficiently low. This clearly reduces the chance for false negatives, but also makes the problem much easier than it really is, leading to over-optimistic accuracy estimates. We demonstrate the effect of this form of bias using a selection of recent protein interaction prediction methods of varying complexity, and urge researchers to pay attention to the details of generating their datasets for potential biases like this.
Collapse
Affiliation(s)
- Don Neumann
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| | - Soumyadip Roy
- Department Computer Science, Colorado State University, Fort Collins, CO, United States
| | | | - Asa Ben-Hur
- Department Computer Science, Colorado State University, Fort Collins, CO, United States,*Correspondence: Don Neumann, ; Asa Ben-Hur,
| |
Collapse
|
25
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
26
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
27
|
A Survey on Deep Networks Approaches in Prediction of Sequence-Based Protein–Protein Interactions. SN COMPUTER SCIENCE 2022; 3:298. [PMID: 35611239 PMCID: PMC9119573 DOI: 10.1007/s42979-022-01197-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/06/2022] [Indexed: 12/03/2022]
Abstract
The prominence of protein–protein interactions (PPIs) in system biology with diverse biological procedures has become the topic to discuss because it acts as a fundamental part in predicting the protein function of the target protein and drug ability of molecules. Numerous researches have been published to predict PPIs computationally because they provide an alternative solution to laboratory trials and a cost-effective way of predicting the most likely set of interactions at the entire proteome scale. In recent computational methods, deep learning has become a buzzword with numerous scientific researches. This paper presents, for the first time, a comprehensive survey of sequence-based PPI prediction by three popular deep learning architectures i.e. deep neural networks, convolutional neural networks and recurrent neural networks and its variants. The thorough survey discussed herein carefully mined every possible information, can help the researchers to further explore the success in this area.
Collapse
|
28
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|