1
|
Han Y, Zhang SW, Shi MH, Zhang QQ, Li Y, Cui X. Predicting protein-protein interaction with interpretable bilinear attention network. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 265:108756. [PMID: 40174317 DOI: 10.1016/j.cmpb.2025.108756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 03/27/2025] [Accepted: 03/27/2025] [Indexed: 04/04/2025]
Abstract
BACKGROUND AND OBJECTIVE Protein-protein interactions (PPIs) play the key roles in myriad biological processes, helping to understand the protein function and disease pathology. Identification of PPIs and their interaction types through wet experimental methods are costly and time-consuming. Therefore, some computational methods (e.g., sequence-based deep learning method) have been proposed to predict PPIs. However, these methods predominantly focus on protein sequence information, neglecting the protein structure information, while the protein structure is closely related to its function. In addition, current PPI prediction methods that introduce the protein structure information use independent encoders to learn the sequence and structure representations from protein sequences and structures, respectively, without explicitly learn the important local interaction representation of two proteins, making the prediction results hard to interpret. METHODS Considering that current protein structure prediction methods (e.g., AlphaFold2) can accurately predict protein 3D structures and also provide a large number of protein 3D structures, here we present a novel end-to-end framework (called PPI-BAN) to predict PPIs and their interaction types by integrating protein sequence information and 3D structure information. PPI-BAN uses one-dimensional convolution operation (Conv1D) to extract the protein sequence features, employes GeomEtry-Aware Relational Graph Neural Network (GearNet) to learn protein 3D structure features, and adopts a deep bilinear attention network (BAN) to learn the joint features between one protein sequence and its 3D structure. The sequence features, structure features and joint features are concatenated to fed into a fully connected network for predicting PPIs and their interaction types. RESULTS Experimental results show that PPI-BAN achieves the best overall performance against other state-of-the-art methods. CONCLUSIONS PPI-BAN can effectively predict PPIs and their interaction types, and identify the significant interaction sites by computing attention weight maps and mapping them to specific amino acid residues.
Collapse
Affiliation(s)
- Yong Han
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China; Henan Judicial Police Vocational College, Zhengzhou, 450046, China
| | - Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.
| | - Ming-Hui Shi
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Qing-Qing Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Yi Li
- Henan Judicial Police Vocational College, Zhengzhou, 450046, China
| | - Xiaodong Cui
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
2
|
Nogueira-Rodríguez A, Glez-Peña D, Vieira CP, Vieira J, López-Fernández H. Towards a more accurate and reliable evaluation of machine learning protein-protein interaction prediction model performance in the presence of unavoidable dataset biases. J Integr Bioinform 2025:jib-2024-0054. [PMID: 40165676 DOI: 10.1515/jib-2024-0054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Accepted: 02/26/2025] [Indexed: 04/02/2025] Open
Abstract
The characterization of protein-protein interactions (PPIs) is fundamental to understand cellular functions. Although machine learning methods in this task have historically reported prediction accuracies up to 95 %, including those only using raw protein sequences, it has been highlighted that this could be overestimated due to the use of random splits and metrics that do not take into account potential biases in the datasets. Here, we propose a per-protein utility metric, pp_MCC, able to show a drop in the performance in both random and unseen-protein splits scenarios. We tested ML models based on sequence embeddings. The pp_MCC metric evidences a reduced performance even in a random split, reaching levels similar to those shown by the raw MCC metric computed over an unseen protein split, and drops even further when the pp_MCC is used in an unseen protein split scenario. Thus, the metric is able to give a more realistic performance estimation while allowing to use random splits, which could be interesting for more protein-centric studies. Given the low adjusted performance obtained, there seems to be room for improvement when using only primary sequence information, suggesting the need of inclusion of complementary protein data, accompanied with the use of the pp_MCC metric.
Collapse
Affiliation(s)
- Alba Nogueira-Rodríguez
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Daniel Glez-Peña
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| | - Cristina P Vieira
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
| | - Jorge Vieira
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Instituto de Biologia Molecular e Celular (IBMC), Rua Alfredo Allen, 208, 4200-135 Porto, Portugal
| | - Hugo López-Fernández
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, 36213 Vigo, Spain
| |
Collapse
|
3
|
Shukla D, Martin J, Morcos F, Potoyan DA. Thermal Adaptation of Cytosolic Malate Dehydrogenase Revealed by Deep Learning and Coevolutionary Analysis. J Chem Theory Comput 2025; 21:3277-3287. [PMID: 40079215 PMCID: PMC11948321 DOI: 10.1021/acs.jctc.4c01774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 03/06/2025] [Accepted: 03/07/2025] [Indexed: 03/14/2025]
Abstract
Protein evolution has shaped enzymes that maintain stability and function across diverse thermal environments. While sequence variation, thermal stability and conformational dynamics are known to influence an enzyme's thermal adaptation, how these factors collectively govern stability and function across diverse temperatures remains unresolved. Cytosolic malate dehydrogenase (cMDH), a citric acid cycle enzyme, is an ideal model for studying these mechanisms due to its temperature-sensitive flexibility and broad presence in species from diverse thermal environments. In this study, we employ techniques inspired by deep learning and statistical mechanics to uncover how sequence variation and conformational dynamics shape patterns of cMDH's thermal adaptation. By integrating coevolutionary models with variational autoencoders (VAE), we generate a latent generative landscape (LGL) of the cMDH sequence space, enabling us to explore mutational pathways and predict fitness using direct coupling analysis (DCA). Structure predictions via AlphaFold and molecular dynamics simulations further illuminate how variations in hydrophobic interactions and conformational flexibility contribute to the thermal stability of warm- and cold-adapted cMDH orthologs. Notably, we identify the ratio of hydrophobic contacts between two regions as a predictive order parameter for thermal stability features, providing a quantitative metric for understanding cMDH dynamics across temperatures. The integrative computational framework employed in this study provides mechanistic insights into protein adaptation at both sequence and structural levels, offering unique perspectives on the evolution of thermal stability and creating avenues for the rational design of proteins with optimized thermal properties.
Collapse
Affiliation(s)
- Divyanshu Shukla
- Bioinformatics
and Computational Biology Program, Iowa
State University, Ames, Iowa 50011, United States
| | - Jonathan Martin
- Department
of Biological Sciences, UT Dallas, Richardson, TX 75080, United States
| | - Faruck Morcos
- Department
of Biological Sciences, UT Dallas, Richardson, TX 75080, United States
- Departments
of Bioengineering and Physics, UT Dallas, Richardson, TX 75080, United States
- Center
for
Systems Biology, UT Dallas, Richardson, TX 75080, United States
| | - Davit A. Potoyan
- Department
of Chemistry, Iowa State University, Ames, Iowa 50011, United States
- Department
of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa 50011, United States
- Bioinformatics
and Computational Biology Program, Iowa
State University, Ames, Iowa 50011, United States
| |
Collapse
|
4
|
Göktepe YE. Protein-protein interaction prediction using enhanced features with spaced conjoint triad and amino acid pairwise distance. PeerJ Comput Sci 2025; 11:e2748. [PMID: 40134873 PMCID: PMC11935777 DOI: 10.7717/peerj-cs.2748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 02/14/2025] [Indexed: 03/27/2025]
Abstract
Protein-protein interactions (PPIs) are pivotal in cellular processes, influencing a wide range of functions, from metabolism to immune responses. Despite the advancements in experimental techniques for PPI detection, their inherent limitations, such as high false-positive rates and significant resource demands, necessitate the development of computational approaches. This study presents a novel computational model named MFPIC (Multi-Feature Protein Interaction Classifier) for predicting PPIs, integrating enhanced sequence-based features, including a novel spaced conjoint triad (SCT) and amino acid pairwise distance (AAPD), with existing methods such as position-specific scoring matrices (PSSM) and AAindex-based features. The SCT captures complex sequence motifs by considering non-adjacent amino acid interactions, while AAPD provides critical spatial information about amino acid residues within protein sequences. The proposed model was evaluated across three benchmark datasets-Saccharomyces cerevisiae, Helicobacter pylori, and human proteins-demonstrating superior performance in comparison to state-of-the-art models. The results underscore the efficacy of integrating diverse and complementary features, achieving significant improvements in predictive accuracy, with the model achieving 95.90%, 99.33%, and 90.95% accuracy on the Saccharomyces cerevisiae, Helicobacter pylori, and human dataset, respectively. This approach not only enhances our understanding of PPI mechanisms but also offers valuable insights for the development of targeted therapeutic strategies.
Collapse
|
5
|
Murmu S, Chaurasia H, Rao AR, Rai A, Jaiswal S, Bharadwaj A, Yadav R, Archak S. PlantPathoPPI: An Ensemble-based Machine Learning Architecture for Prediction of Protein-Protein Interactions between Plants and Pathogens. J Mol Biol 2025:169093. [PMID: 40133779 DOI: 10.1016/j.jmb.2025.169093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Revised: 03/11/2025] [Accepted: 03/12/2025] [Indexed: 03/27/2025]
Abstract
This study aimed to develop a machine learning-based tool for predicting protein-protein interactions (PPIs) between plant-pathogen systems, addressing the challenges of experimental PPI identification. Identifying PPIs in plant-pathogen interactions is crucial for understanding the molecular mechanisms underlying plant defense and pathogen virulence. However, experimental methods are time-consuming and labor-intensive, prompting the use of computational techniques to complement traditional approaches. A robust ensemble model was developed using multiple sequence encodings and diverse learning algorithms such as random forest, support vector machine, and artificial neural network. The features used included auto-covariance, conjoint triad, and local descriptor schemes, which were selected based on their performance. The top three performing models were combined into an ensemble model, improving prediction accuracy to approximately 97%. The PlantPathoPPI tool, developed through this approach, was compared with existing tools using an independent test dataset, showing promising potential for PPI prediction in plant-pathogen interactions. To facilitate broad accessibility, a web-based prediction server was developed, available at https://plantpathoppi.onrender.com/, alongside a Python package on https://pypi.org/project/plantpathoppi-ml/. This research contributes significantly to the field by offering an efficient tool for predicting PPIs in plant-pathogen systems, providing valuable insights into plant diseases and supporting hypothesis-driven research.
Collapse
Affiliation(s)
- Sneha Murmu
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India; ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Himanshushekhar Chaurasia
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India; ICAR-Indian Agricultural Research Institute, New Delhi 110012, India; ICAR-Central Institute for Research on Cotton Technology, Mumbai 400019, India
| | - A R Rao
- Indian Council of Agricultural Research, New Delhi 110001, India
| | - Anil Rai
- Indian Council of Agricultural Research, New Delhi 110001, India
| | - Sarika Jaiswal
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Anshu Bharadwaj
- ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India
| | - Rajbir Yadav
- ICAR-Indian Agricultural Research Institute, New Delhi 110012, India
| | - Sunil Archak
- ICAR-National Bureau of Plant Genetic Resources, New Delhi 110012, India.
| |
Collapse
|
6
|
Zhou Y, Lin H, Xie L, Huang Y, Wu L, Li SZ, Chen W. Effectiveness and Efficiency: Label-Aware Hierarchical Subgraph Learning for Protein-Protein Interaction. J Mol Biol 2025; 437:168737. [PMID: 39102976 DOI: 10.1016/j.jmb.2024.168737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 07/26/2024] [Accepted: 07/31/2024] [Indexed: 08/07/2024]
Abstract
The study of protein-protein interactions (PPIs) holds immense significance in understanding various biological activities, as well as in drug discovery and disease diagnosis. Existing deep learning methods for PPI prediction, including graph neural networks (GNNs), have been widely employed as the solutions, while they often experience a decline in performance in the real world. We claim that the topological shortcut is one of the key problems contributing negatively to the performance, according to our analysis. By modeling the PPIs as a graph with protein as nodes and interactions as edge types, the prevailing models tend to learn the pattern of nodes' degrees rather than intrinsic sequence-structure profiles, leading to the problem termed topological shortcut. The huge data growth of PPI leads to intensive computational costs and challenges computing devices, causing infeasibility in practice. To address the discussed problems, we propose a label-aware hierarchical subgraph learning method (laruGL-PPI) that can effectively infer PPIs while being interpretable. Specifically, we introduced edge-based subgraph sampling to effectively alleviate the problems of topological shortcuts and high computing costs. Besides, the inner-outer connections of PPIs are modeled as a hierarchical graph, together with the dependencies between interaction types constructed by a label graph. Extensive experiments conducted across various scales of PPI datasets have conclusively demonstrated that the laruGL-PPI method surpasses the most advanced PPI prediction techniques currently available, particularly in the testing of unseen proteins. Also, our model can recognize crucial sites of proteins, such as surface sites for binding and active sites for catalysis.
Collapse
Affiliation(s)
- Yuanqing Zhou
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China; AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Haitao Lin
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Lianghua Xie
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China
| | - Yufei Huang
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Lirong Wu
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China
| | - Stan Z Li
- AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou 310024, China.
| | - Wei Chen
- Department of Food Science and Nutrition, College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou 310058, China.
| |
Collapse
|
7
|
Zhou S, Luo J, Tang M, Li C, Li Y, He W. Predicting protein-protein interactions in microbes associated with cardiovascular diseases using deep denoising autoencoders and evolutionary information. Front Pharmacol 2025; 16:1565860. [PMID: 40135232 PMCID: PMC11932980 DOI: 10.3389/fphar.2025.1565860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Accepted: 02/17/2025] [Indexed: 03/27/2025] Open
Abstract
Introduction Protein-protein interactions (PPIs) are critical for understanding the molecular mechanisms underlying various biological processes, particularly in microbes associated with cardiovascular disease. Traditional experimental methods for detecting PPIs are often time-consuming and costly, leading to an urgent need for reliable computational approaches. Methods In this study, we present a novel model, the deep denoising autoencoder for protein-protein interaction (DAEPPI), which leverages the denoising autoencoder and the CatBoost algorithm to predict PPIs from the evolutionary information of protein sequences. Results Our extensive experiments demonstrate the effectiveness of the DAEPPI model, achieving average prediction accuracies of 97.85% and 98.49% on yeast and human datasets, respectively. Comparative analyses with existing effective methods further validate the robustness and reliability of our model in predicting PPIs. Discussion Additionally, we explore the application of DAEPPI in the context of cardiovascular disease, showcasing its potential to uncover significant interactions that could contribute to the understanding of disease mechanisms. Our findings indicate that DAEPPI is a powerful tool for advancing research in proteomics and could play a pivotal role in the identification of novel therapeutic targets in cardiovascular disease.
Collapse
Affiliation(s)
- Senyu Zhou
- Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
| | - Jian Luo
- Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
| | - Mei Tang
- Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
| | - Chaojun Li
- Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
| | - Wenhua He
- Cardiovascular Department, The Fourth Hospital of Changsha (Integrated Traditional Chinese and Western Medicine Hospital of Changsha, Changsha Hospital of Hunan Normal University), Changsha, China
| |
Collapse
|
8
|
Ye A, Zhang JY, Xu Q, Guo HX, Liao Z, Cui H, Zhang D, Guo FB. Carmna: classification and regression models for nitrogenase activity based on a pretrained large protein language model. Brief Bioinform 2025; 26:bbaf197. [PMID: 40273431 PMCID: PMC12021265 DOI: 10.1093/bib/bbaf197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 01/11/2025] [Accepted: 04/07/2025] [Indexed: 04/26/2025] Open
Abstract
Nitrogen-fixing microorganisms play a critical role in the global nitrogen cycle by converting atmospheric nitrogen into ammonia through the action of nitrogenase (EC 1.18.6.1). In this study, we employed six machine learning algorithms to model the classification and regression of nitrogenase activity (Carmna). Carmna utilized the pretrained large-scale model ProtT5 for feature extraction from nitrogenase sequences and incorporated additional features, such as gene expression and codon preference, for model training. The optimal classification model, based on XGBoost, achieved an average area under receiver operating characteristic curve of 0.9365 and an F1 score of 0.85 in five-fold cross-validation. For regression, the best-performing model was a stacking approach based on support vector regression, with an average R2 of 0.5572 and a mean absolute error of 0.3351. Further interpretability analysis of the optimal regression model revealed that not only the proportion and codon preferences of standard amino acids, but also the expression levels and spatial distance of nitrogenase genes were associated with nitrogenase activity. We also obtained the minimum nitrogen-fixing nif cluster. This study deepens our understanding of the complex mechanisms regulating nitrogenase activity and contributes to the development of efficient bio-fertilizers.
Collapse
Affiliation(s)
- Anqiang Ye
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Ji-Yun Zhang
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Qian Xu
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Hai-Xia Guo
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Zhen Liao
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Hongtu Cui
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Dongdong Zhang
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| | - Feng-Biao Guo
- Department of Respiratory and Critical Care Medicine, Zhongnan Hospital of Wuhan University, School of Pharmaceutical Sciences, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, Wuhan University, 185 Donghu Road, Wuchang District, Wuhan 430071, China
| |
Collapse
|
9
|
Bian Q, Shen Z, Gao J, Shen L, Lu Y, Zhang Q, Chen R, Xu D, Liu T, Che J, Lu Y, Dong X. PPI-CoAttNet: A Web Server for Protein-Protein Interaction Tasks Using a Coattention Model. J Chem Inf Model 2025; 65:461-471. [PMID: 39761551 DOI: 10.1021/acs.jcim.4c01365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Predicting protein-protein interactions (PPIs) is crucial for advancing drug discovery. Despite the proposal of numerous advanced computational methods, these approaches often suffer from poor usability for biologists and lack generalization. In this study, we designed a deep learning model based on a coattention mechanism that was capable of both PPI and site prediction and used this model as the foundation for PPI-CoAttNet, a user-friendly, multifunctional web server for PPI prediction. This platform provides comprehensive services for online PPI model training, PPI and site prediction, and prediction of interactions with proteins associated with highly prevalent cancers. In our Homo sapiens test set for PPI prediction, PPI-CoAttNet achieved an AUC of 0.9841 and an F1 score of 0.9440, outperforming most state-of-the-art models. Additionally, these results are generated in real time, delivering outcomes within minutes. We also evaluated PPI-CoAttNet for downstream tasks, including novel E3 ligase scoring, demonstrating outstanding accuracy. We believe that this tool will empower researchers, especially those without computational expertise, to leverage AI for accelerating drug development.
Collapse
Affiliation(s)
- Qingyu Bian
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zheyuan Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jian Gao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Liteng Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yang Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingnan Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Roufen Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Donghang Xu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Tao Liu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yan Lu
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
10
|
Nguyen MH, Tran ND, Le NQK. Big Data and Artificial Intelligence in Drug Discovery for Gastric Cancer: Current Applications and Future Perspectives. Curr Med Chem 2025; 32:1968-1986. [PMID: 37711014 DOI: 10.2174/0929867331666230913105829] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Revised: 07/04/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023]
Abstract
Gastric cancer (GC) represents a significant global health burden, ranking as the fifth most common malignancy and the fourth leading cause of cancer-related death worldwide. Despite recent advancements in GC treatment, the five-year survival rate for advanced-stage GC patients remains low. Consequently, there is an urgent need to identify novel drug targets and develop effective therapies. However, traditional drug discovery approaches are associated with high costs, time-consuming processes, and a high failure rate, posing challenges in meeting this critical need. In recent years, there has been a rapid increase in the utilization of artificial intelligence (AI) algorithms and big data in drug discovery, particularly in cancer research. AI has the potential to improve the drug discovery process by analyzing vast and complex datasets from multiple sources, enabling the prediction of compound efficacy and toxicity, as well as the optimization of drug candidates. This review provides an overview of the latest AI algorithms and big data employed in drug discovery for GC. Additionally, we examine the various applications of AI in this field, with a specific focus on therapeutic discovery. Moreover, we discuss the challenges, limitations, and prospects of emerging AI methods, which hold significant promise for advancing GC research in the future.
Collapse
Affiliation(s)
- Mai Hanh Nguyen
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
- Pathology and Forensic Medicine Department, 103 Military Hospital, Hanoi, Vietnam
| | - Ngoc Dung Tran
- Pathology and Forensic Medicine Department, 103 Military Hospital, Hanoi, Vietnam
| | - Nguyen Quoc Khanh Le
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| |
Collapse
|
11
|
Yin H, Duo H, Li S, Qin D, Xie L, Xiao Y, Sun J, Tao J, Zhang X, Li Y, Zou Y, Yang Q, Yang X, Hao Y, Li B. Unlocking biological insights from differentially expressed genes: Concepts, methods, and future perspectives. J Adv Res 2024:S2090-1232(24)00560-5. [PMID: 39647635 DOI: 10.1016/j.jare.2024.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 10/12/2024] [Accepted: 12/03/2024] [Indexed: 12/10/2024] Open
Abstract
BACKGROUND Identifying differentially expressed genes (DEGs) is a core task of transcriptome analysis, as DEGs can reveal the molecular mechanisms underlying biological processes. However, interpreting the biological significance of large DEG lists is challenging. Currently, gene ontology, pathway enrichment and protein-protein interaction analysis are common strategies employed by biologists. Additionally, emerging analytical strategies/approaches (such as network module analysis, knowledge graph, drug repurposing, cell marker discovery, trajectory analysis, and cell communication analysis) have been proposed. Despite these advances, comprehensive guidelines for systematically and thoroughly mining the biological information within DEGs remain lacking. AIM OF REVIEW This review aims to provide an overview of essential concepts and methodologies for the biological interpretation of DEGs, enhancing the contextual understanding. It also addresses the current limitations and future perspectives of these approaches, highlighting their broad applications in deciphering the molecular mechanism of complex diseases and phenotypes. To assist users in extracting insights from extensive datasets, especially various DEG lists, we developed DEGMiner (https://www.ciblab.net/DEGMiner/), which integrates over 300 easily accessible databases and tools. KEY SCIENTIFIC CONCEPTS OF REVIEW This review offers strong support and guidance for exploring DEGs, and also will accelerate the discovery of hidden biological insights within genomes.
Collapse
Affiliation(s)
- Huachun Yin
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China; Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing 400037, PR China; Department of Neurobiology, Chongqing Key Laboratory of Neurobiology, The Army Medical University, Chongqing 400038, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Song Li
- Department of Neurosurgery, Xinqiao Hospital, The Army Medical University, Chongqing 400037, PR China
| | - Dan Qin
- Department of Biology, College of Science, Northeastern University, Boston, MA 02115, USA
| | - Lingling Xie
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, PR China
| | - Yue Zou
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, PR China
| | - Xian Yang
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, PR China.
| |
Collapse
|
12
|
Wang L, Li R, Guan X, Yan S. Prediction of protein interactions between pine and pine wood nematode using deep learning and multi-dimensional feature fusion. FRONTIERS IN PLANT SCIENCE 2024; 15:1489116. [PMID: 39687321 PMCID: PMC11646721 DOI: 10.3389/fpls.2024.1489116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2024] [Accepted: 11/12/2024] [Indexed: 12/18/2024]
Abstract
Pine Wilt Disease (PWD) is a devastating forest disease that has a serious impact on ecological balance ecological. Since the identification of plant-pathogen protein interactions (PPIs) is a critical step in understanding the pathogenic system of the pine wilt disease, this study proposes a Multi-feature Fusion Graph Attention Convolution (MFGAC-PPI) for predicting plant-pathogen PPIs based on deep learning. Compared with methods based on single-feature information, MFGAC-PPI obtains more 3D characterization information by utilizing AlphaFold and combining protein sequence features to extract multi-dimensional features via Transform with improved GCN. The performance of MFGAC-PPI was compared with the current representative methods of sequence-based, structure-based and hybrid characterization, demonstrating its superiority across all metrics. The experiments showed that learning multi-dimensional feature information effectively improved the ability of MFGAC-PPI in plant and pathogen PPI prediction tasks. Meanwhile, a pine wilt disease PPI network consisting of 2,688 interacting protein pairs was constructed based on MFGAC-PPI, which made it possible to systematically discover new disease resistance genes in pine trees and promoted the understanding of plant-pathogen interactions.
Collapse
Affiliation(s)
- Liuyan Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Rongguang Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Xuemei Guan
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, Heilongjiang, China
| | - Shanchun Yan
- Key Laboratory of Sustainable Forest Ecosystem Management, School of Forestry, Northeast Forestry University, Harbin, Heilongjiang, China
| |
Collapse
|
13
|
Zhou L, Song J, Li Z, Hu Y, Guo W. THGB: predicting ligand-receptor interactions by combining tree boosting and histogram-based gradient boosting. Sci Rep 2024; 14:29604. [PMID: 39609487 PMCID: PMC11604971 DOI: 10.1038/s41598-024-78954-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 11/05/2024] [Indexed: 11/30/2024] Open
Abstract
Ligand-receptor interaction (LRI) prediction has great significance in biological and medical research and facilitates to infer and analyze cell-to-cell communication. However, wet experiments for new LRI discovery are costly and time-consuming. Here, we propose a computational model called THGB to uncover new LRIs. THGB first extracts feature information of Ligand-Receptor (LR) pairs using iFeature. Next, it adopts a tree boosting model to obtain representative LR features. Finally, it devises the histogram-based gradient boosting model to capture high-quality LRIs. To assess the THGB performance, we compared it with three new LRI prediction models (i.e., CellEnBoost, CellGiQ, and CellComNet) and one classical protein-protein interaction inference model PIPR. The results demonstrated that THGB achieved the best overall predictions in terms of six evaluation indictors (i.e., precision, recall, accuracy, F1-score, AUC, and AUPR). To measure the effect of LR feature selection on the prediction, THGB was compared with four feature selection methods (i.e., PCA, NMF, LLE, and TSVD). The results showed that the tree boosting model was more appropriate to select representative LR features and improve LRI prediction. We also conducted ablation study and found that THGB with feature selection outperformed THGB without feature selection. We hope that THGB is a useful tool to find new LRIs and further infer cell-to-cell communication.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Jiao Song
- School of Computer Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Zejun Li
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, 421002, Hunan, China.
| | - Yingxi Hu
- School of Science, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| | - Wenyan Guo
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, 412007, Hunan, China
| |
Collapse
|
14
|
Luo X, Chi ASY, Lin AH, Ong TJ, Wong L, Rahman CR. Benchmarking recent computational tools for DNA-binding protein identification. Brief Bioinform 2024; 26:bbae634. [PMID: 39657630 PMCID: PMC11630855 DOI: 10.1093/bib/bbae634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 10/29/2024] [Accepted: 11/20/2024] [Indexed: 12/12/2024] Open
Abstract
Identification of DNA-binding proteins (DBPs) is a crucial task in genome annotation, as it aids in understanding gene regulation, DNA replication, transcriptional control, and various cellular processes. In this paper, we conduct an unbiased benchmarking of 11 state-of-the-art computational tools as well as traditional tools such as ScanProsite, BLAST, and HMMER for identifying DBPs. We highlight the data leakage issue in conventional datasets leading to inflated performance. We introduce new evaluation datasets to support further development. Through a comprehensive evaluation pipeline, we identify potential limitations in models, feature extraction techniques, and training methods, and recommend solutions regarding these issues. We show that combining the predictions of the two best computational tools with BLAST-based prediction significantly enhances DBP identification capability. We provide this consensus method as user-friendly software. The datasets and software are available at https://github.com/Rafeed-bot/DNA_BP_Benchmarking.
Collapse
Affiliation(s)
- Xizi Luo
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Amadeus Song Yi Chi
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Andre Huikai Lin
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Tze Jet Ong
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore 119077, Singapore
| | | |
Collapse
|
15
|
Akid H, Chennen K, Frey G, Thompson J, Ben Ayed M, Lachiche N. Graph-based machine learning model for weight prediction in protein-protein networks. BMC Bioinformatics 2024; 25:349. [PMID: 39511478 PMCID: PMC11546293 DOI: 10.1186/s12859-024-05973-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024] Open
Abstract
Proteins interact with each other in complex ways to perform significant biological functions. These interactions, known as protein-protein interactions (PPIs), can be depicted as a graph where proteins are nodes and their interactions are edges. The development of high-throughput experimental technologies allows for the generation of numerous data which permits increasing the sophistication of PPI models. However, despite significant progress, current PPI networks remain incomplete. Discovering missing interactions through experimental techniques can be costly, time-consuming, and challenging. Therefore, computational approaches have emerged as valuable tools for predicting missing interactions. In PPI networks, a graph is usually used to model the interactions between proteins. An edge between two proteins indicates a known interaction, while the absence of an edge means the interaction is not known or missed. However, this binary representation overlooks the reliability of known interactions when predicting new ones. To address this challenge, we propose a novel approach for link prediction in weighted protein-protein networks, where interaction weights denote confidence scores. By leveraging data from the yeast Saccharomyces cerevisiae obtained from the STRING database, we introduce a new model that combines similarity-based algorithms and aggregated confidence score weights for accurate link prediction purposes. Our model significantly improves prediction accuracy, surpassing traditional approaches in terms of Mean Absolute Error, Mean Relative Absolute Error, and Root Mean Square Error. Our proposed approach holds the potential for improved accuracy in predicting PPIs, which is crucial for better understanding the underlying biological processes.
Collapse
Affiliation(s)
- Hajer Akid
- ICube, University of Strasbourg, 67412, Illkirch Cedex, France.
| | - Kirsley Chennen
- ICube, University of Strasbourg, 67412, Illkirch Cedex, France
| | - Gabriel Frey
- ICube, University of Strasbourg, 67412, Illkirch Cedex, France
| | - Julie Thompson
- ICube, University of Strasbourg, 67412, Illkirch Cedex, France
| | | | | |
Collapse
|
16
|
Gong F, Cao D, Sun X, Li Z, Qu C, Fan Y, Cao Z, Zhao K, Zhao K, Qiu D, Li Z, Ren R, Ma X, Zhang X, Yin D. Homologous mapping yielded a comprehensive predicted protein-protein interaction network for peanut (Arachis hypogaea L.). BMC PLANT BIOLOGY 2024; 24:873. [PMID: 39304811 DOI: 10.1186/s12870-024-05580-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Protein-protein interactions are the primary means through which proteins carry out their functions. These interactions thus have crucial roles in life activities. The wide availability of fully sequenced animal and plant genomes has facilitated establishment of relatively complete global protein interaction networks for some model species. The genomes of cultivated and wild peanut (Arachis hypogaea L.) have also been sequenced, but the functions of most of the encoded proteins remain unclear. RESULTS We here used homologous mapping of validated protein interaction data from model species to generate complete peanut protein interaction networks for A. hypogaea cv. 'Tifrunner' (282,619 pairs), A. hypogaea cv. 'Shitouqi' (256,441 pairs), A. monticola (440,470 pairs), A. duranensis (136,363 pairs), and A. ipaensis (172,813 pairs). A detailed analysis was conducted for a putative disease-resistance subnetwork in the Tifrunner network to identify candidate genes and validate functional interactions. The network suggested that DX2UEH and its interacting partners may participate in peanut resistance to bacterial wilt; this was preliminarily validated with overexpression experiments in peanut. CONCLUSION Our results provide valuable new information for future analyses of gene and protein functions and regulatory networks in peanut.
Collapse
Affiliation(s)
- Fangping Gong
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Di Cao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xiaojian Sun
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zhuo Li
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Chengxin Qu
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Yi Fan
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zenghui Cao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Kai Zhao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Kunkun Zhao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Ding Qiu
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zhongfeng Li
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Rui Ren
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xingli Ma
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xingguo Zhang
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Dongmei Yin
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China.
| |
Collapse
|
17
|
Zhang F, Chang S, Wang B, Zhang X. DSSGNN-PPI: A Protein-Protein Interactions prediction model based on Double Structure and Sequence graph neural networks. Comput Biol Med 2024; 177:108669. [PMID: 38833802 DOI: 10.1016/j.compbiomed.2024.108669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 04/04/2024] [Accepted: 05/26/2024] [Indexed: 06/06/2024]
Abstract
The process of experimentally confirming complex interaction networks among proteins is time-consuming and laborious. This study aims to address Protein-Protein Interactions (PPIs) prediction based on graph neural networks (GNN). A novel multilevel prediction model for PPIs named DSSGNN-PPI (Double Structure and Sequence GNN for PPIs) is designed. Initially, a distance graph between amino acid residues is constructed. Subsequently, the distance graph is fed into an underlying graph attention network module. This enables us to efficiently learn vector representations that encode the three-dimensional structure of proteins and simultaneously aggregate key local patterns and overall topological information to obtain graph embedding that adequately represent local and global structural features. In addition, the embedding representations that reflect sequence properties are obtained. Two features are fused to construct high-level protein complex networks, which are fed into the designed gated graph attention network to extract complex topological patterns. By combining heterogeneous multi-source information from downstream structure graph and upstream sequence models, the understanding of PPIs is comprehensively enhanced. A series of evaluation results validate the remarkable effectiveness of DSSGNN-PPI framework in enhancing the prediction of multi-type interactions among proteins. The multilevel representation learning and information fusion strategies provide a new effective solution paradigm for structural biology problems. The source code for DSSGNN-PPI has been hosted on GitHub and is available at https://github.com/cstudy1/DSSGNN-PPI.
Collapse
Affiliation(s)
- Fan Zhang
- Huaihe Hospital of Henan University, Kaifeng 475004, China; School of Computer and Information Engineering, Henan University, Kaifeng 475004, China.
| | - Sheng Chang
- School of Computer and Information Engineering, Henan University, Kaifeng 475004, China.
| | - Binjie Wang
- Huaihe Hospital of Henan University, Kaifeng 475004, China.
| | - Xinhong Zhang
- School of Software, Henan University, Kaifeng, 475004, China.
| |
Collapse
|
18
|
Pancino N, Gallegati C, Romagnoli F, Bongini P, Bianchini M. Protein-Protein Interfaces: A Graph Neural Network Approach. Int J Mol Sci 2024; 25:5870. [PMID: 38892057 PMCID: PMC11173158 DOI: 10.3390/ijms25115870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 05/15/2024] [Accepted: 05/24/2024] [Indexed: 06/21/2024] Open
Abstract
Protein-protein interactions (PPIs) are fundamental processes governing cellular functions, crucial for understanding biological systems at the molecular level. Compared to experimental methods for PPI prediction and site identification, computational deep learning approaches represent an affordable and efficient solution to tackle these problems. Since protein structure can be summarized as a graph, graph neural networks (GNNs) represent the ideal deep learning architecture for the task. In this work, PPI prediction is modeled as a node-focused binary classification task using a GNN to determine whether a generic residue is part of the interface. Biological data were obtained from the Protein Data Bank in Europe (PDBe), leveraging the Protein Interfaces, Surfaces, and Assemblies (PISA) service. To gain a deeper understanding of how proteins interact, the data obtained from PISA were assembled into three datasets: Whole, Interface, and Chain, consisting of data on the whole protein, couples of interacting chains, and single chains, respectively. These three datasets correspond to three different nuances of the problem: identifying interfaces between protein complexes, between chains of the same protein, and interface regions in general. The results indicate that GNNs are capable of solving each of the three tasks with very good performance levels.
Collapse
Affiliation(s)
- Niccolò Pancino
- Department of Information Engineering and Mathematics, University of Siena, Via Roma, 56, 53100 Siena, Italy; (C.G.); (P.B.); (M.B.)
| | | | | | | | | |
Collapse
|
19
|
Tran HN, Nguyen PXQ, Guo F, Wang J. Prediction of Protein-Protein Interactions Based on Integrating Deep Learning and Feature Fusion. Int J Mol Sci 2024; 25:5820. [PMID: 38892007 PMCID: PMC11172432 DOI: 10.3390/ijms25115820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 04/27/2024] [Accepted: 04/29/2024] [Indexed: 06/21/2024] Open
Abstract
Understanding protein-protein interactions (PPIs) helps to identify protein functions and develop other important applications such as drug preparation and protein-disease relationship identification. Deep-learning-based approaches are being intensely researched for PPI determination to reduce the cost and time of previous testing methods. In this work, we integrate deep learning with feature fusion, harnessing the strengths of both approaches, handcrafted features, and protein sequence embedding. The accuracies of the proposed model using five-fold cross-validation on Yeast core and Human datasets are 96.34% and 99.30%, respectively. In the task of predicting interactions in important PPI networks, our model correctly predicted all interactions in one-core, Wnt-related, and cancer-specific networks. The experimental results on cross-species datasets, including Caenorhabditis elegans, Helicobacter pylori, Homo sapiens, Mus musculus, and Escherichia coli, also show that our feature fusion method helps increase the generalization capability of the PPI prediction model.
Collapse
Affiliation(s)
| | | | | | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China (F.G.)
| |
Collapse
|
20
|
Ma W, Bi X, Jiang H, Zhang S, Wei Z. CollaPPI: A Collaborative Learning Framework for Predicting Protein-Protein Interactions. IEEE J Biomed Health Inform 2024; 28:3167-3177. [PMID: 38466584 DOI: 10.1109/jbhi.2024.3375621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Exploring protein-protein interaction (PPI) is of paramount importance for elucidating the intrinsic mechanism of various biological processes. Nevertheless, experimental determination of PPI can be both time-consuming and expensive, motivating the exploration of data-driven deep learning technologies as a viable, efficient, and accurate alternative. Nonetheless, most current deep learning-based methods regarded a pair of proteins to be predicted for possible interaction as two separate entities when extracting PPI features, thus neglecting the knowledge sharing among the collaborative protein and the target protein. Aiming at the above issue, a collaborative learning framework CollaPPI was proposed in this study, where two kinds of collaboration, i.e., protein-level collaboration and task-level collaboration, were incorporated to achieve not only the knowledge-sharing between a pair of proteins, but also the complementation of such shared knowledge between biological domains closely related to PPI (i.e., protein function, and subcellular location). Evaluation results demonstrated that CollaPPI obtained superior performance compared to state-of-the-art methods on two PPI benchmarks. Besides, evaluation results of CollaPPI on the additional PPI type prediction task further proved its excellent generalization ability.
Collapse
|
21
|
Mischley V, Maier J, Chen J, Karanicolas J. PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.16.585347. [PMID: 38559274 PMCID: PMC10979958 DOI: 10.1101/2024.03.16.585347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein-protein interactions underlie nearly all cellular processes. With the advent of protein structure prediction methods such as AlphaFold2 (AF2), models of specific protein pairs can be built extremely accurately in most cases. However, determining the relevance of a given protein pair remains an open question. It is presently unclear how to use best structure-based tools to infer whether a pair of candidate proteins indeed interact with one another: ideally, one might even use such information to screen amongst candidate pairings to build up protein interaction networks. Whereas methods for evaluating quality of modeled protein complexes have been co-opted for determining which pairings interact (e.g., pDockQ and iPTM), there have been no rigorously benchmarked methods for this task. Here we introduce PPIscreenML, a classification model trained to distinguish AF2 models of interacting protein pairs from AF2 models of compelling decoy pairings. We find that PPIscreenML out-performs methods such as pDockQ and iPTM for this task, and further that PPIscreenML exhibits impressive performance when identifying which ligand/receptor pairings engage one another across the structurally conserved tumor necrosis factor superfamily (TNFSF). Analysis of benchmark results using complexes not seen in PPIscreenML development strongly suggest that the model generalizes beyond training data, making it broadly applicable for identifying new protein complexes based on structural models built with AF2.
Collapse
Affiliation(s)
- Victoria Mischley
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia PA 19111
- Molecular Cell Biology and Genetics, Drexel University, Philadelphia PA 19102
| | | | | | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia PA 19111
- Moulder Center for Drug Discovery Research, Temple University School of Pharmacy, Philadelphia PA 19140
| |
Collapse
|
22
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
23
|
Xian L, Wang Y. Advances in Computational Methods for Protein–Protein Interaction Prediction. ELECTRONICS 2024; 13:1059. [DOI: 10.3390/electronics13061059] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
Protein–protein interactions (PPIs) are pivotal in various physiological processes inside biological entities. Accurate identification of PPIs holds paramount significance for comprehending biological processes, deciphering disease mechanisms, and advancing medical research. Given the costly and labor-intensive nature of experimental approaches, a multitude of computational methods have been devised to enable swift and large-scale PPI prediction. This review offers a thorough examination of recent strides in computational methodologies for PPI prediction, with a particular focus on the utilization of deep learning techniques within this domain. Alongside a systematic classification and discussion of relevant databases, feature extraction strategies, and prominent computational approaches, we conclude with a thorough analysis of current challenges and prospects for the future of this field.
Collapse
Affiliation(s)
- Lei Xian
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|
24
|
Qi X, Zhao Y, Qi Z, Hou S, Chen J. Machine Learning Empowering Drug Discovery: Applications, Opportunities and Challenges. Molecules 2024; 29:903. [PMID: 38398653 PMCID: PMC10892089 DOI: 10.3390/molecules29040903] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/08/2024] [Accepted: 02/14/2024] [Indexed: 02/25/2024] Open
Abstract
Drug discovery plays a critical role in advancing human health by developing new medications and treatments to combat diseases. How to accelerate the pace and reduce the costs of new drug discovery has long been a key concern for the pharmaceutical industry. Fortunately, by leveraging advanced algorithms, computational power and biological big data, artificial intelligence (AI) technology, especially machine learning (ML), holds the promise of making the hunt for new drugs more efficient. Recently, the Transformer-based models that have achieved revolutionary breakthroughs in natural language processing have sparked a new era of their applications in drug discovery. Herein, we introduce the latest applications of ML in drug discovery, highlight the potential of advanced Transformer-based ML models, and discuss the future prospects and challenges in the field.
Collapse
Affiliation(s)
- Xin Qi
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Yuanchun Zhao
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Zhuang Qi
- School of Software, Shandong University, Jinan 250101, China;
| | - Siyu Hou
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| | - Jiajia Chen
- School of Chemistry and Life Sciences, Suzhou University of Science and Technology, Suzhou 215011, China; (Y.Z.); (S.H.); (J.C.)
| |
Collapse
|
25
|
Zhao M, Lei C, Zhou K, Huang Y, Fu C, Yang S, Zhang Z. POOE: predicting oomycete effectors based on a pre-trained large protein language model. mSystems 2024; 9:e0100423. [PMID: 38078741 PMCID: PMC10804963 DOI: 10.1128/msystems.01004-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/23/2023] [Indexed: 01/24/2024] Open
Abstract
Oomycetes are fungus-like eukaryotic microorganisms which can cause catastrophic diseases in many plants. Successful infection of oomycetes depends highly on their effector proteins that are secreted into plant cells to subvert plant immunity. Thus, systematic identification of effectors from the oomycete proteomes remains an initial but crucial step in understanding plant-pathogen relationships. However, the number of experimentally identified oomycete effectors is still limited. Currently, only a few bioinformatics predictors exist to detect potential effectors, and their prediction performance needs to be improved. Here, we used the sequence embeddings from a pre-trained large protein language model (ProtTrans) as input and developed a support vector machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance with an area under the precision-recall curve of 0.804 (area under the receiver operating characteristic curve = 0.893, accuracy = 0.874, precision = 0.777, recall = 0.684, and specificity = 0.936) in the fivefold cross-validation, considerably outperforming various combinations of popular machine learning algorithms and other commonly used sequence encoding schemes. A similar prediction performance was also observed in the independent test. Compared with the existing oomycete effector prediction methods, POOE provided very competitive and promising performance, suggesting that ProtTrans effectively captures rich protein semantic information and dramatically improves the prediction task. We anticipate that POOE can accelerate the identification of oomycete effectors and provide new hints to systematically understand the functional roles of effectors in plant-pathogen interactions. The web server of POOE is freely accessible at http://zzdlab.com/pooe/index.php. The corresponding source codes and data sets are also available at https://github.com/zzdlabzm/POOE.IMPORTANCEIn this work, we use the sequence representations from a pre-trained large protein language model (ProtTrans) as input and develop a Support Vector Machine-based method called POOE for predicting oomycete effectors. POOE could achieve a highly accurate performance in the independent test set, considerably outperforming existing oomycete effector prediction methods. We expect that this new bioinformatics tool will accelerate the identification of oomycete effectors and further guide the experimental efforts to interrogate the functional roles of effectors in plant-pathogen interaction.
Collapse
Affiliation(s)
- Miao Zhao
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Chenping Lei
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Kewei Zhou
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Yan Huang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Chen Fu
- School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing, China
| | - Shiping Yang
- State Key Laboratory of Plant Environmental Resilience, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| |
Collapse
|
26
|
Bernett J, Blumenthal DB, List M. Cracking the black box of deep sequence-based protein-protein interaction prediction. Brief Bioinform 2024; 25:bbae076. [PMID: 38446741 PMCID: PMC10939362 DOI: 10.1093/bib/bbae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/09/2024] [Indexed: 03/08/2024] Open
Abstract
Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.
Collapse
Affiliation(s)
- Judith Bernett
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Werner-von-Siemens-Str. 61, 91052, Erlangen, Germany
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof Forum 3, 85354, Freising, Germany
| |
Collapse
|
27
|
Yang Z, Zhang Z, Li J, Chen W, Liu C. CRISPRlnc: a machine learning method for lncRNA-specific single-guide RNA design of CRISPR/Cas9 system. Brief Bioinform 2024; 25:bbae066. [PMID: 38426328 PMCID: PMC10905519 DOI: 10.1093/bib/bbae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 01/22/2024] [Accepted: 02/03/2024] [Indexed: 03/02/2024] Open
Abstract
CRISPR/Cas9 is a promising RNA-guided genome editing technology, which consists of a Cas9 nuclease and a single-guide RNA (sgRNA). So far, a number of sgRNA prediction softwares have been developed. However, they were usually designed for protein-coding genes without considering that long non-coding RNA (lncRNA) genes may have different characteristics. In this study, we first evaluated the performances of a series of known sgRNA-designing tools in the context of both coding and non-coding datasets. Meanwhile, we analyzed the underpinnings of their varied performances on the sgRNA's specificity for lncRNA including nucleic acid sequence, genome location and editing mechanism preference. Furthermore, we introduce a support vector machine-based machine learning algorithm named CRISPRlnc, which aims to model both CRISPR knock-out (CRISPRko) and CRISPR inhibition (CRISPRi) mechanisms to predict the on-target activity of targets. CRISPRlnc combined the paired-sgRNA design and off-target analysis to achieve one-stop design of CRISPR/Cas9 sgRNAs for non-coding genes. Performance comparison on multiple datasets showed that CRISPRlnc was far superior to existing methods for both CRISPRko and CRISPRi mechanisms during the lncRNA-specific sgRNA design. To maximize the availability of CRISPRlnc, we developed a web server (http://predict.crisprlnc.cc) and made it available for download on GitHub.
Collapse
Affiliation(s)
- Zitian Yang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zexin Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| | - Wen Chen
- Hunan Provincial Key Laboratory of Vascular Biology and Translational Medicine, School of Medicine, Hunan University of Chinese Medicine, Changsha 410208, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Yunnan Key Laboratory of Crop Wild Relatives Omics, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
28
|
Wu J, Liu B, Zhang J, Wang Z, Li J. DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning. BMC Bioinformatics 2023; 24:473. [PMID: 38097937 PMCID: PMC10722729 DOI: 10.1186/s12859-023-05594-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open
Abstract
PURPOSE Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.
Collapse
Affiliation(s)
- Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
29
|
Zandi F, Mansouri P, Goodarzi M. Global protein-protein interaction networks in yeast saccharomyces cerevisiae and helicobacter pylori. Talanta 2023; 265:124836. [PMID: 37393709 DOI: 10.1016/j.talanta.2023.124836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/04/2023] [Accepted: 06/17/2023] [Indexed: 07/04/2023]
Abstract
Understanding many biological processes relies heavily on accurately predicting protein-protein interactions (PPIs). In this study, we propose a novel method for predicting PPIs that is based on LogitBoost with a binary bat feature selection algorithm. Our approach involves the extraction of an initial feature vector by combining pseudo amino acid composition (PseAAC), pseudo-position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Subsequently, a binary bat algorithm is applied to eliminate redundant features, and the resulting optimal features are fed into the LogitBoost classifier for the identification of PPIs. To evaluate the proposed method, we test it on two databases, Saccharomyces cerevisiae and Helicobacter pylori, using 10-fold cross-validation, and achieve accuracies of 94.39% and 97.89%, respectively. Our results showcase the significant potential of our pipeline in accurately predicting protein-protein interactions (PPIs), thereby offering a valuable resource to the scientific research community.
Collapse
Affiliation(s)
- Farzad Zandi
- Faculty of Sciences, Islamic Azad University, Arak Branch, Arak, Markazi, Iran
| | | | - Mohammad Goodarzi
- Department of Immunology, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
30
|
Beltrán JF, Belén LH, Farias JG, Zamorano M, Lefin N, Miranda J, Parraguez-Contreras F. VirusHound-I: prediction of viral proteins involved in the evasion of host adaptive immune response using the random forest algorithm and generative adversarial network for data augmentation. Brief Bioinform 2023; 25:bbad434. [PMID: 38033292 PMCID: PMC10753651 DOI: 10.1093/bib/bbad434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/18/2023] [Accepted: 11/05/2023] [Indexed: 12/02/2023] Open
Abstract
Throughout evolution, pathogenic viruses have developed different strategies to evade the response of the adaptive immune system. To carry out successful replication, some pathogenic viruses encode different proteins that manipulate the molecular mechanisms of host cells. Currently, there are different bioinformatics tools for virus research; however, none of them focus on predicting viral proteins that evade the adaptive system. In this work, we have developed a novel tool based on machine and deep learning for predicting this type of viral protein named VirusHound-I. This tool is based on a model developed with the multilayer perceptron algorithm using the dipeptide composition molecular descriptor. In this study, we have also demonstrated the robustness of our strategy for data augmentation of the positive dataset based on generative antagonistic networks. During the 10-fold cross-validation step in the training dataset, the predictive model showed 0.947 accuracy, 0.994 precision, 0.943 F1 score, 0.995 specificity, 0.896 sensitivity, 0.894 kappa, 0.898 Matthew's correlation coefficient and 0.989 AUC. On the other hand, during the testing step, the model showed 0.964 accuracy, 1.0 precision, 0.967 F1 score, 1.0 specificity, 0.936 sensitivity, 0.929 kappa, 0.931 Matthew's correlation coefficient and 1.0 AUC. Taking this model into account, we have developed a tool called VirusHound-I that makes it possible to predict viral proteins that evade the host's adaptive immune system. We believe that VirusHound-I can be very useful in accelerating studies on the molecular mechanisms of evasion of pathogenic viruses, as well as in the discovery of therapeutic targets.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | | | - Jorge G Farias
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Mauricio Zamorano
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Nicolás Lefin
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Javiera Miranda
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Fernanda Parraguez-Contreras
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| |
Collapse
|
31
|
Kewalramani N, Emili A, Crovella M. State-of-the-art computational methods to predict protein-protein interactions with high accuracy and coverage. Proteomics 2023; 23:e2200292. [PMID: 37401192 DOI: 10.1002/pmic.202200292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/24/2023] [Accepted: 06/09/2023] [Indexed: 07/05/2023]
Abstract
Prediction of protein-protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.
Collapse
Affiliation(s)
- Neal Kewalramani
- Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| | - Andrew Emili
- OHSU Knight Cancer Institute, Portland, Oregon, USA
| | - Mark Crovella
- Department of Computer Science and Program in Bioinformatics, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
32
|
Halsana AA, Chakroborty T, Halder AK, Basu S. DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein-Protein Interactions. IEEE Trans Nanobioscience 2023; 22:904-911. [PMID: 37028059 DOI: 10.1109/tnb.2023.3251192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2023]
Abstract
Protein-protein interactions (PPI) are crucial for understanding the behaviour of living organisms and identifying disease associations. This paper proposes DensePPI, a novel deep convolution strategy applied to the 2D image map generated from the interacting protein pairs for PPI prediction. A colour encoding scheme has been introduced to embed the bigram interaction possibilities of Amino Acids into RGB colour space to enhance the learning and prediction task. The DensePPI model is trained on 5.5 million sub-images of size 128×128 generated from nearly 36,000 interacting and 36,000 non-interacting benchmark protein pairs. The performance is evaluated on independent datasets from five different organisms; Caenorhabditis elegans, Escherichia coli, Helicobacter Pylori, Homo sapiens and Mus Musculus. The proposed model achieves an average prediction accuracy score of 99.95% on these datasets, considering inter-species and intra-species interactions. The performance of DensePPI is compared with the state-of-the-art methods and outperforms those approaches in different evaluation metrics. Improved performance of DensePPI indicates the efficiency of the image-based encoding strategy of sequence information with the deep learning architecture in PPI prediction. The enhanced performance on diverse test sets shows that the DensePPI is significant for intra-species interaction prediction and cross-species interactions. The dataset, supplementary file, and the developed models are available at https://github.com/Aanzil/DensePPI for academic use only.
Collapse
|
33
|
Jha K, Saha S, Karmakar S. Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3215-3225. [PMID: 37027644 DOI: 10.1109/tcbb.2023.3248797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The knowledge of protein-protein interaction (PPI) helps us to understand proteins' functions, the causes and growth of several diseases, and can aid in designing new drugs. The majority of existing PPI research has relied mainly on sequence-based approaches. With the availability of multi-omics datasets (sequence, 3D structure) and advancements in deep learning techniques, it is feasible to develop a deep multi-modal framework that fuses the features learned from different sources of information to predict PPI. In this work, we propose a multi-modal approach utilizing protein sequence and 3D structure. To extract features from the 3D structure of proteins, we use a pre-trained vision transformer model that has been fine-tuned on the structural representation of proteins. The protein sequence is encoded into a feature vector using a pre-trained language model. The feature vectors extracted from the two modalities are fused and then fed to the neural network classifier to predict the protein interactions. To showcase the effectiveness of the proposed methodology, we conduct experiments on two popular PPI datasets, namely, the human dataset and the S. cerevisiae dataset. Our approach outperforms the existing methodologies to predict PPI, including multi-modal approaches. We also evaluate the contributions of each modality by designing uni-modal baselines. We perform experiments with three modalities as well, having gene ontology as the third modality.
Collapse
|
34
|
Zhang T, Jia J, Chen C, Zhang Y, Yu B. BiGRUD-SA: Protein S-sulfenylation sites prediction based on BiGRU and self-attention. Comput Biol Med 2023; 163:107145. [PMID: 37336062 DOI: 10.1016/j.compbiomed.2023.107145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/18/2023] [Accepted: 06/06/2023] [Indexed: 06/21/2023]
Abstract
S-sulfenylation is a vital post-translational modification (PTM) of proteins, which is an intermediate in other redox reactions and has implications for signal transduction and protein function regulation. However, there are many restrictions on the experimental identification of S-sulfenylation sites. Therefore, predicting S-sulfoylation sites by computational methods is fundamental to studying protein function and related biological mechanisms. In this paper, we propose a method named BiGRUD-SA based on bi-directional gated recurrent unit (BiGRU) and self-attention mechanism to predict protein S-sulfenylation sites. We first use AAC, BLOSUM62, AAindex, EAAC and GAAC to extract features, and do feature fusion to obtain original feature space. Next, we use SMOTE-Tomek method to handle data imbalance. Then, we input the processed data to the BiGRU and use self-attention mechanism to do further feature extraction. Finally, we input the data obtained to the deep neural networks (DNN) to identify S-sulfenylation sites. The accuracies of training set and independent test set are 96.66% and 95.91% respectively, which indicates that our method is conducive to identifying S-sulfenylation sites. Furthermore, we use a data set of S-sulfenylation sites in Arabidopsis thaliana to effectively verify the generalization ability of BiGRUD-SA method, and obtain better prediction results.
Collapse
Affiliation(s)
- Tingting Zhang
- College of Computer Science and Technology, Shandong University, Qingdao, 266237, China; College of Information Science and Technology, School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Jihua Jia
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Cheng Chen
- College of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Yaqun Zhang
- College of Mathematics and Big Data, Dezhou University, Dezhou, 253023, China.
| | - Bin Yu
- College of Information Science and Technology, School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
35
|
Luo X, Wang L, Hu P, Hu L. Predicting Protein-Protein Interactions Using Sequence and Network Information via Variational Graph Autoencoder. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3182-3194. [PMID: 37155405 DOI: 10.1109/tcbb.2023.3273567] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Protein-protein interactions (PPIs) play a critical role in the proteomics study, and a variety of computational algorithms have been developed to predict PPIs. Though effective, their performance is constrained by high false-positive and false-negative rates observed in PPI data. To overcome this problem, a novel PPI prediction algorithm, namely PASNVGA, is proposed in this work by combining the sequence and network information of proteins via variational graph autoencoder. To do so, PASNVGA first applies different strategies to extract the features of proteins from their sequence and network information, and obtains a more compact form of these features using principal component analysis. In addition, PASNVGA designs a scoring function to measure the higher-order connectivity between proteins and so as to obtain a higher-order adjacency matrix. With all these features and adjacency matrices, PASNVGA trains a variational graph autoencoder model to further learn the integrated embeddings of proteins. The prediction task is then completed by using a simple feedforward neural network. Extensive experiments have been conducted on five PPI datasets collected from different species. Compared with several state-of-the-art algorithms, PASNVGA has been demonstrated as a promising PPI prediction algorithm.
Collapse
|
36
|
Zhang F, Zhang Y, Zhu X, Chen X, Lu F, Zhang X. DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2907-2919. [PMID: 37079417 DOI: 10.1109/tcbb.2023.3268661] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Protein-protein interaction (PPI) plays an important role in almost all life activities. Many protein interaction sites have been confirmed by biological experiments, but these PPI site identification methods are time-consuming and expensive. In this study, a deep learning-based PPI prediction method, named DeepSG2PPI, is developed. First, the protein sequence information is retrieved and the local context information of each amino acid residue is calculated. A two-dimensional convolutional neural network (2D-CNN) model is employed to extract features from a two-channel coding structure, in which an attention mechanism is embedded to assign higher weights to key features. Second, the global statistical information of each amino acid residue and the relationship graph between the protein and GO (Gene Ontology) function annotation are built, and the graph embedding vector is constructed to represent the biological features of the protein. Finally, a 2D-CNN model and two 1D-CNN models are combined for PPI prediction. The comparison analysis with existing algorithms shows that the DeepSG2PPI method has better performance. It provides more accurate and effective PPI site prediction, which will be helpful in reducing the cost and failure rate of biological experiments.
Collapse
|
37
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 85] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
38
|
Hong X, Lv J, Li Z, Xiong Y, Zhang J, Chen HF. Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions. Int J Biol Macromol 2023; 243:125233. [PMID: 37290543 DOI: 10.1016/j.ijbiomac.2023.125233] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023]
Abstract
Protein phosphorylation, catalyzed by kinases, is an important biochemical process, which plays an essential role in multiple cell signaling pathways. Meanwhile, protein-protein interactions (PPI) constitute the signaling pathways. Abnormal phosphorylation status on protein can regulate protein functions through PPI to evoke severe diseases, such as Cancer and Alzheimer's disease. Due to the limited experimental evidence and high costs to experimentally identify novel evidence of phosphorylation regulation on PPI, it is necessary to develop a high-accuracy and user-friendly artificial intelligence method to predict phosphorylation effect on PPI. Here, we proposed a novel sequence-based machine learning method named PhosPPI, which achieved better identification performance (Accuracy and AUC) than other competing predictive methods of Betts, HawkDock and FoldX. PhosPPI is now freely available in web server (https://phosppi.sjtu.edu.cn/). This tool can help the user to identify functional phosphorylation sites affecting PPI and explore phosphorylation-associated disease mechanism and drug development.
Collapse
Affiliation(s)
- Xiaokun Hong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jiyang Lv
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200025, China.
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
39
|
Zheng J, Yang X, Huang Y, Yang S, Wuchty S, Zhang Z. Deep learning-assisted prediction of protein-protein interactions in Arabidopsis thaliana. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 114:984-994. [PMID: 36919205 DOI: 10.1111/tpj.16188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 02/20/2023] [Accepted: 03/09/2023] [Indexed: 05/27/2023]
Abstract
Currently, the experimentally identified interactome of Arabidopsis (Arabidopsis thaliana) is still far from complete, suggesting that computational prediction methods can complement experimental techniques. Motivated by the prosperity and success of deep learning algorithms and natural language processing techniques, we introduce an integrative deep learning framework, DeepAraPPI, allowing us to predict protein-protein interactions (PPIs) of Arabidopsis utilizing sequence, domain and Gene Ontology (GO) information. Our current DeepAraPPI comprises: (i) a word2vec encoding-based Siamese recurrent convolutional neural network (RCNN) model; (ii) a Domain2vec encoding-based multiple-layer perceptron (MLP) model; and (iii) a GO2vec encoding-based MLP model. Finally, DeepAraPPI combines the prediction results of the three individual predictors through a logistic regression model. Compiling high-quality positive and negative training and test samples by applying strict filtering strategies, DeepAraPPI shows superior performance compared with existing state-of-the-art Arabidopsis PPI prediction methods. DeepAraPPI also provides better cross-species predictive ability in rice (Oryza sativa) than traditional machine learning methods, although the overall performance in cross-species prediction remains to be improved. DeepAraPPI is freely accessible at http://zzdlab.com/deeparappi/. In the meantime, we have also made the source code and data sets of DeepAraPPI available at https://github.com/zjy1125/DeepAraPPI.
Collapse
Affiliation(s)
- Jingyan Zheng
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xiaodi Yang
- Department of Hematology, Peking University First Hospital, Beijing, 100034, China
| | - Yan Huang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Shiping Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami, FL, 33146, USA
- Department of Biology, University of Miami, Miami, FL, 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, 33136, USA
- Institute of Data Science and Computing, University of Miami, Miami, FL, 33146, USA
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
40
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
41
|
Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform 2023; 24:6995378. [PMID: 36682013 DOI: 10.1093/bib/bbad020] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/17/2022] [Accepted: 01/05/2023] [Indexed: 01/23/2023] Open
Abstract
While deep learning (DL)-based models have emerged as powerful approaches to predict protein-protein interactions (PPIs), the reliance on explicit similarity measures (e.g. sequence similarity and network neighborhood) to known interacting proteins makes these methods ineffective in dealing with novel proteins. The advent of AlphaFold2 presents a significant opportunity and also a challenge to predict PPIs in a straightforward way based on monomer structures while controlling bias from protein sequences. In this work, we established Structure and Graph-based Predictions of Protein Interactions (SGPPI), a structure-based DL framework for predicting PPIs, using the graph convolutional network. In particular, SGPPI focused on protein patches on the protein-protein binding interfaces and extracted the structural, geometric and evolutionary features from the residue contact map to predict PPIs. We demonstrated that our model outperforms traditional machine learning methods and state-of-the-art DL-based methods using non-representation-bias benchmark datasets. Moreover, our model trained on human dataset can be reliably transferred to predict yeast PPIs, indicating that SGPPI can capture converging structural features of protein interactions across various species. The implementation of SGPPI is available at https://github.com/emerson106/SGPPI.
Collapse
Affiliation(s)
- Yan Huang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Coral Gables, FL 33146, USA
- Department of Biology, University of Miami, Coral Gables, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
- Institute of Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Yuan Zhou
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Ziding Zhang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
42
|
Kibar G, Vingron M. Prediction of protein-protein interactions using sequences of intrinsically disordered regions. Proteins 2023. [PMID: 36908253 DOI: 10.1002/prot.26486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/15/2023] [Accepted: 03/07/2023] [Indexed: 03/14/2023]
Abstract
Protein-protein interactions (PPIs) play a crucial role in numerous molecular processes. Despite many efforts, mechanisms governing molecular recognition between interacting proteins remain poorly understood and it is particularly challenging to predict from sequence whether two proteins can interact. Here we present a new method to tackle this challenge using intrinsically disordered regions (IDRs). IDRs are protein segments that are functional despite lacking a single invariant three-dimensional structure. The prevalence of IDRs in eukaryotic proteins suggests that IDRs are critical for interactions. To test this hypothesis, we predicted PPIs using IDR sequences in candidate proteins in humans. Moreover, we divide the PPI prediction problem into two specific subproblems and adapt appropriate training and test strategies based on problem type. Our findings underline the importance of defining clearly the problem type and show that sequences encoding IDRs can aid in predicting specific features of the protein interaction network of intrinsically disordered proteins. Our findings further suggest that accounting for IDRs in future analyses should accelerate efforts to elucidate the eukaryotic PPI network.
Collapse
Affiliation(s)
- Gözde Kibar
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195, Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195, Berlin, Germany
| |
Collapse
|
43
|
Rehman AU, Khurshid B, Ali Y, Rasheed S, Wadood A, Ng HL, Chen HF, Wei Z, Luo R, Zhang J. Computational approaches for the design of modulators targeting protein-protein interactions. Expert Opin Drug Discov 2023; 18:315-333. [PMID: 36715303 PMCID: PMC10149343 DOI: 10.1080/17460441.2023.2171396] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/18/2023] [Indexed: 01/31/2023]
Abstract
BACKGROUND Protein-protein interactions (PPIs) are intriguing targets for designing novel small-molecule inhibitors. The role of PPIs in various infectious and neurodegenerative disorders makes them potential therapeutic targets . Despite being portrayed as undruggable targets, due to their flat surfaces, disorderedness, and lack of grooves. Recent progresses in computational biology have led researchers to reconsider PPIs in drug discovery. AREAS COVERED In this review, we introduce in-silico methods used to identify PPI interfaces and present an in-depth overview of various computational methodologies that are successfully applied to annotate the PPIs. We also discuss several successful case studies that use computational tools to understand PPIs modulation and their key roles in various physiological processes. EXPERT OPINION Computational methods face challenges due to the inherent flexibility of proteins, which makes them expensive, and result in the use of rigid models. This problem becomes more significant in PPIs due to their flexible and flat interfaces. Computational methods like molecular dynamics (MD) simulation and machine learning can integrate the chemical structure data into biochemical and can be used for target identification and modulation. These computational methodologies have been crucial in understanding the structure of PPIs, designing PPI modulators, discovering new drug targets, and predicting treatment outcomes.
Collapse
Affiliation(s)
- Ashfaq Ur Rehman
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
| | - Beenish Khurshid
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Yasir Ali
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Salman Rasheed
- National Center for Bioinformatics, Quaid-e-Azam University, Islamabad, Pakistan
| | - Abdul Wadood
- Department of Biochemistry, Abdul Wali Khan University Mardan, Pakistan
| | - Ho-Leung Ng
- Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, Kansas, USA
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, Zhejiang, China
| | - Zhiqiang Wei
- Medicinal Chemistry and Bioinformatics Center, Ocean University of China, Qingdao, Shandong, China
| | - Ray Luo
- Departments of Molecular Biology and Biochemistry, Chemical and Biomolecular Engineering, Materials Science and Engineering, and Biomedical Engineering, Graduate Program in Chemical and Materials Physics, University of California Irvine, Irvine, California, USA
| | - Jian Zhang
- Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Medicinal Bioinformatics Center, Shanghai Jiao-Tong University School of Medicine, Shanghai, Zhejiang, China
- School of Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
44
|
Kang Y, Elofsson A, Jiang Y, Huang W, Yu M, Li Z. AFTGAN: prediction of multi-type PPI based on attention free transformer and graph attention network. Bioinformatics 2023; 39:7000335. [PMID: 36692145 PMCID: PMC9897180 DOI: 10.1093/bioinformatics/btad052] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 01/01/2023] [Accepted: 01/24/2023] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Protein-protein interaction (PPI) networks and transcriptional regulatory networks are critical in regulating cells and their signaling. A thorough understanding of PPIs can provide more insights into cellular physiology at normal and disease states. Although numerous methods have been proposed to predict PPIs, it is still challenging for interaction prediction between unknown proteins. In this study, a novel neural network named AFTGAN was constructed to predict multi-type PPIs. Regarding feature input, ESM-1b embedding containing much biological information for proteins was added as a protein sequence feature besides amino acid co-occurrence similarity and one-hot coding. An ensemble network was also constructed based on a transformer encoder containing an AFT module (performing the weight operation on vital protein sequence feature information) and graph attention network (extracting the relational features of protein pairs) for the part of the network framework. RESULTS The experimental results showed that the Micro-F1 of the AFTGAN based on three partitioning schemes (BFS, DFS and the random mode) on the SHS27K and SHS148K datasets was 0.685, 0.711 and 0.867, as well as 0.745, 0.819 and 0.920, respectively, all higher than that of other popular methods. In addition, the experimental comparisons confirmed the performance superiority of the proposed model for predicting PPIs of unknown proteins on the STRING dataset. AVAILABILITY AND IMPLEMENTATION The source code is publicly available at https://github.com/1075793472/AFTGAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yanlei Kang
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, Zhejiang 313000, China
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Solna 17121, Sweden
| | - Yunliang Jiang
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, Zhejiang 321004, China
| | - Weihong Huang
- College of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
| | - Minzhe Yu
- College of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
| | - Zhong Li
- Zhejiang Province Key Laboratory of Smart Management & Application of Modern Agricultural Resources, School of Information Engineering, Huzhou University, Huzhou, Zhejiang 313000, China.,Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Solna 17121, Sweden.,College of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
| |
Collapse
|
45
|
Albu AI, Bocicor MI, Czibula G. MM-StackEns: A new deep multimodal stacked generalization approach for protein-protein interaction prediction. Comput Biol Med 2023; 153:106526. [PMID: 36623437 DOI: 10.1016/j.compbiomed.2022.106526] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 12/13/2022] [Accepted: 12/31/2022] [Indexed: 01/05/2023]
Abstract
Accurate in-silico identification of protein-protein interactions (PPIs) is a long-standing problem in biology, with important implications in protein function prediction and drug design. Current computational approaches predominantly use a single data modality for describing protein pairs, which may not fully capture the characteristics relevant for identifying PPIs. Another limitation of existing methods is their poor generalization to proteins outside the training graph. In this paper, we aim to address these shortcomings by proposing a new ensemble approach for PPI prediction, which learns information from two modalities, corresponding to pairs of sequences and to the graph formed by the training proteins and their interactions. Our approach uses a siamese neural network to process sequence information, while graph attention networks are employed for the network view. For capturing the relationships between the proteins in a pair, we design a new feature fusion module, based on computing the distance between the distributions corresponding to the two proteins. The prediction is made using a stacked generalization procedure, in which the final classifier is represented by a Logistic Regression model trained on the scores predicted by the sequence and graph models. Additionally, we show that protein sequence embeddings obtained using pretrained language models can significantly improve the generalization of PPI methods. The experimental results demonstrate the good performance of our approach, which surpasses all the related work on two Yeast data sets, while outperforming the majority of literature approaches on two Human data sets and on independent multi-species data sets.
Collapse
Affiliation(s)
- Alexandra-Ioana Albu
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| | - Maria-Iuliana Bocicor
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| | - Gabriela Czibula
- Department of Computer Science, Babeş-Bolyai University, 1 Mihail Kogalniceanu Street, Cluj-Napoca, 400084, Romania.
| |
Collapse
|
46
|
Soleymani F, Paquet E, Viktor HL, Michalowski W, Spinello D. ProtInteract: A deep learning framework for predicting protein-protein interactions. Comput Struct Biotechnol J 2023; 21:1324-1348. [PMID: 36817951 PMCID: PMC9929211 DOI: 10.1016/j.csbj.2023.01.028] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/20/2023] [Accepted: 01/20/2023] [Indexed: 01/26/2023] Open
Abstract
Proteins mainly perform their functions by interacting with other proteins. Protein-protein interactions underpin various biological activities such as metabolic cycles, signal transduction, and immune response. However, due to the sheer number of proteins, experimental methods for finding interacting and non-interacting protein pairs are time-consuming and costly. We therefore developed the ProtInteract framework to predict protein-protein interaction. ProtInteract comprises two components: first, a novel autoencoder architecture that encodes each protein's primary structure to a lower-dimensional vector while preserving its underlying sequence attributes. This leads to faster training of the second network, a deep convolutional neural network (CNN) that receives encoded proteins and predicts their interaction under three different scenarios. In each scenario, the deep CNN predicts the class of a given encoded protein pair. Each class indicates different ranges of confidence scores corresponding to the probability of whether a predicted interaction occurs or not. The proposed framework features significantly low computational complexity and relatively fast response. The contributions of this work are twofold. First, ProtInteract assimilates the protein's primary structure into a pseudo-time series. Therefore, we leverage the nature of the time series of proteins and their physicochemical properties to encode a protein's amino acid sequence into a lower-dimensional vector space. This approach enables extracting highly informative sequence attributes while reducing computational complexity. Second, the ProtInteract framework utilises this information to identify protein interactions with other proteins based on its amino acid configuration. Our results suggest that the proposed framework performs with high accuracy and efficiency in predicting protein-protein interactions.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada,Corresponding author.
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON K1N 6N5, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
47
|
Rogers JR, Nikolényi G, AlQuraishi M. Growing ecosystem of deep learning methods for modeling protein-protein interactions. Protein Eng Des Sel 2023; 36:gzad023. [PMID: 38102755 DOI: 10.1093/protein/gzad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 12/17/2023] Open
Abstract
Numerous cellular functions rely on protein-protein interactions. Efforts to comprehensively characterize them remain challenged however by the diversity of molecular recognition mechanisms employed within the proteome. Deep learning has emerged as a promising approach for tackling this problem by exploiting both experimental data and basic biophysical knowledge about protein interactions. Here, we review the growing ecosystem of deep learning methods for modeling protein interactions, highlighting the diversity of these biophysically informed models and their respective trade-offs. We discuss recent successes in using representation learning to capture complex features pertinent to predicting protein interactions and interaction sites, geometric deep learning to reason over protein structures and predict complex structures, and generative modeling to design de novo protein assemblies. We also outline some of the outstanding challenges and promising new directions. Opportunities abound to discover novel interactions, elucidate their physical mechanisms, and engineer binders to modulate their functions using deep learning and, ultimately, unravel how protein interactions orchestrate complex cellular behaviors.
Collapse
Affiliation(s)
- Julia R Rogers
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Gergő Nikolényi
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
48
|
Li X, Han P, Chen W, Gao C, Wang S, Song T, Niu M, Rodriguez-Patón A. MARPPI: boosting prediction of protein-protein interactions with multi-scale architecture residual network. Brief Bioinform 2023; 24:6887309. [PMID: 36502435 DOI: 10.1093/bib/bbac524] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 09/29/2022] [Accepted: 11/04/2022] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions (PPIs) are a major component of the cellular biochemical reaction network. Rich sequence information and machine learning techniques reduce the dependence of exploring PPIs on wet experiments, which are costly and time-consuming. This paper proposes a PPI prediction model, multi-scale architecture residual network for PPIs (MARPPI), based on dual-channel and multi-feature. Multi-feature leverages Res2vec to obtain the association information between residues, and utilizes pseudo amino acid composition, autocorrelation descriptors and multivariate mutual information to achieve the amino acid composition and order information, physicochemical properties and information entropy, respectively. Dual channel utilizes multi-scale architecture improved ResNet network which extracts protein sequence features to reduce protein feature loss. Compared with other advanced methods, MARPPI achieves 96.03%, 99.01% and 91.80% accuracy in the intraspecific datasets of Saccharomyces cerevisiae, Human and Helicobacter pylori, respectively. The accuracy on the two interspecific datasets of Human-Bacillus anthracis and Human-Yersinia pestis is 97.29%, and 95.30%, respectively. In addition, results on specific datasets of disease (neurodegenerative and metabolic disorders) demonstrate the ability to detect hidden interactions. To better illustrate the performance of MARPPI, evaluations on independent datasets and PPIs network suggest that MARPPI can be used to predict cross-species interactions. The above shows that MARPPI can be regarded as a concise, efficient and accurate tool for PPI datasets.
Collapse
Affiliation(s)
- Xue Li
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Peifu Han
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Wenqi Chen
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Changnan Gao
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Muyuan Niu
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Alfonso Rodriguez-Patón
- School of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| |
Collapse
|
49
|
DeepCF-PPI: improved prediction of protein-protein interactions by combining learned and handcrafted features based on attention mechanisms. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04387-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
50
|
Yan Y, Huang T. The Interactome of Protein, DNA, and RNA. Methods Mol Biol 2023; 2695:89-110. [PMID: 37450113 DOI: 10.1007/978-1-0716-3346-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
Proteins participate in many processes of the organism and are very important for maintaining the health of the organism. However, proteins cannot function independently in the body. They must interact with proteins, DNA, RNA, and other substances to perform biological functions and maintain the body's health. At present, there are many experimental methods and software tools that can detect and predict the interaction between proteins and other substances. There are also many databases that record the interaction between proteins and other substances. This article mainly describes protein-protein, protein-DNA, and protein-RNA interactions in detail by introducing some commonly used experimental methods, the software tools produced with the accumulation of experimental data and the rapid development of machine learning, and the related databases that record the relationship between proteins and some substances. By this review, we hope that through the analysis and summary of various aspects, it will be convenient for researchers to conduct further research on protein interactions.
Collapse
Affiliation(s)
- Yuyao Yan
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China.
| |
Collapse
|