1
|
Christoffer C, Kagaya Y, Verburgt J, Terashi G, Shin WH, Jain A, Sarkar D, Aderinwale T, Maddhuri Venkata Subramaniya SR, Wang X, Zhang Z, Zhang Y, Kihara D. Integrative Protein Assembly With LZerD and Deep Learning in CAPRI 47-55. Proteins 2025. [PMID: 40095385 DOI: 10.1002/prot.26818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 02/18/2025] [Indexed: 03/19/2025]
Abstract
We report the performance of the protein complex prediction approaches of our group and their results in CAPRI Rounds 47-55, excluding the joint CASP Rounds 50 and 54, as well as the special COVID-19 Round 51. Our approaches integrated classical pipelines developed in our group as well as more recently developed deep learning pipelines. In the cases of human group prediction, we surveyed the literature to find information to integrate into the modeling, such as assayed interface residues. In addition to any literature information, generated complex models were selected by a rank aggregation of statistical scoring functions, by generative model confidence, or by expert inspection. In these CAPRI rounds, our human group successfully modeled eight interfaces and achieved the top quality level among the submissions for all of them, including two where no other group did. We note that components of our modeling pipelines have become increasingly unified within deep learning approaches. Finally, we discuss several case studies that illustrate successful and unsuccessful modeling using our approaches.
Collapse
Affiliation(s)
- Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
- Rosen Center for Advanced Computing, Purdue University, West Lafayette, Indiana, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
- College of Medicine, Korea University, Seoul, South Korea
| | - Anika Jain
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | | | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
- Purdue University Institute for Cancer Research, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
2
|
Grassmann G, Di Rienzo L, Ruocco G, Miotto M, Milanetti E. Compact Assessment of Molecular Surface Complementarities Enhances Neural Network-Aided Prediction of Key Binding Residues. J Chem Inf Model 2025; 65:2695-2709. [PMID: 39982412 PMCID: PMC11898074 DOI: 10.1021/acs.jcim.4c02286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 02/09/2025] [Accepted: 02/13/2025] [Indexed: 02/22/2025]
Abstract
Predicting interactions between proteins is fundamental for understanding the mechanisms underlying cellular processes, since protein-protein complexes are crucial in physiological conditions but also in many diseases, for example by seeding aggregates formation. Despite the many advancements made so far, the performance of docking protocols is deeply dependent on their capability to identify binding regions. From this, the importance of developing low-cost and computationally efficient methods in this field. We present an integrated novel protocol mainly based on compact modeling of protein surface patches via sets of orthogonal polynomials to identify regions of high shape/electrostatic complementarity. By incorporating both hydrophilic and hydrophobic contributions, we define new binding matrices, which serve as effective inputs for training a neural network. In this work, we propose a new Neural Network (NN)-based architecture, Core Interacting Residues Network (CIRNet), which achieves a performance in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC) of approximately 0.87 in identifying pairs of core interacting residues on a balanced data set. In a blind search for core interacting residues, CIRNet distinguishes them from random decoys with an ROC AUC of 0.72. We test this protocol to enhance docking algorithms by filtering the proposed poses, addressing one of the still open problems in computational biology. Notably, when applied to the top ten models from three widely used docking servers, CIRNet improves docking outcomes, significantly reducing the average RMSD between the selected poses and the native state. Compared to another state-of-the-art tool for rescaling docking poses, CIRNet more efficiently identified the worst poses generated by the three docking servers under consideration and achieved superior rescaling performance in two cases.
Collapse
Affiliation(s)
- Greta Grassmann
- Department
of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, P.Le A. Moro 5, Rome 00185, Italy
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Lorenzo Di Rienzo
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Giancarlo Ruocco
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| | - Mattia Miotto
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
| | - Edoardo Milanetti
- Center
for Life Nano & Neuro Science, Istituto
Italiano di Tecnologia, Viale Regina Elena 291, Rome 00161, Italy
- Department
of Physics, Sapienza University, Piazzale Aldo Moro 5, Rome 00185, Italy
| |
Collapse
|
3
|
Han B, Zhang Y, Li L, Gong X, Xia K. TopoQA: a topological deep learning-based approach for protein complex structure interface quality assessment. Brief Bioinform 2025; 26:bbaf083. [PMID: 40062613 PMCID: PMC11891663 DOI: 10.1093/bib/bbaf083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 01/11/2025] [Accepted: 02/17/2025] [Indexed: 05/13/2025] Open
Abstract
Even with the significant advances of AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3) in protein complex structure prediction, their accuracy is still not comparable with monomer structure prediction. Efficient and effective quality assessment (QA) or estimation of model accuracy models that can evaluate the quality of the predicted protein-complexes without knowing their native structures are of key importance for protein structure generation and model selection. In this paper, we leverage persistent homology (PH) to capture the atomic-level topological information around residues and design a topological deep learning-based QA method, TopoQA, to assess the accuracy of protein complex interfaces. We integrate PH from topological data analysis into graph neural networks (GNNs) to characterize complex higher-order structures that GNNs might overlook, enhancing the learning of the relationship between the topological structure of complex interfaces and quality scores. Our TopoQA model is extensively validated based on the two most-widely used benchmark datasets, Docking Benchmark5.5 AF2 (DBM55-AF2) and Heterodimer-AF2 (HAF2), along with our newly constructed ABAG-AF3 dataset to facilitate comparisons with AF3. For all three datasets, TopoQA outperforms AF-Multimer-based AF2Rank and shows an advantage over AF3 in nearly half of the targets. In particular, in the DBM55-AF2 dataset, a ranking loss of 73.6% lower than AF-Multimer-based AF2Rank is obtained. Further, other than AF-Multimer and AF3, we have also extensively compared with nearly-all the state-of-the-art models (as far as we know), it has been found that our TopoQA can achieve the highest Top 10 Hit-rate on the DBM55-AF2 dataset and the lowest ranking loss on the HAF2 dataset. Ablation experiments show that our topological features significantly improve the model's performance. At the same time, our method also provides a new paradigm for protein structure representation learning.
Collapse
Affiliation(s)
- Bingqing Han
- Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Yipeng Zhang
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Longlong Li
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
- School of Mathematics, Shandong University, Jinan 250100, China
- Data Science Institute, Shandong University, Jinan 250100, China
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
4
|
Hoegen Dijkhof LR, Rönkkö TKE, von Vegesack HC, Lenzing J, Hauser AS. Deep learning in GPCR drug discovery: benchmarking the path to accurate peptide binding. Brief Bioinform 2025; 26:bbaf186. [PMID: 40285358 PMCID: PMC12031724 DOI: 10.1093/bib/bbaf186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 03/12/2025] [Accepted: 04/01/2025] [Indexed: 04/29/2025] Open
Abstract
Deep learning (DL) methods have drastically advanced structure-based drug discovery by directly predicting protein structures from sequences. Recently, these methods have become increasingly accurate in predicting complexes formed by multiple protein chains. We evaluated these advancements to predict and accurately model the largest receptor family and its cognate peptide hormones. We benchmarked DL tools, including AlphaFold 2.3 (AF2), AlphaFold 3 (AF3), Chai-1, NeuralPLexer, RoseTTAFold-AllAtom, Peptriever, ESMFold, and D-SCRIPT, to predict interactions between G protein-coupled receptors (GPCRs) and their endogenous peptide ligands. Our results showed that structure-aware models outperformed language models in peptide binding classification, with the top-performing model achieving an area under the curve of 0.86 on a benchmark set of 124 ligands and 1240 decoys. Rescoring predicted structures on local interactions further improved the principal ligand discovery among decoy peptides, whereas DL-based approaches did not. We explored a competitive tournament approach for modeling multiple peptides simultaneously on a single GPCR, which accelerates the performance but reduces true-positive recovery. When evaluating the binding poses of 67 recent complexes, AF2 reproduced the correct binding modes in nearly all cases (94%), surpassing those of both AF3 and Chai-1. Confidence scores correlate with structural binding mode accuracy, which provides a guide for interpreting interface predictions. These results demonstrated that DL models can reliably rediscover peptide binders, aid peptide drug discovery, and guide the selection of optimal tools for GPCR-targeted therapies. To this end, we provided a practical guide for selecting the best models for specific applications and an independent benchmarking set for future model evaluation.
Collapse
Affiliation(s)
- Luuk R Hoegen Dijkhof
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark
- Center for Pharmaceutical Data Science, University of Copenhagen, Denmark
| | - Teemu K E Rönkkö
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark
- Center for Pharmaceutical Data Science, University of Copenhagen, Denmark
| | - Hans C von Vegesack
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark
- Center for Pharmaceutical Data Science, University of Copenhagen, Denmark
| | - Jacob Lenzing
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark
- Center for Pharmaceutical Data Science, University of Copenhagen, Denmark
| | - Alexander S Hauser
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100 Ø, Copenhagen, Denmark
- Center for Pharmaceutical Data Science, University of Copenhagen, Denmark
| |
Collapse
|
5
|
Shirali A, Stebliankin V, Karki U, Shi J, Chapagain P, Narasimhan G. A comprehensive survey of scoring functions for protein docking models. BMC Bioinformatics 2025; 26:25. [PMID: 39844036 PMCID: PMC11755896 DOI: 10.1186/s12859-024-05991-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 11/18/2024] [Indexed: 01/24/2025] Open
Abstract
BACKGROUND While protein-protein docking is fundamental to our understanding of how proteins interact, scoring protein-protein complex conformations is a critical component of successful docking programs. Without accurate and efficient scoring functions to differentiate between native and non-native binding complexes, the accuracy of current docking tools cannot be guaranteed. Although many innovative scoring functions have been proposed, a good scoring function for docking remains elusive. Deep learning models offer alternatives to using explicit empirical or mathematical functions for scoring protein-protein complexes. RESULTS In this study, we perform a comprehensive survey of the state-of-the-art scoring functions by considering the most popular and highly performant approaches, both classical and deep learning-based, for scoring protein-protein complexes. The methods were also compared based on their runtime as it directly impacts their use in large-scale docking applications. CONCLUSIONS We evaluate the strengths and weaknesses of classical and deep learning-based approaches across seven public and popular datasets to aid researchers in understanding the progress made in this field.
Collapse
Affiliation(s)
- Azam Shirali
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Vitalii Stebliankin
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Ukesh Karki
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Jimeng Shi
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
| | - Prem Chapagain
- Department of Physics, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th 10 St, Miami, 33199, USA.
- Biomolecular Sciences Institute, Florida International University, 11200 SW 8th St, Miami, 33199, USA.
| |
Collapse
|
6
|
Genc AG, McGuffin LJ. Beyond AlphaFold2: The Impact of AI for the Further Improvement of Protein Structure Prediction. Methods Mol Biol 2025; 2867:121-139. [PMID: 39576578 DOI: 10.1007/978-1-0716-4196-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein structure prediction is fundamental to molecular biology and has numerous applications in areas such as drug discovery and protein engineering. Machine learning techniques have greatly advanced protein 3D modeling in recent years, particularly with the development of AlphaFold2 (AF2), which can analyze sequences of amino acids and predict 3D structures with near experimental accuracy. Since the release of AF2, numerous studies have been conducted, either using AF2 directly for large-scale modeling or building upon the software for other use cases. Many reviews have been published discussing the impact of AF2 in the field of protein bioinformatics, particularly in relation to neural networks, which have highlighted what AF2 can and cannot do. It is evident that AF2 and similar approaches are open to further development and several new approaches have emerged, in addition to older refinement approaches, for improving the quality of predictions. Here we provide a brief overview, aimed at the general biologist, of how machine learning techniques have been used for improvement of 3D models of proteins following AF2, and we highlight the impacts of these approaches. In the most recent experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP15), the most successful groups all developed their own tools for protein structure modeling that were based at least in some part on AF2. This improvement involved employing techniques such as generative modeling, changing parameters such as dropout to generate more AF2 structures, and data-driven approaches including using alternative templates and MSAs.
Collapse
Affiliation(s)
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
7
|
Wang J, Mao J, Li C, Xiang H, Wang X, Wang S, Wang Z, Chen Y, Li Y, No KT, Song T, Zeng X. Interface-aware molecular generative framework for protein-protein interaction modulators. J Cheminform 2024; 16:142. [PMID: 39707457 DOI: 10.1186/s13321-024-00930-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 11/11/2024] [Indexed: 12/23/2024] Open
Abstract
Protein-protein interactions (PPIs) play a crucial role in numerous biochemical and biological processes. Although several structure-based molecular generative models have been developed, PPI interfaces and compounds targeting PPIs exhibit distinct physicochemical properties compared to traditional binding pockets and small-molecule drugs. As a result, generating compounds that effectively target PPIs, particularly by considering PPI complexes or interface hotspot residues, remains a significant challenge. In this work, we constructed a comprehensive dataset of PPI interfaces with active and inactive compound pairs. Based on this, we propose a novel molecular generative framework tailored to PPI interfaces, named GENiPPI. Our evaluation demonstrates that GENiPPI captures the implicit relationships between the PPI interfaces and the active molecules, and can generate novel compounds that target these interfaces. Moreover, GENiPPI can generate structurally diverse novel compounds with limited PPI interface modulators. To the best of our knowledge, this is the first exploration of a structure-based molecular generative model focused on PPI interfaces, which could facilitate the design of PPI modulators. The PPI interface-based molecular generative model enriches the existing landscape of structure-based (pocket/interface) molecular generative model. SCIENTIFIC CONTRIBUTION: This study introduces GENiPPI, a protein-protein interaction (PPI) interface-aware molecular generative framework. The framework first employs Graph Attention Networks to capture atomic-level interaction features at the protein complex interface. Subsequently, Convolutional Neural Networks extract compound representations in voxel and electron density spaces. These features are integrated into a Conditional Wasserstein Generative Adversarial Network, which trains the model to generate compound representations targeting PPI interfaces. GENiPPI effectively captures the relationship between PPI interfaces and active/inactive compounds. Furthermore, in fewshot molecular generation, GENiPPI successfully generates compounds comparable to known disruptors. GENiPPI provides an efficient tool for structure-based design of PPI modulators.
Collapse
Affiliation(s)
- Jianmin Wang
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Jiashun Mao
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea
| | - Chunyan Li
- School of Informatics, Yunnan Normal University, Kunming, China
| | - Hongxin Xiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xun Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
- High Performance Computer Research Center, University of Chinese Academy of Sciences, Beijing, 100190, China
| | - Shuang Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Yuquan Li
- College of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, China
| | - Kyoung Tai No
- Department of Integrative Biotechnology, Yonsei University, Incheon, 21983, Republic of Korea.
| | - Tao Song
- School of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, Shandong, China.
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
8
|
Zhou Z, Yin Y, Han H, Jia Y, Koh JH, Kong AWK, Mu Y. ProAffinity-GNN: A Novel Approach to Structure-Based Protein-Protein Binding Affinity Prediction via a Curated Data Set and Graph Neural Networks. J Chem Inf Model 2024; 64:8796-8808. [PMID: 39558674 DOI: 10.1021/acs.jcim.4c01850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2024]
Abstract
Protein-protein interactions (PPIs) are crucial for understanding biological processes and disease mechanisms, contributing significantly to advances in protein engineering and drug discovery. The accurate determination of binding affinities, essential for decoding PPIs, faces challenges due to the substantial time and financial costs involved in experimental and theoretical methods. This situation underscores the urgent need for more effective and precise methodologies for predicting binding affinity. Despite the abundance of research on PPI modeling, the field of quantitative binding affinity prediction remains underexplored, mainly due to a lack of comprehensive data. This study seeks to address these needs by manually curating pairwise interaction labels on available 3D structures of protein complexes, with experimentally determined binding affinities, creating the largest data set for structure-based pairwise protein interaction with binding affinity to date. Subsequently, we introduce ProAffinity-GNN, a novel deep learning framework using protein language model and graph neural network (GNN) to improve the accuracy of prediction of structure-based protein-protein binding affinities. The evaluation results across several benchmark test sets and an additional case study demonstrate that ProAffinity-GNN not only outperforms existing models in terms of accuracy but also shows strong generalization capabilities.
Collapse
Affiliation(s)
- Zhiyuan Zhou
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Yueming Yin
- Institute for Digital Molecular Analytics and Science (IDMxS), Nanyang Technological University, 636921, Singapore
| | - Hao Han
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Yiping Jia
- School of Pharmacy, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jun Hong Koh
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Adams Wai-Kin Kong
- College of Computing and Data Science, Nanyang Technological University, 639798, Singapore
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| |
Collapse
|
9
|
Sharo C, Zhang J, Zhai T, Bao J, Garcia-Epelboim A, Mamourian E, Shen L, Huang Z. Repurposing FDA-Approved Drugs Against Potential Drug Targets Involved in Brain Inflammation Contributing to Alzheimer's Disease. TARGETS (BASEL) 2024; 2:446-469. [PMID: 39897171 PMCID: PMC11786951 DOI: 10.3390/targets2040025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Alzheimer's disease is a neurodegenerative disease that continues to have a rising number of cases. While extensive research has been conducted in the last few decades, only a few drugs have been approved by the FDA for treatment, and even fewer aim to be curative rather than manage symptoms. There remains an urgent need for understanding disease pathogenesis, as well as identifying new targets for further drug discovery. Alzheimer's disease (AD) is known to stem from a build-up of amyloid beta (Aβ) plaques as well as tangles of tau proteins. Furthermore, inflammation in the brain is known to arise from the degeneration of tissue and the build-up of insoluble material. Therefore, there is a potential link between the pathology of AD and inflammation in the brain, especially as the disease progresses to later stages where neuronal death and degeneration levels are higher. Proteins that are relevant to both brain inflammation and AD thus make ideal potential targets for therapeutics; however, the proteins need to be evaluated to determine which targets would be ideal for potential drug therapeutic treatments, or 'druggable'. Druggability analysis was conducted using two structure-based methods (i.e., Drug-Like Density analysis and SiteMap), as well as a sequence-based approach, SPIDER. The most druggable targets were then evaluated using single-nuclei sequencing data for their clinical relevance to inflammation in AD. For each of the top five targets, small molecule docking was used to evaluate which FDA approved drugs were able to bind with the chosen proteins. The top targets included DRD2 (inhibits adenylyl cyclase activity), C9 (binds with C5B8 to form the membrane attack complex), C4b (binds with C2a to form C3 convertase), C5AR1 (GPCR that binds C5a), and GABA-A-R (GPCR involved in inhibiting neurotransmission). Each target had multiple potential inhibitors from the FDA-approved drug list with decent binding infinities. Among these inhibitors, two drugs were found as top inhibitors for more than one protein target. They are C15H14N2O2 and v316 (Paracetamol), used to treat pain/inflammation originally for cataracts and relieve headaches/fever, respectively. These results provide the groundwork for further experimental investigation or clinical trials.
Collapse
Affiliation(s)
- Catherine Sharo
- Department of Chemical and Biological Engineering, Villanova University, Villanova, PA 19085, USA
| | - Jiayu Zhang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Tianhua Zhai
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrés Garcia-Epelboim
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Elizabeth Mamourian
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Zuyi Huang
- Department of Chemical and Biological Engineering, Villanova University, Villanova, PA 19085, USA
| |
Collapse
|
10
|
McFee M, Kim J, Kim PM. EuDockScore: Euclidean graph neural networks for scoring protein-protein interfaces. Bioinformatics 2024; 40:btae636. [PMID: 39441796 PMCID: PMC11543620 DOI: 10.1093/bioinformatics/btae636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 10/16/2024] [Accepted: 10/21/2024] [Indexed: 10/25/2024] Open
Abstract
MOTIVATION Protein-protein interactions are essential for a variety of biological phenomena including mediating biochemical reactions, cell signaling, and the immune response. Proteins seek to form interfaces which reduce overall system energy. Although determination of single polypeptide chain protein structures has been revolutionized by deep learning techniques, complex prediction has still not been perfected. Additionally, experimentally determining structures is incredibly resource and time expensive. An alternative is the technique of computational docking, which takes the solved individual structures of proteins to produce candidate interfaces (decoys). Decoys are then scored using a mathematical function that assess the quality of the system, known as scoring functions. Beyond docking, scoring functions are a critical component of assessing structures produced by many protein generative models. Scoring models are also used as a final filtering in many generative deep learning models including those that generate antibody binders, and those which perform docking. RESULTS In this work, we present improved scoring functions for protein-protein interactions which utilizes cutting-edge Euclidean graph neural network architectures, to assess protein-protein interfaces. These Euclidean docking score models are known as EuDockScore, and EuDockScore-Ab with the latter being antibody-antigen dock specific. Finally, we provided EuDockScore-AFM a model trained on antibody-antigen outputs from AlphaFold-Multimer (AFM) which proves useful in reranking large numbers of AFM outputs. AVAILABILITY AND IMPLEMENTATION The code for these models is available at https://gitlab.com/mcfeemat/eudockscore.
Collapse
Affiliation(s)
- Matthew McFee
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jisun Kim
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Philip M Kim
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Computer Science, The University of Toronto, Toronto, ON M5S 2E4, Canada
| |
Collapse
|
11
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
12
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
13
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
14
|
Farheen F, Broyles BK, Zhang Y, Ibtehaz N, Erkine AM, Kihara D. Predicting transcriptional activation domain function using Graph Neural Networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.08.593266. [PMID: 38766093 PMCID: PMC11100744 DOI: 10.1101/2024.05.08.593266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence. To address these, we have utilized the power of graph neural networks which can represent structural data in the form of nodes and edges, allowing nodes to exchange information among themselves. We have experimented with two kinds of graph formulations, one involving residues as nodes and the other assigning atoms to be the nodes. A logistic regression model was also developed to analyze feature importance. For all the models, several feature combinations were experimented with. The residue-level GNN model with amino acid type, residue position, acidic/basic/aromatic property and secondary structure feature combination gave the best performing model with accuracy, F1 score and AUROC of 97.9%, 71% and 97.1% respectively which outperformed other existing methods in the literature when applied on the dataset we used. Among the other structure-based features that were analyzed, the amphipathic property of helices also proved to be an important feature for classification. Logistic regression results showed that the most dominant feature that makes a sequence functional is the frequency of different types of amino acids in the sequence. Our results consistent have shown that functional sequences have more acidic and aromatic residues whereas basic residues are seen more in non-functional sequences.
Collapse
Affiliation(s)
- Farhanaz Farheen
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Bradley K. Broyles
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Alexandre M. Erkine
- College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
15
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
16
|
Mslati H, Gentile F, Pandey M, Ban F, Cherkasov A. PROTACable Is an Integrative Computational Pipeline of 3-D Modeling and Deep Learning To Automate the De Novo Design of PROTACs. J Chem Inf Model 2024; 64:3034-3046. [PMID: 38504115 DOI: 10.1021/acs.jcim.3c01878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Proteolysis-targeting chimeras (PROTACs) that engage two biological targets at once are a promising technology in degrading clinically relevant protein targets. Since factors that influence the biological activities of PROTACs are more complex than those of a small molecule drug, we explored a combination of computational chemistry and deep learning strategies to forecast PROTAC activity and enable automated design. A new method named PROTACable was developed for the de novo design of PROTACs, which includes a robust 3-D modeling workflow to model PROTAC ternary complexes using a library of E3 ligase and linker and an SE(3)-equivariant graph transformer network to predict the activity of newly designed PROTACs. PROTACable is available at https://github.com/giaguaro/PROTACable/.
Collapse
Affiliation(s)
- Hazem Mslati
- Vancouver Prostate Centre, The University of British Columbia, Vancouver, British Columbia V6H 3Z6, Canada
| | - Francesco Gentile
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
- Ottawa Institute of Systems Biology, Ottawa, Ontario K1N 6N5, Canada
| | - Mohit Pandey
- Vancouver Prostate Centre, The University of British Columbia, Vancouver, British Columbia V6H 3Z6, Canada
| | - Fuqiang Ban
- Vancouver Prostate Centre, The University of British Columbia, Vancouver, British Columbia V6H 3Z6, Canada
| | - Artem Cherkasov
- Vancouver Prostate Centre, The University of British Columbia, Vancouver, British Columbia V6H 3Z6, Canada
| |
Collapse
|
17
|
Xu X, Bonvin AMJJ. DeepRank-GNN-esm: a graph neural network for scoring protein-protein models using protein language model. BIOINFORMATICS ADVANCES 2024; 4:vbad191. [PMID: 38213822 PMCID: PMC10782804 DOI: 10.1093/bioadv/vbad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 12/19/2023] [Indexed: 01/13/2024]
Abstract
Motivation Protein-Protein interactions (PPIs) play critical roles in numerous cellular processes. By modelling the 3D structures of the correspond protein complexes valuable insights can be obtained, providing, e.g. starting points for drug and protein design. One challenge in the modelling process is however the identification of near-native models from the large pool of generated models. To this end we have previously developed DeepRank-GNN, a graph neural network that integrates structural and sequence information to enable effective pattern learning at PPI interfaces. Its main features are related to the Position Specific Scoring Matrices (PSSMs), which are computationally expensive to generate, significantly limits the algorithm's usability. Results We introduce here DeepRank-GNN-esm that includes as additional features protein language model embeddings from the ESM-2 model. We show that the ESM-2 embeddings can actually replace the PSSM features at no cost in-, or even better performance on two PPI-related tasks: scoring docking poses and detecting crystal artifacts. This new DeepRank version bypasses thus the need of generating PSSM, greatly improving the usability of the software and opening new application opportunities for systems for which PSSM profiles cannot be obtained or are irrelevant (e.g. antibody-antigen complexes). Availability and implementation DeepRank-GNN-esm is freely available from https://github.com/DeepRank/DeepRank-GNN-esm.
Collapse
Affiliation(s)
- Xiaotong Xu
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| | - Alexandre M J J Bonvin
- Department of Chemistry, Faculty of Science, Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Utrecht 3584 CS, The Netherlands
| |
Collapse
|
18
|
Zhang Y, Wang X, Zhang Z, Huang Y, Kihara D. Assessment of Protein-Protein Docking Models Using Deep Learning. Methods Mol Biol 2024; 2780:149-162. [PMID: 38987469 DOI: 10.1007/978-1-0716-3985-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein interactions are involved in almost all processes in a living cell and determine the biological functions of proteins. To obtain mechanistic understandings of protein-protein interactions, the tertiary structures of protein complexes have been determined by biophysical experimental methods, such as X-ray crystallography and cryogenic electron microscopy. However, as experimental methods are costly in resources, many computational methods have been developed that model protein complex structures. One of the difficulties in computational protein complex modeling (protein docking) is to select the most accurate models among many models that are usually generated by a docking method. This article reviews advances in protein docking model assessment methods, focusing on recent developments that apply deep learning to several network architectures.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Zicong Zhang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yunhan Huang
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
19
|
Michalik I, Kuder KJ. Machine Learning Methods in Protein-Protein Docking. Methods Mol Biol 2024; 2780:107-126. [PMID: 38987466 DOI: 10.1007/978-1-0716-3985-6_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
An exponential increase in the number of publications that address artificial intelligence (AI) usage in life sciences has been noticed in recent years, while new modeling techniques are constantly being reported. The potential of these methods is vast-from understanding fundamental cellular processes to discovering new drugs and breakthrough therapies. Computational studies of protein-protein interactions, crucial for understanding the operation of biological systems, are no exception in this field. However, despite the rapid development of technology and the progress in developing new approaches, many aspects remain challenging to solve, such as predicting conformational changes in proteins, or more "trivial" issues as high-quality data in huge quantities.Therefore, this chapter focuses on a short introduction to various AI approaches to study protein-protein interactions, followed by a description of the most up-to-date algorithms and programs used for this purpose. Yet, given the considerable pace of development in this hot area of computational science, at the time you read this chapter, the development of the algorithms described, or the emergence of new (and better) ones should come as no surprise.
Collapse
Affiliation(s)
- Ilona Michalik
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland
| | - Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
20
|
Zięba A, Matosiuk D. Sampling and Scoring in Protein-Protein Docking. Methods Mol Biol 2024; 2780:15-26. [PMID: 38987461 DOI: 10.1007/978-1-0716-3985-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Protein-protein docking is considered one of the most important techniques supporting experimental proteomics. Recent developments in the field of computer science helped to improve this computational technique so that it better handles the complexity of protein nature. Sampling algorithms are responsible for the generation of numerous protein-protein ensembles. Unfortunately, a primary docking output comprises a set of both near-native poses and decoys. Application of the efficient scoring function helps to differentiate poses with the most favorable properties from those that are very unlikely to represent a natural state of the complex. This chapter explains the importance of sampling and scoring in the process of protein-protein docking. Moreover, it summarizes advances in the field.
Collapse
Affiliation(s)
- Agata Zięba
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland.
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modeling Laboratory, Faculty of Pharmacy, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
21
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. Proteins 2023; 91:1889-1902. [PMID: 37357816 PMCID: PMC10749984 DOI: 10.1002/prot.26542] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/27/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and performed very well in estimating the global structure accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analyzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA.
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
22
|
Olechnovič K, Venclovas Č. VoroIF-GNN: Voronoi tessellation-derived protein-protein interface assessment using a graph neural network. Proteins 2023; 91:1879-1888. [PMID: 37482904 DOI: 10.1002/prot.26554] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/19/2023] [Accepted: 07/01/2023] [Indexed: 07/25/2023]
Abstract
We present VoroIF-GNN (Voronoi InterFace Graph Neural Network), a novel method for assessing inter-subunit interfaces in a structural model of a protein-protein complex, relying solely on the input structure without any additional information. Given a multimeric protein structural model, we derive interface contacts from the Voronoi tessellation of atomic balls, construct a graph of those contacts, and predict the accuracy of every contact using an attention-based GNN. The contact-level predictions are then summarized to produce whole interface-level scores. VoroIF-GNN was blindly tested for its ability to estimate the accuracy of protein complexes during CASP15 and showed strong performance in selecting the best multimeric model out of many. The method implementation is freely available at https://kliment-olechnovic.github.io/voronota/expansion_js/.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
23
|
Schweke H, Xu Q, Tauriello G, Pantolini L, Schwede T, Cazals F, Lhéritier A, Fernandez-Recio J, Rodríguez-Lumbreras LÁ, Schueler-Furman O, Varga JK, Jiménez-García B, Réau MF, Bonvin A, Savojardo C, Martelli PL, Casadio R, Tubiana J, Wolfson H, Oliva R, Barradas-Bautista D, Ricciardelli T, Cavallo L, Venclovas Č, Olechnovič K, Guerois R, Andreani J, Martin J, Wang X, Kihara D, Marchand A, Correia B, Zou X, Dey S, Dunbrack R, Levy E, Wodak S. Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics 2023; 23:e2200323. [PMID: 37365936 PMCID: PMC10937251 DOI: 10.1002/pmic.202200323] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/28/2023]
Abstract
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julia K. Varga
- Hebrew University of Jerusalem Institute for Medical Research Israel-Canada
| | | | | | | | | | | | | | - Jérôme Tubiana
- Tel Aviv University Blavatnik School of Computer Science
| | - Haim Wolfson
- Tel Aviv University Blavatnik School of Computer Science
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, Institute for Data Science and Informatics, University of Missouri
| | | | | | | | | |
Collapse
|
24
|
Chen Z, Liu N, Huang Y, Min X, Zeng X, Ge S, Zhang J, Xia N. PointDE: Protein Docking Evaluation Using 3D Point Cloud Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3128-3138. [PMID: 37220029 DOI: 10.1109/tcbb.2023.3279019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Protein-protein interactions (PPIs) play essential roles in many vital movements and the determination of protein complex structure is helpful to discover the mechanism of PPI. Protein-protein docking is being developed to model the structure of the protein. However, there is still a challenge to selecting the near-native decoys generated by protein-protein docking. Here, we propose a docking evaluation method using 3D point cloud neural network named PointDE. PointDE transforms protein structure to the point cloud. Using the state-of-the-art point cloud network architecture and a novel grouping mechanism, PointDE can capture the geometries of the point cloud and learn the interaction information from the protein interface. On public datasets, PointDE surpasses the state-of-the-art method using deep learning. To further explore the ability of our method in different types of protein structures, we developed a new dataset generated by high-quality antibody-antigen complexes. The result in this antibody-antigen dataset shows the strong performance of PointDE, which will be helpful for the understanding of PPI mechanisms.
Collapse
|
25
|
Zeng X, Bai G, Sun C, Ma B. Recent Progress in Antibody Epitope Prediction. Antibodies (Basel) 2023; 12:52. [PMID: 37606436 PMCID: PMC10443277 DOI: 10.3390/antib12030052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/23/2023] Open
Abstract
Recent progress in epitope prediction has shown promising results in the development of vaccines and therapeutics against various diseases. However, the overall accuracy and success rate need to be improved greatly to gain practical application significance, especially conformational epitope prediction. In this review, we examined the general features of antibody-antigen recognition, highlighting the conformation selection mechanism in flexible antibody-antigen binding. We recently highlighted the success and warning signs of antibody epitope predictions, including linear and conformation epitope predictions. While deep learning-based models gradually outperform traditional feature-based machine learning, sequence and structure features still provide insight into antibody-antigen recognition problems.
Collapse
Affiliation(s)
- Xincheng Zeng
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Ganggang Bai
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Chuance Sun
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
| | - Buyong Ma
- Engineering Research Center of Cell & Therapeutic Antibody (MOE), School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China; (X.Z.); (C.S.)
- Shanghai Digiwiser Biological, Inc., Shanghai 200131, China
| |
Collapse
|
26
|
Chen X, Morehead A, Liu J, Cheng J. A gated graph transformer for protein complex structure quality assessment and its performance in CASP15. Bioinformatics 2023; 39:i308-i317. [PMID: 37387159 PMCID: PMC10311325 DOI: 10.1093/bioinformatics/btad203] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Proteins interact to form complexes to carry out essential biological functions. Computational methods such as AlphaFold-multimer have been developed to predict the quaternary structures of protein complexes. An important yet largely unsolved challenge in protein complex structure prediction is to accurately estimate the quality of predicted protein complex structures without any knowledge of the corresponding native structures. Such estimations can then be used to select high-quality predicted complex structures to facilitate biomedical research such as protein function analysis and drug discovery. RESULTS In this work, we introduce a new gated neighborhood-modulating graph transformer to predict the quality of 3D protein complex structures. It incorporates node and edge gates within a graph transformer framework to control information flow during graph message passing. We trained, evaluated and tested the method (called DProQA) on newly-curated protein complex datasets before the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) and then blindly tested it in the 2022 CASP15 experiment. The method was ranked 3rd among the single-model quality assessment methods in CASP15 in terms of the ranking loss of TM-score on 36 complex targets. The rigorous internal and external experiments demonstrate that DProQA is effective in ranking protein complex structures. AVAILABILITY AND IMPLEMENTATION The source code, data, and pre-trained models are available at https://github.com/jianlin-cheng/DProQA.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Alex Morehead
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65201, United States
| |
Collapse
|
27
|
McFee M, Kim PM. GDockScore: a graph-based protein-protein docking scoring function. BIOINFORMATICS ADVANCES 2023; 3:vbad072. [PMID: 37359726 PMCID: PMC10290236 DOI: 10.1093/bioadv/vbad072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/30/2023] [Accepted: 06/10/2023] [Indexed: 06/28/2023]
Abstract
Summary Protein complexes play vital roles in a variety of biological processes, such as mediating biochemical reactions, the immune response and cell signalling, with 3D structure specifying function. Computational docking methods provide a means to determine the interface between two complexed polypeptide chains without using time-consuming experimental techniques. The docking process requires the optimal solution to be selected with a scoring function. Here, we propose a novel graph-based deep learning model that utilizes mathematical graph representations of proteins to learn a scoring function (GDockScore). GDockScore was pre-trained on docking outputs generated with the Protein Data Bank biounits and the RosettaDock protocol, and then fine-tuned on HADDOCK decoys generated on the ZDOCK Protein Docking Benchmark. GDockScore performs similarly to the Rosetta scoring function on docking decoys generated using the RosettaDock protocol. Furthermore, state-of-the-art is achieved on the CAPRI score set, a challenging dataset for developing docking scoring functions. Availability and implementation The model implementation is available at https://gitlab.com/mcfeemat/gdockscore. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Matthew McFee
- Department of Molecular Genetics, The University of Toronto, Toronto, ON M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, The University of Toronto, Toronto, ON M5S 3E1, Canada
| | | |
Collapse
|
28
|
Shuvo MH, Karim M, Roche R, Bhattacharya D. PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries. BIOINFORMATICS ADVANCES 2023; 3:vbad070. [PMID: 37351310 PMCID: PMC10281963 DOI: 10.1093/bioadv/vbad070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/17/2023] [Accepted: 06/01/2023] [Indexed: 06/24/2023]
Abstract
Motivation Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Results Here, we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. Availability and implementation An open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Mohimenul Karim
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
29
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
30
|
Masters MR, Mahmoud AH, Wei Y, Lill MA. Deep Learning Model for Efficient Protein-Ligand Docking with Implicit Side-Chain Flexibility. J Chem Inf Model 2023; 63:1695-1707. [PMID: 36916514 DOI: 10.1021/acs.jcim.2c01436] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
Abstract
Protein-ligand docking is an essential tool in structure-based drug design with applications ranging from virtual high-throughput screening to pose prediction for lead optimization. Most docking programs for pose prediction are optimized for redocking to an existing cocrystallized protein structure, ignoring protein flexibility. In real-world drug design applications, however, protein flexibility is an essential feature of the ligand-binding process. Flexible protein-ligand docking still remains a significant challenge to computational drug design. To target this challenge, we present a deep learning (DL) model for flexible protein-ligand docking based on the prediction of an intermolecular Euclidean distance matrix (EDM), making the typical use of iterative search algorithms obsolete. The model was trained on a large-scale data set of protein-ligand complexes and evaluated on independent test sets. Our model generates high quality poses for a diverse set of protein and ligand structures and outperforms comparable docking methods.
Collapse
Affiliation(s)
- Matthew R Masters
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Amr H Mahmoud
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Yao Wei
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| | - Markus A Lill
- Department of Pharmaceutical Sciences, University of Basel, Klingelbergstrasse 50, 4056 Basel, Switzerland
| |
Collapse
|
31
|
Roy RS, Liu J, Giri N, Guo Z, Cheng J. Combining pairwise structural similarity and deep learning interface contact prediction to estimate protein complex model accuracy in CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531814. [PMID: 36945536 PMCID: PMC10028888 DOI: 10.1101/2023.03.08.531814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Estimating the accuracy of quaternary structural models of protein complexes and assemblies (EMA) is important for predicting quaternary structures and applying them to studying protein function and interaction. The pairwise similarity between structural models is proven useful for estimating the quality of protein tertiary structural models, but it has been rarely applied to predicting the quality of quaternary structural models. Moreover, the pairwise similarity approach often fails when many structural models are of low quality and similar to each other. To address the gap, we developed a hybrid method (MULTICOM_qa) combining a pairwise similarity score (PSS) and an interface contact probability score (ICPS) based on the deep learning inter-chain contact prediction for estimating protein complex model accuracy. It blindly participated in the 15th Critical Assessment of Techniques for Protein Structure Prediction (CASP15) in 2022 and ranked first out of 24 predictors in estimating the global accuracy of assembly models. The average per-target correlation coefficient between the model quality scores predicted by MULTICOM_qa and the true quality scores of the models of CASP15 assembly targets is 0.66. The average per-target ranking loss in using the predicted quality scores to rank the models is 0.14. It was able to select good models for most targets. Moreover, several key factors (i.e., target difficulty, model sampling difficulty, skewness of model quality, and similarity between good/bad models) for EMA are identified and analayzed. The results demonstrate that combining the multi-model method (PSS) with the complementary single-model method (ICPS) is a promising approach to EMA. The source code of MULTICOM_qa is available at https://github.com/BioinfoMachineLearning/MULTICOM_qa .
Collapse
Affiliation(s)
- Raj S. Roy
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Nabin Giri
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
32
|
Rui H, Ashton KS, Min J, Wang C, Potts PR. Protein-protein interfaces in molecular glue-induced ternary complexes: classification, characterization, and prediction. RSC Chem Biol 2023; 4:192-215. [PMID: 36908699 PMCID: PMC9994104 DOI: 10.1039/d2cb00207h] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/02/2023] [Indexed: 01/04/2023] Open
Abstract
Molecular glues are a class of small molecules that stabilize the interactions between proteins. Naturally occurring molecular glues are present in many areas of biology where they serve as central regulators of signaling pathways. Importantly, several clinical compounds act as molecular glue degraders that stabilize interactions between E3 ubiquitin ligases and target proteins, leading to their degradation. Molecular glues hold promise as a new generation of therapeutic agents, including those molecular glue degraders that can redirect the protein degradation machinery in a precise way. However, rational discovery of molecular glues is difficult in part due to the lack of understanding of the protein-protein interactions they stabilize. In this review, we summarize the structures of known molecular glue-induced ternary complexes and the interface properties. Detailed analysis shows different mechanisms of ternary structure formation. Additionally, we also review computational approaches for predicting protein-protein interfaces and highlight the promises and challenges. This information will ultimately help inform future approaches for rational molecular glue discovery.
Collapse
Affiliation(s)
- Huan Rui
- Center for Research Acceleration by Digital Innovation, Amgen Research Thousand Oaks CA 91320 USA
| | - Kate S Ashton
- Medicinal Chemistry, Amgen Research Thousand Oaks CA 91320 USA
| | - Jaeki Min
- Induced Proximity Platform, Amgen Research Thousand Oaks CA 91320 USA
| | - Connie Wang
- Digital, Technology & Innovation, Amgen Thousand Oaks CA 91320 USA
| | | |
Collapse
|
33
|
Shuvo MH, Karim M, Roche R, Bhattacharya D. PIQLE: protein-protein interface quality estimation by deep graph learning of multimeric interaction geometries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.14.528528. [PMID: 36824789 PMCID: PMC9949034 DOI: 10.1101/2023.02.14.528528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Accurate modeling of protein-protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Here we present PIQLE, a deep graph learning method for protein-protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of the individual interactions between the interfacial residues using a multihead graph attention network and then probabilistically combines the estimated quality of the interfacial residues for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study reveals that the performance gains are connected to the effectiveness of the multihead graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. An open-source software implementation of PIQLE, licensed under the GNU General Public License v3, is freely available at https://github.com/Bhattacharya-Lab/PIQLE .
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Mohimenul Karim
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| |
Collapse
|
34
|
Yao M, Yu H, Bian H. Defending against adversarial attacks on graph neural networks via similarity property. AI COMMUN 2023. [DOI: 10.3233/aic-220120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Graph Neural Networks (GNNs) are powerful tools in graph application areas. However, recent studies indicate that GNNs are vulnerable to adversarial attacks, which can lead GNNs to easily make wrong predictions for downstream tasks. A number of works aim to solve this problem but what criteria we should follow to clean the perturbed graph is still a challenge. In this paper, we propose GSP-GNN, a general framework to defend against massive poisoning attacks that can perturb graphs. The vital principle of GSP-GNN is to explore the similarity property to mitigate negative effects on graphs. Specifically, this method prunes adversarial edges by the similarity of node feature and graph structure to eliminate adversarial perturbations. In order to stabilize and enhance GNNs training process, previous layer information is adopted in case a large number of edges are pruned in one layer. Extensive experiments on three real-world graphs demonstrate that GSP-GNN achieves significantly better performance compared with the representative baselines and has favorable generalization ability simultaneously.
Collapse
Affiliation(s)
- Minghong Yao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi 830046, China
| | - Haizheng Yu
- College of Mathematics and System Sciences, Xinjiang University, Urumqi 830046, China
| | - Hong Bian
- School of Mathematical Sciences, Xinjiang Normal University, Urumqi 830017, China
| |
Collapse
|
35
|
Réau M, Renaud N, Xue LC, Bonvin AMJJ. DeepRank-GNN: a graph neural network framework to learn patterns in protein-protein interfaces. Bioinformatics 2022; 39:6845451. [PMID: 36420989 PMCID: PMC9805592 DOI: 10.1093/bioinformatics/btac759] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION Gaining structural insights into the protein-protein interactome is essential to understand biological phenomena and extract knowledge for rational drug design or protein engineering. We have previously developed DeepRank, a deep-learning framework to facilitate pattern learning from protein-protein interfaces using convolutional neural network (CNN) approaches. However, CNN is not rotation invariant and data augmentation is required to desensitize the network to the input data orientation which dramatically impairs the computation performance. Representing protein-protein complexes as atomic- or residue-scale rotation invariant graphs instead enables using graph neural networks (GNN) approaches, bypassing those limitations. RESULTS We have developed DeepRank-GNN, a framework that converts protein-protein interfaces from PDB 3D coordinates files into graphs that are further provided to a pre-defined or user-defined GNN architecture to learn problem-specific interaction patterns. DeepRank-GNN is designed to be highly modularizable, easily customized and is wrapped into a user-friendly python3 package. Here, we showcase DeepRank-GNN's performance on two applications using a dedicated graph interaction neural network: (i) the scoring of docking poses and (ii) the discriminating of biological and crystal interfaces. In addition to the highly competitive performance obtained in those tasks as compared to state-of-the-art methods, we show a significant improvement in speed and storage requirement using DeepRank-GNN as compared to DeepRank. AVAILABILITY AND IMPLEMENTATION DeepRank-GNN is freely available from https://github.com/DeepRank/DeepRank-GNN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Li C Xue
- Center for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen 6525 GA, The Netherlands
| | | |
Collapse
|
36
|
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:1246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein-ligand binding, including allosteric effects, protein-protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
Affiliation(s)
- Chris Avery
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - John Patterson
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Tyler Grear
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Theodore Frater
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Donald J. Jacobs
- Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
37
|
Mohseni Behbahani Y, Crouzet S, Laine E, Carbone A. Deep Local Analysis evaluates protein docking conformations with locally oriented cubes. Bioinformatics 2022; 38:4505-4512. [PMID: 35962985 PMCID: PMC9525006 DOI: 10.1093/bioinformatics/btac551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 07/04/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the recent advances in protein 3D structure prediction, protein interactions are becoming more central than ever before. Here, we address the problem of determining how proteins interact with one another. More specifically, we investigate the possibility of discriminating near-native protein complex conformations from incorrect ones by exploiting local environments around interfacial residues. RESULTS Deep Local Analysis (DLA)-Ranker is a deep learning framework applying 3D convolutions to a set of locally oriented cubes representing the protein interface. It explicitly considers the local geometry of the interfacial residues along with their neighboring atoms and the regions of the interface with different solvent accessibility. We assessed its performance on three docking benchmarks made of half a million acceptable and incorrect conformations. We show that DLA-Ranker successfully identifies near-native conformations from ensembles generated by molecular docking. It surpasses or competes with other deep learning-based scoring functions. We also showcase its usefulness to discover alternative interfaces. AVAILABILITY AND IMPLEMENTATION http://gitlab.lcqb.upmc.fr/dla-ranker/DLA-Ranker.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yasser Mohseni Behbahani
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris 75005, France
| | - Simon Crouzet
- Sorbonne Université, CNRS, IBPS, Laboratory of Computational and Quantitative Biology (LCQB), UMR 7238, Paris 75005, France
| | | | | |
Collapse
|
38
|
Pozzati G, Kundrotas P, Elofsson A. Scoring of protein–protein docking models utilizing predicted interface residues. Proteins 2022; 90:1493-1505. [PMID: 35246997 PMCID: PMC9314140 DOI: 10.1002/prot.26330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 02/23/2022] [Accepted: 02/28/2022] [Indexed: 11/08/2022]
Abstract
Scoring docking solutions is a difficult task, and many methods have been developed for this purpose. In docking, only a handful of the hundreds of thousands of models generated by docking algorithms are acceptable, causing difficulties when developing scoring functions. Today's best scoring functions can significantly increase the number of top‐ranked models but still fail for most targets. Here, we examine the possibility of utilizing predicted interface residues to score docking models generated during the scan stage of a docking algorithm. Many methods have been developed to infer the regions of a protein surface that interact with another protein, but most have not been benchmarked using docking algorithms. This study systematically tests different interface prediction methods for scoring >300.000 low‐resolution rigid‐body template free docking decoys. Overall we find that contact‐based interface prediction by BIPSPI is the best method to score docking solutions, with >12% of first ranked docking models being acceptable. Additional experiments indicated precision as a high‐importance metric when estimating interface prediction quality, focusing on docking constraints production. Finally, we discussed several limitations for adopting interface predictions as constraints in a docking protocol.
Collapse
Affiliation(s)
- Gabriele Pozzati
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
| | - Petras Kundrotas
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
- Center for Bioinformatics and Department of Molecular Biosciences University of Kansas Lawrence Kansas USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory Stockholm University Solna Sweden
| |
Collapse
|
39
|
Jiang H, Wang J, Cong W, Huang Y, Ramezani M, Sarma A, Dokholyan NV, Mahdavi M, Kandemir MT. Predicting Protein-Ligand Docking Structure with Graph Neural Network. J Chem Inf Model 2022; 62:2923-2932. [PMID: 35699430 PMCID: PMC10279412 DOI: 10.1021/acs.jcim.2c00127] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Modern day drug discovery is extremely expensive and time consuming. Although computational approaches help accelerate and decrease the cost of drug discovery, existing computational software packages for docking-based drug discovery suffer from both low accuracy and high latency. A few recent machine learning-based approaches have been proposed for virtual screening by improving the ability to evaluate protein-ligand binding affinity, but such methods rely heavily on conventional docking software to sample docking poses, which results in excessive execution latencies. Here, we propose and evaluate a novel graph neural network (GNN)-based framework, MedusaGraph, which includes both pose-prediction (sampling) and pose-selection (scoring) models. Unlike the previous machine learning-centric studies, MedusaGraph generates the docking poses directly and achieves from 10 to 100 times speedup compared to state-of-the-art approaches, while having a slightly better docking accuracy.
Collapse
Affiliation(s)
- Huaipan Jiang
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Jian Wang
- Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey, Pennsylvania 17033, United States
| | - Weilin Cong
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Yihe Huang
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Morteza Ramezani
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Anup Sarma
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Nikolay V Dokholyan
- Departments of Pharmacology and Biochemistry and Molecular Biology, Pennsylvania State College of Medicine, Hershey, Pennsylvania 17033, United States
- Departments of Chemistry and Biomedical Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Mehrdad Mahdavi
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| | - Mahmut T Kandemir
- Department of Computer Science and Engineering, Pennsylvania State University, State College, Pennsylvania 16802, United States
| |
Collapse
|
40
|
Fasoulis R, Paliouras G, Kavraki LE. Graph representation learning for structural proteomics. Emerg Top Life Sci 2021; 5:789-802. [PMID: 34665257 PMCID: PMC8786289 DOI: 10.1042/etls20210225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/02/2021] [Accepted: 09/13/2021] [Indexed: 12/13/2022]
Abstract
The field of structural proteomics, which is focused on studying the structure-function relationship of proteins and protein complexes, is experiencing rapid growth. Since the early 2000s, structural databases such as the Protein Data Bank are storing increasing amounts of protein structural data, in addition to modeled structures becoming increasingly available. This, combined with the recent advances in graph-based machine-learning models, enables the use of protein structural data in predictive models, with the goal of creating tools that will advance our understanding of protein function. Similar to using graph learning tools to molecular graphs, which currently undergo rapid development, there is also an increasing trend in using graph learning approaches on protein structures. In this short review paper, we survey studies that use graph learning techniques on proteins, and examine their successes and shortcomings, while also discussing future directions.
Collapse
Affiliation(s)
- Romanos Fasoulis
- Department of Computer Science, Rice University, Houston, TX, U.S.A
| | - Georgios Paliouras
- Institute of Informatics and Telecommunications, NCSR Demokritos, Athens, Greece
| | - Lydia E. Kavraki
- Department of Computer Science, Rice University, Houston, TX, U.S.A
| |
Collapse
|
41
|
Han Y, He F, Chen Y, Qin W, Yu H, Xu D. Quality Assessment of Protein Docking Models Based on Graph Neural Network. FRONTIERS IN BIOINFORMATICS 2021; 1:693211. [PMID: 36303780 PMCID: PMC9581034 DOI: 10.3389/fbinf.2021.693211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 08/02/2021] [Indexed: 11/24/2022] Open
Abstract
Protein docking provides a structural basis for the design of drugs and vaccines. Among the processes of protein docking, quality assessment (QA) is utilized to pick near-native models from numerous protein docking candidate conformations, and it directly determines the final docking results. Although extensive efforts have been made to improve QA accuracy, it is still the bottleneck of current protein docking systems. In this paper, we presented a Deep Graph Attention Neural Network (DGANN) to evaluate and rank protein docking candidate models. DGANN learns inter-residue physio-chemical properties and structural fitness across the two protein monomers in a docking model and generates their probabilities of near-native models. On the ZDOCK decoy benchmark, our DGANN outperformed the ranking provided by ZDOCK in terms of ranking good models into the top selections. Furthermore, we conducted comparative experiments on an independent testing dataset, and the results also demonstrated the superiority and generalization of our proposed method.
Collapse
Affiliation(s)
- Ye Han
- School of Information Technology, Jilin Agricultural University, Changchun, China
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Fei He
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- School of Information Science and Technology, Northeast Normal University, Changchun, China
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun, China
| | - Wenyuan Qin
- School of Information Science and Technology, Northeast Normal University, Changchun, China
| | - Helong Yu
- School of Information Technology, Jilin Agricultural University, Changchun, China
- *Correspondence: Helong Yu, ; Dong Xu,
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- *Correspondence: Helong Yu, ; Dong Xu,
| |
Collapse
|