1
|
Yan K, He W, Pang M, Lu X, Chen Z, Piao L, Zhang H, Wang Y, Chang S, Kong R. E3Docker: a docking server for potential E3 binder discovery. Nucleic Acids Res 2025:gkaf391. [PMID: 40337923 DOI: 10.1093/nar/gkaf391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/23/2025] [Accepted: 04/27/2025] [Indexed: 05/09/2025] Open
Abstract
Targeted protein degradation (TPD) has emerged as a promising therapeutic strategy for modulating protein levels in cells. Proteolysis-targeting chimeras and molecular glues facilitate the formation of a complex between the protein of interest (POI) and a specific E3 ligase, leading to POI ubiquitination and subsequent degradation by the proteasome. Considering over 600 E3s in the human genome, it is of great potential to find novel E3 binders and recruit new E3 ligase for TPD related drug discovery. Here we introduce E3Docker, an online computational tool for E3 binder discovery. A total of 1075 Homo sapiens E3 ligases are collected from databases and literature, and 4474 three-dimensional structures of these E3 ligases, in either apo or complex forms, are integrated into the web server. The druggable pockets for each E3 ligase are defined by experimentally bound ligand from PDB or predicted by using DeepPocket. CoDock-Ligand is employed as docking engine for potential E3 binder estimation. With a user-friendly interface, E3Docker facilitates the generation of binding poses and affinity scores for compounds with over 1000 kinds of E3 ligases and may benefit for novel E3 binder discovery. The E3Docker server and tutorials are freely available at https://e3docker.schanglab.org.cn/.
Collapse
Affiliation(s)
- Kejia Yan
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Wangqiu He
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Mingwei Pang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Xufeng Lu
- Primary Biotechnology Co., Ltd, Changzhou 213125, China
| | - Zhou Chen
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Lianhua Piao
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Han Zhang
- Institute of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Yu Wang
- Institute of Traditional Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
2
|
Lv SQ, Zeng X, Su GP, Du WF, Li Y, Wen ML. Improving Identification of Drug-Target Binding Sites Based on Structures of Targets Using Residual Graph Transformer Network. Biomolecules 2025; 15:221. [PMID: 40001524 PMCID: PMC11853427 DOI: 10.3390/biom15020221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 01/28/2025] [Accepted: 01/28/2025] [Indexed: 02/27/2025] Open
Abstract
Improving identification of drug-target binding sites can significantly aid in drug screening and design, thereby accelerating the drug development process. However, due to challenges such as insufficient fusion of multimodal information from targets and imbalanced datasets, enhancing the performance of drug-target binding sites prediction models remains exceptionally difficult. Leveraging structures of targets, we proposed a novel deep learning framework, RGTsite, which employed a Residual Graph Transformer Network to improve the identification of drug-target binding sites. First, a residual 1D convolutional neural network (1D-CNN) and the pre-trained model ProtT5 were employed to extract the local and global sequence features from the target, respectively. These features were then combined with the physicochemical properties of amino acid residues to serve as the vertex features in graph. Next, the edge features were incorporated, and the residual graph transformer network (GTN) was applied to extract the more comprehensive vertex features. Finally, a fully connected network was used to classify whether the vertex was a binding site. Experimental results showed that RGTsite outperformed the existing state-of-the-art methods in key evaluation metrics, such as F1-score (F1) and Matthews Correlation Coefficient (MCC), across multiple benchmark datasets. Additionally, we conducted interpretability analysis for RGTsite through the real-world cases, and the results confirmed that RGTsite can effectively identify drug-target binding sites in practical applications.
Collapse
Affiliation(s)
- Shuang-Qing Lv
- Faculty of Surveying and Information Engineering, West Yunnan University of Applied Sciences, Dali 671000, China;
| | - Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Guang-Peng Su
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Wen-Feng Du
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali 671003, China; (X.Z.)
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, Kunming 650000, China
| |
Collapse
|
3
|
Vural O, Jololian L. Machine learning approaches for predicting protein-ligand binding sites from sequence data. FRONTIERS IN BIOINFORMATICS 2025; 5:1520382. [PMID: 39963299 PMCID: PMC11830693 DOI: 10.3389/fbinf.2025.1520382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/10/2025] [Indexed: 02/20/2025] Open
Abstract
Proteins, composed of amino acids, are crucial for a wide range of biological functions. Proteins have various interaction sites, one of which is the protein-ligand binding site, essential for molecular interactions and biochemical reactions. These sites enable proteins to bind with other molecules, facilitating key biological functions. Accurate prediction of these binding sites is pivotal in computational drug discovery, helping to identify therapeutic targets and facilitate treatment development. Machine learning has made significant contributions to this field by improving the prediction of protein-ligand interactions. This paper reviews studies that use machine learning to predict protein-ligand binding sites from sequence data, focusing on recent advancements. The review examines various embedding methods and machine learning architectures, addressing current challenges and the ongoing debates in the field. Additionally, research gaps in the existing literature are highlighted, and potential future directions for advancing the field are discussed. This study provides a thorough overview of sequence-based approaches for predicting protein-ligand binding sites, offering insights into the current state of research and future possibilities.
Collapse
Affiliation(s)
- Orhun Vural
- Department of Electrical and Computer Engineering, The University of Alabama at Birmingham, Birmingham, AL, United States
| | | |
Collapse
|
4
|
Dosajh A, Agrawal P, Chatterjee P, Priyakumar UD. Modern machine learning methods for protein property prediction. Curr Opin Struct Biol 2025; 90:102990. [PMID: 39881454 DOI: 10.1016/j.sbi.2025.102990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 12/06/2024] [Accepted: 01/04/2025] [Indexed: 01/31/2025]
Abstract
Recent progress and development of artificial intelligence and machine learning (AI/ML) techniques have enabled addressing complex biomolecular problems. AI/ML models learn the underlying distribution of data they are trained on and when exposed to new inputs, they make predictions based on patterns and relationships previously observed in the training set. Further, generative artificial intelligence (GenAI) can be used to accurately generate protein structure or sequence from specific selected properties. This review specifically focuses on the applications of AI/ML in predicting important functional properties of proteins, and the potential prospects of reverse-engineering in depicting the sequence and structure, from available protein-property information.
Collapse
Affiliation(s)
- Arjun Dosajh
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prakul Agrawal
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - Prathit Chatterjee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, Telangana, India.
| |
Collapse
|
5
|
Shen Z, Chen R, Gao J, Chi X, Zhang Q, Bian Q, Zhou B, Che J, Dai H, Dong X. EvaluationMaster: A GUI Tool for Structure-Based Virtual Screening Evaluation Analysis and Decision-Making Support. J Chem Inf Model 2025; 65:7-14. [PMID: 39692527 DOI: 10.1021/acs.jcim.4c01818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2024]
Abstract
Structure-based virtual screening (SBVS) plays an indispensable role in the early phases of drug discovery, utilizing computational docking techniques to predict interactions between molecules and biological targets. During the SBVS process, selecting appropriate target structures and screening algorithms is crucial, as these choices significantly shape the outcomes. Typically, such selections require researchers to be proficient with multiple algorithms and familiar with evaluation and analysis processes, complicating their tasks. These algorithms' lack of graphical user interfaces (GUIs) further complicates it. To address these challenges, we introduced EvaluationMaster, the first GUI tool designed specifically to streamline and standardize the evaluation and decision-making processes in SBVS. It supports four docking algorithms' evaluation under multiple target structures and offers a comprehensive platform that manages the entire workflow─including the downloading of molecules, construction of decoy datasets, prediction of protein pockets, batch docking, and extensive data analysis. By automating complex evaluation tasks and providing clear visualizations of analysis results, EvaluationMaster significantly reduces the learning curve for researchers and boosts the efficiency of evaluations, potentially improving SBVS hit rates and accelerating the discovery and development of new therapeutic agents.
Collapse
Affiliation(s)
- Zheyuan Shen
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
| | - Roufen Chen
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
| | - Jian Gao
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
| | - Xinglong Chi
- Key Laboratory of Neuropsychiatric Drug Research of Zhejiang Province, School of Pharmacy, Hangzhou Medical College, HangzhouZhejiang310058, China
| | - Qingnan Zhang
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
| | - Qingyu Bian
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
| | - Binbin Zhou
- Department of Computer Science and Computing, Zhejiang University City College, HangzhouZhejiang310058, China
| | - Jinxin Che
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
- Hangzhou Institute of Innovative Medicine, Zhejiang University, HangzhouZhejiang310058, China
| | - Haibin Dai
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, HangzhouZhejiang310058, China
| | - Xiaowu Dong
- College of Pharmaceutical Sciences, Zhejiang University, HangzhouZhejiang310058, China
- Department of Pharmacy, Second Affiliated Hospital, Zhejiang University School of Medicine, HangzhouZhejiang310058, China
| |
Collapse
|
6
|
Mozaffari S, Moen A, Ng CY, Nicolaes GA, Wichapong K. Structural bioinformatics for rational drug design. Res Pract Thromb Haemost 2025; 9:102691. [PMID: 40027444 PMCID: PMC11869865 DOI: 10.1016/j.rpth.2025.102691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 12/13/2024] [Accepted: 12/18/2024] [Indexed: 03/05/2025] Open
Abstract
A State of the Art lecture titled "structural bioinformatics technologies for rational drug design: from in silico to in vivo" was presented at the International Society on Thrombosis and Haemostasis (ISTH) Congress in 2024. Drug discovery remains a resource-intensive and complex endeavor, which usually takes over a decade and costs billions to bring a new therapeutic agent to market. However, the landscape of drug discovery has been transformed by the recent advancements in bioinformatics and cheminformatics. Key techniques, including structure- and ligand-based virtual screening, molecular dynamics simulations, and artificial intelligence-driven models are allowing researchers to explore vast chemical spaces, investigate molecular interactions, predict binding affinity, and optimize drug candidates with unprecedented accuracy and efficiency. These computational methods complement experimental techniques by accelerating the identification of viable drug candidates and refining lead compounds. Artificial intelligence models, alongside traditional physics-based simulations, now play an important role in predicting key properties such as binding affinity and toxicity, contributing to more informed decision-making, particularly early in the drug discovery process. Despite these advancements, challenges remain in terms of accuracy, interpretability, and the needed computational power. This review explores the state of the art in computational drug discovery, examining the latest methods and technologies, their transformative impact on the drug development pipeline, and the future directions needed to overcome remaining limitations. Finally, we summarize relevant data and highlight cases where various computational approaches were successfully applied to develop novel inhibitors, as presented during the ISTH 2024 Congress.
Collapse
Affiliation(s)
- Soroush Mozaffari
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, the Netherlands
| | - Agnethe Moen
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, the Netherlands
| | - Che Yee Ng
- Hillmark B.V., Maastricht, the Netherlands
| | - Gerry A.F. Nicolaes
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, the Netherlands
- Hillmark B.V., Maastricht, the Netherlands
| | - Kanin Wichapong
- Department of Biochemistry, Cardiovascular Research Institute Maastricht (CARIM), Maastricht University, Maastricht, the Netherlands
- Hillmark B.V., Maastricht, the Netherlands
| |
Collapse
|
7
|
Sharo C, Zhang J, Zhai T, Bao J, Garcia-Epelboim A, Mamourian E, Shen L, Huang Z. Repurposing FDA-Approved Drugs Against Potential Drug Targets Involved in Brain Inflammation Contributing to Alzheimer's Disease. TARGETS (BASEL) 2024; 2:446-469. [PMID: 39897171 PMCID: PMC11786951 DOI: 10.3390/targets2040025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Alzheimer's disease is a neurodegenerative disease that continues to have a rising number of cases. While extensive research has been conducted in the last few decades, only a few drugs have been approved by the FDA for treatment, and even fewer aim to be curative rather than manage symptoms. There remains an urgent need for understanding disease pathogenesis, as well as identifying new targets for further drug discovery. Alzheimer's disease (AD) is known to stem from a build-up of amyloid beta (Aβ) plaques as well as tangles of tau proteins. Furthermore, inflammation in the brain is known to arise from the degeneration of tissue and the build-up of insoluble material. Therefore, there is a potential link between the pathology of AD and inflammation in the brain, especially as the disease progresses to later stages where neuronal death and degeneration levels are higher. Proteins that are relevant to both brain inflammation and AD thus make ideal potential targets for therapeutics; however, the proteins need to be evaluated to determine which targets would be ideal for potential drug therapeutic treatments, or 'druggable'. Druggability analysis was conducted using two structure-based methods (i.e., Drug-Like Density analysis and SiteMap), as well as a sequence-based approach, SPIDER. The most druggable targets were then evaluated using single-nuclei sequencing data for their clinical relevance to inflammation in AD. For each of the top five targets, small molecule docking was used to evaluate which FDA approved drugs were able to bind with the chosen proteins. The top targets included DRD2 (inhibits adenylyl cyclase activity), C9 (binds with C5B8 to form the membrane attack complex), C4b (binds with C2a to form C3 convertase), C5AR1 (GPCR that binds C5a), and GABA-A-R (GPCR involved in inhibiting neurotransmission). Each target had multiple potential inhibitors from the FDA-approved drug list with decent binding infinities. Among these inhibitors, two drugs were found as top inhibitors for more than one protein target. They are C15H14N2O2 and v316 (Paracetamol), used to treat pain/inflammation originally for cataracts and relieve headaches/fever, respectively. These results provide the groundwork for further experimental investigation or clinical trials.
Collapse
Affiliation(s)
- Catherine Sharo
- Department of Chemical and Biological Engineering, Villanova University, Villanova, PA 19085, USA
| | - Jiayu Zhang
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Tianhua Zhai
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingxuan Bao
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Andrés Garcia-Epelboim
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Elizabeth Mamourian
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Zuyi Huang
- Department of Chemical and Biological Engineering, Villanova University, Villanova, PA 19085, USA
| |
Collapse
|
8
|
Zhang J, Basu S, Zhang F, Kurgan L. MERIT: Accurate Prediction of Multi Ligand-binding Residues with Hybrid Deep Transformer Network, Evolutionary Couplings and Transfer Learning. J Mol Biol 2024:168872. [PMID: 40133785 DOI: 10.1016/j.jmb.2024.168872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/30/2024] [Accepted: 11/15/2024] [Indexed: 03/27/2025]
Abstract
Multi-ligand binding residues (MLBRs) are amino acids in protein sequences that interact with multiple different ligands that include proteins, peptides, nucleic acids, and a variety of small molecules. MLBRs are implicated in a number of cellular functions and targeted in a context of multiple human diseases. There are many sequence-based predictors of residues that interact with specific ligand types and they can be collectively used to identify MLBRs. However, there are no methods that directly predict MLBRs. To this end, we conceptualize, design, evaluate and release MERIT (Multi-binding rEsidues pRedIcTor). This tool relies on a custom-crafted deep neural network that implements a number of innovative features, such as a multi-layered/step architecture with transformer modules that we train using a custom-designed loss function, computation of evolutionary couplings, and application of transfer learning. These innovations boost predictive performance, which we demonstrate using an ablation analysis. In particular, they reduce the number of cross-predictions, defined as residues that interact with a single ligand type that are incorrectly predicted as MLBRs. We compare MERIT against a representative selection of current and popular ligand-specific predictors, meta-predictors that combine their results to identify MLBRs, and a baseline regression-based predictor. These tests reveal that MERIT provides accurate predictions and statistically outperforms these alternatives. Moreover, using two test datasets, one with MLBRs and another with only the single ligand binding residues, we show that MERIT consistently produces relatively low false positive rates, including low rates of cross-predictions. The web server and datasets from this study are freely available at http://biomine.cs.vcu.edu/servers/MERIT/.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China.
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Fuhao Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
| |
Collapse
|
9
|
Utgés JS, Barton GJ. Comparative evaluation of methods for the prediction of protein-ligand binding sites. J Cheminform 2024; 16:126. [PMID: 39529176 PMCID: PMC11552181 DOI: 10.1186/s13321-024-00923-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
The accurate identification of protein-ligand binding sites is of critical importance in understanding and modulating protein function. Accordingly, ligand binding site prediction has remained a research focus for over three decades with over 50 methods developed and a change of paradigm from geometry-based to machine learning. In this work, we collate 13 ligand binding site predictors, spanning 30 years, focusing on the latest machine learning-based methods such as VN-EGNN, IF-SitePred, GrASP, PUResNet, and DeepPocket and compare them to the established P2Rank, PRANK and fpocket and earlier methods like PocketFinder, Ligsite and Surfnet. We benchmark the methods against the human subset of our new curated reference dataset, LIGYSIS. LIGYSIS is a comprehensive protein-ligand complex dataset comprising 30,000 proteins with bound ligands which aggregates biologically relevant unique protein-ligand interfaces across biological units of multiple structures from the same protein. LIGYSIS is an improvement for testing methods over earlier datasets like sc-PDB, PDBbind, binding MOAD, COACH420 and HOLO4K which either include 1:1 protein-ligand complexes or consider asymmetric units. Re-scoring of fpocket predictions by PRANK and DeepPocket display the highest recall (60%) whilst IF-SitePred presents the lowest recall (39%). We demonstrate the detrimental effect that redundant prediction of binding sites has on performance as well as the beneficial impact of stronger pocket scoring schemes, with improvements up to 14% in recall (IF-SitePred) and 30% in precision (Surfnet). Finally, we propose top-N+2 recall as the universal benchmark metric for ligand binding site prediction and urge authors to share not only the source code of their methods, but also of their benchmark.Scientific contributionsThis study conducts the largest benchmark of ligand binding site prediction methods to date, comparing 13 original methods and 15 variants using 10 informative metrics. The LIGYSIS dataset is introduced, which aggregates biologically relevant protein-ligand interfaces across multiple structures of the same protein. The study highlights the detrimental effect of redundant binding site prediction and demonstrates significant improvement in recall and precision through stronger scoring schemes. Finally, top-N+2 recall is proposed as a universal benchmark metric for ligand binding site prediction, with a recommendation for open-source sharing of both methods and benchmarks.
Collapse
Affiliation(s)
- Javier S Utgés
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
10
|
Vural O, Jololian L, Pan L. DeepLigType: Predicting Ligand Types of ProteinLigand Binding Sites Using a Deep Learning Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; PP:116-123. [PMID: 39509302 DOI: 10.1109/tcbb.2024.3493820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2024]
Abstract
The analysis of protein-ligand binding sites plays a crucial role in the initial stages of drug discovery. Accurately predicting the ligand types that are likely to bind to protein-ligand binding sites enables more informed decision making in drug design. Our study, DeepLigType, determines protein-ligand binding sites using Fpocket and then predicts the ligand type of these pockets with the deep learning model, Convolutional Block Attention Module (CBAM) with ResNet. CBAM-ResNet has been trained to accurately predict five distinct ligand types. We classified protein-ligand binding sites into five different categories according to the type of response ligands cause when they bind to their target proteins, which are antagonist, agonist, activator, inhibitor, and others. We created a novel dataset, referred to as LigType5, from the widely recognized PDBbind and scPDB dataset for training and testing our model. While the literature mostly focuses on the specificity and characteristic analysis of protein binding sites by experimental (laboratory-based) methods, we propose a computational method with the DeepLigType architecture. DeepLigType demonstrated an accuracy of 74.30% and an AUC of 0.83 in ligand type prediction on a novel test dataset using the CBAM-ResNet deep learning model.
Collapse
|
11
|
Liu F, Zhou H, Li X, Zhou L, Yu C, Zhang H, Bu D, Liang X. GPCR-BSD: a database of binding sites of human G-protein coupled receptors under diverse states. BMC Bioinformatics 2024; 25:343. [PMID: 39497074 PMCID: PMC11533411 DOI: 10.1186/s12859-024-05962-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 10/16/2024] [Indexed: 11/06/2024] Open
Abstract
G-protein coupled receptors (GPCRs), the largest family of membrane proteins in human body, involve a great variety of biological processes and thus have become highly valuable drug targets. By binding with ligands (e.g., drugs), GPCRs switch between active and inactive conformational states, thereby performing functions such as signal transmission. The changes in binding pockets under different states are important for a better understanding of drug-target interactions. Therefore it is critical, as well as a practical need, to obtain binding sites in human GPCR structures. We report a database (called GPCR-BSD) that collects 127,990 predicted binding sites of 803 GPCRs under active and inactive states (thus 1,606 structures in total). The binding sites were identified from the predicted GPCR structures by executing three geometric-based pocket prediction methods, fpocket, CavityPlus and GHECOM. The server provides query, visualization, and comparison of the predicted binding sites for both GPCR predicted and experimentally determined structures recorded in PDB. We evaluated the identified pockets of 132 experimentally determined human GPCR structures in terms of pocket residue coverage, pocket center distance and redocking accuracy. The evaluation showed that fpocket and CavityPlus methods performed better and successfully predicted orthosteric binding sites in over 60% of the 132 experimentally determined structures. The GPCR Binding Site database is freely accessible at https://gpcrbs.bigdata.jcmsc.cn . This study not only provides a systematic evaluation of the commonly-used fpocket and CavityPlus methods for the first time but also meets the need for binding site information in GPCR studies.
Collapse
Affiliation(s)
- Fan Liu
- University of Chinese Academy of Sciences, Beijing, 101408, China
- Key Laboratory of Phytochemistry and Natural Medicines, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Han Zhou
- University of Chinese Academy of Sciences, Beijing, 101408, China
- Key Laboratory of Phytochemistry and Natural Medicines, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
- Jiangxi Provincial Key Laboratory for Pharmacodynamic Material Basis of Traditional Chinese Medicine, Ganjiang Chinese Medicine Innovation Center, Nanchang, 330000, Jiangxi, China
| | - Xiaonong Li
- Jiangxi Provincial Key Laboratory for Pharmacodynamic Material Basis of Traditional Chinese Medicine, Ganjiang Chinese Medicine Innovation Center, Nanchang, 330000, Jiangxi, China
| | - Liangliang Zhou
- Jiangxi Provincial Key Laboratory for Pharmacodynamic Material Basis of Traditional Chinese Medicine, Ganjiang Chinese Medicine Innovation Center, Nanchang, 330000, Jiangxi, China
| | - Chungong Yu
- University of Chinese Academy of Sciences, Beijing, 101408, China
- SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Haicang Zhang
- University of Chinese Academy of Sciences, Beijing, 101408, China
- SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- University of Chinese Academy of Sciences, Beijing, 101408, China.
- SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- Central China Institute of Artificial Intelligence, Zhengzhou, 450046, Henan, China.
| | - Xinmiao Liang
- University of Chinese Academy of Sciences, Beijing, 101408, China.
- Key Laboratory of Phytochemistry and Natural Medicines, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China.
- Jiangxi Provincial Key Laboratory for Pharmacodynamic Material Basis of Traditional Chinese Medicine, Ganjiang Chinese Medicine Innovation Center, Nanchang, 330000, Jiangxi, China.
| |
Collapse
|
12
|
Dong S, Fan C, Wang M, Patil S, Li J, Huang L, Chen Y, Guo H, Liu Y, Pan M, Ma L, Chen F. Development of a carbohydrate-binding protein prediction algorithm using structural features of stacking aromatic rings. Int J Biol Macromol 2024; 281:136553. [PMID: 39401628 DOI: 10.1016/j.ijbiomac.2024.136553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 10/03/2024] [Accepted: 10/11/2024] [Indexed: 10/20/2024]
Abstract
Carbohydrate-protein interactions play fundamental roles in numerous aspects of biological activities, and the search for new carbohydrate (CHO)-binding proteins (CBPs) has long been a research focus. In this study, through the analysis of CBP structures, we identified significant enrichment of aromatic residues in CHO-binding regions. We further summarized the structural features of these aromatic rings within the CHO-stacking region, namely "exposing" and "proximity" features, and developed a screening algorithm that can identify CHO-stacking Trp (tryptophan) residues based on these two features. Our Trp screening algorithm can achieve high accuracy in both CBP (specificity score 0.93) and CBS (Carbohydrate binding site, precision score 0.77) prediction using experimentally determined protein structures. We also applied our screening algorithm on AlphaGO pan-species predicted models and observed significant enrichment of carbohydrate-related functions in predicted CBP candidates across different species. Moreover, through carbohydrate arrays, we experimentally verified the CHO-binding ability of four candidate proteins, which further confirms the robustness of the algorithm. This study provides another perspective on proteome-wide CBP and CBS prediction. Our results not only help to reveal the structural mechanism of CHO-binding, but also provide a pan-species CBP dataset for future CHO-protein interaction exploration.
Collapse
Affiliation(s)
- Shaowei Dong
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China; Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Chuiqin Fan
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Manna Wang
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Sandip Patil
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Jun Li
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Liangping Huang
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yuanguo Chen
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Huijie Guo
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Yanbing Liu
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China
| | - Mengwen Pan
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| | - Lian Ma
- Department of Hematology and Oncology, Shenzhen Children's Hospital, Shenzhen 518038, China.
| | - Fuyi Chen
- Department of Obstetrics and Gynecology, Department of Pediatrics, Guangdong Provincial Key Laboratory of Major Obstetric Diseases, Guangdong Provincial Clinical Research Center for Obstetrics and Gynecology, Guangdong-Hong Kong-Macao Greater Bay Area Higher Education Joint Laboratory of Maternal-Fetal Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China.
| |
Collapse
|
13
|
Ugurlu SY, McDonald D, He S. MEF-AlloSite: an accurate and robust Multimodel Ensemble Feature selection for the Allosteric Site identification model. J Cheminform 2024; 16:116. [PMID: 39444016 PMCID: PMC11515501 DOI: 10.1186/s13321-024-00882-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 07/09/2024] [Indexed: 10/25/2024] Open
Abstract
A crucial mechanism for controlling the actions of proteins is allostery. Allosteric modulators have the potential to provide many benefits compared to orthosteric ligands, such as increased selectivity and saturability of their effect. The identification of new allosteric sites presents prospects for the creation of innovative medications and enhances our comprehension of fundamental biological mechanisms. Allosteric sites are increasingly found in different protein families through various techniques, such as machine learning applications, which opens up possibilities for creating completely novel medications with a diverse variety of chemical structures. Machine learning methods, such as PASSer, exhibit limited efficacy in accurately finding allosteric binding sites when relying solely on 3D structural information.Scientific ContributionPrior to conducting feature selection for allosteric binding site identification, integration of supporting amino-acid-based information to 3D structural knowledge is advantageous. This approach can enhance performance by ensuring accuracy and robustness. Therefore, we have developed an accurate and robust model called Multimodel Ensemble Feature Selection for Allosteric Site Identification (MEF-AlloSite) after collecting 9460 relevant and diverse features from the literature to characterise pockets. The model employs an accurate and robust multimodal feature selection technique for the small training set size of only 90 proteins to improve predictive performance. This state-of-the-art technique increased the performance in allosteric binding site identification by selecting promising features from 9460 features. Also, the relationship between selected features and allosteric binding sites enlightened the understanding of complex allostery for proteins by analysing selected features. MEF-AlloSite and state-of-the-art allosteric site identification methods such as PASSer2.0 and PASSerRank have been tested on three test cases 51 times with a different split of the training set. The Student's t test and Cohen's D value have been used to evaluate the average precision and ROC AUC score distribution. On three test cases, most of the p-values ( < 0.05 ) and the majority of Cohen's D values ( > 0.5 ) showed that MEF-AlloSite's 1-6% higher mean of average precision and ROC AUC than state-of-the-art allosteric site identification methods are statistically significant.
Collapse
Affiliation(s)
- Sadettin Y Ugurlu
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | | | - Shan He
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.
- AIA Insights Ltd, Birmingham, UK.
| |
Collapse
|
14
|
Wang X, Xu K, Fu H, Chen Q, Zhao B, Zhao X, Zhou J. Enhancing substrate specificity of microbial transglutaminase for precise nanobody labeling. Synth Syst Biotechnol 2024; 10:185-193. [PMID: 39552758 PMCID: PMC11564792 DOI: 10.1016/j.synbio.2024.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 10/11/2024] [Accepted: 10/14/2024] [Indexed: 11/19/2024] Open
Abstract
Streptomyces mobaraenesis transglutaminase (smTG) can be used for site-specific labeling of proteins with chemical groups. Here, we explored the use of modified smTG for the biosynthesis of nanobody-fluorophore conjugates (NFC). smTG catalyzes the conjugation of acyl donors containing glutamine with lysine-containing acceptors, which can lead to non-specific cross-linking. To achieve precise site-specific labeling, we employed molecular docking and virtual mutagenesis to redesign the enzyme's substrate specificity towards the peptide GGGGQR, a non-preferred acyl donor for smTG. Starting with a thermostable and highly active smTG variant (TGm2), we identified that single mutations G250H and Y278E significantly enhanced activity against GGGGQR, increasing it by 41 % and 1.13-fold, respectively. Notably, the Y278E mutation dramatically shifted the enzyme's substrate preference, with the activity ratio against GGGGQR versus the standard substrate CBZ-Gln-Gly rising from 0.05 to 0.93. In case studies, we used nanobodies 1C12 and 7D12 as labeling targets, catalyzing their conjugation with a synthetic fluorophore via smTG variants. Nanobodies fused with GGGGQR were successfully site-specifically labeled by TGm2-Y278E, in contrast to non-specific labeling observed with other variants. These results suggest that engineering smTG for site-specific labeling is a promising approach for the biosynthesis of antibody-drug conjugates.
Collapse
Affiliation(s)
- Xinglong Wang
- School of Food Science and Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Kangjie Xu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Haoran Fu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Qiming Chen
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Beichen Zhao
- Department of Chemical and Materials Engineering, The University of Auckland, Auckland, 1010, New Zealand
| | - Xinyi Zhao
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu, 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi, 214122, China
| |
Collapse
|
15
|
Lee D, Hwang W, Byun J, Shin B. Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation. BMC Bioinformatics 2024; 25:306. [PMID: 39304807 DOI: 10.1186/s12859-024-05923-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 09/05/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Locating small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many drug-discovery scenarios. Since it is not always easy to find such binding sites using conventional methods, different deep learning methods to predict binding sites out of protein structures have been developed in recent years. The existing deep learning based methods have several limitations, including (1) the inefficiency of the CNN-only architecture, (2) loss of information due to excessive post-processing, and (3) the under-utilization of available data sources. METHODS We present a new model architecture and training method that resolves the aforementioned problems. First, by layering geometric self-attention units on top of residue-level 3D CNN outputs, our model overcomes the problems of CNN-only architectures. Second, by configuring the fundamental units of computation as residues and pockets instead of voxels, our method reduced the information loss from post-processing. Lastly, by employing inter-resolution transfer learning and homology-based augmentation, our method maximizes the utilization of available data sources to a significant extent. RESULTS The proposed method significantly outperformed all state-of-the-art baselines regarding both resolutions-pocket and residue. An ablation study demonstrated the indispensability of our proposed architecture, as well as transfer learning and homology-based augmentation, for achieving optimal performance. We further scrutinized our model's performance through a case study involving human serum albumin, which demonstrated our model's superior capability in identifying multiple binding sites of the protein, outperforming the existing methods. CONCLUSIONS We believe that our contribution to the literature is twofold. Firstly, we introduce a novel computational method for binding site prediction with practical applications, substantiated by its strong performance across diverse benchmarks and case studies. Secondly, the innovative aspects in our method- specifically, the design of the model architecture, inter-resolution transfer learning, and homology-based augmentation-would serve as useful components for future work.
Collapse
Affiliation(s)
| | | | | | - Bonggun Shin
- Deargen, Seoul, Republic of Korea.
- SK Life Science, Inc., Paramus, NJ, USA.
| |
Collapse
|
16
|
Mai TT, Lam TP, Pham LHD, Nguyen KH, Nguyen QT, Le MT, Thai KM. Toward Unveiling Putative Binding Sites of Interleukin-33: Insights from Mixed-Solvent Molecular Dynamics Simulations of the Interleukin-1 Family. J Phys Chem B 2024; 128:8362-8375. [PMID: 39178050 DOI: 10.1021/acs.jpcb.4c03057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The interleukin (IL)-1 family is a major proinflammatory cytokine family, ranging from the well-studied IL-1s to the most recently discovered IL-33. As a new focus, IL-33 has attracted extensive research for its crucial immunoregulatory roles, leading to the development of notable monoclonal antibodies as clinical candidates. Efforts to develop small molecules disrupting IL-33/ST2 interaction remain highly desired but encounter challenges due to the shallow and featureless interfaces. The information from relative cytokines has shown that traditional binding site identification methods still struggle in mapping cryptic sites, necessitating dynamic approaches to uncover druggable pockets on IL-33. Here, we employed mixed-solvent molecular dynamics (MixMD) simulations with diverse-property probes to map the hotspots of IL-33 and identify potential binding sites. The protocol was first validated using the known binding sites of two IL-1 family members and then applied to the structure of IL-33. Our simulations revealed several binding sites and proposed side-chain rearrangements essential for the binding of a known inhibitor, aligning well with experimental NMR findings. Further microsecond-time scale simulations of this IL-33-protein complex unveiled distinct binding modes with varying occurrences. These results could facilitate future efforts in developing ligands to target challenging flexible pockets of IL-33 and IL-1 family cytokines in general.
Collapse
Affiliation(s)
- Tan Thanh Mai
- Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
| | - Thua-Phong Lam
- Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
- Department of Cell and Molecular Biology, Uppsala University, Uppsala 75124, Sweden
| | - Long-Hung Dinh Pham
- Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
- Department of Chemistry, Imperial College London, London W12 0BZ, United Kingdom
| | - Kim-Hung Nguyen
- Department of Biochemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
| | - Quoc-Thai Nguyen
- Department of Biochemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
| | - Minh-Tri Le
- Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
- University of Health Sciences, Vietnam National University Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
- Research Center for Discovery and Development of Healthcare Products, Vietnam National University Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
| | - Khac-Minh Thai
- Department of Medicinal Chemistry, Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City 700000, Vietnam
| |
Collapse
|
17
|
Moyano-Gómez P, Lehtonen JV, Pentikäinen OT, Postila PA. Building shape-focused pharmacophore models for effective docking screening. J Cheminform 2024; 16:97. [PMID: 39123240 PMCID: PMC11312248 DOI: 10.1186/s13321-024-00857-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 05/12/2024] [Indexed: 08/12/2024] Open
Abstract
The performance of molecular docking can be improved by comparing the shape similarity of the flexibly sampled poses against the target proteins' inverted binding cavities. The effectiveness of these pseudo-ligands or negative image-based models in docking rescoring is boosted further by performing enrichment-driven optimization. Here, we introduce a novel shape-focused pharmacophore modeling algorithm O-LAP that generates a new class of cavity-filling models by clumping together overlapping atomic content via pairwise distance graph clustering. Top-ranked poses of flexibly docked active ligands were used as the modeling input and multiple alternative clustering settings were benchmark-tested thoroughly with five demanding drug targets using random training/test divisions. In docking rescoring, the O-LAP modeling typically improved massively on the default docking enrichment; furthermore, the results indicate that the clustered models work well in rigid docking. The C+ +/Qt5-based algorithm O-LAP is released under the GNU General Public License v3.0 via GitHub ( https://github.com/jvlehtonen/overlap-toolkit ). SCIENTIFIC CONTRIBUTION: This study introduces O-LAP, a C++/Qt5-based graph clustering software for generating new type of shape-focused pharmacophore models. In the O-LAP modeling, the target protein cavity is filled with flexibly docked active ligands, the overlapping ligand atoms are clustered, and the shape/electrostatic potential of the resulting model is compared against the flexibly sampled molecular docking poses. The O-LAP modeling is shown to ensure high enrichment in both docking rescoring and rigid docking based on comprehensive benchmark-testing.
Collapse
Affiliation(s)
- Paola Moyano-Gómez
- MedChem.fi, Institute of Biomedicine, Integrative Physiology and Pharmacology, University of Turku, 20014, Turku, Finland
- InFLAMES Research Flagship, University of Turku, 20014, Turku, Finland
| | - Jukka V Lehtonen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, 20500, Turku, Finland
- InFLAMES Research Flagship, Åbo Akademi University, 20500, Turku, Finland
| | - Olli T Pentikäinen
- MedChem.fi, Institute of Biomedicine, Integrative Physiology and Pharmacology, University of Turku, 20014, Turku, Finland
- InFLAMES Research Flagship, University of Turku, 20014, Turku, Finland
- Aurlide Ltd, Lemminkäisenkatu 14A, 20520, Turku, Finland
| | - Pekka A Postila
- MedChem.fi, Institute of Biomedicine, Integrative Physiology and Pharmacology, University of Turku, 20014, Turku, Finland.
- InFLAMES Research Flagship, University of Turku, 20014, Turku, Finland.
- Aurlide Ltd, Lemminkäisenkatu 14A, 20520, Turku, Finland.
| |
Collapse
|
18
|
Ishitani R, Takemoto M, Tomii K. Protein ligand binding site prediction using graph transformer neural network. PLoS One 2024; 19:e0308425. [PMID: 39106255 DOI: 10.1371/journal.pone.0308425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 07/23/2024] [Indexed: 08/09/2024] Open
Abstract
Ligand binding site prediction is a crucial initial step in structure-based drug discovery. Although several methods have been proposed previously, including those using geometry based and machine learning techniques, their accuracy is considered to be still insufficient. In this study, we introduce an approach that leverages a graph transformer neural network to rank the results of a geometry-based pocket detection method. We also created a larger training dataset compared to the conventionally used sc-PDB and investigated the correlation between the dataset size and prediction performance. Our findings indicate that utilizing a graph transformer-based method alongside a larger training dataset could enhance the performance of ligand binding site prediction.
Collapse
Affiliation(s)
- Ryuichiro Ishitani
- Division of Computational Drug Discovery and Design, Medical Research Institute, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
- Preferred Networks, Inc., Chiyoda-ku, Tokyo, Japan
| | - Mizuki Takemoto
- Division of Computational Drug Discovery and Design, Medical Research Institute, Tokyo Medical and Dental University, Bunkyo-ku, Tokyo, Japan
| | - Kentaro Tomii
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan
| |
Collapse
|
19
|
Wang X, Xu K, Zeng X, Linghu K, Zhao B, Yu S, Wang K, Yu S, Zhao X, Zeng W, Wang K, Zhou J. Machine learning-assisted substrate binding pocket engineering based on structural information. Brief Bioinform 2024; 25:bbae381. [PMID: 39101501 PMCID: PMC11299021 DOI: 10.1093/bib/bbae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 05/25/2024] [Accepted: 07/23/2024] [Indexed: 08/06/2024] Open
Abstract
Engineering enzyme-substrate binding pockets is the most efficient approach for modifying catalytic activity, but is limited if the substrate binding sites are indistinct. Here, we developed a 3D convolutional neural network for predicting protein-ligand binding sites. The network was integrated by DenseNet, UNet, and self-attention for extracting features and recovering sample size. We attempted to enlarge the dataset by data augmentation, and the model achieved success rates of 48.4%, 35.5%, and 43.6% at a precision of ≥50% and 52%, 47.6%, and 58.1%. The distance of predicted and real center is ≤4 Å, which is based on SC6K, COACH420, and BU48 validation datasets. The substrate binding sites of Klebsiella variicola acid phosphatase (KvAP) and Bacillus anthracis proline 4-hydroxylase (BaP4H) were predicted using DUnet, showing high competitive performance of 53.8% and 56% of the predicted binding sites that critically affected the catalysis of KvAP and BaP4H. Virtual saturation mutagenesis was applied based on the predicted binding sites of KvAP, and the top-ranked 10 single mutations contributed to stronger enzyme-substrate binding varied while the predicted sites were different. The advantage of DUnet for predicting key residues responsible for enzyme activity further promoted the success rate of virtual mutagenesis. This study highlighted the significance of correctly predicting key binding sites for enzyme engineering.
Collapse
Affiliation(s)
- Xinglong Wang
- School of Food Science and Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kangjie Xu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xuan Zeng
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kai Linghu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Beichen Zhao
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Shangyang Yu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kun Wang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Shuyao Yu
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xinyi Zhao
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Weizhu Zeng
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Kai Wang
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
20
|
Shen A, Yuan M, Ma Y, Du J, Wang M. PGBind: pocket-guided explicit attention learning for protein-ligand docking. Brief Bioinform 2024; 25:bbae455. [PMID: 39293803 PMCID: PMC11410380 DOI: 10.1093/bib/bbae455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 08/07/2024] [Accepted: 08/31/2024] [Indexed: 09/20/2024] Open
Abstract
As more and more protein structures are discovered, blind protein-ligand docking will play an important role in drug discovery because it can predict protein-ligand complex conformation without pocket information on the target proteins. Recently, deep learning-based methods have made significant advancements in blind protein-ligand docking, but their protein features are suboptimal because they do not fully consider the difference between potential pocket regions and non-pocket regions in protein feature extraction. In this work, we propose a pocket-guided strategy for guiding the ligand to dock to potential docking regions on a protein. To this end, we design a plug-and-play module to enhance the protein features, which can be directly incorporated into existing deep learning-based blind docking methods. The proposed module first estimates potential pocket regions on the target protein and then leverages a pocket-guided attention mechanism to enhance the protein features. Experiments are conducted on integrating our method with EquiBind and FABind, and the results show that their blind-docking performances are both significantly improved and new start-of-the-art performance is achieved by integration with FABind.
Collapse
Affiliation(s)
- Ao Shen
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, Shanghai 200032, China
| | - Mingzhi Yuan
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, Shanghai 200032, China
| | - Yingfan Ma
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, Shanghai 200032, China
| | - Jie Du
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, Shanghai 200032, China
| | - Manning Wang
- Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong’an Road, Shanghai 200032, China
- Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong’an Road, Shanghai 200032, China
| |
Collapse
|
21
|
Zhou R, Fan J, Li S, Zeng W, Chen Y, Zheng X, Chen H, Liao J. LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification. J Cheminform 2024; 16:79. [PMID: 38972994 PMCID: PMC11229186 DOI: 10.1186/s13321-024-00871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/12/2024] [Indexed: 07/09/2024] Open
Abstract
BACKGROUND Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes. RESULTS We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance. SCIENTIFIC CONTRIBUTION We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.
Collapse
Affiliation(s)
- Ruifeng Zhou
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Jing Fan
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Sishu Li
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Wenjie Zeng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Yilun Chen
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Xiaoshan Zheng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| | - Jun Liao
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China.
- Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| |
Collapse
|
22
|
He X, Zhao L, Tian Y, Li R, Chu Q, Gu Z, Zheng M, Wang Y, Li S, Jiang H, Jiang Y, Wen L, Wang D, Cheng X. Highly accurate carbohydrate-binding site prediction with DeepGlycanSite. Nat Commun 2024; 15:5163. [PMID: 38886381 PMCID: PMC11183243 DOI: 10.1038/s41467-024-49516-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 06/10/2024] [Indexed: 06/20/2024] Open
Abstract
As the most abundant organic substances in nature, carbohydrates are essential for life. Understanding how carbohydrates regulate proteins in the physiological and pathological processes presents opportunities to address crucial biological problems and develop new therapeutics. However, the diversity and complexity of carbohydrates pose a challenge in experimentally identifying the sites where carbohydrates bind to and act on proteins. Here, we introduce a deep learning model, DeepGlycanSite, capable of accurately predicting carbohydrate-binding sites on a given protein structure. Incorporating geometric and evolutionary features of proteins into a deep equivariant graph neural network with the transformer architecture, DeepGlycanSite remarkably outperforms previous state-of-the-art methods and effectively predicts binding sites for diverse carbohydrates. Integrating with a mutagenesis study, DeepGlycanSite reveals the guanosine-5'-diphosphate-sugar-recognition site of an important G-protein coupled receptor. These findings demonstrate DeepGlycanSite is invaluable for carbohydrate-binding site prediction and could provide insights into molecular mechanisms underlying carbohydrate-regulation of therapeutically important proteins.
Collapse
Affiliation(s)
- Xinheng He
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Lifen Zhao
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Yinping Tian
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
| | - Rui Li
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Zhiyong Gu
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Mingyue Zheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
| | - Yusong Wang
- National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University, Xi'an, China
| | - Shaoning Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Hualiang Jiang
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China
- Lingang Laboratory, Shanghai, China
| | - Yi Jiang
- Lingang Laboratory, Shanghai, China
| | - Liuqing Wen
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
| | | | - Xi Cheng
- State Key Laboratory of Drug Research and State Key Laboratory of Chemical Biology, Carbohydrate-Based Drug Research Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China.
- University of Chinese Academy of Sciences, Beijing, China.
- School of Pharmaceutical Science and Technology, Hangzhou Institute of Advanced Study, Hangzhou, China.
| |
Collapse
|
23
|
Xia Y, Pan X, Shen HB. A comprehensive survey on protein-ligand binding site prediction. Curr Opin Struct Biol 2024; 86:102793. [PMID: 38447285 DOI: 10.1016/j.sbi.2024.102793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
24
|
Takahashi M, Chong HB, Zhang S, Yang TY, Lazarov MJ, Harry S, Maynard M, Hilbert B, White RD, Murrey HE, Tsou CC, Vordermark K, Assaad J, Gohar M, Dürr BR, Richter M, Patel H, Kryukov G, Brooijmans N, Alghali ASO, Rubio K, Villanueva A, Zhang J, Ge M, Makram F, Griesshaber H, Harrison D, Koglin AS, Ojeda S, Karakyriakou B, Healy A, Popoola G, Rachmin I, Khandelwal N, Neil JR, Tien PC, Chen N, Hosp T, van den Ouweland S, Hara T, Bussema L, Dong R, Shi L, Rasmussen MQ, Domingues AC, Lawless A, Fang J, Yoda S, Nguyen LP, Reeves SM, Wakefield FN, Acker A, Clark SE, Dubash T, Kastanos J, Oh E, Fisher DE, Maheswaran S, Haber DA, Boland GM, Sade-Feldman M, Jenkins RW, Hata AN, Bardeesy NM, Suvà ML, Martin BR, Liau BB, Ott CJ, Rivera MN, Lawrence MS, Bar-Peled L. DrugMap: A quantitative pan-cancer analysis of cysteine ligandability. Cell 2024; 187:2536-2556.e30. [PMID: 38653237 PMCID: PMC11143475 DOI: 10.1016/j.cell.2024.03.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/15/2024] [Accepted: 03/19/2024] [Indexed: 04/25/2024]
Abstract
Cysteine-focused chemical proteomic platforms have accelerated the clinical development of covalent inhibitors for a wide range of targets in cancer. However, how different oncogenic contexts influence cysteine targeting remains unknown. To address this question, we have developed "DrugMap," an atlas of cysteine ligandability compiled across 416 cancer cell lines. We unexpectedly find that cysteine ligandability varies across cancer cell lines, and we attribute this to differences in cellular redox states, protein conformational changes, and genetic mutations. Leveraging these findings, we identify actionable cysteines in NF-κB1 and SOX10 and develop corresponding covalent ligands that block the activity of these transcription factors. We demonstrate that the NF-κB1 probe blocks DNA binding, whereas the SOX10 ligand increases SOX10-SOX10 interactions and disrupts melanoma transcriptional signaling. Our findings reveal heterogeneity in cysteine ligandability across cancers, pinpoint cell-intrinsic features driving cysteine targeting, and illustrate the use of covalent probes to disrupt oncogenic transcription-factor activity.
Collapse
Affiliation(s)
- Mariko Takahashi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA.
| | - Harrison B Chong
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Siwen Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Tzu-Yi Yang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Matthew J Lazarov
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Stefan Harry
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | | | | - Kira Vordermark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Jonathan Assaad
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Magdy Gohar
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Benedikt R Dürr
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Marianne Richter
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Himani Patel
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | | | | | - Karla Rubio
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Antonio Villanueva
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Junbing Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Maolin Ge
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farah Makram
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Hanna Griesshaber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Drew Harrison
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ann-Sophie Koglin
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Samuel Ojeda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Barbara Karakyriakou
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Alexander Healy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - George Popoola
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Inbal Rachmin
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Neha Khandelwal
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | - Pei-Chieh Tien
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Nicholas Chen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Tobias Hosp
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sanne van den Ouweland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Toshiro Hara
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lillian Bussema
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Rui Dong
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lei Shi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Martin Q Rasmussen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ana Carolina Domingues
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Aleigha Lawless
- Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jacy Fang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Satoshi Yoda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Linh Phuong Nguyen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Marie Reeves
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farrah Nicole Wakefield
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Adam Acker
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Elizabeth Clark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Taronish Dubash
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - John Kastanos
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Eugene Oh
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - David E Fisher
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Shyamala Maheswaran
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Daniel A Haber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Genevieve M Boland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Surgery, Harvard Medical School, Boston, MA 02114, USA
| | - Moshe Sade-Feldman
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Russell W Jenkins
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Aaron N Hata
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Nabeel M Bardeesy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Mario L Suvà
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | | | - Brian B Liau
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher J Ott
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Miguel N Rivera
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Michael S Lawrence
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA.
| | - Liron Bar-Peled
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
25
|
Yang K, Xie Z, Li Z, Qian X, Sun N, He T, Xu Z, Jiang J, Mei Q, Wang J, Qu S, Xu X, Chen C, Ju B. MolProphet: A One-Stop, General Purpose, and AI-Based Platform for the Early Stages of Drug Discovery. J Chem Inf Model 2024; 64:2941-2947. [PMID: 38563534 PMCID: PMC11040716 DOI: 10.1021/acs.jcim.3c01979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 03/20/2024] [Accepted: 03/21/2024] [Indexed: 04/04/2024]
Abstract
Artificial intelligence (AI) is an effective tool to accelerate drug discovery and cut costs in discovery processes. Many successful AI applications are reported in the early stages of small molecule drug discovery. However, most of those applications require a deep understanding of software and hardware, and focus on a single field that implies data normalization and transfer between those applications is still a challenge for normal users. It usually limits the application of AI in drug discovery. Here, based on a series of robust models, we formed a one-stop, general purpose, and AI-based drug discovery platform, MolProphet, to provide complete functionalities in the early stages of small molecule drug discovery, including AI-based target pocket prediction, hit discovery and lead optimization, and compound targeting, as well as abundant analyzing tools to check the results. MolProphet is an accessible and user-friendly web-based platform that is fully designed according to the practices in the drug discovery industry. The molecule screened, generated, or optimized by the MolProphet is purchasable and synthesizable at low cost but with good drug-likeness. More than 400 users from industry and academia have used MolProphet in their work. We hope this platform can provide a powerful solution to assist each normal researcher in drug design and related research areas. It is available for everyone at https://www.molprophet.com/.
Collapse
Affiliation(s)
- Keda Yang
- Key
Laboratory of Artificial Organs and Computational Medicine in Zhejiang
Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou 310015, P. R. China
| | - Zewen Xie
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Zhen Li
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Xiaoliang Qian
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Nannan Sun
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Tao He
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Zuodong Xu
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Jing Jiang
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Qi Mei
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Jie Wang
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Shugang Qu
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| | - Xiaoling Xu
- Key
Laboratory of Artificial Organs and Computational Medicine in Zhejiang
Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou 310015, P. R. China
| | - Chaoxiang Chen
- Key
Laboratory of Artificial Organs and Computational Medicine in Zhejiang
Province, Shulan International Medical College, Zhejiang Shuren University, Hangzhou 310015, P. R. China
| | - Bin Ju
- Hangzhou
SanOmics Information Technology Co., Ltd., Hangzhou 310015, P. R. China
| |
Collapse
|
26
|
Smith Z, Strobel M, Vani BP, Tiwary P. Graph Attention Site Prediction (GrASP): Identifying Druggable Binding Sites Using Graph Neural Networks with Attention. J Chem Inf Model 2024; 64:2637-2644. [PMID: 38453912 PMCID: PMC11182664 DOI: 10.1021/acs.jcim.3c01698] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
Identifying and discovering druggable protein binding sites is an important early step in computer-aided drug discovery, but it remains a difficult task where most campaigns rely on a priori knowledge of binding sites from experiments. Here, we present a binding site prediction method called Graph Attention Site Prediction (GrASP) and re-evaluate assumptions in nearly every step in the site prediction workflow from data set preparation to model evaluation. GrASP is able to achieve state-of-the-art performance at recovering binding sites in PDB structures while maintaining a high degree of precision which will minimize wasted computation in downstream tasks such as docking and free energy perturbation.
Collapse
Affiliation(s)
- Zachary Smith
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Biophysics Program, University of Maryland, College Park 20742, USA
| | - Michael Strobel
- Department of Computer Science, University of Maryland, College Park 20742, USA
| | - Bodhi P. Vani
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
| | - Pratyush Tiwary
- Institute for Physical Science and Technology, University of Maryland, College Park 20742, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park 20742, USA
| |
Collapse
|
27
|
Kumar N, Srivastava R. Deep learning in structural bioinformatics: current applications and future perspectives. Brief Bioinform 2024; 25:bbae042. [PMID: 38701422 PMCID: PMC11066934 DOI: 10.1093/bib/bbae042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 01/05/2024] [Accepted: 01/18/2024] [Indexed: 05/05/2024] Open
Abstract
In this review article, we explore the transformative impact of deep learning (DL) on structural bioinformatics, emphasizing its pivotal role in a scientific revolution driven by extensive data, accessible toolkits and robust computing resources. As big data continue to advance, DL is poised to become an integral component in healthcare and biology, revolutionizing analytical processes. Our comprehensive review provides detailed insights into DL, featuring specific demonstrations of its notable applications in bioinformatics. We address challenges tailored for DL, spotlight recent successes in structural bioinformatics and present a clear exposition of DL-from basic shallow neural networks to advanced models such as convolution, recurrent, artificial and transformer neural networks. This paper discusses the emerging use of DL for understanding biomolecular structures, anticipating ongoing developments and applications in the realm of structural bioinformatics.
Collapse
Affiliation(s)
- Niranjan Kumar
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Rakesh Srivastava
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
28
|
Carbery A, Buttenschoen M, Skyner R, von Delft F, Deane CM. Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures. J Cheminform 2024; 16:32. [PMID: 38486231 PMCID: PMC10941399 DOI: 10.1186/s13321-024-00821-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/01/2024] [Indexed: 03/17/2024] Open
Abstract
Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.
Collapse
Affiliation(s)
- Anna Carbery
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK
| | - Martin Buttenschoen
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK
| | - Rachael Skyner
- OMass Therapeutics, Building 4000, Chancellor Court, John Smith Drive, ARC Oxford, OX4 2GX, UK
| | - Frank von Delft
- Diamond Light Source, Harwell Science and Innovation Campus, Didcot, OX11 0DE, UK
- Centre for Medicines Discovery, University of Oxford, Oxford, OX3 7DQ, UK
- Research Complex at Harwell, Harwell Science and Innovation Campus, Didcot, OX11 0FA, United Kingdom
- Department of Biochemistry, University of Johannesburg, Johannesburg, 2006, South Africa
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, OX1 3LB, UK.
| |
Collapse
|
29
|
Korlepara DB, C S V, Srivastava R, Pal PK, Raza SH, Kumar V, Pandit S, Nair AG, Pandey S, Sharma S, Jeurkar S, Thakran K, Jaglan R, Verma S, Ramachandran I, Chatterjee P, Nayar D, Priyakumar UD. PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications. Sci Data 2024; 11:180. [PMID: 38336857 PMCID: PMC10858175 DOI: 10.1038/s41597-023-02872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/21/2023] [Indexed: 02/12/2024] Open
Abstract
Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski's rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery.
Collapse
Affiliation(s)
- Divya B Korlepara
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Divison of Physics, School of Advanced Sciences, Vellore Institute of Technology, Chennai, 600127, India
| | - Vasavi C S
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Department of Artificial Intelligence, School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Bengaluru, 560035, India
| | - Rakesh Srivastava
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Saalim H Raza
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivam Pandit
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Aathira G Nair
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Sanjana Pandey
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Kavita Thakran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Reena Jaglan
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivangi Verma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Indhu Ramachandran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Prathit Chatterjee
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India.
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
30
|
Popov P, Kalinin R, Buslaev P, Kozlovskii I, Zaretckii M, Karlov D, Gabibov A, Stepanov A. Unraveling viral drug targets: a deep learning-based approach for the identification of potential binding sites. Brief Bioinform 2023; 25:bbad459. [PMID: 38113077 PMCID: PMC10783863 DOI: 10.1093/bib/bbad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/10/2023] [Accepted: 11/22/2023] [Indexed: 12/21/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has spurred a wide range of approaches to control and combat the disease. However, selecting an effective antiviral drug target remains a time-consuming challenge. Computational methods offer a promising solution by efficiently reducing the number of candidates. In this study, we propose a structure- and deep learning-based approach that identifies vulnerable regions in viral proteins corresponding to drug binding sites. Our approach takes into account the protein dynamics, accessibility and mutability of the binding site and the putative mechanism of action of the drug. We applied this technique to validate drug targeting toward severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein S. Our findings reveal a conformation- and oligomer-specific glycan-free binding site proximal to the receptor binding domain. This site comprises topologically important amino acid residues. Molecular dynamics simulations of Spike in complex with candidate drug molecules bound to the potential binding sites indicate an equilibrium shifted toward the inactive conformation compared with drug-free simulations. Small molecules targeting this binding site have the potential to prevent the closed-to-open conformational transition of Spike, thereby allosterically inhibiting its interaction with human angiotensin-converting enzyme 2 receptor. Using a pseudotyped virus-based assay with a SARS-CoV-2 neutralizing antibody, we identified a set of hit compounds that exhibited inhibition at micromolar concentrations.
Collapse
Affiliation(s)
- Petr Popov
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Roman Kalinin
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Pavel Buslaev
- Nanoscience Center and Department of Chemistry, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Igor Kozlovskii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Mark Zaretckii
- Tetra-d, Rheinweg 9, Schaffhausen, 8200, Switzerland
- School of Science, Constructor University Bremen gGmbH, 28759, Bremen, Germany
| | - Dmitry Karlov
- School of Pharmacy, Medical Biology Centre, Queen’s University Belfast, Street, Belfast, BT9 7BL Northern Ireland, U.K
| | - Alexander Gabibov
- M.M. Shemyakin and Yu.A. Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow, 117997, Russia
| | - Alexey Stepanov
- Department of Chemistry, The Scripps Research Institute, 10550 North Torrey Pines Road MB-10, La Jolla, 92037, CA, USA
| |
Collapse
|
31
|
Xia S, Chen E, Zhang Y. Integrated Molecular Modeling and Machine Learning for Drug Design. J Chem Theory Comput 2023; 19:7478-7495. [PMID: 37883810 PMCID: PMC10653122 DOI: 10.1021/acs.jctc.3c00814] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/10/2023] [Accepted: 10/11/2023] [Indexed: 10/28/2023]
Abstract
Modern therapeutic development often involves several stages that are interconnected, and multiple iterations are usually required to bring a new drug to the market. Computational approaches have increasingly become an indispensable part of helping reduce the time and cost of the research and development of new drugs. In this Perspective, we summarize our recent efforts on integrating molecular modeling and machine learning to develop computational tools for modulator design, including a pocket-guided rational design approach based on AlphaSpace to target protein-protein interactions, delta machine learning scoring functions for protein-ligand docking as well as virtual screening, and state-of-the-art deep learning models to predict calculated and experimental molecular properties based on molecular mechanics optimized geometries. Meanwhile, we discuss remaining challenges and promising directions for further development and use a retrospective example of FDA approved kinase inhibitor Erlotinib to demonstrate the use of these newly developed computational tools.
Collapse
Affiliation(s)
- Song Xia
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Eric Chen
- Department
of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department
of Chemistry, New York University, New York, New York 10003, United States
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
32
|
Takahashi M, Chong HB, Zhang S, Lazarov MJ, Harry S, Maynard M, White R, Murrey HE, Hilbert B, Neil JR, Gohar M, Ge M, Zhang J, Durr BR, Kryukov G, Tsou CC, Brooijmans N, Alghali ASO, Rubio K, Vilanueva A, Harrison D, Koglin AS, Ojeda S, Karakyriakou B, Healy A, Assaad J, Makram F, Rachman I, Khandelwal N, Tien PC, Popoola G, Chen N, Vordermark K, Richter M, Patel H, Yang TY, Griesshaber H, Hosp T, van den Ouweland S, Hara T, Bussema L, Dong R, Shi L, Rasmussen MQ, Domingues AC, Lawless A, Fang J, Yoda S, Nguyen LP, Reeves SM, Wakefield FN, Acker A, Clark SE, Dubash T, Fisher DE, Maheswaran S, Haber DA, Boland G, Sade-Feldman M, Jenkins R, Hata A, Bardeesy N, Suva ML, Martin B, Liau B, Ott C, Rivera MN, Lawrence MS, Bar-Peled L. DrugMap: A quantitative pan-cancer analysis of cysteine ligandability. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.20.563287. [PMID: 37961514 PMCID: PMC10634688 DOI: 10.1101/2023.10.20.563287] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Cysteine-focused chemical proteomic platforms have accelerated the clinical development of covalent inhibitors of a wide-range of targets in cancer. However, how different oncogenic contexts influence cysteine targeting remains unknown. To address this question, we have developed DrugMap , an atlas of cysteine ligandability compiled across 416 cancer cell lines. We unexpectedly find that cysteine ligandability varies across cancer cell lines, and we attribute this to differences in cellular redox states, protein conformational changes, and genetic mutations. Leveraging these findings, we identify actionable cysteines in NFκB1 and SOX10 and develop corresponding covalent ligands that block the activity of these transcription factors. We demonstrate that the NFκB1 probe blocks DNA binding, whereas the SOX10 ligand increases SOX10-SOX10 interactions and disrupts melanoma transcriptional signaling. Our findings reveal heterogeneity in cysteine ligandability across cancers, pinpoint cell-intrinsic features driving cysteine targeting, and illustrate the use of covalent probes to disrupt oncogenic transcription factor activity.
Collapse
|
33
|
Liu Y, Li P, Tu S, Xu L. RefinePocket: An Attention-Enhanced and Mask-Guided Deep Learning Approach for Protein Binding Site Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3314-3321. [PMID: 37040253 DOI: 10.1109/tcbb.2023.3265640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Protein binding site prediction is an important prerequisite task of drug discovery and design. While binding sites are very small, irregular and varied in shape, making the prediction very challenging. Standard 3D U-Net has been adopted to predict binding sites but got stuck with unsatisfactory prediction results, incomplete, out-of-bounds, or even failed. The reason is that this scheme is less capable of extracting the chemical interactions of the entire region and hardly takes into account the difficulty of segmenting complex shapes. In this paper, we propose a refined U-Net architecture, called RefinePocket, consisting of an attention-enhanced encoder and a mask-guided decoder. During encoding, taking binding site proposal as input, we employ Dual Attention Block (DAB) hierarchically to capture rich global information, exploring residue relationship and chemical correlations in spatial and channel dimensions respectively. Then, based on the enhanced representation extracted by the encoder, we devise Refine Block (RB) in the decoder to enable self-guided refinement of uncertain regions gradually, resulting in more precise segmentation. Experiments show that DAB and RB complement and promote each other, making RefinePocket has an average improvement of 10.02% on DCC and 4.26% on DVO compared with the state-of-the-art method on four test sets.
Collapse
|
34
|
Lei C, Lu Z, Wang M, Li M. StackCPA: A stacking model for compound-protein binding affinity prediction based on pocket multi-scale features. Comput Biol Med 2023; 164:107131. [PMID: 37494820 DOI: 10.1016/j.compbiomed.2023.107131] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Revised: 05/10/2023] [Accepted: 06/01/2023] [Indexed: 07/28/2023]
Abstract
Accurately predicting compound-protein binding affinity is a crucial task in drug discovery. Computational models offer the advantages of short time, low cost and safety compared to traditional drug development. Pocket is the key binding region of the protein, which provides invaluable information for drug repositioning and drug design. In this study, we propose an ensemble learning model, called StackCPA, to predict the compound-protein binding affinity. The model integrates multi-scale features of protein pocket and compound through a transfer learning strategy. The protein pocket is described in a fine-grained way by atomic level, residue level and subdomain level. The proposed model StackCPA is evaluated on three binding affinity benchmark datasets. The experiment results show that StackCPA achieves the best performance on all the three datasets in comparison with other state-of-the-art deep learning models. The ablation study shows that the protein pocket can provide sufficient information for affinity prediction and its multi-scale features enable the model to further improve the prediction performance. In addition, the case study for epidermal growth factor receptor erbB1 (EGFR) indicates that StackCPA could serve as an effective tool for drug repurposing. Source codes and data of StackCPA are available at https://github.com/CSUBioGroup/StackCPA.
Collapse
Affiliation(s)
- Chuqi Lei
- School of Computer Science and Engineering, Central South University, 410083, Changsha, PR China
| | - Zhangli Lu
- School of Computer Science and Engineering, Central South University, 410083, Changsha, PR China
| | - Meng Wang
- School of Computer Science and Engineering, Central South University, 410083, Changsha, PR China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 410083, Changsha, PR China.
| |
Collapse
|
35
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
36
|
Xia Y, Pan X, Shen HB. LigBind: identifying binding residues for over 1000 ligands with relation-aware graph neural networks. J Mol Biol 2023; 435:168091. [PMID: 37054909 DOI: 10.1016/j.jmb.2023.168091] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 03/22/2023] [Accepted: 04/05/2023] [Indexed: 04/15/2023]
Abstract
Identifying the interactions between proteins and ligands is significant for drug discovery and design. Considering the diverse binding patterns of ligands, the ligand-specific methods are trained per ligand to predict binding residues. However, most of the existing ligand-specific methods ignore shared binding preferences among various ligands and generally only cover a limited number of ligands with a sufficient number of known binding proteins. In this study, we propose a relation-aware framework LigBind with graph-level pre-training to enhance the ligand-specific binding residue predictions for 1159 ligands, which can effectively cover the ligands with a few known binding proteins. LigBind first pre-trains a graph neural network-based feature extractor for ligand-residue pairs and relation-aware classifiers for similar ligands. Then, LigBind is fine-tuned with ligand-specific binding data, where a domain adaptive neural network is designed to automatically leverage the diversity and similarity of various ligand-binding patterns for accurate binding residue prediction. We construct ligand-specific benchmark datasets of 1159 ligands and 16 unseen ligands, which are used to evaluate the effectiveness of LigBind. The results demonstrate the LigBind's efficacy on the large-scale ligand-specific benchmark datasets, and generalizes well to unseen ligands. LigBind also enables accurate identification of the ligand-binding residues in the main protease, papain-like protease and the RNA-dependent RNA polymerase of SARS-CoV-2. The webserver and source codes of LigBind are available at http://www.csbio.sjtu.edu.cn/bioinf/LigBind/ and https://github.com/YYingXia/LigBind/ for academic use.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
37
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
38
|
Petrovski ŽH, Hribar-Lee B, Bosnić Z. CAT-Site: Predicting Protein Binding Sites Using a Convolutional Neural Network. Pharmaceutics 2022; 15:pharmaceutics15010119. [PMID: 36678749 PMCID: PMC9862895 DOI: 10.3390/pharmaceutics15010119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/18/2022] [Accepted: 12/22/2022] [Indexed: 01/01/2023] Open
Abstract
Identifying binding sites on the protein surface is an important part of computer-assisted drug design processes. Reliable prediction of binding sites not only assists with docking algorithms, but it can also explain the possible side-effects of a potential drug as well as its efficiency. In this work, we propose a novel workflow for predicting possible binding sites of a ligand on a protein surface. We use proteins from the PDBbind and sc-PDB databases, from which we combine available ligand information for similar proteins using all the possible ligands rather than only a special sub-selection to generalize the work of existing research. After performing protein clustering and merging of ligands of similar proteins, we use a three-dimensional convolutional neural network that takes into account the spatial structure of a protein. Lastly, we combine ligandability predictions for points on protein surfaces into joint binding sites. Analysis of our model's performance shows that its achieved sensitivity is 0.829, specificity is 0.98, and F1 score is 0.517, and that for 54% of larger and pharmacologically relevant binding sites, the distance between their real and predicted centers amounts to less than 4 Å.
Collapse
Affiliation(s)
- Žan Hafner Petrovski
- University of Ljubljana, Faculty of Computer and Information Science, SI-1000 Ljubljana, Slovenia
| | - Barbara Hribar-Lee
- University of Ljubljana, Faculty of Chemistry and Chemical Technology, SI-1000 Ljubljana, Slovenia
- Correspondence: (B.-H.L.); (Z.B.)
| | - Zoran Bosnić
- University of Ljubljana, Faculty of Computer and Information Science, SI-1000 Ljubljana, Slovenia
- Correspondence: (B.-H.L.); (Z.B.)
| |
Collapse
|
39
|
Xia Y, Xia C, Pan X, Shen H. BindWeb: A web server for ligand binding residue and pocket prediction from protein structures. Protein Sci 2022; 31:e4462. [PMID: 36190332 PMCID: PMC9667820 DOI: 10.1002/pro.4462] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 12/13/2022]
Abstract
Knowledge of protein-ligand interactions is beneficial for biological process analysis and drug design. Given the complexity of the interactions and the inadequacy of experimental data, accurate ligand binding residue and pocket prediction remains challenging. In this study, we introduce an easy-to-use web server BindWeb for ligand-specific and ligand-general binding residue and pocket prediction from protein structures. BindWeb integrates a graph neural network GraphBind with a hybrid convolutional neural network and bidirectional long short-term memory network DELIA to identify binding residues. Furthermore, BindWeb clusters the predicted binding residues to binding pockets with mean shift clustering. The experimental results and case study demonstrate that BindWeb benefits from the complementarity of two base methods. BindWeb is freely available for academic use at http://www.csbio.sjtu.edu.cn/bioinf/BindWeb/.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Chunqiu Xia
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| | - Hong‐Bin Shen
- Institute of Image Processing and Pattern RecognitionShanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of ChinaShanghaiChina
| |
Collapse
|
40
|
Zhang J, Li H, Zhao X, Wu Q, Huang SY. Holo Protein Conformation Generation from Apo Structures by Ligand Binding Site Refinement. J Chem Inf Model 2022; 62:5806-5820. [PMID: 36342197 DOI: 10.1021/acs.jcim.2c00895] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An important part in structure-based drug design is the selection of an appropriate protein structure. It has been revealed that a holo protein structure that contains a well-defined binding site is a much better choice than an apo structure in structure-based drug discovery. Therefore, it is valuable to obtain a holo-like protein conformation from apo structures in the case where no holo structure is available. Meeting the need, we present a robust approach to generate reliable holo-like structures from apo structures by ligand binding site refinement with restraints derived from holo templates with low homology. Our method was tested on a test set of 32 proteins from the DUD-E data set and compared with other approaches. It was shown that our method successfully refined the apo structures toward the corresponding holo conformations for 23 of 32 proteins, reducing the average all-heavy-atom RMSD of binding site residues by 0.48 Å. In addition, when evaluated against all the holo structures in the protein data bank, our method can improve the binding site RMSD for 14 of 19 cases that experience significant conformational changes. Furthermore, our refined structures also demonstrate their advantages over the apo structures in ligand binding mode predictions by both rigid docking and flexible docking and in virtual screening on the database of active and decoy ligands from the DUD-E. These results indicate that our method is effective in recovering holo-like conformations and will be valuable in structure-based drug discovery.
Collapse
Affiliation(s)
- Jinze Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan430074, Hubei, P. R. China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan430074, Hubei, P. R. China
| | - Xuejun Zhao
- School of Physics, Huazhong University of Science and Technology, Wuhan430074, Hubei, P. R. China
| | - Qilong Wu
- School of Physics, Huazhong University of Science and Technology, Wuhan430074, Hubei, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan430074, Hubei, P. R. China
| |
Collapse
|
41
|
Wei GW, Soares TA, Wahab H, Zhu F. Computational Chemistry in Asia. J Chem Inf Model 2022; 62:5035-5037. [DOI: 10.1021/acs.jcim.2c01050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
Zhou Y, Li M, Shen T, Yang T, Shi G, Wei Y, Chen C, Wang D, Wang Y, Zhang T. Celastrol Targets Cullin-Associated and Neddylation-Dissociated 1 to Prevent Fibroblast-Myofibroblast Transformation against Pulmonary Fibrosis. ACS Chem Biol 2022; 17:2734-2743. [PMID: 36076154 DOI: 10.1021/acschembio.2c00099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Celastrol (CEL), a pentacyclic triterpene compound, has been proven to have a definite antipulmonary fibrosis effect. However, its direct targets for antipulmonary fibrosis remain unknown. In this study, we designed and synthesized a series of celastrol-based probes to identify the direct targets in human pulmonary fibroblasts using an activity-based protein profiling strategy. Among many fished targets, we identified a key protein, cullin-associated and neddylation-dissociated 1 (CAND1), which was involved in fibroblast-myofibroblast transformation (FMT). More importantly, we found that the inhibitory effect of celastrol on FMT is dependent on CAND1, through improving the interactions between CAND1 and Cullin1 to promote the activity of Skp1/Cullin1/F-box ubiquitin ligases. In silico studies and cysteine mutation experiments further demonstrated that Cys264 of CAND1 is the site for conjugation of celastrol. This reveals a new mechanism of celastrol against pulmonary fibrosis and may provide a novel therapeutic option for antipulmonary fibrosis.
Collapse
Affiliation(s)
- Yu Zhou
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Manru Li
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Tao Shen
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Tianming Yang
- Tianjin Key Laboratory of Molecular Design and Drug Discovery, Tianjin Institute of Pharmaceutical Research, Tianjin 300301, China.,State Key Laboratory of Drug Delivery Technology and Pharmacokinetics, Tianjin Institute of Pharmaceutical Research, Tianjin 300301, China
| | - Gaona Shi
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Yazi Wei
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Chengjuan Chen
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Yanan Wang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| | - Tiantai Zhang
- State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
43
|
Eguida M, Rognan D. Estimating the Similarity between Protein Pockets. Int J Mol Sci 2022; 23:12462. [PMID: 36293316 PMCID: PMC9604425 DOI: 10.3390/ijms232012462] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 10/15/2022] [Accepted: 10/16/2022] [Indexed: 10/28/2023] Open
Abstract
With the exponential increase in publicly available protein structures, the comparison of protein binding sites naturally emerged as a scientific topic to explain observations or generate hypotheses for ligand design, notably to predict ligand selectivity for on- and off-targets, explain polypharmacology, and design target-focused libraries. The current review summarizes the state-of-the-art computational methods applied to pocket detection and comparison as well as structural druggability estimates. The major strengths and weaknesses of current pocket descriptors, alignment methods, and similarity search algorithms are presented. Lastly, an exhaustive survey of both retrospective and prospective applications in diverse medicinal chemistry scenarios illustrates the capability of the existing methods and the hurdle that still needs to be overcome for more accurate predictions.
Collapse
Affiliation(s)
| | - Didier Rognan
- Laboratoire d’Innovation Thérapeutique, UMR7200 CNRS-Université de Strasbourg, 67400 Illkirch, France
| |
Collapse
|
44
|
Radoux CJ, Vianello F, McGreig J, Desai N, Bradley AR. The druggable genome: Twenty years later. FRONTIERS IN BIOINFORMATICS 2022; 2:958378. [PMID: 36304325 PMCID: PMC9580872 DOI: 10.3389/fbinf.2022.958378] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
The concept of the druggable genome has been with us for 20 years. During this time, researchers have developed several methods and resources to help assess a target's druggability. In parallel, evidence for target-disease associations has been collated at scale by Open Targets. More recently, the Protein Data Bank in Europe (PDBe) have built a knowledge base matching per-residue annotations with available protein structure. While each resource is useful in isolation, we believe there is enormous potential in bringing all relevant data into a single knowledge graph, from gene-level to protein residue. Automation is vital for the processing and assessment of all available structures. We have developed scalable, automated workflows that provide hotspot-based druggability assessments for all available structures across large numbers of targets. Ultimately, we will run our method at a proteome scale, an ambition made more realistic by the arrival of AlphaFold 2. Bringing together annotations from the residue up to the gene level and building connections within the graph to represent pathways or protein-protein interactions will create complexity that mirrors the biological systems they represent. Such complexity is difficult for the human mind to utilise effectively, particularly at scale. We believe that graph-based AI methods will be able to expertly navigate such a knowledge graph, selecting the targets of the future.
Collapse
|
45
|
Wang C, Chen Y, Zhang Y, Li K, Lin M, Pan F, Wu W, Zhang J. A reinforcement learning approach for protein-ligand binding pose prediction. BMC Bioinformatics 2022; 23:368. [PMID: 36076158 PMCID: PMC9454149 DOI: 10.1186/s12859-022-04912-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 08/25/2022] [Indexed: 11/10/2022] Open
Abstract
Protein ligand docking is an indispensable tool for computational prediction of protein functions and screening drug candidates. Despite significant progress over the past two decades, it is still a challenging problem, characterized by the still limited understanding of the energetics between proteins and ligands, and the vast conformational space that has to be searched to find a satisfactory solution. In this project, we developed a novel reinforcement learning (RL) approach, the asynchronous advantage actor-critic model (A3C), to address the protein ligand docking problem. The overall framework consists of two models. During the search process, the agent takes an action selected by the actor model based on the current location. The critic model then evaluates this action and predict the distance between the current location and true binding site. Experimental results showed that in both single- and multi-atom cases, our model improves binding site prediction substantially compared to a naïve model. For the single-atom ligand, copper ion (Cu2+), the model predicted binding sites have a median root-mean-square-deviation (RMSD) of 2.39 Å to the true binding sites when starting from random starting locations. For the multi-atom ligand, sulfate ion (SO42-), the predicted binding sites have a median RMSD of 3.82 Å to the true binding sites. The ligand-specific models built in this study can be used in solvent mapping studies and the RL framework can be readily scaled up to larger and more diverse sets of ligands.
Collapse
Affiliation(s)
- Chenran Wang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yang Chen
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Yuan Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Keqiao Li
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Menghan Lin
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Feng Pan
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA
| | - Wei Wu
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, 32306-4330, USA.
| |
Collapse
|
46
|
Aguti R, Gardini E, Bertazzo M, Decherchi S, Cavalli A. Probabilistic Pocket Druggability Prediction via One-Class Learning. Front Pharmacol 2022; 13:870479. [PMID: 35847005 PMCID: PMC9278401 DOI: 10.3389/fphar.2022.870479] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 03/24/2022] [Indexed: 12/31/2022] Open
Abstract
The choice of target pocket is a key step in a drug discovery campaign. This step can be supported by in silico druggability prediction. In the literature, druggability prediction is often approached as a two-class classification task that distinguishes between druggable and non-druggable (or less druggable) pockets (or voxels). Apart from obvious cases, however, the non-druggable class is conceptually ambiguous. This is because any pocket (or target) is only non-druggable until a drug is found for it. It is therefore more appropriate to adopt a one-class approach, which uses only unambiguous information, namely, druggable pockets. Here, we propose using the import vector domain description (IVDD) algorithm to support this task. IVDD is a one-class probabilistic kernel machine that we previously introduced. To feed the algorithm, we use customized DrugPred descriptors computed via NanoShaper. Our results demonstrate the feasibility and effectiveness of the approach. In particular, we can remove or mitigate biases chiefly due to the labeling.
Collapse
Affiliation(s)
- Riccardo Aguti
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy.,Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Erika Gardini
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy.,Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Martina Bertazzo
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy
| | - Sergio Decherchi
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy
| | - Andrea Cavalli
- Computational and Chemical Biology, Fondazione Istituto Italiano di Tecnologia, Genoa, Italy.,Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
47
|
Ning S, Wang H, Zeng C, Zhao Y. Prediction of allosteric druggable pockets of cyclin-dependent kinases. Brief Bioinform 2022; 23:6643454. [PMID: 35830869 DOI: 10.1093/bib/bbac290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 06/07/2022] [Accepted: 06/24/2022] [Indexed: 11/13/2022] Open
Abstract
Cyclin-dependent kinase (Cdk) proteins play crucial roles in the cell cycle progression and are thus attractive drug targets for therapy against such aberrant cell cycle processes as cancer. Since most of the available Cdk inhibitors target the highly conserved catalytic ATP pocket and their lack of specificity often lead to side effects, it is imperative to identify and characterize less conserved non-catalytic pockets capable of interfering with the kinase activity allosterically. However, a systematic analysis of these allosteric druggable pockets is still in its infancy. Here, we summarize the existing Cdk pockets and their selectivity. Then, we outline a network-based pocket prediction approach (NetPocket) and illustrate its utility for systematically identifying the allosteric druggable pockets with case studies. Finally, we discuss potential future directions and their challenges.
Collapse
Affiliation(s)
- Shangbo Ning
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Huiwen Wang
- School of Physics and Engineering, Henan University of Science and Technology, Luoyang 471023, China
| | - Chen Zeng
- Department of Physics, The George Washington University, Washington, DC 20052, USA
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| |
Collapse
|
48
|
Meli R, Morris GM, Biggin PC. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review. FRONTIERS IN BIOINFORMATICS 2022; 2:885983. [PMID: 36187180 PMCID: PMC7613667 DOI: 10.3389/fbinf.2022.885983] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
The rapid and accurate in silico prediction of protein-ligand binding free energies or binding affinities has the potential to transform drug discovery. In recent years, there has been a rapid growth of interest in deep learning methods for the prediction of protein-ligand binding affinities based on the structural information of protein-ligand complexes. These structure-based scoring functions often obtain better results than classical scoring functions when applied within their applicability domain. Here we review structure-based scoring functions for binding affinity prediction based on deep learning, focussing on different types of architectures, featurization strategies, data sets, methods for training and evaluation, and the role of explainable artificial intelligence in building useful models for real drug-discovery applications.
Collapse
Affiliation(s)
- Rocco Meli
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Garrett M. Morris
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Philip C. Biggin
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
49
|
Chelur VR, Priyakumar UD. BiRDS - Binding Residue Detection from Protein Sequences Using Deep ResNets. J Chem Inf Model 2022; 62:1809-1818. [PMID: 35414182 DOI: 10.1021/acs.jcim.1c00972] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein-drug interactions play important roles in many biological processes and therapeutics. Predicting the binding sites of a protein helps to discover such interactions. New drugs can be designed to optimize these interactions, improving protein function. The tertiary structure of a protein decides the binding sites available to the drug molecule, but the determination of the 3D structure is slow and expensive. Conversely, the determination of the amino acid sequence is swift and economical. Although quick and accurate prediction of the binding site using just the sequence is challenging, the application of Deep Learning, which has been hugely successful in several biochemical tasks, makes it feasible. BiRDS is a Residual Neural Network that predicts the protein's most active binding site using sequence information. SC-PDB, an annotated database of druggable binding sites, is used for training the network. Multiple Sequence Alignments of the proteins in the database are generated using DeepMSA, and features such as Position-Specific Scoring Matrix, Secondary Structure, and Relative Solvent Accessibility are extracted. During training, a weighted binary cross-entropy loss function is used to counter the substantial imbalance in the two classes of binding and nonbinding residues. A novel test set SC6K is introduced to compare binding-site prediction methods. BiRDS achieves an AUROC score of 0.87, and the center of 25% of its predicted binding sites lie within 4 Å of the center of the actual binding site.
Collapse
Affiliation(s)
- Vineeth R Chelur
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences & Bioinformatics International Institute of Information Technology Hyderabad 500032, India
| |
Collapse
|
50
|
McGreig JE, Uri H, Antczak M, Sternberg MJE, Michaelis M, Wass MN. 3DLigandSite: structure-based prediction of protein-ligand binding sites. Nucleic Acids Res 2022; 50:W13-W20. [PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/13/2022] [Accepted: 04/03/2022] [Indexed: 01/13/2023] Open
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.
Collapse
Affiliation(s)
- Jake E McGreig
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Hannah Uri
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Magdalena Antczak
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Martin Michaelis
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Mark N Wass
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| |
Collapse
|