1
|
Liu W, Li X, Hang B, Wang P. EnGCI: enhancing GPCR-compound interaction prediction via large molecular models and KAN network. BMC Biol 2025; 23:136. [PMID: 40375308 DOI: 10.1186/s12915-025-02238-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 05/06/2025] [Indexed: 05/18/2025] Open
Abstract
BACKGROUND Identifying GPCR-compound interactions (GCI) plays a significant role in drug discovery and chemogenomics. Machine learning, particularly deep learning, has become increasingly influential in this domain. Large molecular models, due to their ability to capture detailed structural and functional information, have shown promise in enhancing the predictive accuracy of downstream tasks. Consequently, exploring the performance of these models in GCI prediction, as well as evaluating their effectiveness when integrated with other deep learning models, has emerged as a compelling research area. This paper aims to investigate these challenges. RESULTS This study introduces EnGCI, a novel model comprising two distinct modules. The MSBM integrates a graph isomorphism network (GIN) and a convolutional neural network (CNN) to extract features from GPCRs and compounds, respectively. These features are then processed by a Kolmogorov-Arnold network (KAN) for decision-making. The LMMBM utilizes two large-scale pre-trained models to extract features from compounds and GPCRs, and subsequently, KAN is again employed for decision-making. Each module leverages different sources of multimodal information, and their fusion enhances the overall accuracy of GPCR-compound interaction (GCI) prediction. Evaluating the EnGCI model on a rigorously curated GCI dataset, we achieved an AUC of approximately 0.89, significantly outperforming current state-of-the-art benchmark models. CONCLUSIONS The EnGCI model integrates two complementary modules: one that learns molecular features from scratch for the GPCR-compound interaction (GCI) prediction task, and another that extracts molecular features using pre-trained large molecular models. After further processing and integration, these multimodal information sources enable a more profound exploration and understanding of the complex interaction relationships between GPCRs and compounds. The EnGCI model offers a robust and efficient framework that enhances GCI predictive capabilities and has the potential to significantly contribute to GPCR drug discovery.
Collapse
Affiliation(s)
- Weihao Liu
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Xiaoli Li
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Bo Hang
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Longzhong Road, Xiangyang, 441053, Hubei, China.
| |
Collapse
|
2
|
Conflitti P, Lyman E, Sansom MSP, Hildebrand PW, Gutiérrez-de-Terán H, Carloni P, Ansell TB, Yuan S, Barth P, Robinson AS, Tate CG, Gloriam D, Grzesiek S, Eddy MT, Prosser S, Limongelli V. Functional dynamics of G protein-coupled receptors reveal new routes for drug discovery. Nat Rev Drug Discov 2025; 24:251-275. [PMID: 39747671 PMCID: PMC11968245 DOI: 10.1038/s41573-024-01083-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/25/2024] [Indexed: 01/04/2025]
Abstract
G protein-coupled receptors (GPCRs) are the largest human membrane protein family that transduce extracellular signals into cellular responses. They are major pharmacological targets, with approximately 26% of marketed drugs targeting GPCRs, primarily at their orthosteric binding site. Despite their prominence, predicting the pharmacological effects of novel GPCR-targeting drugs remains challenging due to the complex functional dynamics of these receptors. Recent advances in X-ray crystallography, cryo-electron microscopy, spectroscopic techniques and molecular simulations have enhanced our understanding of receptor conformational dynamics and ligand interactions with GPCRs. These developments have revealed novel ligand-binding modes, mechanisms of action and druggable pockets. In this Review, we highlight such aspects for recently discovered small-molecule drugs and drug candidates targeting GPCRs, focusing on three categories: allosteric modulators, biased ligands, and bivalent and bitopic compounds. Although studies so far have largely been retrospective, integrating structural data on ligand-induced receptor functional dynamics into the drug discovery pipeline has the potential to guide the identification of drug candidates with specific abilities to modulate GPCR interactions with intracellular effector proteins such as G proteins and β-arrestins, enabling more tailored selectivity and efficacy profiles.
Collapse
Affiliation(s)
- Paolo Conflitti
- Euler Institute, Faculty of Biomedical Sciences, Università della Svizzera italiana (USI), Lugano, Switzerland
| | - Edward Lyman
- Department of Physics and Astronomy, University of Delaware, Newark, DE, USA
- Department of Chemistry and Biochemistry, University of Delaware, Newark, DE, USA
| | - Mark S P Sansom
- Department of Biochemistry, University of Oxford, Oxford, UK
| | - Peter W Hildebrand
- Institute of Medical Physics and Biophysics, Faculty of Medicine, Leipzig University, Leipzig, Germany
| | - Hugo Gutiérrez-de-Terán
- Department of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Uppsala, Sweden
| | - Paolo Carloni
- INM-9/IAS-5 Computational Biomedicine, Forschungszentrum Jülich, Jülich, Germany
- Department of Physics, RWTH Aachen University, Aachen, Germany
| | - T Bertie Ansell
- Department of Biochemistry, University of Oxford, Oxford, UK
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Shuguang Yuan
- Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Patrick Barth
- Interfaculty Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Ludwig Institute for Cancer Research Lausanne, Lausanne, Switzerland
| | - Anne S Robinson
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | - David Gloriam
- Department of Drug Design and Pharmacology, University of Copenhagen, Universitetsparken 2, Copenhagen, Denmark
| | - Stephan Grzesiek
- Focal Area Structural Biology and Biophysics, Biozentrum, University of Basel, Basel, Switzerland
| | - Matthew T Eddy
- Department of Chemistry, College of Liberal Arts and Sciences, University of Florida, Gainesville, FL, USA
| | - Scott Prosser
- Department of Chemistry, University of Toronto, Mississauga, Ontario, Canada
| | - Vittorio Limongelli
- Euler Institute, Faculty of Biomedical Sciences, Università della Svizzera italiana (USI), Lugano, Switzerland.
| |
Collapse
|
3
|
Ekambaram S, Arakelov G, Dokholyan NV. The Evolving Landscape of Protein Allostery: From Computational and Experimental Perspectives. J Mol Biol 2025:169060. [PMID: 40043838 DOI: 10.1016/j.jmb.2025.169060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 02/26/2025] [Accepted: 02/26/2025] [Indexed: 03/16/2025]
Abstract
Protein allostery is a fundamental biological regulatory mechanism that allows communication between distant locations within a protein, modifying its function in response to signals. Experimental techniques, such as NMR spectroscopy and cryo-electron microscopy (cryo-EM), are critical validation tools for computational predictions and provide valuable insights into dynamic conformational changes. Combining these approaches has greatly improved our understanding of classical conformational allostery and complex dynamic coupling mechanisms. Recent advances in machine learning and enhanced sampling methods have broadened the scope of allostery research, identifying cryptic allosteric sites and directing new drug discovery approaches. Despite progress, bridging static structural data with dynamic functional states remains challenging. This review underscores the importance of combining experimental and computational approaches to comprehensively understand protein allostery and its diverse applications in biology and medicine.
Collapse
Affiliation(s)
- Srinivasan Ekambaram
- Department of Neuroscience and Experimental Therapeutics, Penn State College of Medicine, Hershey, PA 17033, USA
| | - Grigor Arakelov
- Department of Neuroscience and Experimental Therapeutics, Penn State College of Medicine, Hershey, PA 17033, USA; Institute of Molecular Biology of the National Academy of Sciences of the Republic of Armenia, Yerevan 0014, Armenia
| | - Nikolay V Dokholyan
- Department of Neuroscience and Experimental Therapeutics, Penn State College of Medicine, Hershey, PA 17033, USA; Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA; Department of Chemistry, Penn State University, University Park, PA 16802, USA; Department of Biomedical Engineering, Penn State University, University Park, PA 16802, USA.
| |
Collapse
|
4
|
Badrinarayanan S, Guntuboina C, Mollaei P, Barati Farimani A. Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties. J Chem Inf Model 2025; 65:83-91. [PMID: 39700492 PMCID: PMC11733943 DOI: 10.1021/acs.jcim.4c01443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/21/2024]
Abstract
Peptides are crucial in biological processes and therapeutic applications. Given their importance, advancing our ability to predict peptide properties is essential. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with graph neural networks (GNNs) to predict peptide properties. We integrate PeptideBERT, a transformer model specifically designed for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing a contrastive loss framework, Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the transformer model's predictive accuracy. Evaluations on hemolysis and nonfouling data sets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 88.057% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
Collapse
Affiliation(s)
- Srivathsan Badrinarayanan
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
| | - Chakradhar Guntuboina
- Department
of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
| | - Amir Barati Farimani
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh 15213, Pennsylvania, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United
States
| |
Collapse
|
5
|
Caniceiro AB, Orzeł U, Rosário-Ferreira N, Filipek S, Moreira IS. Leveraging Artificial Intelligence in GPCR Activation Studies: Computational Prediction Methods as Key Drivers of Knowledge. Methods Mol Biol 2025; 2870:183-220. [PMID: 39543036 DOI: 10.1007/978-1-0716-4213-9_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
G protein-coupled receptors (GPCRs) are key molecules involved in cellular signaling and are attractive targets for pharmacological intervention. This chapter is designed to explore the range of algorithms used to predict GPCRs' activation states, while also examining the pharmaceutical implications of these predictions. Our primary objective is to show how artificial intelligence (AI) is key in GPCR research to reveal the intricate dynamics of activation and inactivation processes, shedding light on the complex regulatory mechanisms of this vital protein family. We describe several computational strategies that leverage diverse structural data from the Protein Data Bank, molecular dynamic simulations, or ligand-based methods to predict the activation states of GPCRs. We demonstrate how the integration of AI into GPCR research not only enhances our understanding of their dynamic properties but also presents immense potential for driving pharmaceutical research and development, offering promising new avenues in the search for newer, better therapeutic agents.
Collapse
Affiliation(s)
- Ana B Caniceiro
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Urszula Orzeł
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Nícia Rosário-Ferreira
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal
| | - Sławomir Filipek
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Irina S Moreira
- Department of Life Sciences, University of Coimbra, Coimbra, Portugal.
- CNC-UC - Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal.
- CIBB - Centre for Innovative Biomedicine and Biotechnology, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
6
|
Mollaei P, Sadasivam D, Guntuboina C, Barati Farimani A. IDP-Bert: Predicting Properties of Intrinsically Disordered Proteins Using Large Language Models. J Phys Chem B 2024; 128:12030-12037. [PMID: 39586094 DOI: 10.1021/acs.jpcb.4c02507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Intrinsically disordered Proteins (IDPs) constitute a large and structureless class of proteins with significant functions. The existence of IDPs challenges the conventional notion that the biological functions of proteins rely on their three-dimensional structures. Despite lacking well-defined spatial arrangements, they exhibit diverse biological functions, influencing cellular processes and shedding light on disease mechanisms. However, it is expensive to run experiments or simulations to characterize this class of proteins. Consequently, we designed an ML model that relies solely on amino acid sequences. In this study, we introduce the IDP-Bert model, a deep-learning architecture leveraging Transformers and Protein Language Models to map sequences directly to IDP properties. Our experiments demonstrate accurate predictions of IDP properties, including Radius of Gyration, end-to-end Decorrelation Time, and Heat Capacity.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Danush Sadasivam
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Chakradhar Guntuboina
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
7
|
Lopez-Balastegui M, Stepniewski TM, Kogut-Günthel MM, Di Pizio A, Rosenkilde MM, Mao J, Selent J. Relevance of G protein-coupled receptor (GPCR) dynamics for receptor activation, signalling bias and allosteric modulation. Br J Pharmacol 2024. [PMID: 38978399 DOI: 10.1111/bph.16495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/22/2024] [Accepted: 05/23/2024] [Indexed: 07/10/2024] Open
Abstract
G protein-coupled receptors (GPCRs) are one of the major drug targets. In recent years, computational drug design for GPCRs has mainly focused on static structures obtained through X-ray crystallography, cryogenic electron microscopy (cryo-EM) or in silico modelling as a starting point for virtual screening campaigns. However, GPCRs are highly flexible entities with the ability to adopt different conformational states that elicit different physiological responses. Including this knowledge in the drug discovery pipeline can help to tailor novel conformation-specific drugs with an improved therapeutic profile. In this review, we outline our current knowledge about GPCR dynamics that is relevant for receptor activation, signalling bias and allosteric modulation. Ultimately, we highlight new technological implementations such as time-resolved X-ray crystallography and cryo-EM as well as computational algorithms that can contribute to a more comprehensive understanding of receptor dynamics and its relevance for GPCR functionality.
Collapse
Affiliation(s)
- Marta Lopez-Balastegui
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute & Pompeu Fabra University, Barcelona, Spain
| | - Tomasz Maciej Stepniewski
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute & Pompeu Fabra University, Barcelona, Spain
- InterAx Biotech AG, Villigen, Switzerland
| | | | - Antonella Di Pizio
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Freising, Germany
- Chair for Chemoinformatics and Protein Modelling, Department of Molecular Life Science, School of Science, Technical University of Munich, Freising, Germany
| | - Mette Marie Rosenkilde
- Department of Biomedical Sciences, Faculty of Health and Medical Sciences University of Copenhagen, København N, Denmark
| | - Jiafei Mao
- Huairou Research Center, Institute of Chemistry, Chinese Academy of Sciences, Beijing, China
| | - Jana Selent
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute & Pompeu Fabra University, Barcelona, Spain
| |
Collapse
|
8
|
Nguyen ATN, Nguyen DTN, Koh HY, Toskov J, MacLean W, Xu A, Zhang D, Webb GI, May LT, Halls ML. The application of artificial intelligence to accelerate G protein-coupled receptor drug discovery. Br J Pharmacol 2024; 181:2371-2384. [PMID: 37161878 DOI: 10.1111/bph.16140] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 04/14/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023] Open
Abstract
The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery. LINKED ARTICLES: This article is part of a themed issue Therapeutic Targeting of G Protein-Coupled Receptors: hot topics from the Australasian Society of Clinical and Experimental Pharmacologists and Toxicologists 2021 Virtual Annual Scientific Meeting. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v181.14/issuetoc.
Collapse
Affiliation(s)
- Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Diep T N Nguyen
- Department of Information Technology, Faculty of Engineering and Technology, Vietnam National University, Cau Giay, Hanoi, Vietnam
| | - Huan Yee Koh
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Jason Toskov
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - William MacLean
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Andrew Xu
- Monash DeepNeuron, Monash University, Clayton, Victoria, Australia
| | - Daokun Zhang
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute and Department of Data Science and Artificial Intelligence, Monash University, Clayton, Victoria, Australia
| | - Lauren T May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Michelle L Halls
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| |
Collapse
|
9
|
Bhattacharjee A, Kar S, Ojha PK. Unveiling G-protein coupled receptor kinase-5 inhibitors for chronic degenerative diseases: Multilayered prioritization employing explainable machine learning-driven multi-class QSAR, ligand-based pharmacophore and free energy-inspired molecular simulation. Int J Biol Macromol 2024; 269:131784. [PMID: 38697440 DOI: 10.1016/j.ijbiomac.2024.131784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 04/02/2024] [Accepted: 04/21/2024] [Indexed: 05/05/2024]
Abstract
GRK5 holds a pivotal role in cellular signaling pathways, with its overexpression in cardiomyocytes, neuronal cells, and tumor cells strongly associated with various chronic degenerative diseases, which highlights the urgent need for potential inhibitors. In this study, multiclass classification-based QSAR models were developed using diverse machine learning algorithms. These models were built from curated compounds with experimentally derived GRK5 inhibitory activity. Additionally, a pharmacophore model was constructed using active compounds from the dataset. Among the models, the SVM-based approach proved most effective and was initially used to screen DrugBank compounds within the applicability domain. Compounds showing significant GRK5 inhibitory potential underwent evaluation for key pharmacophoric features. Prospective compounds were subjected to molecular docking to assess binding affinity towards GRK5's key active site amino acid residues. Stability at the binding site was analyzed through 200 ns molecular dynamics simulations. MM-GBSA analysis quantified individual free energy components contributing to the total binding energy with respect to binding site residues. Metadynamics analysis, including PCA, FEL, and PDF, provided crucial insights into conformational changes of both apo and holo forms of GRK5 at defined energy states. The study identifies DB02844 (S-Adenosyl-1,8-Diamino-3-Thiooctane) and DB13155 (Esculin) as promising GRK5 inhibitors, warranting further in vitro and in vivo validation studies.
Collapse
Affiliation(s)
- Arnab Bhattacharjee
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Supratik Kar
- Chemometrics and Molecular Modeling Laboratory, Department of Chemistry and Physics, Kean University, 1000 Morris Avenue, Union, NJ, 07083, USA
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory (DDD Lab), Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
10
|
Kim S, Mollaei P, Antony A, Magar R, Barati Farimani A. GPCR-BERT: Interpreting Sequential Design of G Protein-Coupled Receptors Using Protein Language Models. J Chem Inf Model 2024; 64:1134-1144. [PMID: 38340054 PMCID: PMC10900288 DOI: 10.1021/acs.jcim.3c01706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 01/29/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024]
Abstract
With the rise of transformers and large language models (LLMs) in chemistry and biology, new avenues for the design and understanding of therapeutics have been opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence data sets. In this letter, we developed the GPCR-BERT model for understanding the sequential design of G protein-coupled receptors (GPCRs). GPCRs are the target of over one-third of Food and Drug Administration-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship among amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, and E/DRY). By utilizing the pretrained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
Collapse
Affiliation(s)
- Seongwon Kim
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Akshay Antony
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Rishikesh Magar
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
11
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
Affiliation(s)
- João P L Velloso
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Aaron S Kovacs
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia.
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia; Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
12
|
Huang WC, Lin WT, Hung MS, Lee JC, Tung CW. Decrypting orphan GPCR drug discovery via multitask learning. J Cheminform 2024; 16:10. [PMID: 38263092 PMCID: PMC10804799 DOI: 10.1186/s13321-024-00806-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024] Open
Abstract
The drug discovery of G protein-coupled receptors (GPCRs) superfamily using computational models is often limited by the availability of protein three-dimensional (3D) structures and chemicals with experimentally measured bioactivities. Orphan GPCRs without known ligands further complicate the process. To enable drug discovery for human orphan GPCRs, multitask models were proposed for predicting half maximal effective concentrations (EC50) of the pairs of chemicals and GPCRs. Protein multiple sequence alignment features, and physicochemical properties and fingerprints of chemicals were utilized to encode the protein and chemical information, respectively. The protein features enabled the transfer of data-rich GPCRs to orphan receptors and the transferability based on the similarity of protein features. The final model was trained using both agonist and antagonist data from 200 GPCRs and showed an excellent mean squared error (MSE) of 0.24 in the validation dataset. An independent test using the orphan dataset consisting of 16 receptors associated with less than 8 bioactivities showed a reasonably good MSE of 1.51 that can be further improved to 0.53 by considering the transferability based on protein features. The informative features were identified and mapped to corresponding 3D structures to gain insights into the mechanism of GPCR-ligand interactions across the GPCR family. The proposed method provides a novel perspective on learning ligand bioactivity within the diverse human GPCR superfamily and can potentially accelerate the discovery of therapeutic agents for orphan GPCRs.
Collapse
Affiliation(s)
- Wei-Cheng Huang
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Wei-Ting Lin
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Ming-Shiu Hung
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Jinq-Chyi Lee
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan
| | - Chun-Wei Tung
- Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 35053, Taiwan.
| |
Collapse
|
13
|
Buyanov I, Popov P. Characterizing conformational states in GPCR structures using machine learning. Sci Rep 2024; 14:1098. [PMID: 38212515 PMCID: PMC10784458 DOI: 10.1038/s41598-023-47698-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/17/2023] [Indexed: 01/13/2024] Open
Abstract
G protein-coupled receptors (GPCRs) play a pivotal role in signal transduction and represent attractive targets for drug development. Recent advances in structural biology have provided insights into GPCR conformational states, which are critical for understanding their signaling pathways and facilitating structure-based drug discovery. In this study, we introduce a machine learning approach for conformational state annotation of GPCRs. We represent GPCR conformations as high-dimensional feature vectors, incorporating information about amino acid residue pairs involved in the activation pathway. Using a dataset of GPCR conformations in inactive and active states obtained through molecular dynamics simulations, we trained machine learning models to distinguish between inactive-like and active-like conformations. The developed model provides interpretable predictions and can be used for the large-scale analysis of molecular dynamics trajectories of GPCRs.
Collapse
Affiliation(s)
- Ilya Buyanov
- iMolecule, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia
| | - Petr Popov
- iMolecule, Skolkovo Institute of Science and Technology, Moscow, 121205, Russia.
| |
Collapse
|
14
|
Mollaei P, Barati Farimani A. Unveiling Switching Function of Amino Acids in Proteins Using a Machine Learning Approach. J Chem Theory Comput 2023; 19:8472-8480. [PMID: 37933128 PMCID: PMC10688191 DOI: 10.1021/acs.jctc.3c00665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 10/13/2023] [Accepted: 10/16/2023] [Indexed: 11/08/2023]
Abstract
Dynamics of individual amino acids play key roles in the overall properties of proteins. However, the knowledge of protein structural features at the residue level is limited due to the current resolutions of experimental and computational techniques. To address this issue, we designed a novel machine-learning (ML) framework that uses Molecular Dynamics (MD) trajectories to identify the major conformational states of individual amino acids, classify amino acids switching between two distinct modes, and evaluate their degree of dynamic stability. The Random Forest model achieved 96.94% classification accuracy in identifying switch residues within proteins. Additionally, our framework distinguishes between the stable switch (SS) residues, which remain stable in one angular state and jump once to another state during protein dynamics, and unstable switch (US) residues, which constantly fluctuate between the two angular states. This study also illustrates the correlation between the dynamics of SS residues and the protein's global properties.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
15
|
Guntuboina C, Das A, Mollaei P, Kim S, Barati Farimani A. PeptideBERT: A Language Model Based on Transformers for Peptide Property Prediction. J Phys Chem Lett 2023; 14:10427-10434. [PMID: 37956397 PMCID: PMC10683064 DOI: 10.1021/acs.jpclett.3c02398] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 11/04/2023] [Accepted: 11/07/2023] [Indexed: 11/15/2023]
Abstract
Recent advances in language models have enabled the protein modeling community with a powerful tool that uses transformers to represent protein sequences as text. This breakthrough enables a sequence-to-property prediction for peptides without relying on explicit structural data. Inspired by the recent progress in the field of large language models, we present PeptideBERT, a protein language model specifically tailored for predicting essential peptide properties such as hemolysis, solubility, and nonfouling. The PeptideBERT utilizes the ProtBERT pretrained transformer model with 12 attention heads and 12 hidden layers. Through fine-tuning the pretrained model for the three downstream tasks, our model is state of the art (SOTA) in predicting hemolysis, which is crucial for determining a peptide's potential to induce red blood cells as well as nonfouling properties. Leveraging primarily shorter sequences and a data set with negative samples predominantly associated with insoluble peptides, our model showcases remarkable performance.
Collapse
Affiliation(s)
- Chakradhar Guntuboina
- Department
of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Adrita Das
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Seongwon Kim
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Chemical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
16
|
Mollaei P, Barati Farimani A. Activity Map and Transition Pathways of G Protein-Coupled Receptor Revealed by Machine Learning. J Chem Inf Model 2023; 63:2296-2304. [PMID: 37036101 PMCID: PMC10131220 DOI: 10.1021/acs.jcim.3c00032] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Indexed: 04/11/2023]
Abstract
Approximately, one-third of all U.S. Food and Drug Administration approved drugs target G protein-coupled receptors (GPCRs). However, more knowledge of protein structure-activity correlation is required to improve the efficacy of the drugs targeting GPCRs. In this study, we developed a machine learning model to predict the activation state and activity level of the receptors with high prediction accuracy. Furthermore, we applied this model to thousands of molecular dynamics trajectories to correlate residue-level conformational changes of a GPCR to its activity level. Finally, the most probable transition pathway between activation states of a receptor can be identified using the state-activity information. In addition, with this model, we can associate the contribution of each amino acid to the activation process. Using this method, we can design drugs that mainly target principal amino acids driving the transition between activation states of GPCRs. Our advanced method is generalizable to all GPCR classes and provides mechanistic insight into the activation mechanism in the receptors.
Collapse
Affiliation(s)
- Parisa Mollaei
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
| | - Amir Barati Farimani
- Department
of Mechanical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Department
of Biomedical Engineering, Carnegie Mellon
University, Pittsburgh, Pennsylvania 15213, United States
- Machine
Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| |
Collapse
|
17
|
Jundi D, Coutanceau JP, Bullier E, Imarraine S, Fajloun Z, Hong E. Expression of olfactory receptor genes in non-olfactory tissues in the developing and adult zebrafish. Sci Rep 2023; 13:4651. [PMID: 36944644 PMCID: PMC10030859 DOI: 10.1038/s41598-023-30895-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 03/02/2023] [Indexed: 03/23/2023] Open
Abstract
Since the discovery of olfactory receptor (OR) genes, their expression in non-olfactory tissues have been reported in rodents and humans. For example, mouse OR23 (mOR23) is expressed in sperm and muscle cells and has been proposed to play a role in chemotaxis and muscle migration, respectively. In addition, mouse mesencephalic dopaminergic neurons express various ORs, which respond to corresponding ligands. As the OR genes comprise the largest multigene family of G protein-coupled receptors in vertebrates (over 400 genes in human and 1000 in rodents), it has been difficult to categorize the extent of their diverse expression in non-olfactory tissues making it challenging to ascertain their function. The zebrafish genome contains significantly fewer OR genes at around 140 genes, and their expression pattern can be easily analyzed by carrying out whole mount in situ hybridization (ISH) assay in larvae. In this study, we found that 31 out of 36 OR genes, including or104-2, or108-1, or111-1, or125-4, or128-1, or128-5, 133-4, or133-7, or137-3 are expressed in various tissues, including the trunk, pharynx, pancreas and brain in the larvae. In addition, some OR genes are expressed in distinct brain regions such as the hypothalamus and the habenula in a dynamic temporal pattern between larvae, juvenile and adult zebrafish. We further confirmed that OR genes are expressed in non-olfactory tissues by RT-PCR in larvae and adults. These results indicate tight regulation of OR gene expression in the brain in a spatial and temporal manner and that the expression of OR genes in non-olfactory tissues are conserved in vertebrates. This study provides a framework to start investigating the function of ORs in the zebrafish brain.
Collapse
Affiliation(s)
- Dania Jundi
- INSERM, CNRS, Neurosciences Paris Seine-Institut de Biologie Paris Seine (NPS-IBPS), Sorbonne Université, 75005, Paris, France
- Laboratory of Applied Biotechnology (LBA3B), Azm Center for Research in Biotechnology and Its Applications, EDST, Lebanese University, Tripoli, 1300, Lebanon
| | - Jean-Pierre Coutanceau
- INSERM, CNRS, Neurosciences Paris Seine-Institut de Biologie Paris Seine (NPS-IBPS), Sorbonne Université, 75005, Paris, France
| | - Erika Bullier
- INSERM, CNRS, Neurosciences Paris Seine-Institut de Biologie Paris Seine (NPS-IBPS), Sorbonne Université, 75005, Paris, France
| | - Soumaiya Imarraine
- INSERM, CNRS, Neurosciences Paris Seine-Institut de Biologie Paris Seine (NPS-IBPS), Sorbonne Université, 75005, Paris, France
- CNRS, Laboratoire Jean Perrin-Institut de Biologie Paris Seine (LJP-IBPS), Sorbonne Université, 75005, Paris, France
| | - Ziad Fajloun
- Laboratory of Applied Biotechnology (LBA3B), Azm Center for Research in Biotechnology and Its Applications, EDST, Lebanese University, Tripoli, 1300, Lebanon
- Department of Biology, Faculty of Sciences 3, Campus Michel Slayman, Lebanese University, Tripoli, 1352, Lebanon
| | - Elim Hong
- INSERM, CNRS, Neurosciences Paris Seine-Institut de Biologie Paris Seine (NPS-IBPS), Sorbonne Université, 75005, Paris, France.
| |
Collapse
|
18
|
Gutiérrez-Mondragón MA, König C, Vellido A. Layer-Wise Relevance Analysis for Motif Recognition in the Activation Pathway of the β2- Adrenergic GPCR Receptor. Int J Mol Sci 2023; 24:ijms24021155. [PMID: 36674669 PMCID: PMC9865744 DOI: 10.3390/ijms24021155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 12/22/2022] [Accepted: 12/30/2022] [Indexed: 01/11/2023] Open
Abstract
G-protein-coupled receptors (GPCRs) are cell membrane proteins of relevance as therapeutic targets, and are associated to the development of treatments for illnesses such as diabetes, Alzheimer's, or even cancer. Therefore, comprehending the underlying mechanisms of the receptor functional properties is of particular interest in pharmacoproteomics and in disease therapy at large. Their interaction with ligands elicits multiple molecular rearrangements all along their structure, inducing activation pathways that distinctly influence the cell response. In this work, we studied GPCR signaling pathways from molecular dynamics simulations as they provide rich information about the dynamic nature of the receptors. We focused on studying the molecular properties of the receptors using deep-learning-based methods. In particular, we designed and trained a one-dimensional convolution neural network and illustrated its use in a classification of conformational states: active, intermediate, or inactive, of the β2-adrenergic receptor when bound to the full agonist BI-167107. Through a novel explainability-oriented investigation of the prediction results, we were able to identify and assess the contribution of individual motifs (residues) influencing a particular activation pathway. Consequently, we contribute a methodology that assists in the elucidation of the underlying mechanisms of receptor activation-deactivation.
Collapse
Affiliation(s)
- Mario A. Gutiérrez-Mondragón
- Computer Science Department, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Center, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
| | - Caroline König
- Computer Science Department, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Center, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
- Correspondence:
| | - Alfredo Vellido
- Computer Science Department, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
- Intelligent Data Science and Artificial Intelligence (IDEAI-UPC) Research Center, Universitat Politècnica de Catalunya—UPC BarcelonaTech, 08034 Barcelona, Spain
| |
Collapse
|