1
|
Wozniak S, Janson G, Feig M. Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning. J Chem Theory Comput 2025. [PMID: 40270304 DOI: 10.1021/acs.jctc.4c01682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2025]
Abstract
Machine learning has emerged as a promising approach for predicting molecular properties of proteins, as it addresses limitations of experimental and traditional computational methods. Here, we introduce GSnet, a graph neural network (GNN) trained to predict physicochemical and geometric properties including solvation-free energies, diffusion constants, and hydrodynamic radii, based on three-dimensional protein structures. By leveraging transfer learning, pretrained GSnet embeddings were adapted to predict solvent-accessible surface area (SASA) and residue-specific pKa values, achieving high accuracy and generalizability. Notably, GSnet outperformed existing protein embeddings for SASA prediction and a locally charge-aware variant, aLCnet, approached the accuracy of simulation-based and empirical methods for pKa prediction. Our GNN framework demonstrated robustness across diverse data sets, including intrinsically disordered peptides, and scalability for high-throughput applications. These results highlight the potential of GNN-based embeddings and transfer learning to advance protein structure analysis, providing a foundation for integrating predictive models into proteome-wide studies and structural biology pipelines.
Collapse
Affiliation(s)
- Spencer Wozniak
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
Xu S, Onoda A. Accurate and Rapid Prediction of Protein p Ka: Protein Language Models Reveal the Sequence-p Ka Relationship. J Chem Theory Comput 2025; 21:3752-3764. [PMID: 40138263 DOI: 10.1021/acs.jctc.4c01288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2025]
Abstract
Protein pKa prediction is a key challenge in computational biology. In this study, we present pKALM, a novel deep learning-based method for high-throughput protein pKa prediction. pKALM uses a protein language model (PLM) to capture the complex sequence-structure relationships of proteins. While traditionally considered a structure-based problem, our results show that a PLM pretrained on large-scale protein sequence databases can effectively learn this relationship and achieve state-of-the-art performance. pKALM accurately predicts the pKa values of six residues (Asp, Glu, His, Lys, Cys, and Tyr) and two termini with high precision and efficiency. It performs well at predicting both exposed and buried residues, which often deviate from standard pKa values measured in the solvent. We demonstrate a novel finding that predicted protein isoelectric points (pI) can be used to improve the accuracy of pKa prediction. High-throughput pKa prediction of the human proteome using pKALM achieves a speed of 4,965 pKa predictions per second, which is several orders of magnitude faster than existing state-of-the-art methods. The case studies illustrate the efficacy of pKALM in estimating pKa values and the constraints of the method. pKALM will thus be a valuable tool for researchers in the fields of biochemistry, biophysics, and drug design.
Collapse
Affiliation(s)
- Shijie Xu
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810 Japan
| | - Akira Onoda
- Graduate School of Environmental Science, Hokkaido University, Sapporo 060-0810 Japan
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo 060-0810, Japan
| |
Collapse
|
3
|
Shen M, Kortzak D, Ambrozak S, Bhatnagar S, Buchanan I, Liu R, Shen J. KaMLs for Predicting Protein p Ka Values and Ionization States: Are Trees All You Need? J Chem Theory Comput 2025; 21:1446-1458. [PMID: 39882632 DOI: 10.1021/acs.jctc.4c01602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2025]
Abstract
Despite its importance in understanding biology and computer-aided drug discovery, the accurate prediction of protein ionization states remains a formidable challenge. Physics-based approaches struggle to capture the small, competing contributions in the complex protein environment, while machine learning (ML) is hampered by the scarcity of experimental data. Here, we report the development of pKa ML (KaML) models based on decision trees and graph attention networks (GAT), exploiting physicochemical understanding and a new experiment pKa database (PKAD-3) enriched with highly shifted pKa's. KaML-CBtree significantly outperforms the current state of the art in predicting pKa values and ionization states across all six titratable amino acids, notably achieving accurate predictions for deprotonated cysteines and lysines─a blind spot in previous models. The superior performance of KaMLs is achieved in part through several innovations, including the separate treatment of acid and base, data augmentation using AlphaFold structures, and model pretraining on a theoretical pKa database. We also introduce the classification of protonation states as a metric for evaluating pKa prediction models. A meta-feature analysis suggests a possible reason for the lightweight tree model to outperform the more complex deep learning GAT. We release an end-to-end pKa predictor based on KaML-CBtree and the new PKAD-3 database, which facilitates a variety of applications and provides the foundation for further advances in protein electrostatic research.
Collapse
Affiliation(s)
- Mingzhe Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Daniel Kortzak
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Simon Ambrozak
- Department of Computer Science, University of Maryland College Park, College Park, Maryland 20742, United States
| | - Shubham Bhatnagar
- Department of Computer Science, University of Maryland College Park, College Park, Maryland 20742, United States
| | - Ian Buchanan
- Stuyvesant High School, New York, New York 10282, United States
| | - Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, United States
| |
Collapse
|
4
|
Hogues H, Wei W, Sulea T. Improved Structure-Based Histidine p Ka Prediction for pH-Responsive Protein Design. J Chem Inf Model 2025; 65:1560-1569. [PMID: 39826152 PMCID: PMC11815838 DOI: 10.1021/acs.jcim.4c01957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 12/23/2024] [Accepted: 01/09/2025] [Indexed: 01/22/2025]
Abstract
The near neutral pKa of histidine is commonly exploited to engineer pH-sensitive biomolecules. For example, histidine mutations introduced in the complementarity-determining region (CDR) of therapeutic antibodies can enhance selectivity for antigens in the acidic microenvironment of solid tumors or increase dissociation rates in the acidic early endosomes of cells. While solvent-exposed histidines typically have a pKa near 6.5, interacting histidines can experience pKa shifts of up to 4 pH units in either direction, making histidine one of the most variable titratable residues. To assist in selecting potential histidine mutation sites, pKa prediction software should achieve an accuracy significantly better than the current standard of around 1.0 pH unit. However, the limited availability of experimental histidine pKa measurements hinders the use of AI-based methods. This study evaluates histidine pKa predictions using Amber force field electrostatics combined with a continuum solvent model, previously calibrated in the solvated interaction energy (SIE) function for binding affinity predictions. By incorporating limited rotameric sampling, proton optimization, and an empirical correction for buried side-chains, the method achieves a mean unsigned error of 0.4 pH units across a diverse set of 91 histidines from 38 distinct protein structures obtained from the PKAD database. This approach should improve the in-silico design of pH-responsive mutations. The method is implemented in the software program JustHISpKa (https://mm.nrc-cnrc.gc.ca/software/JustHISpKa).
Collapse
Affiliation(s)
- Hervé Hogues
- Human Health Therapeutics Research
Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| | - Wanlei Wei
- Human Health Therapeutics Research
Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| | - Traian Sulea
- Human Health Therapeutics Research
Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, Quebec H4P 2R2, Canada
| |
Collapse
|
5
|
Bondarchuk T, Zhuravel E, Shyshlyk O, Debelyy MO, Pokholenko O, Vaskiv D, Pogribna A, Kuznietsova M, Hrynyshyn Y, Nedialko O, Brovarets V, Zozulya SA. The molecular features of non-peptidic nucleophilic substrates and acceptor proteins determine the efficiency of sortagging. RSC Chem Biol 2025; 6:295-306. [PMID: 39802631 PMCID: PMC11721432 DOI: 10.1039/d4cb00246f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 12/19/2024] [Indexed: 01/16/2025] Open
Abstract
Sortase A-mediated ligation (SML) or "sortagging" has become a popular technology to selectively introduce structurally diverse protein modifications. Despite the great progress in the optimization of the reaction conditions and design of miscellaneous C- or N-terminal protein modification strategies, the reported yields of conjugates are highly variable. In this study, we have systematically investigated C-terminal protein sortagging efficiency using a combination of several rationally selected and modified acceptor proteins and a panel of incoming surrogate non-peptidic amine nucleophile substrates varying in the structural features of their amino linker parts and cargo molecules. Our data suggest that the sortagging efficiency is modulated by the combination of molecular features of the incoming nucleophilic substrate, including the ionization properties of the reactive amino group, structural recognition of the nucleophilic amino linker by the enzyme, as well as the molecular nature of the attached payload moiety. Previous reports have confirmed that the steric accessibility of the C-terminal SrtA recognition site in the acceptor protein is also the critical determinant of sortase reaction efficiency. We suggest a computational procedure for simplifying a priori predictions of sortagging outcomes through the structural assessment of the acceptor protein and introduction of a peptide linker, if deemed necessary.
Collapse
Affiliation(s)
- Tetiana Bondarchuk
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Department of Structural and Functional Proteomics, Institute of Molecular Biology and Genetics 150 Zabolotnogo Street Kyiv 03680 Ukraine
| | - Elena Zhuravel
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleh Shyshlyk
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry 1 Academician Kukhar Street Kyiv 02094 Ukraine
| | - Mykhaylo O Debelyy
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleksandr Pokholenko
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Taras Shevchenko National University of Kyiv, Department of Chemistry 64 Volodymyrska Street Kyiv 01033 Ukraine
| | - Diana Vaskiv
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Alla Pogribna
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- Department of Cell Signal Systems, Institute of Molecular Biology and Genetics 150 Zabolotnogo Street Kyiv 03680 Ukraine
| | - Mariana Kuznietsova
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Yevhenii Hrynyshyn
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| | - Oleksandr Nedialko
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
- V. N. Karazin Kharkiv National University, 4 Svobody Square Kharkiv 61022 Ukraine
| | - Volodymyr Brovarets
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry 1 Academician Kukhar Street Kyiv 02094 Ukraine
| | - Sergey A Zozulya
- Enamine Ltd 78 Winston Churchill Street Kyiv 02094 Ukraine +380 67 656-4026 https://www.enamine.net
| |
Collapse
|
6
|
Shen M, Kortzak D, Ambrozak S, Bhatnagar S, Buchanan I, Liu R, Shen J. KaMLs for Predicting Protein p K a Values and Ionization States: Are Trees All You Need? BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.11.09.622800. [PMID: 39605739 PMCID: PMC11601431 DOI: 10.1101/2024.11.09.622800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Despite its importance in understanding biology and computer-aided drug discovery, the accurate prediction of protein ionization states remains a formidable challenge. Physics-based approaches struggle to capture the small, competing contributions in the complex protein environment, while machine learning (ML) is hampered by scarcity of experimental data. Here we report the development of pK a ML (KaML) models based on decision trees and graph attention networks (GAT), exploiting physicochemical understanding and a new experiment pK a database (PKAD-3) enriched with highly shifted pK a's. KaML-CBtree significantly outperforms the current state of the art in predicting pK a values and ionization states across all six titratable amino acids, notably achieving accurate predictions for deprotonated cysteines and lysines - a blind spot in previous models. The superior performance of KaMLs is achieved in part through several innovations, including separate treatment of acid and base, data augmentation using AlphaFold structures, and model pretraining on a theoretical pK a database. We also introduce the classification of protonation states as a metric for evaluating pK a prediction models. A meta-feature analysis suggests a possible reason for the lightweight tree model to outperform the more complex deep learning GAT. We release an end-to-end pK a predictor based on KaML-CBtree and the new PKAD-3 database, which facilitates a variety of applications and provides the foundation for further advances in protein electrostatics research.
Collapse
Affiliation(s)
- Mingzhe Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, U.S.A
| | - Daniel Kortzak
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, U.S.A
| | - Simon Ambrozak
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, U.S.A
| | - Shubham Bhatnagar
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, U.S.A
| | - Ian Buchanan
- Stuyvesant High School, New York, NY 10282, U.S.A
| | - Ruibin Liu
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, U.S.A
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201, U.S.A
| |
Collapse
|
7
|
Wozniak S, Janson G, Feig M. Accurate Predictions of Molecular Properties of Proteins via Graph Neural Networks and Transfer Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627714. [PMID: 39713395 PMCID: PMC11661272 DOI: 10.1101/2024.12.10.627714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Machine learning has emerged as a promising approach for predicting molecular properties of proteins, as it addresses limitations of experimental and traditional computational methods. Here, we introduce GSnet, a graph neural network (GNN) trained to predict physicochemical and geometric properties including solvation free energies, diffusion constants, and hydrodynamic radii, based on three-dimensional protein structures. By leveraging transfer learning, pre-trained GSnet embeddings were adapted to predict solvent-accessible surface area (SASA) and residue-specific pKa values, achieving high accuracy and generalizability. Notably, GSnet outperformed existing protein embeddings for SASA prediction, and a locally charge-aware variant, aLCnet, approached the accuracy of simulation-based and empirical methods for pKa prediction. Our GNN framework demonstrated robustness across diverse datasets, including intrinsically disordered peptides, and scalability for high-throughput applications. These results highlight the potential of GNN-based embeddings and transfer learning to advance protein structure analysis, providing a foundation for integrating predictive models into proteome-wide studies and structural biology pipelines.
Collapse
Affiliation(s)
- Spencer Wozniak
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
8
|
Hong R, Alagbe BD, Mattei A, Sheikh AY, Tuckerman ME. Enhanced and Efficient Predictions of Dynamic Ionization through Constant-pH Adiabatic Free Energy Dynamics. J Chem Theory Comput 2024; 20:10010-10021. [PMID: 39513519 PMCID: PMC11603612 DOI: 10.1021/acs.jctc.4c00704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 10/14/2024] [Accepted: 10/16/2024] [Indexed: 11/15/2024]
Abstract
Dynamic or structurally induced ionization is a critical aspect of many physical, chemical, and biological processes. Molecular dynamics (MD) based simulation approaches, specifically constant pH MD methods, have been developed to simulate ionization states of molecules or proteins under experimentally or physiologically relevant conditions. While such approaches are now widely utilized to predict ionization sites of macromolecules or to study physical or biological phenomena, they are often computationally expensive and require long simulation times to converge. In this article, using the principles of adiabatic free energy dynamics, we introduce an efficient technique for performing constant pH MD simulations within the framework of the adiabatic free energy dynamics (AFED) approach. We call the new approach pH-AFED. We show that pH-AFED provides highly accurate predictions of protein residue pKa values, with a MUE of 0.5 pKa units when coupled with driven adiabatic free energy dynamics (d-AFED), while reducing the required simulation times by more than an order of magnitude. In addition, pH-AFED can be easily integrated into most constant pH MD codes or implementations and flexibly adapted to work in conjunction with enhanced sampling algorithms that target collective variables. We demonstrate that our approaches, with both pH-AFED standalone as well as pH-AFED combined with collective variable based enhanced sampling, provide promising predictive accuracy, with a MUE of 0.6 and 0.5 pKa units respectively, on a diverse range of proteins and enzymes, ranging up to 186 residues and 21 titratable sites. Lastly, we demonstrate how this approach can be utilized to understand the in vivo performance engineered antibodies for immunotherapy.
Collapse
Affiliation(s)
- Richard
S. Hong
- AbbVie
Inc., Molecular Profiling and Drug Delivery, Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
- Department
of Chemistry, New York University, New York City, New York 10003, United States
| | - Busayo D. Alagbe
- AbbVie
Inc., Molecular Profiling and Drug Delivery, Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Alessandra Mattei
- AbbVie
Inc., Molecular Profiling and Drug Delivery, Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Ahmad Y. Sheikh
- AbbVie
Inc., Molecular Profiling and Drug Delivery, Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Mark E. Tuckerman
- Department
of Chemistry, New York University, New York City, New York 10003, United States
- Courant
Institute of Mathematical Sciences, New
York University, New York, New York 10012, United States
- NYU-ECNU
Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai 200062, China
- Simons
Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
| |
Collapse
|
9
|
Reis PBPS, Clevert DA, Machuqueiro M. PypKa server: online pKa predictions and biomolecular structure preparation with precomputed data from PDB and AlphaFold DB. Nucleic Acids Res 2024; 52:W294-W298. [PMID: 38619040 PMCID: PMC11223823 DOI: 10.1093/nar/gkae255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/14/2024] [Accepted: 03/28/2024] [Indexed: 04/16/2024] Open
Abstract
When preparing biomolecular structures for molecular dynamics simulations, pKa calculations are required to provide at least a representative protonation state at a given pH value. Neglecting this step and adopting the reference protonation states of the amino acid residues in water, often leads to wrong electrostatics and nonphysical simulations. Fortunately, several methods have been developed to prepare structures considering the protonation preference of residues in their specific environments (pKa values), and some are even available for online usage. In this work, we present the PypKa server, which allows users to run physics-based, as well as ML-accelerated methods suitable for larger systems, to obtain pKa values, isoelectric points, titration curves, and structures with representative pH-dependent protonation states compatible with commonly used force fields (AMBER, CHARMM, GROMOS). The user may upload a custom structure or submit an identifier code from PBD or UniProtKB. The results for over 200k structures taken from the Protein Data Bank and the AlphaFold DB have been precomputed, and their data can be retrieved without extra calculations. All this information can also be obtained from an application programming interface (API) facilitating its usage and integration into existing pipelines as well as other web services. The web server is available at pypka.org.
Collapse
Affiliation(s)
- Pedro B P S Reis
- BioISI – Instituto de Biossistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- Machine Learning Research, Bayer AG, Müllerstraße 178, 13353 Berlin, Germany
| | - Djork-Arné Clevert
- Machine Learning Research, Bayer AG, Müllerstraße 178, 13353 Berlin, Germany
- Machine Learning Research, Pfizer, Berlin, Germany
| | - Miguel Machuqueiro
- BioISI – Instituto de Biossistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
10
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
11
|
Cai Z, Peng H, Sun S, He J, Luo F, Huang Y. DeepKa Web Server: High-Throughput Protein p Ka Prediction. J Chem Inf Model 2024; 64:2933-2940. [PMID: 38530291 DOI: 10.1021/acs.jcim.3c02013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
DeepKa is a deep-learning-based protein pKa predictor proposed in our previous work. In this study, a web server was developed that enables online protein pKa prediction driven by DeepKa. The web server provides a user-friendly interface where a single step of entering a valid PDB code or uploading a PDB format file is required to submit a job. Two case studies have been attached in order to explain how pKa's calculated by the web server could be utilized by users. Finally, combining the web server with post processing as described in case studies, this work suggests a quick workflow of investigating the relationship between protein structure and function that are pH dependent. The web server of DeepKa is freely available at http://www.computbiophys.com/DeepKa/main.
Collapse
Affiliation(s)
- Zhitao Cai
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Hao Peng
- National Pilot School of Software, Yunnan University, Kunming 650504, China
| | - Shuo Sun
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Jiahao He
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Fangfang Luo
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| | - Yandong Huang
- College of Computer Engineering, Jimei University, Xiamen 361021, China
| |
Collapse
|