1
|
Hameduh T, Miller AD, Heger Z, Haddad Y. The proteomic code: Novel amino acid residue pairing models "encode" protein folding and protein-protein interactions. Comput Biol Med 2025; 190:110033. [PMID: 40112562 DOI: 10.1016/j.compbiomed.2025.110033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 03/11/2025] [Accepted: 03/13/2025] [Indexed: 03/22/2025]
Abstract
Recent advances in protein 3D structure prediction using deep learning have focused on the importance of amino acid residue-residue connections (i.e., pairwise atomic contacts) for accuracy at the expense of mechanistic interpretability. Therefore, we decided to perform a series of analyses based on an alternative framework of residue-residue connections making primary use of the TOP2018 dataset. This framework of residue-residue connections is derived from amino acid residue pairing models both historic and new, all based on genetic principles complemented by relevant biophysical principles. Of these pairing models, three new models (named the GU, Transmuted and Shift pairing models) exhibit the highest observed-over-expected ratios and highest correlations in statistical analyses with various intra- and inter-chain datasets, in comparison to the remaining models. In addition, these new pairing models are universally frequent across different connection ranges, secondary structure connections, and protein sizes. Accordingly, following further statistical and other analyses described herein, we have come to a major conclusion that all three pairing models together could represent the basis of a universal proteomic code (second genetic code) sufficient, in and of itself, to "encode" for both protein folding mechanisms and protein-protein interactions.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic; Veterinary Research Institute, Hudcova 296/70, CZ-621 00, Brno, Czech Republic; KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic.
| |
Collapse
|
2
|
Park S, Myung S, Baek M. Advancing protein structure prediction beyond AlphaFold2. Curr Opin Struct Biol 2025; 90:102985. [PMID: 39862760 DOI: 10.1016/j.sbi.2025.102985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 12/31/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025]
Abstract
Accurate prediction of protein structures is essential for understanding their biological functions. The release of AlphaFold2 in 2021 marked a significant breakthrough, delivering unprecedented accuracy. However, challenges remain, particularly for proteins with limited evolutionary data or complex molecular interactions. This review explores efforts to enhance AlphaFold2's performance through advanced sequence search techniques and alternative approaches, including protein language models and frameworks that integrate diverse biomolecular interactions. We propose that future progress will depend on developing models grounded in fundamental physicochemical principles, offering more accurate and comprehensive predictions across a wider spectrum of biological systems.
Collapse
Affiliation(s)
- Sanggeun Park
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Sojung Myung
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea. https://twitter.com/sj_myung27
| | - Minkyung Baek
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea.
| |
Collapse
|
3
|
Raouraoua N, Lensink MF, Brysbaert G. Massive Sampling Strategy for Antibody-Antigen Targets in CAPRI Round 55 With MassiveFold. Proteins 2025. [PMID: 39868877 DOI: 10.1002/prot.26802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 12/24/2024] [Accepted: 01/12/2025] [Indexed: 01/28/2025]
Abstract
Massive sampling with AlphaFold2 improves protein-protein complex predictions. This has been shown during the last CASP15-CAPRI blind prediction round by the AFsample tool. However, more difficult targets including antibody-antigen binding remain challenging. CAPRI Round 55 consisted of three antibody-antigen targets and one heterotrimer. We used our AlphaFold2-based MassiveFold, running 6 prediction pools, each with their own set of parameters, to produce in total more than 6000 predictions per target. We show here that massive sampling categorically produces acceptable to high quality predictions, however it is clear that the AlphaFold2 confidence score cannot be used to identify the best models in the set. We also show that, contrary to what was done before for CASP15-CAPRI with AFsample, increasing the sampling without activating the dropout provides the best models for most of the targets of Round 55.
Collapse
Affiliation(s)
- Nessim Raouraoua
- Univ. Lille, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Marc F Lensink
- Univ. Lille, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Guillaume Brysbaert
- Univ. Lille, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| |
Collapse
|
4
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
5
|
Huang GJ, Parry TK, McLaughlin WA. Assessment of the Performances of the Protein Modeling Techniques Participating in CASP15 Using a Structure-Based Functional Site Prediction Approach: ResiRole. Bioengineering (Basel) 2023; 10:1377. [PMID: 38135968 PMCID: PMC10740689 DOI: 10.3390/bioengineering10121377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/27/2023] [Accepted: 11/28/2023] [Indexed: 12/24/2023] Open
Abstract
BACKGROUND Model quality assessments via computational methods which entail comparisons of the modeled structures to the experimentally determined structures are essential in the field of protein structure prediction. The assessments provide means to benchmark the accuracies of the modeling techniques and to aid with their development. We previously described the ResiRole method to gauge model quality principally based on the preservation of the structural characteristics described in SeqFEATURE functional site prediction models. METHODS We apply ResiRole to benchmark modeling group performances in the Critical Assessment of Structure Prediction experiment, round 15. To gauge model quality, a normalized Predicted Functional site Similarity Score (PFSS) was calculated as the average of one minus the absolute values of the differences of the functional site prediction probabilities, as found for the experimental structures versus those found at the corresponding sites in the structure models. RESULTS The average PFSS per modeling group (gPFSS) correlates with standard quality metrics, and can effectively be used to rank the accuracies of the groups. For the free modeling (FM) category, correlation coefficients of the Local Distance Difference Test (LDDT) and Global Distance Test-Total Score (GDT-TS) metrics with gPFSS were 0.98239 and 0.87691, respectively. An example finding for a specific group is that the gPFSS for EMBER3D was higher than expected based on the predictive relationship between gPFSS and LDDT. We infer the result is due to the use of constraints imprinted by function that are a part of the EMBER3D methodology. Also, we find functional site predictions that may guide further functional characterizations of the respective proteins. CONCLUSION The gPFSS metric provides an effective means to assess and rank the performances of the structure prediction techniques according to their abilities to accurately recount the structural features at predicted functional sites.
Collapse
Affiliation(s)
| | | | - William A. McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, 525 Pine Street, Scranton, PA 18509, USA (T.K.P.)
| |
Collapse
|