1
|
Gao Y, Wang H, Zhou J, Yang Y. An easy-to-use three-dimensional protein-structure-prediction online platform "DPL3D" based on deep learning algorithms. Curr Res Struct Biol 2025; 9:100163. [PMID: 39867105 PMCID: PMC11761317 DOI: 10.1016/j.crstbi.2024.100163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 11/20/2024] [Accepted: 12/30/2024] [Indexed: 01/28/2025] Open
Abstract
The change in the three-dimensional (3D) structure of a protein can affect its own function or interaction with other protein(s), which may lead to disease(s). Gene mutations, especially missense mutations, are the main cause of changes in protein structure. Due to the lack of protein crystal structure data, about three-quarters of human mutant proteins cannot be predicted or accurately predicted, and the pathogenicity of missense mutations can only be indirectly evaluated by evolutionary conservation. Recently, many computational methods have been developed to predict protein 3D structures with accuracy comparable to experiments. This progress enables the information of structural biology to be further utilized by clinicians. Thus, we developed a user-friendly platform named DPL3D (http://nsbio.tech:3000) which can predict and visualize the 3D structure of mutant proteins. The crystal structure and other information of proteins were downloaded together with the software including AlphaFold 2, RoseTTAFold, RoseTTAFold All-Atom, and trRosettaX-Single. We implemented a query module for 210,180 molecular structures, including 52,248 human proteins. Visualization of protein two-dimensional (2D) and 3D structure prediction can be generated via LiteMol automatically or manually and interactively. This platform will allow users to easily and quickly retrieve large-scale structural information for biological discovery.
Collapse
Affiliation(s)
- Yunlong Gao
- NewInsyght Biotech (Guangdong) Co., Ltd. DongGuan 523000, China
| | - He Wang
- NewInsyght Biotech (Guangdong) Co., Ltd. DongGuan 523000, China
| | - Jiapeng Zhou
- College of Life Sciences, Hunan Normal University, Changsha, 410000, China
| | - Yan Yang
- The College of Health Humanities, Jinzhou Medical University, Jinzhou, 121001, China
| |
Collapse
|
2
|
Hameduh T, Miller AD, Heger Z, Haddad Y. The proteomic code: Novel amino acid residue pairing models "encode" protein folding and protein-protein interactions. Comput Biol Med 2025; 190:110033. [PMID: 40112562 DOI: 10.1016/j.compbiomed.2025.110033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 03/11/2025] [Accepted: 03/13/2025] [Indexed: 03/22/2025]
Abstract
Recent advances in protein 3D structure prediction using deep learning have focused on the importance of amino acid residue-residue connections (i.e., pairwise atomic contacts) for accuracy at the expense of mechanistic interpretability. Therefore, we decided to perform a series of analyses based on an alternative framework of residue-residue connections making primary use of the TOP2018 dataset. This framework of residue-residue connections is derived from amino acid residue pairing models both historic and new, all based on genetic principles complemented by relevant biophysical principles. Of these pairing models, three new models (named the GU, Transmuted and Shift pairing models) exhibit the highest observed-over-expected ratios and highest correlations in statistical analyses with various intra- and inter-chain datasets, in comparison to the remaining models. In addition, these new pairing models are universally frequent across different connection ranges, secondary structure connections, and protein sizes. Accordingly, following further statistical and other analyses described herein, we have come to a major conclusion that all three pairing models together could represent the basis of a universal proteomic code (second genetic code) sufficient, in and of itself, to "encode" for both protein folding mechanisms and protein-protein interactions.
Collapse
Affiliation(s)
- Tareq Hameduh
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Andrew D Miller
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic; Veterinary Research Institute, Hudcova 296/70, CZ-621 00, Brno, Czech Republic; KP Therapeutics (Europe) s.r.o., Purkyňova 649/127, CZ-612 00, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic
| | - Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Zemědělská 1665/1, CZ-613 00, Brno, Czech Republic; MendelFOLD s.r.o., Zezulova 174/3, CZ-613 00, Brno, Czech Republic.
| |
Collapse
|
3
|
Feldman J, Skolnick J. AF3Complex Yields Improved Structural Predictions of Protein Complexes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.27.640585. [PMID: 40093092 PMCID: PMC11908126 DOI: 10.1101/2025.02.27.640585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
Motivation Accurate structures of protein complexes are essential for understanding biological pathway function. A previous study showed how downstream modifications to AlphaFold 2 could yield AF2Complex, a model better suited for protein complexes. Here, we introduce AF3Complex, a model equipped with the same improvements as AF2Complex, along with a novel method for excluding ligands, built on AlphaFold 3. Results Benchmarking AF3Complex and AlphaFold 3 on a large dataset of protein complexes, it was shown that AF3Complex outperforms AlphaFold 3 to a significant degree. Moreover, by evaluating the structures generated by AF3Complex on a dataset of protein-peptide complexes and antibody-antigen complexes, it was established that AF3Complex could create high-fidelity structures for these challenging complex types. Additionally, when deployed to generate structural predictions for the two antibody-antigen and seven protein-protein complexes used in the recent CASP16 competition, AF3Complex yielded structures that would have placed it among the top models in the competition. Availability The AF3Complex code is freely available at https://github.com/Jfeldman34/AF3Complex.git. Contact Please contact skolnick@gatech.edu.
Collapse
Affiliation(s)
- Jonathan Feldman
- Center for the Study of Systems Biology/School of Biological Sciences, Georgia Institute of Technology, 950 Atlantic Drive, 30332, Georgia
- School of Computer Science, Georgia Institute of Technology, 266 Ferst Dr, Atlanta, 30332, Georgia
| | - Jeffrey Skolnick
- School of Computer Science, Georgia Institute of Technology, 266 Ferst Dr, Atlanta, 30332, Georgia
| |
Collapse
|
4
|
Kim G, Lee S, Levy Karin E, Kim H, Moriwaki Y, Ovchinnikov S, Steinegger M, Mirdita M. Easy and accurate protein structure prediction using ColabFold. Nat Protoc 2025; 20:620-642. [PMID: 39402428 DOI: 10.1038/s41596-024-01060-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 08/07/2024] [Indexed: 03/12/2025]
Abstract
Since its public release in 2021, AlphaFold2 (AF2) has made investigating biological questions, by using predicted protein structures of single monomers or full complexes, a common practice. ColabFold-AF2 is an open-source Jupyter Notebook inside Google Colaboratory and a command-line tool that makes it easy to use AF2 while exposing its advanced options. ColabFold-AF2 shortens turnaround times of experiments because of its optimized usage of AF2's models. In this protocol, we guide the reader through ColabFold best practices by using three scenarios: (i) monomer prediction, (ii) complex prediction and (iii) conformation sampling. The first two scenarios cover classic static structure prediction and are demonstrated on the human glycosylphosphatidylinositol transamidase protein. The third scenario demonstrates an alternative use case of the AF2 models by predicting two conformations of the human alanine serine transporter 2. Users can run the protocol without computational expertise via Google Colaboratory or in a command-line environment for advanced users. Using Google Colaboratory, it takes <2 h to run each procedure. The data and code for this protocol are available at https://protocol.colabfold.com .
Collapse
Affiliation(s)
- Gyuri Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sewon Lee
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Hyunbin Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Yoshitaka Moriwaki
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, Japan
- Department of Computational Drug Discovery and Design, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | | | - Martin Steinegger
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.
- School of Biological Sciences, Seoul National University, Seoul, South Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, South Korea.
| |
Collapse
|
5
|
Park S, Myung S, Baek M. Advancing protein structure prediction beyond AlphaFold2. Curr Opin Struct Biol 2025; 90:102985. [PMID: 39862760 DOI: 10.1016/j.sbi.2025.102985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 12/31/2024] [Accepted: 01/02/2025] [Indexed: 01/27/2025]
Abstract
Accurate prediction of protein structures is essential for understanding their biological functions. The release of AlphaFold2 in 2021 marked a significant breakthrough, delivering unprecedented accuracy. However, challenges remain, particularly for proteins with limited evolutionary data or complex molecular interactions. This review explores efforts to enhance AlphaFold2's performance through advanced sequence search techniques and alternative approaches, including protein language models and frameworks that integrate diverse biomolecular interactions. We propose that future progress will depend on developing models grounded in fundamental physicochemical principles, offering more accurate and comprehensive predictions across a wider spectrum of biological systems.
Collapse
Affiliation(s)
- Sanggeun Park
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Sojung Myung
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea. https://twitter.com/sj_myung27
| | - Minkyung Baek
- Department of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea.
| |
Collapse
|
6
|
Wang H, Sun M, Xie L, Liu D, Zhang G. Physical-aware model accuracy estimation for protein complex using deep learning method. Comput Struct Biotechnol J 2025; 27:478-487. [PMID: 39916698 PMCID: PMC11799971 DOI: 10.1016/j.csbj.2025.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/18/2025] [Accepted: 01/21/2025] [Indexed: 02/09/2025] Open
Abstract
With the breakthrough of AlphaFold2 on monomers, the research focus of structure prediction has shifted to protein complexes, driving the continued development of new methods for multimer structure prediction. Therefore, it is crucial to accurately estimate quality scores for the multimer model independent of the used prediction methods. In this work, we propose a physical-aware deep learning method, DeepUMQA-PA, to evaluate the residue-wise quality of protein complex models. Given the input protein complex model, the residue-based contact area and orientation features were first constructed using Voronoi tessellation, representing the potential physical interactions and hydrophobic properties. Then, the relationship between local residues and the overall complex topology as well as the inter-residue evolutionary information are characterized by geometry-based features, protein language model embedding representation, and knowledge-based statistical potential features. Finally, these features are fed into a fused network architecture employing equivalent graph neural network and ResNet network to estimate residue-wise model accuracy. Experimental results on the CASP15 test set demonstrate that our method outperforms the state-of-the-art method DeepUMQA3 by 3.69 % and 3.49 % on Pearson and Spearman, respectively. Notably, our method achieved 16.8 % and 15.5 % improvement in Pearson and Spearman, respectively, for the evaluation of nanobody-antigens. In addition, DeepUMQA-PA achieved better MAE scores than AlphaFold-Multimer and AlphaFold3 self-assessment methods on 43 % and 50 % of the targets, respectively. All these results suggest that physical-aware information based on the area and orientation of atom-atom and atom-solvent contacts has the potential to capture sequence-structure-quality relationships of proteins, especially in the case of flexible proteins. The DeepUMQA-PA server is freely available at http://zhanglab-bioinf.com/DeepUMQA-PA/.
Collapse
Affiliation(s)
- Haodong Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Meng Sun
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024; 14:1531. [PMID: 39766238 PMCID: PMC11673352 DOI: 10.3390/biom14121531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
Collapse
Affiliation(s)
- Chenyue Zhang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qinxin Wang
- Suzhou New & High-Tech Innovation Service Center, Suzhou 215011, China;
| | - Yiyang Li
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Anqi Teng
- Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China;
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Wei Zheng
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
8
|
German GJ, DeGiulio JV, Ramsey J, Kropinski AM, Misra R. The TolC and Lipopolysaccharide-Specific Escherichia coli Bacteriophage TLS-the Tlsvirus Archetype Virus. PHAGE (NEW ROCHELLE, N.Y.) 2024; 5:173-183. [PMID: 39372356 PMCID: PMC11447400 DOI: 10.1089/phage.2023.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Introduction TLS is a virulent bacteriophage of Escherichia coli that utilizes TolC and lipopolysaccharide as its cell surface receptors. Methods The genome was reannotated using the latest online resources and compared to other T1-like phages. Results The TLS genome consists of 49,902 base pairs, encoding 86 coding sequences that display considerable sequence similarity with the T1 phage genome. It also contains 18 intergenic 21-base long repeats, each of them upstream of a predicted start codon and in the direction of transcription. Data revealed that DNA packaging occurs through the pac site-mediated headful mechanism. Conclusions Based on sequence analysis of its genome, TLS belongs to the Drexlerviridae family and represents the type member of the Tlsvirus genus.
Collapse
Affiliation(s)
- Gregory J. German
- St. Joseph’s Health Centre, Unity Health Toronto, Toronto, Canada
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, Canada
| | | | - Jolene Ramsey
- Texas A&M University, Biology Department, College Station, TX USA
| | - Andrew M. Kropinski
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, Canada
| | - Rajeev Misra
- School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
9
|
Cheon H, Kim JH, Kim JS, Park JB. Valorization of single-carbon chemicals by using carboligases as key enzymes. Curr Opin Biotechnol 2024; 85:103047. [PMID: 38128199 DOI: 10.1016/j.copbio.2023.103047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/23/2023]
Abstract
Single-carbon (C1) biorefinery plays a key role in the consumption of global greenhouse gases and a circular carbon economy. Thereby, we have focused on the valorization of C1 compounds (e.g. methanol, formaldehyde, and formate) into multicarbon products, including bioplastic monomers, glycolate, and ethylene glycol. For instance, methanol, derived from the oxidation of CH4, can be converted into glycolate, ethylene glycol, or erythrulose via formaldehyde and glycolaldehyde, employing C1 and/or C2 carboligases as essential enzymes. Escherichia coli was engineered to convert formate, produced from CO via CO2 or from CO2 directly, into glycolate. Recent progress in the design of biotransformation pathways, enzyme discovery, and engineering, as well as whole-cell biocatalyst engineering for C1 biorefinery, was addressed in this review.
Collapse
Affiliation(s)
- Huijin Cheon
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea
| | - Jun-Hong Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea
| | - Jeong-Sun Kim
- Department of Chemistry, Chonnam National University, Gwangju 61186, Republic of Korea.
| | - Jin-Byung Park
- Department of Food Science and Biotechnology, Ewha Womans University, Seoul 03760, Republic of Korea.
| |
Collapse
|
10
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
11
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
12
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XV. Proteins 2023; 91:1539-1549. [PMID: 37920879 PMCID: PMC10843301 DOI: 10.1002/prot.26617] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 11/04/2023]
Abstract
Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
Collapse
Affiliation(s)
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Maya Topf
- Centre for Structural Systems Biology, Leibniz-Institut für Experimentelle Virologie and Universitätsklinikum Hamburg-Eppendorf (UKE), Hamburg, Germany
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, MD, USA, and Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| |
Collapse
|