1
|
Cui X, Xia Y, Hou M, Zhao X, Wang S, Zhang G. M-DeepAssembly: enhanced DeepAssembly based on multi-objective multi-domain protein conformation sampling. BMC Bioinformatics 2025; 26:120. [PMID: 40325375 PMCID: PMC12054043 DOI: 10.1186/s12859-025-06131-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/03/2025] [Indexed: 05/07/2025] Open
Abstract
BACKGROUND Association and cooperation among structural domains play an important role in protein function and drug design. Despite remarkable advancements in highly accurate single-domain protein structure prediction through the collaborative efforts of the community using deep learning, challenges still exist in predicting multi-domain protein structures when the evolutionary signal for a given domain pair is weak or the protein structure is large. RESULTS To alleviate the above challenges, we proposed M-DeepAssembly, a protocol based on multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction. Firstly, the inter-domain interactions and full-length sequence distance features are extracted through DeepAssembly and AlphaFold2, respectively. Secondly, subject to these features, we constructed a multi-objective energy model and designed a sampling algorithm for exploring and exploiting conformational space to generate ensembles. Finally, the output protein structure was selected from the ensembles using our in-house developed model quality assessment algorithm. On the test set of 164 multi-domain proteins, the results show that the average TM-score of M-DeepAssembly is 15.4% and 2.0% higher than AlphaFold2 and DeepAssembly, respectively. It is worth noting that there are models with higher accuracy in ensembles, achieving an improvement of 20.3% and 6.4% relative to the two baseline methods, although these models were not selected. Furthermore, when compared to the prediction results of AlphaFold2 for CASP15 multi-domain targets, M-DeepAssembly demonstrates certain performance advantages. CONCLUSIONS M-DeepAssembly provides a distinctive multi-domain protein assembly algorithm, which can alleviate the current challenges of weak evolutionary signals and large structures to some extent by forming diverse ensembles using multi-objective protein conformation sampling algorithm. The proposed method contributes to exploring the functions of multi-domain proteins, especially providing new insights into targets with multiple conformational states.
Collapse
Affiliation(s)
- Xinyue Cui
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
| |
Collapse
|
2
|
Xia Y, Pu Y, Wang S, Zhuang J, Liu D, Hou M, Zhang G. DeepAssembly2: A Web Server for Protein Complex Structure Assembly Based on Domain-Domain Interactions. J Mol Biol 2025:169128. [PMID: 40188941 DOI: 10.1016/j.jmb.2025.169128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 03/14/2025] [Accepted: 04/01/2025] [Indexed: 04/18/2025]
Abstract
Proteins often perform biological functions by forming complexes, thereby accurately predicting the structure of protein complexes is crucial to understanding and mastering their functions, as well as facilitating drug discovery. Protein monomeric structure prediction has made a breakthrough in recent years, but the accurate prediction of complex structure remains a challenge. In this work, we present DeepAssembly2, a web server for automatically assembling protein complex structure based on domain-domain interactions. First, the features are constructed according to the input complex sequence and monomeric structures, then these features are used to predict the inter-chain residue distance through a deep learning model, and finally, the complex structure is assembled under the guidance of inter-chain residue distances. Compared with the previously developed version, DeepAssembly2 is trained on a newly constructed inter-chain domain-domain interaction dataset. Meanwhile, several important features have been added, such as Interface Residue Propensity and Ultrafast Shape Recognition. In addition, we introduced the inter-chain residue distance from the AlphaFold-Multimer model to further improve the accuracy. Finally, we also integrate our recently developed model quality assessment method to select the output models. The performance of DeepAssembly2 is significantly improved compared with the previous version, and it is expected to provide new insights and an effective tool for drug development, vaccine design, etc. The web server of DeepAssembly2 is freely available at https://zhanglab-bioinf.com/DeepAssembly/.
Collapse
Affiliation(s)
- Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Yilin Pu
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Jianan Zhuang
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou 310023, China.
| |
Collapse
|
3
|
Wang H, Sun M, Xie L, Liu D, Zhang G. Physical-aware model accuracy estimation for protein complex using deep learning method. Comput Struct Biotechnol J 2025; 27:478-487. [PMID: 39916698 PMCID: PMC11799971 DOI: 10.1016/j.csbj.2025.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/18/2025] [Accepted: 01/21/2025] [Indexed: 02/09/2025] Open
Abstract
With the breakthrough of AlphaFold2 on monomers, the research focus of structure prediction has shifted to protein complexes, driving the continued development of new methods for multimer structure prediction. Therefore, it is crucial to accurately estimate quality scores for the multimer model independent of the used prediction methods. In this work, we propose a physical-aware deep learning method, DeepUMQA-PA, to evaluate the residue-wise quality of protein complex models. Given the input protein complex model, the residue-based contact area and orientation features were first constructed using Voronoi tessellation, representing the potential physical interactions and hydrophobic properties. Then, the relationship between local residues and the overall complex topology as well as the inter-residue evolutionary information are characterized by geometry-based features, protein language model embedding representation, and knowledge-based statistical potential features. Finally, these features are fed into a fused network architecture employing equivalent graph neural network and ResNet network to estimate residue-wise model accuracy. Experimental results on the CASP15 test set demonstrate that our method outperforms the state-of-the-art method DeepUMQA3 by 3.69 % and 3.49 % on Pearson and Spearman, respectively. Notably, our method achieved 16.8 % and 15.5 % improvement in Pearson and Spearman, respectively, for the evaluation of nanobody-antigens. In addition, DeepUMQA-PA achieved better MAE scores than AlphaFold-Multimer and AlphaFold3 self-assessment methods on 43 % and 50 % of the targets, respectively. All these results suggest that physical-aware information based on the area and orientation of atom-atom and atom-solvent contacts has the potential to capture sequence-structure-quality relationships of proteins, especially in the case of flexible proteins. The DeepUMQA-PA server is freely available at http://zhanglab-bioinf.com/DeepUMQA-PA/.
Collapse
Affiliation(s)
- Haodong Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Meng Sun
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
4
|
Liang F, Sun M, Xie L, Zhao X, Liu D, Zhao K, Zhang G. Recent advances and challenges in protein complex model accuracy estimation. Comput Struct Biotechnol J 2024; 23:1824-1832. [PMID: 38707538 PMCID: PMC11066466 DOI: 10.1016/j.csbj.2024.04.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/07/2024] Open
Abstract
Estimation of model accuracy plays a crucial role in protein structure prediction, aiming to evaluate the quality of predicted protein structure models accurately and objectively. This process is not only key to screening candidate models that are close to the real structure, but also provides guidance for further optimization of protein structures. With the significant advancements made by AlphaFold2 in monomer structure, the problem of single-domain protein structure prediction has been widely solved. Correspondingly, the importance of assessing the quality of single-domain protein models decreased, and the research focus has shifted to estimation of model accuracy of protein complexes. In this review, our goal is to provide a comprehensive overview of the reference and statistical metrics, as well as representative methods, and the current challenges within four distinct facets (Topology Global Score, Interface Total Score, Interface Residue-Wise Score, and Tertiary Residue-Wise Score) in the field of complex EMA.
Collapse
Affiliation(s)
| | | | - Lei Xie
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xuanfeng Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Dong Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
5
|
Yin Y, Ren H, Wu H, Lu Z. Triclosan Dioxygenase: A Novel Two-component Rieske Nonheme Iron Ring-hydroxylating Dioxygenase Initiates Triclosan Degradation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:13833-13844. [PMID: 39012163 DOI: 10.1021/acs.est.4c02845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
The emerging contaminant triclosan (TCS) is widely distributed both in surface water and in wastewater and poses a threat to aquatic organisms and human health due to its resistance to degradation. The dioxygenase enzyme TcsAB has been speculated to perform the initial degradation of TCS, but its precise catalytic mechanism remains unclear. In this study, the function of TcsAB was elucidated using multiple biochemical and molecular biology methods. Escherichia coli BL21(DE3) heterologously expressing tcsAB from Sphingomonas sp. RD1 converted TCS to 2,4-dichlorophenol. TcsAB belongs to the group IA family of two-component Rieske nonheme iron ring-hydroxylating dioxygenases. The highest amino acid identity of TcsA and the large subunits of other dioxygenases in the same family was only 35.50%, indicating that TcsAB is a novel dioxygenase. Mutagenesis of residues near the substrate binding pocket decreased the TCS-degrading activity and narrowed the substrate spectrum, except for the TcsAF343A mutant. A meta-analysis of 1492 samples from wastewater treatment systems worldwide revealed that tcsA genes are widely distributed. This study is the first to report that the TCS-specific dioxygenase TcsAB is responsible for the initial degradation of TCS. Studying the microbial degradation mechanism of TCS is crucial for removing this pollutant from the environment.
Collapse
Affiliation(s)
- Yiran Yin
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Hao Ren
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Hao Wu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Cancer Center, Zhejiang University, Hangzhou 310058, China
| | - Zhenmei Lu
- MOE Laboratory of Biosystem Homeostasis and Protection, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- Cancer Center, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
6
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
7
|
Zhang Z, Cai Y, Zhang B, Zheng W, Freddolino L, Zhang G, Zhou X. DEMO-EM2: assembling protein complex structures from cryo-EM maps through intertwined chain and domain fitting. Brief Bioinform 2024; 25:bbae113. [PMID: 38517699 PMCID: PMC10959074 DOI: 10.1093/bib/bbae113] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 02/10/2024] [Accepted: 02/25/2024] [Indexed: 03/24/2024] Open
Abstract
The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.
Collapse
Affiliation(s)
- Ziying Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yaxian Cai
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|