1
|
Cai Y, Zhang Z, Xu X, Xu L, Chen Y, Zhang G, Zhou X. Fitting Atomic Structures into Cryo-EM Maps by Coupling Deep Learning-Enhanced Map Processing with Global-Local Optimization. J Chem Inf Model 2025; 65:3800-3811. [PMID: 40152222 DOI: 10.1021/acs.jcim.5c00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2025]
Abstract
With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.
Collapse
Affiliation(s)
- Yaxian Cai
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ziying Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiangyu Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Liang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu Chen
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
2
|
Li C, Sun G, Deng L, Qiao L, Yang G. A population state evaluation-based improvement framework for differential evolution. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
3
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 210] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
4
|
Ye P, Tian B, Lv Y, Li Q, Wang FY. On Iterative Proportional Updating: Limitations and Improvements for General Population Synthesis. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1726-1735. [PMID: 32479409 DOI: 10.1109/tcyb.2020.2991427] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Population synthesis is the foundation of the agent-based social simulation. Current approaches mostly consider basic population and households, rather than other social organizations. This article starts with a theoretical analysis of the iterative proportional updating (IPU) algorithm, a representative method in this field, and then gives an extension to consider more social organization types. The IPU method, for the first time, proves to be unable to converge to an optimal population distribution that simultaneously satisfies the constraints from individual and household levels. It is further improved to a bilevel optimization, which can solve such a problem and include more than one type of social organization. Numerical simulations, as well as population synthesis using actual Chinese nationwide census data, support our theoretical conclusions and indicate that our proposed bilevel optimization can both synthesize more social organization types and get more accurate results.
Collapse
|
5
|
Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.
Collapse
|
6
|
Hou M, Peng C, Zhou X, Zhang B, Zhang G. Multi contact-based folding method for de novo protein structure prediction. Brief Bioinform 2021; 23:6445108. [PMID: 34849573 DOI: 10.1093/bib/bbab463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 11/12/2022] Open
Abstract
Meta contact, which combines different contact maps into one to improve contact prediction accuracy and effectively reduce the noise from a single contact map, is a widely used method. However, protein structure prediction using meta contact cannot fully exploit the information carried by original contact maps. In this work, a multi contact-based folding method under the evolutionary algorithm framework, MultiCFold, is proposed. In MultiCFold, the thorough information of different contact maps is directly used by populations to guide protein structure folding. In addition, noncontact is considered as an effective supplement to contact information and can further assist protein folding. MultiCFold is tested on a set of 120 nonredundant proteins, and the average TM-score and average RMSD reach 0.617 and 5.815 Å, respectively. Compared with the meta contact-based method, MetaCFold, average TM-score and average RMSD have a 6.62 and 8.82% improvement. In particular, the import of noncontact information increases the average TM-score by 6.30%. Furthermore, MultiCFold is compared with four state-of-the-art methods of CASP13 on the 24 FM targets, and results show that MultiCFold is significantly better than other methods after the full-atom relax procedure.
Collapse
Affiliation(s)
- Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Chunxiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Hangzhou 310023, China
| | - Biao Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Tan Z, Li K. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107678] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
8
|
|
9
|
TPDE: A tri-population differential evolution based on zonal-constraint stepped division mechanism and multiple adaptive guided mutation strategies. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.06.035] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
10
|
Abstract
In the field of Differential Evolution (DE), a number of measures have been used to enhance algorithm. However, most of the measures need revision for fitting ensemble of different combinations of DE operators-ensemble DE algorithm. Meanwhile, although ensemble DE algorithm may show better performance than each of its constituent algorithms, there still exists the possibility of further improvement on performance with the help of revised measures. In this paper, we manage to implement measures into Ensemble of Differential Evolution Variants (EDEV). Firstly, we extend the collecting range of optional external archive of JADE-one of the constituent algorithm in EDEV. Then, we revise and implement the Event-Triggered Impulsive (ETI) control. Finally, Linear Population Size Reduction (LPSR) is used by us. Then, we obtain Improved Ensemble of Differential Evolution Variants (IEDEV). In our experiments, good performers in the CEC competitions on real parameter single objective optimization among population-based metaheuristics, state-of-the-art DE algorithms, or up-to-date DE algorithms are involved. Experiments show that our IEDEV is very competitive.
Collapse
|
11
|
Wang L, Liu J, Xia Y, Xu J, Zhou X, Zhang G. Distance-guided protein folding based on generalized descent direction. Brief Bioinform 2021; 22:6341661. [PMID: 34355233 DOI: 10.1093/bib/bbab296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/30/2021] [Accepted: 07/12/2021] [Indexed: 12/25/2022] Open
Abstract
Advances in the prediction of the inter-residue distance for a protein sequence have increased the accuracy to predict the correct folds of proteins with distance information. Here, we propose a distance-guided protein folding algorithm based on generalized descent direction, named GDDfold, which achieves effective structural perturbation and potential minimization in two stages. In the global stage, random-based direction is designed using evolutionary knowledge, which guides conformation population to cross potential barriers and explore conformational space rapidly in a large range. In the local stage, locally rugged potential landscape can be explored with the aid of conjugate-based direction integrated into a specific search strategy, which can improve the exploitation ability. GDDfold is tested on 347 proteins of a benchmark set, 24 template-free modeling (FM) approaches targets of CASP13 and 20 FM targets of CASP14. Results show that GDDfold correctly folds [template modeling (TM) score ≥ = 0.5] 316 out of 347 proteins, where 65 proteins have TM scores that are greater than 0.8, and significantly outperforms Rosetta-dist (distance-assisted fragment assembly method) and L-BFGSfold (distance geometry optimization method). On CASP FM targets, GDDfold is comparable with five state-of-the-art full-version methods, namely, Quark, RaptorX, Rosetta, MULTICOM and trRosetta in the CASP 13 and 14 server groups.
Collapse
Affiliation(s)
- Liujing Wang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jiakang Xu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Michigan USA
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
12
|
Zhang SX, Chan WS, Tang KS, Zheng SY. Adaptive strategy in differential evolution via explicit exploitation and exploration controls. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107494] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Xia YH, Peng CX, Zhou XG, Zhang GJ. A Sequential Niche Multimodal Conformational Sampling Algorithm for Protein Structure Prediction. Bioinformatics 2021; 37:4357-4365. [PMID: 34245242 DOI: 10.1093/bioinformatics/btab500] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/23/2021] [Accepted: 07/05/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Massive local minima on the protein energy landscape often cause traditional conformational sampling algorithms to be easily trapped in local basin regions, because they find it difficult to overcome high-energy barriers. Also, the lowest energy conformation may not correspond to the native structure due to the inaccuracy of energy models. This study investigates whether these two problems can be alleviated by a sequential niche technique without loss of accuracy. RESULTS A sequential niche multimodal conformational sampling algorithm for protein structure prediction (SNfold) is proposed in this study. In SNfold, a derating function is designed based on the knowledge learned from the previous sampling and used to construct a series of sampling-guided energy functions. These functions then help the sampling algorithm overcome high-energy barriers and avoid the re-sampling of the explored regions. In inaccurate protein energy models, the high-energy conformation that may correspond to the native structure can be sampled with successively updated sampling-guided energy functions. The proposed SNfold is tested on 300 benchmark proteins, 24 CASP13 and 19 CASP14 FM targets. Results show that SNfold correctly folds (TM-score ≥ 0.5) 231 out of 300 proteins. In particular, compared with Rosetta restrained by distance (Rosetta-dist), SNfold achieves higher average TM-score and improves the sampling efficiency by more than 100 times. On several CASP FM targets, SNfold also shows good performance compared with four state-of-the-art servers in CASP. As a plug-in conformational sampling algorithm, SNfold can be extended to other protein structure prediction methods. AVAILABILITY The source code and executable versions are freely available at https://github.com/iobio-zjut/SNfold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| |
Collapse
|
14
|
Zhao KL, Liu J, Zhou XG, Su JZ, Zhang Y, Zhang GJ. MMpred: a distance-assisted multimodal conformation sampling for de novo protein structure prediction. Bioinformatics 2021; 37:4350-4356. [PMID: 34185079 DOI: 10.1093/bioinformatics/btab484] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 06/22/2021] [Accepted: 06/28/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. RESULTS A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages. In the first modal exploration stage, a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. In the second modal maintaining stage, an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on 320 non-redundant proteins, where MMpred obtains models with TM-score ≥ 0.5 on 268 cases, which is 20.3% higher than that of Rosetta guided with the same distance constraints. In addition, on 320 benchmark proteins, the average TM-score of the enhanced version of MMpred (E-MMpred) is 0.732 on the best model, which is comparable to trRosetta (0.730). AVAILABILITY The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Jian-Zhong Su
- School of Biomedical Engineering, School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325011, Zhejiang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
15
|
Wang X, Li C, Zhu J, Meng Q. L-SHADE-E: Ensemble of two differential evolution algorithms originating from L-SHADE. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.11.055] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
16
|
Zhang GJ, Xie TY, Zhou XG, Wang LJ, Hu J. Protein Structure Prediction Using Population-Based Algorithm Guided by Information Entropy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:697-707. [PMID: 31180869 DOI: 10.1109/tcbb.2019.2921958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ab initio protein structure prediction is one of the most challenging problems in computational biology. Multistage algorithms are widely used in ab initio protein structure prediction. The different computational costs of a multistage algorithm for different proteins are important to be considered. In this study, a population-based algorithm guided by information entropy (PAIE), which includes exploration and exploitation stages, is proposed for protein structure prediction. In PAIE, an entropy-based stage switch strategy is designed to switch from the exploration stage to the exploitation stage. Torsion angle statistical information is also deduced from the first stage and employed to enhance the exploitation in the second stage. Results indicate that an improvement in the performance of protein structure prediction in a benchmark of 30 proteins and 17 other free modeling targets in CASP.
Collapse
|
17
|
Hu J, Rao L, Zhu YH, Zhang GJ, Yu DJ. TargetDBP+: Enhancing the Performance of Identifying DNA-Binding Proteins via Weighted Convolutional Features. J Chem Inf Model 2021; 61:505-515. [PMID: 33410688 DOI: 10.1021/acs.jcim.0c00735] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein-DNA interactions exist ubiquitously and play important roles in the life cycles of living cells. The accurate identification of DNA-binding proteins (DBPs) is one of the key steps to understand the mechanisms of protein-DNA interactions. Although many DBP identification methods have been proposed, the current performance is still unsatisfactory. In this study, a new method, called TargetDBP+, is developed to further enhance the performance of identifying DBPs. In TargetDBP+, five convolutional features are first extracted from five feature sources, i.e., amino acid one-hot matrix (AAOHM), position-specific scoring matrix (PSSM), predicted secondary structure probability matrix (PSSPM), predicted solvent accessibility probability matrix (PSAPM), and predicted probabilities of DNA-binding sites (PPDBSs); second, the five features are weightedly and serially combined using the weights of all of the elements learned by the differential evolution algorithm; and finally, the DBP identification model of TargetDBP+ is trained using the support vector machine (SVM) algorithm. To evaluate the developed TargetDBP+ and compare it with other existing methods, a new gold-standard benchmark data set, called UniSwiss, is constructed, which consists of 4881 DBPs and 4881 non-DBPs extracted from the UniprotKB/Swiss-Prot database. Experimental results demonstrate that TargetDBP+ can obtain an accuracy of 85.83% and precision of 88.45% covering 82.41% of all DBP data on the independent validation subset of UniSwiss, with the MCC value (0.718) being significantly higher than those of other state-of-the-art control methods. The web server of TargetDBP+ is accessible at http://csbio.njust.edu.cn/bioinf/targetdbpplus/; the UniSwiss data set and stand-alone program of TargetDBP+ are accessible at https://github.com/jun-csbio/TargetDBPplus.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, P. R. China.,Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou 363000, P. R. China
| | - Liang Rao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, P. R. China
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
18
|
Zhang GJ, Wang XQ, Ma LF, Wang LJ, Hu J, Zhou XG. Two-Stage Distance Feature-based Optimization Algorithm for De novo Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2119-2130. [PMID: 31107659 DOI: 10.1109/tcbb.2019.2917452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
De novo protein structure prediction can be treated as a conformational space optimization problem under the guidance of an energy function. However, it is a challenge of how to design an accurate energy function which ensures low-energy conformations close to native structures. Fortunately, recent studies have shown that the accuracy of de novo protein structure prediction can be significantly improved by integrating the residue-residue distance information. In this paper, a two-stage distance feature-based optimization algorithm (TDFO) for de novo protein structure prediction is proposed within the framework of evolutionary algorithm. In TDFO, a similarity model is first designed by using feature information which is extracted from distance profiles by bisecting K-means algorithm. The similarity model-based selection strategy is then developed to guide conformation search, and thus improve the quality of the predicted models. Moreover, global and local mutation strategies are designed, and a state estimation strategy is also proposed to strike a trade-off between the exploration and exploitation of the search space. Experimental results of 35 benchmark proteins show that the proposed TDFO can improve prediction accuracy for a large portion of test proteins.
Collapse
|
19
|
Liu J, Zhou XG, Zhang Y, Zhang GJ. CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm. Bioinformatics 2020; 36:2443-2450. [PMID: 31860059 DOI: 10.1093/bioinformatics/btz943] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/10/2019] [Accepted: 12/18/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
20
|
Hu J, Zhou XG, Zhu YH, Yu DJ, Zhang GJ. TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1419-1429. [PMID: 30668479 DOI: 10.1109/tcbb.2019.2893634] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Accurately identifying DNA-binding proteins (DBPs) from protein sequence information is an important but challenging task for protein function annotations. In this paper, we establish a novel computational method, named TargetDBP, for accurately targeting DBPs from primary sequences. In TargetDBP, four single-view features, i.e., AAC (Amino Acid Composition), PsePSSM (Pseudo Position-Specific Scoring Matrix), PsePRSA (Pseudo Predicted Relative Solvent Accessibility), and PsePPDBS (Pseudo Predicted Probabilities of DNA-Binding Sites), are first extracted to represent different base features, respectively. Second, differential evolution algorithm is employed to learn the weights of four base features. Using the learned weights, we weightedly combine these base features to form the original super feature. An excellent subset of the super feature is then selected by using a suitable feature selection algorithm SVM-REF+CBR (Support Vector Machine Recursive Feature Elimination with Correlation Bias Reduction). Finally, the prediction model is learned via using support vector machine on the selected feature subset. We also construct a new gold-standard and non-redundant benchmark dataset from PDB database to evaluate and compare the proposed TargetDBP with other existing predictors. On this new dataset, TargetDBP can achieve higher performance than other state-of-the-art predictors. The TargetDBP web server and datasets are freely available at http://csbio.njust.edu.cn/bioinf/targetdbp/ for academic use.
Collapse
|
21
|
Zhou XG, Peng CX, Liu J, Zhang Y, Zhang GJ. Underestimation-Assisted Global-Local Cooperative Differential Evolution and the Application to Protein Structure Prediction. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION : A PUBLICATION OF THE IEEE NEURAL NETWORKS COUNCIL 2020; 24:536-550. [PMID: 33603321 PMCID: PMC7885903 DOI: 10.1109/tevc.2019.2938531] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.
Collapse
Affiliation(s)
- Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, and also with the Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, and also with the Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
22
|
Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.
Collapse
|
23
|
Tang L, Wang X, Dong Z. Adaptive Multiobjective Differential Evolution With Reference Axis Vicinity Mechanism. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:3571-3585. [PMID: 30004897 DOI: 10.1109/tcyb.2018.2849343] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Due to the simple but effective search framework, differential evolution (DE) has achieved successful applications in multiobjective optimization problems. However, most of the previous research on the multiobjective DE (MODE) focused on the design of control strategies of parameters and mutation operators for a given population at each generation, and ignored that the given population might have a bad distribution in the objective space. Therefore, this paper proposes a new variant of MODE in which a reference axis vicinity mechanism (RAVM) is developed to restore the good distribution of the given population and maintain its convergence before the evolution (i.e., mutation, crossover, and selection) starts at each generation. Besides the RAVM, a hybrid control strategy of parameters and mutation operators is also presented to accelerate convergence by integrating both randomness and guided information derived from solutions generated during the search process. Computational results on four series of benchmark problems illustrate that the proposed MODE with the RAVM and hybrid control strategy is competitive or even superior to some state-of-the-art multiobjective evolutionary algorithms in the literature.
Collapse
|
24
|
|
25
|
Li ZW, Sun K, Hao XH, Hu J, Ma LF, Zhou XG, Zhang GJ. Loop Enhanced Conformational Resampling Method for Protein Structure Prediction. IEEE Trans Nanobioscience 2019; 18:567-577. [PMID: 31180866 DOI: 10.1109/tnb.2019.2922101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein structure prediction has been a long-standing problem for the past decades. In particular, the loop region structure remains an obstacle in forming an accurate protein tertiary structure because of its flexibility. In this study, Rama torsion angle and secondary structure feature-guided differential evolution named RSDE is proposed to predict three-dimensional structure with the exploitation on the loop region structure. In RSDE, the structure of the loop region is improved by the following: loop-based cross operator, which interchanges configuration of a randomly selected loop region between individuals, and loop-based mutate operator, which considers torsion angle feature into conformational sampling. A stochastic ranking selective strategy is designed to select conformations with low energy and near-native structure. Moreover, the conformational resampling method, which uses previously learned knowledge to guide subsequent sampling, is proposed to improve the sampling efficiency. Experiments on a total of 28 test proteins reveals that the proposed RSDE is effective and can obtain native-like models.
Collapse
|