1
|
Lu Q. Molecular structure recognition by blob detection. RSC Adv 2021; 11:35879-35886. [PMID: 35492772 PMCID: PMC9043223 DOI: 10.1039/d1ra05752a] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 10/31/2021] [Indexed: 11/23/2022] Open
Abstract
Molecular structure recognition is fundamental in computational chemistry. The most common approach is to calculate the root mean square deviation (RMSD) between two sets of molecular coordinates. However, this method does not perform well for large molecules. In this work, a new method is proposed for structure comparison. Blob detection is used for recognizing structural features. Fragmentation of molecules is proposed as the pre-treatment. Mapping between blobs and atoms is developed as the post-treatment. A set of key parameters important for blob detections are determined. The dissimilarity is quantified by calculating the Euclidean metric of the blob vectors. The overall algorithm is found to be accurate to distinguish structural dissimilarity. The method has potential to be combined with other pattern recognition techniques for new chemistry discoveries. Molecular structure recognition is fundamental in computational chemistry.![]()
Collapse
Affiliation(s)
- Qing Lu
- Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences Beijing 100190 China
| |
Collapse
|
2
|
Schaeffer RD, Kinch L, Kryshtafovych A, Grishin NV. Assessment of domain interactions in the fourteenth round of the Critical Assessment of Structure Prediction (CASP14). Proteins 2021; 89:1700-1710. [PMID: 34455641 PMCID: PMC8616818 DOI: 10.1002/prot.26225] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/07/2021] [Accepted: 08/24/2021] [Indexed: 12/29/2022]
Abstract
The high accuracy of some CASP14 models at the domain level prompted a more detailed evaluation of structure predictions on whole targets. For the first time in critical assessment of structure prediction (CASP), we evaluated accuracy of difficult domain assembly in models submitted for multidomain targets where the community predicted individual evaluation units (EUs) with greater accuracy than full-length targets. Ten proteins with domain interactions that did not show evidence of conformational change and were not involved in significant oligomeric contacts were chosen as targets for the domain interaction assessment. Groups were ranked using complementary interaction scores (F1, QS score, and Jaccard coefficient), and their predictions were evaluated for their ability to correctly model inter-domain interfaces and overall protein folds. Target performance was broadly grouped into two clusters. The first consisted primarily of targets containing two EUs wherein predictors more broadly predicted domain positioning and interfacial contacts correctly. The other consisted of complex two-EU and three-EU targets where few predictors performed well. The highest ranked predictor, AlphaFold2, produced high-accuracy models on eight out of 10 targets. Their interdomain scores on three of these targets were significantly higher than all other groups and were responsible for their overall outperformance in the category. We further highlight the performance of AlphaFold2 and the next best group, BAKER-experimental on several interesting targets.
Collapse
Affiliation(s)
- R Dustin Schaeffer
- Department of Biophysics, UT Southwestern Medical Center, Dallas, Texas, USA
| | - Lisa Kinch
- Howard Hughes Medical Institute, UT Southwestern Medical Center, Dallas, Texas, USA
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Nick V Grishin
- Department of Biophysics, UT Southwestern Medical Center, Dallas, Texas, USA.,Howard Hughes Medical Institute, UT Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
3
|
Kinch LN, Schaeffer RD, Kryshtafovych A, Grishin NV. Target classification in the 14th round of the critical assessment of protein structure prediction (CASP14). Proteins 2021; 89:1618-1632. [PMID: 34350630 DOI: 10.1002/prot.26202] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/21/2021] [Accepted: 07/11/2021] [Indexed: 12/14/2022]
Abstract
An evolutionary-based definition and classification of target evaluation units (EUs) is presented for the 14th round of the critical assessment of structure prediction (CASP14). CASP14 targets included 84 experimental models submitted by various structural groups (designated T1024-T1101). Targets were split into EUs based on the domain organization of available templates and performance of server groups. Several targets required splitting (19 out of 25 multidomain targets) due in part to observed conformation changes. All in all, 96 CASP14 EUs were defined and assigned to tertiary structure assessment categories (Topology-based FM or High Accuracy-based TBM-easy and TBM-hard) considering their evolutionary relationship to existing ECOD fold space: 24 family level, 50 distant homologs (H-group), 12 analogs (X-group), and 10 new folds. Principal component analysis and heatmap visualization of sequence and structure similarity to known templates as well as performance of servers highlighted trends in CASP14 target difficulty. The assigned evolutionary levels (i.e., H-groups) and assessment classes (i.e., FM) displayed overlapping clusters of EUs. Many viral targets diverged considerably from their template homologs and thus were more difficult for prediction than other homology-related targets. On the other hand, some targets did not have sequence-identifiable templates, but were predicted better than expected due to relatively simple arrangements of secondary structural elements. An apparent improvement in overall server performance in CASP14 further complicated traditional classification, which ultimately assigned EUs into high-accuracy modeling (27 TBM-easy and 31 TBM-hard), topology (23 FM), or both (15 FM/TBM).
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, USA.,Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| |
Collapse
|
4
|
The breakthrough in protein structure prediction. Biochem J 2021; 478:1885-1890. [PMID: 34029366 PMCID: PMC8166336 DOI: 10.1042/bcj20200963] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 04/24/2021] [Accepted: 05/04/2021] [Indexed: 11/17/2022]
Abstract
Proteins are the essential agents of all living systems. Even though they are synthesized as linear chains of amino acids, they must assume specific three-dimensional structures in order to manifest their biological activity. These structures are fully specified in their amino acid sequences - and therefore in the nucleotide sequences of their genes. However, the relationship between sequence and structure, known as the protein folding problem, has remained elusive for half a century, despite sustained efforts. To measure progress on this problem, a series of doubly blind, biennial experiments called CASP (critical assessment of structure prediction) were established in 1994. We were part of the assessment team for the most recent CASP experiment, CASP14, where we witnessed an astonishing breakthrough by DeepMind, the leading artificial intelligence laboratory of Alphabet Inc. The models filed by DeepMind's structure prediction team using the program AlphaFold2 were often essentially indistinguishable from experimental structures, leading to a consensus in the community that the structure prediction problem for single protein chains has been solved. Here, we will review the path to CASP14, outline the method employed by AlphaFold2 to the extent revealed, and discuss the implications of this breakthrough for the life sciences.
Collapse
|
5
|
Toth JM, DePietro PJ, Haas J, McLaughlin WA. ResiRole: residue-level functional site predictions to gauge the accuracies of protein structure prediction techniques. Bioinformatics 2021; 37:351-359. [PMID: 32780798 PMCID: PMC8058773 DOI: 10.1093/bioinformatics/btaa712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 07/31/2020] [Accepted: 08/05/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation Methods to assess the quality of protein structure models are needed for user applications. To aid with the selection of structure models and further inform the development of structure prediction techniques, we describe the ResiRole method for the assessment of the quality of structure models. Results Structure prediction techniques are ranked according to the results of round-robin, head-to-head comparisons using difference scores. Each difference score was defined as the absolute value of the cumulative probability for a functional site prediction made with the FEATURE program for the reference structure minus that for the structure model. Overall, the difference scores correlate well with other model quality metrics; and based on benchmarking studies with NaïveBLAST, they are found to detect additional local structural similarities between the structure models and reference structures. Availabilityand implementation Automated analyses of models addressed in CAMEO are available via the ResiRole server, URL http://protein.som.geisinger.edu/ResiRole/. Interactive analyses with user-provided models and reference structures are also enabled. Code is available at github.com/wamclaughlin/ResiRole. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joshua M Toth
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| | - Paul J DePietro
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| | - Juergen Haas
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, CH-4056 Basel, Switzerland
| | - William A McLaughlin
- Department of Medical Education, Geisinger Commonwealth School of Medicine, Scranton, PA 18510, USA
| |
Collapse
|
6
|
Røgen P. Quantifying steric hindrance and topological obstruction to protein structure superposition. Algorithms Mol Biol 2021; 16:1. [PMID: 33639968 PMCID: PMC7913338 DOI: 10.1186/s13015-020-00180-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/17/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In computational structural biology, structure comparison is fundamental for our understanding of proteins. Structure comparison is, e.g., algorithmically the starting point for computational studies of structural evolution and it guides our efforts to predict protein structures from their amino acid sequences. Most methods for structural alignment of protein structures optimize the distances between aligned and superimposed residue pairs, i.e., the distances traveled by the aligned and superimposed residues during linear interpolation. Considering such a linear interpolation, these methods do not differentiate if there is room for the interpolation, if it causes steric clashes, or more severely, if it changes the topology of the compared protein backbone curves. RESULTS To distinguish such cases, we analyze the linear interpolation between two aligned and superimposed backbones. We quantify the amount of steric clashes and find all self-intersections in a linear backbone interpolation. To determine if the self-intersections alter the protein's backbone curve significantly or not, we present a path-finding algorithm that checks if there exists a self-avoiding path in a neighborhood of the linear interpolation. A new path is constructed by altering the linear interpolation using a novel interpretation of Reidemeister moves from knot theory working on three-dimensional curves rather than on knot diagrams. Either the algorithm finds a self-avoiding path or it returns a smallest set of essential self-intersections. Each of these indicates a significant difference between the folds of the aligned protein structures. As expected, we find at least one essential self-intersection separating most unknotted structures from a knotted structure, and we find even larger motions in proteins connected by obstruction free linear interpolations. We also find examples of homologous proteins that are differently threaded, and we find many distinct folds connected by longer but simple deformations. TM-align is one of the most restrictive alignment programs. With standard parameters, it only aligns residues superimposed within 5 Ångström distance. We find 42165 topological obstructions between aligned parts in 142068 TM-alignments. Thus, this restrictive alignment procedure still allows topological dissimilarity of the aligned parts. CONCLUSIONS Based on the data we conclude that our program ProteinAlignmentObstruction provides significant additional information to alignment scores based solely on distances between aligned and superimposed residue pairs.
Collapse
|
7
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
8
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
9
|
AlQuraishi M. AlphaFold at CASP13. Bioinformatics 2020; 35:4862-4865. [PMID: 31116374 DOI: 10.1093/bioinformatics/btz422] [Citation(s) in RCA: 154] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 03/26/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.,Lab of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
10
|
Lee GR, Won J, Heo L, Seok C. GalaxyRefine2: simultaneous refinement of inaccurate local regions and overall protein structure. Nucleic Acids Res 2020; 47:W451-W455. [PMID: 31001635 PMCID: PMC6602442 DOI: 10.1093/nar/gkz288] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/01/2019] [Accepted: 04/11/2019] [Indexed: 11/12/2022] Open
Abstract
The 3D structure of a protein can be predicted from its amino acid sequence with high accuracy for a large fraction of cases because of the availability of large quantities of experimental data and the advance of computational algorithms. Recently, deep learning methods exploiting the coevolution information obtained by comparing related protein sequences have been successfully used to generate highly accurate model structures even in the absence of template structure information. However, structures predicted based on either template structures or related sequences require further improvement in regions for which information is missing. Refining a predicted protein structure with insufficient information on certain regions is critical because these regions may be connected to functional specificity that is not conserved among related proteins. The GalaxyRefine2 web server, freely available via http://galaxy.seoklab.org/refine2, is an upgraded version of the GalaxyRefine protein structure refinement server and reflects recent developments successfully tested through CASP blind prediction experiments. This method adopts an iterative optimization approach involving various structure move sets to refine both local and global structures. The estimation of local error and hybridization of available homolog structures are also employed for effective conformation search.
Collapse
Affiliation(s)
- Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
11
|
Eguchi RR, Huang PS. Multi-scale structural analysis of proteins by deep semantic segmentation. Bioinformatics 2020; 36:1740-1749. [PMID: 31424530 PMCID: PMC7075530 DOI: 10.1093/bioinformatics/btz650] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 07/29/2019] [Accepted: 08/18/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation-a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. RESULTS We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design. AVAILABILITY AND IMPLEMENTATION The trained classifier network, parser network, and entropy calculation scripts are available for download at https://git.io/fp6bd, with detailed usage instructions provided at the download page. A step-by-step tutorial for setup is provided at https://goo.gl/e8GB2S. All Rosetta commands, RosettaRemodel blueprints, and predictions for all datasets used in the study are available in the Supplementary Information. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raphael R Eguchi
- Department of Biochemistry, School of Medicine, Stanford University, Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
| | - Po-Ssu Huang
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Olechnovič K, Monastyrskyy B, Kryshtafovych A, Venclovas Č. Comparative analysis of methods for evaluation of protein models against native structures. Bioinformatics 2019; 35:937-944. [PMID: 30169622 DOI: 10.1093/bioinformatics/bty760] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/04/2018] [Accepted: 08/28/2018] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Measuring discrepancies between protein models and native structures is at the heart of development of protein structure prediction methods and comparison of their performance. A number of different evaluation methods have been developed; however, their comprehensive and unbiased comparison has not been performed. RESULTS We carried out a comparative analysis of several popular model assessment methods (RMSD, TM-score, GDT, QCS, CAD-score, LDDT, SphereGrinder and RPF) to reveal their relative strengths and weaknesses. The analysis, performed on a large and diverse model set derived in the course of three latest community-wide CASP experiments (CASP10-12), had two major directions. First, we looked at general differences between the scores by analyzing distribution, correspondence and correlation of their values as well as differences in selecting best models. Second, we examined the score differences taking into account various structural properties of models (stereochemistry, hydrogen bonds, packing of domains and chain fragments, missing residues, protein length and secondary structure). Our results provide a solid basis for an informed selection of the most appropriate score or combination of scores depending on the task at hand. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| | | | | | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| |
Collapse
|
13
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 2019; 87:1141-1148. [PMID: 31602685 PMCID: PMC7079254 DOI: 10.1002/prot.25834] [Citation(s) in RCA: 169] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/17/2022]
Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T. Jones
- The Francis Crick InstituteLondonUK
- University College LondonLondonUK
| | | | | | | |
Collapse
|
14
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
15
|
Liu Y, Ye Q, Wang L, Peng J. Learning structural motif representations for efficient protein structure search. Bioinformatics 2019; 34:i773-i780. [PMID: 30423083 PMCID: PMC6129266 DOI: 10.1093/bioinformatics/bty585] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted. Results Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. Availability and implementation https://github.com/largelymfs/DeepFold
Collapse
Affiliation(s)
- Yang Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Qing Ye
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Liwei Wang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
16
|
Guzenko D, Lafita A, Monastyrskyy B, Kryshtafovych A, Duarte JM. Assessment of protein assembly prediction in CASP13. Proteins 2019; 87:1190-1199. [PMID: 31374138 DOI: 10.1002/prot.25795] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 07/11/2019] [Accepted: 07/27/2019] [Indexed: 01/08/2023]
Abstract
We present the assembly category assessment in the 13th edition of the CASP community-wide experiment. For the second time, protein assemblies constitute an independent assessment category. Compared to the last edition we see a clear uptake in participation, more oligomeric targets released, and consistent, albeit modest, improvement of the predictions quality. Looking at the tertiary structure predictions, we observe that ignoring the oligomeric state of the targets hinders modeling success. We also note that some contact prediction groups successfully predicted homomeric interfacial contacts, though it appears that these predictions were not used for assembly modeling. Homology modeling with sizeable human intervention appears to form the basis of the assembly prediction techniques in this round of CASP. Future developments should see more integrated approaches where subunits are modeled in the context of the assemblies they form.
Collapse
Affiliation(s)
- Dmytro Guzenko
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, California
| | - Aleix Lafita
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
| | - Bohdan Monastyrskyy
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, California
| |
Collapse
|
17
|
Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins 2019; 87:1113-1127. [PMID: 31407380 PMCID: PMC6851432 DOI: 10.1002/prot.25800] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 07/29/2019] [Accepted: 08/08/2019] [Indexed: 12/12/2022]
Abstract
Performance in the template‐based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter‐residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM‐easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM‐hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main‐chain and side‐chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.
Collapse
Affiliation(s)
- Tristan I Croll
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| | | | - Randy J Read
- Department of Haematology, University of Cambridge, Cambridge Institute for Medical Research, Cambridge, UK
| |
Collapse
|
18
|
Kinch LN, Kryshtafovych A, Monastyrskyy B, Grishin NV. CASP13 target classification into tertiary structure prediction categories. Proteins 2019; 87:1021-1036. [PMID: 31294862 DOI: 10.1002/prot.25775] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/24/2019] [Accepted: 07/06/2019] [Indexed: 12/30/2022]
Abstract
Protein target structures for the Critical Assessment of Structure Prediction round 13 (CASP13) were split into evaluation units (EUs) based on their structural domains, the domain organization of available templates, and the performance of servers on whole targets compared to split target domains. Eighty targets were split into 112 EUs. The EUs were classified into categories suitable for assessment of high accuracy modeling (or template-based modeling [TBM]) and topology (or free modeling [FM]) based on target difficulty. Assignment into assessment categories considered the following criteria: (a) the evolutionary relationship of target domains to existing fold space as defined by the Evolutionary Classification of Protein Domains (ECOD) database; (b) the clustering of target domains using eight objective sequence, structure, and performance measures; and (c) the placement of target domains in a scatter plot of target difficulty against server performance used in the previous CASP. Generally, target domains with good server predictions had close template homologs and were classified as TBM. Alternately, targets with poor server predictions represent a mixture of fast evolving homologs, structure analogs, and new folds, and were classified as FM or FM/TBM overlap.
Collapse
Affiliation(s)
- Lisa N Kinch
- Departments of Biophysics and Biochemistry, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas
| | | | | | - Nick V Grishin
- Departments of Biophysics and Biochemistry, Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas
| |
Collapse
|
19
|
Halder AK, Dutta P, Kundu M, Basu S, Nasipuri M. Review of computational methods for virus-host protein interaction prediction: a case study on novel Ebola-human interactions. Brief Funct Genomics 2019; 17:381-391. [PMID: 29028879 PMCID: PMC7109800 DOI: 10.1093/bfgp/elx026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Identification of potential virus–host interactions is useful and vital to control the highly infectious virus-caused diseases. This may contribute toward development of new drugs to treat the viral infections. Recently, database records of clinically and experimentally validated interactions between a small set of human proteins and Ebola virus (EBOV) have been published. Using the information of the known human interaction partners of EBOV, our main objective is to identify a set of proteins that may interact with EBOV proteins. Here, we first review the state-of-the-art, computational methods used for prediction of novel virus–host interactions for infectious diseases followed by a case study on EBOV–human interactions. The assessment result shows that the predicted human host proteins are highly similar with known human interaction partners of EBOV in the context of structure and semantics and are responsible for similar biochemical activities, pathways and host–pathogen relationships.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Pritha Dutta
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Mahantapas Kundu
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, India
| |
Collapse
|
20
|
Zaman AB, Shehu A. Balancing multiple objectives in conformation sampling to control decoy diversity in template-free protein structure prediction. BMC Bioinformatics 2019; 20:211. [PMID: 31023237 PMCID: PMC6485169 DOI: 10.1186/s12859-019-2794-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/04/2019] [Indexed: 12/05/2022] Open
Abstract
Background Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations. Results In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms. Conclusions The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.,Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA.,School of Systems Biology, George Mason University, Manassas, 20110, VA, USA
| |
Collapse
|
21
|
Han X, Li L, Lu Y. Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity. Genes (Basel) 2019; 10:genes10020132. [PMID: 30754721 PMCID: PMC6410076 DOI: 10.3390/genes10020132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 02/03/2019] [Accepted: 02/04/2019] [Indexed: 11/18/2022] Open
Abstract
Effective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a large number of candidate protein structures called decoy set can be predicted, however, it is a difficult problem to select a good near-native structure from the predicted decoy set. In this work we propose a new method for selecting the near-native structure from the decoy set based on both contact map overlap (CMO) and graphlets. By generalizing graphlets to ordered graphs, and using a dynamic programming to select the optimal alignment with an introduced gap penalty, a GR_score is defined for calculating the similarity between the three-dimensional (3D) decoy structures. The proposed method was applied to all 54 single-domain targets in CASP11 and all 43 targets in CASP10, and ensemble clustering was used to cluster the protein decoy structures based on the computed CR_scores. The most popular centroid structure was selected as the near-native structure. The experiments showed that compared to the SPICKER method, which is used in I-TASSER, the proposed method can usually select better near-native structures in terms of the similarity between the selected structure and the true native structure.
Collapse
Affiliation(s)
- Xu Han
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Li Li
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| | - Yonggang Lu
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China.
| |
Collapse
|
22
|
Experimental accuracy in protein structure refinement via molecular dynamics simulations. Proc Natl Acad Sci U S A 2018; 115:13276-13281. [PMID: 30530696 DOI: 10.1073/pnas.1811364115] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Refinement is the last step in protein structure prediction pipelines to convert approximate homology models to experimental accuracy. Protocols based on molecular dynamics (MD) simulations have shown promise, but current methods are limited to moderate levels of consistent refinement. To explore the energy landscape between homology models and native structures and analyze the challenges of MD-based refinement, eight test cases were studied via extensive simulations followed by Markov state modeling. In all cases, native states were found very close to the experimental structures and at the lowest free energies, but refinement was hindered by a rough energy landscape. Transitions from the homology model to the native states require the crossing of significant kinetic barriers on at least microsecond time scales. A significant energetic driving force toward the native state was lacking until its immediate vicinity, and there was significant sampling of off-pathway states competing for productive refinement. The role of recent force field improvements is discussed and transition paths are analyzed in detail to inform which key transitions have to be overcome to achieve successful refinement.
Collapse
|
23
|
Robertson JC, Perez A, Dill KA. MELD × MD Folds Nonthreadables, Giving Native Structures and Populations. J Chem Theory Comput 2018; 14:6734-6740. [PMID: 30407805 DOI: 10.1021/acs.jctc.8b00886] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A current challenge is to compute the native structures of proteins from their amino acid sequences. A main approach of bioinformatics is threading, in which a protein to be predicted is computationally threaded onto protein fragments of similar sequence having an already known structure. However, ∼15% of proteins cannot be folded in this way; this has been called the glass ceiling, and the proteins are called nonthreadables. For these, physical molecular dynamics (MD) modeling is promising because it does not require templates. We find that MD, when used with an accelerator called MELD, can fold many nonthreadables. For 41 nonthreadable proteins with fewer than 125 residues, MELD-accelerated MD (MELD × MD) folds 20 of them to better than 4 Å error. In 10 cases, MELD × MD succeeds even when the force field does not properly encode the native state. In 11 cases, MELD × MD foretells its own success; seeing large Boltzmann populations in the simulations predicts it has converged to the correct native state. MELD × MD acceleration can be applied to a broad physical protein modeling range.
Collapse
Affiliation(s)
- James C Robertson
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Chemistry , Stony Brook University , Stony Brook , New York 11794 , United States.,Department of Physics and Astronomy , Stony Brook University , Stony Brook , New York 11794 , United States
| |
Collapse
|
24
|
Chen M, Lin X, Lu W, Schafer NP, Onuchic JN, Wolynes PG. Template-Guided Protein Structure Prediction and Refinement Using Optimized Folding Landscape Force Fields. J Chem Theory Comput 2018; 14:6102-6116. [PMID: 30240202 DOI: 10.1021/acs.jctc.8b00683] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
When good structural templates can be identified, template-based modeling is the most reliable way to predict the tertiary structure of proteins. In this study, we combine template-based modeling with a realistic coarse-grained force field, AWSEM, that has been optimized using the principles of energy landscape theory. The Associative memory, Water mediated, Structure and Energy Model (AWSEM) is a coarse-grained force field having both transferable tertiary interactions and knowledge-based local-in-sequence interaction terms. We incorporate template information into AWSEM by introducing soft collective biases to the template structures, resulting in a model that we call AWSEM-Template. Structure prediction tests on eight targets, four of which are in the low sequence identity "twilight zone" of homology modeling, show that AWSEM-Template can achieve high-resolution structure prediction. Our results also confirm that using a combination of AWSEM and a template-guided potential leads to more accurate prediction of protein structures than simply using a template-guided potential alone. Free energy profile analyses demonstrate that the soft collective biases to the template effectively increase funneling toward native-like structures while still allowing significant flexibility so as to allow for correction of discrepancies between the target structure and the template. A further stage of refinement using all-atom molecular dynamics augmented with soft collective biases to the structures predicted by AWSEM-Template leads to a further improvement of both backbone and side-chain accuracy by maintaining sufficient flexibility but at the same time discouraging unproductive unfolding events often seen in unrestrained all-atom refinement simulations. The all-atom refinement simulations also reduce patches of frustration of the initial predictions. Some of the backbones found among the structures produced during the initial coarse-grained prediction step already have CE-RMSD values of less than 3 Å with 90% or more of the residues aligned to the experimentally solved structure for all targets. All-atom structures generated during the following all-atom refinement simulations, which started from coarse-grained structures that were chosen without reference to any knowledge about the native structure, have CE-RMSD values of less than 2.5 Å with 90% or more of the residues aligned for 6 out of 8 targets. Clustering low energy structures generated during the initial coarse-grained annealing picks out reliably structures that are within 1 Å of the best sampled structures in 5 out of 8 cases. After the all-atom refinement, structures that are within 1 Å of the best sampled structures can be selected using a simple algorithm based on energetic features alone in 7 out of 8 cases.
Collapse
Affiliation(s)
- Mingchen Chen
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Bioengineering , Rice University , Houston , Texas 77005 , United States
| | - Xingcheng Lin
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States
| | - Wei Lu
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Physics and Astronomy , Rice University , Houston , Texas 77005 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States.,Department of Biosciences , Rice University , Houston , Texas 77005 , United States
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University , Houston , Texas 77030 , United States.,Department of Chemistry , Rice University , Houston , Texas 77005 , United States.,Department of Biosciences , Rice University , Houston , Texas 77005 , United States
| |
Collapse
|
25
|
Ma T, Zang T, Wang Q, Ma J. Refining protein structures using enhanced sampling techniques with restraints derived from an ensemble-based model. Protein Sci 2018; 27:1842-1849. [PMID: 30098055 DOI: 10.1002/pro.3486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 07/05/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022]
Abstract
This paper reports a method for high-accuracy protein structural refinement, which is a direct extension of the method in our recent publication (Zang, J Chem Phys 2018; 149:072319). It combines a parallel continuous simulated tempering (PCST) method with a temperature-dependent restraint and a blind model selection scheme. In this work, a single-reference-based restraint in previous work was changed to an ensemble-based model (EBM), in which the non-bonded Lennard-Jones term for each contacting atomic pair in previous restraining potential was replaced by a multi-Gaussian function whose parameters are derived from an ensemble of structures such as the ones from various CASP participating groups. The purpose of EBM is to take advantage of partial "correctness" distributed among members of the structural ensemble. Totally 18 targets were refined from the refinement category of CASP10, CASP11 and CASP12. In Top-1 group, 11 out of 18 targets had better models (greater GDT_TS scores) than the CASPR participants. In Top-5 group, nine out of 18 were better. Our results show that PCST-EBM method can considerably improve the low-accuracy structures.
Collapse
Affiliation(s)
- Tianqi Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005
| | - Tianwu Zang
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, 77030
| | - Jianpeng Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, 77030
| |
Collapse
|
26
|
Zang T, Ma T, Wang Q, Ma J. Improving low-accuracy protein structures using enhanced sampling techniques. J Chem Phys 2018; 149:072319. [PMID: 30134714 PMCID: PMC5995690 DOI: 10.1063/1.5027243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 05/23/2018] [Indexed: 11/14/2022] Open
Abstract
In this paper, we report results of using enhanced sampling and blind selection techniques for high-accuracy protein structural refinement. By combining a parallel continuous simulated tempering (PCST) method, previously developed by Zang et al. [J. Chem. Phys. 141, 044113 (2014)], and the structure based model (SBM) as restraints, we refined 23 targets (18 from the refinement category of the CASP10 and 5 from that of CASP12). We also designed a novel model selection method to blindly select high-quality models from very long simulation trajectories. The combined use of PCST-SBM with the blind selection method yielded final models that are better than initial models. For Top-1 group, 7 out of 23 targets had better models (greater global distance test total scores) than the critical assessment of structure prediction participants. For Top-5 group, 10 out of 23 were better. Our results justify the crucial position of enhanced sampling in protein structure prediction and refinement and demonstrate that a considerable improvement of low-accuracy structures is achievable with current force fields.
Collapse
Affiliation(s)
- Tianwu Zang
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Tianqi Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, USA
| | - Jianpeng Ma
- Author to whom correspondence should be addressed: . Telephone: 713-798-8187. Fax: 713-796-9438
| |
Collapse
|
27
|
Terashi G, Kihara D. De novo main-chain modeling with MAINMAST in 2015/2016 EM Model Challenge. J Struct Biol 2018; 204:351-359. [PMID: 30075190 PMCID: PMC6179447 DOI: 10.1016/j.jsb.2018.07.013] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 07/13/2018] [Accepted: 07/19/2018] [Indexed: 11/15/2022]
Abstract
Protein tertiary structure modeling is a critical step for the interpretation of three dimensional (3D) election microscopy density. Our group participated the 2015/2016 EM Model Challenge using the MAINMAST software for a de novo main chain modeling. The software generates local dense points using the mean shifting algorithm, and connects them into Cα models by calculating the minimum spanning tree and the longest path. Subsequently, full atom structure models are generated, which are subject to structural refinement. Here, we summarize the qualities of our submitted models and examine successful and unsuccessful models, including 3D models we did not submit to the Challenge. Our protocol using the MAINMAST software was sometimes able to build correct conformations with 3.4–5.1 Å RMSD. Unsuccessful models had failure of chain traces, however, their Cα positions and some local structures were quite correctly built. For evaluate the quality of the models, the MAINMAST software provides a confidence score for each Cα position from the consensus of top 100 scoring models.
Collapse
Affiliation(s)
- Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA; Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA.
| |
Collapse
|
28
|
Large-scale computational drug repositioning to find treatments for rare diseases. NPJ Syst Biol Appl 2018; 4:13. [PMID: 29560273 PMCID: PMC5847522 DOI: 10.1038/s41540-018-0050-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 01/22/2018] [Accepted: 02/03/2018] [Indexed: 11/08/2022] Open
Abstract
Rare, or orphan, diseases are conditions afflicting a small subset of people in a population. Although these disorders collectively pose significant health care problems, drug companies require government incentives to develop drugs for rare diseases due to extremely limited individual markets. Computer-aided drug repositioning, i.e., finding new indications for existing drugs, is a cheaper and faster alternative to traditional drug discovery offering a promising venue for orphan drug research. Structure-based matching of drug-binding pockets is among the most promising computational techniques to inform drug repositioning. In order to find new targets for known drugs ultimately leading to drug repositioning, we recently developed eMatchSite, a new computer program to compare drug-binding sites. In this study, eMatchSite is combined with virtual screening to systematically explore opportunities to reposition known drugs to proteins associated with rare diseases. The effectiveness of this integrated approach is demonstrated for a kinase inhibitor, which is a confirmed candidate for repositioning to synapsin Ia. The resulting dataset comprises 31,142 putative drug-target complexes linked to 980 orphan diseases. The modeling accuracy is evaluated against the structural data recently released for tyrosine-protein kinase HCK. To illustrate how potential therapeutics for rare diseases can be identified, we discuss a possibility to repurpose a steroidal aromatase inhibitor to treat Niemann-Pick disease type C. Overall, the exhaustive exploration of the drug repositioning space exposes new opportunities to combat orphan diseases with existing drugs. DrugBank/Orphanet repositioning data are freely available to research community at https://osf.io/qdjup/.
Collapse
|
29
|
Brylinski M, Naderi M, Govindaraj RG, Lemoine J. eRepo-ORP: Exploring the Opportunity Space to Combat Orphan Diseases with Existing Drugs. J Mol Biol 2017; 430:2266-2273. [PMID: 29237557 DOI: 10.1016/j.jmb.2017.12.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 11/15/2017] [Accepted: 12/05/2017] [Indexed: 01/29/2023]
Abstract
About 7000 rare, or orphan, diseases affect more than 350 million people worldwide. Although these conditions collectively pose significant health care problems, drug companies seldom develop drugs for orphan diseases due to extremely limited individual markets. Consequently, developing new treatments for often life-threatening orphan diseases is primarily contingent on financial incentives from governments, special research grants, and private philanthropy. Computer-aided drug repositioning is a cheaper and faster alternative to traditional drug discovery offering a promising venue for orphan drug research. Here, we present eRepo-ORP, a comprehensive resource constructed by a large-scale repositioning of existing drugs to orphan diseases with a collection of structural bioinformatics tools, including eThread, eFindSite, and eMatchSite. Specifically, a systematic exploration of 320,856 possible links between known drugs in DrugBank and orphan proteins obtained from Orphanet reveals as many as 18,145 candidates for repurposing. In order to illustrate how potential therapeutics for rare diseases can be identified with eRepo-ORP, we discuss the repositioning of a kinase inhibitor for Ras-associated autoimmune leukoproliferative disease. The eRepo-ORP data set is available through the Open Science Framework at https://osf.io/qdjup/.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Center for Computation & Technology, Louisiana State University, Baton Rouge, LA 70803, USA.
| | - Misagh Naderi
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | - Jeffrey Lemoine
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
30
|
Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics 2017; 33:2684-2690. [PMID: 28419258 PMCID: PMC5860056 DOI: 10.1093/bioinformatics/btx217] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/18/2017] [Accepted: 04/12/2017] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT d.t.jones@ucl.ac.uk.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Department of Computer Science, University College London, Gower Street, London, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, UK
| |
Collapse
|
31
|
Faraggi E, Dunker AK, Sussman JL, Kloczkowski A. Comparing NMR and X-ray protein structure: Lindemann-like parameters and NMR disorder. J Biomol Struct Dyn 2017; 36:2331-2341. [PMID: 28714803 DOI: 10.1080/07391102.2017.1352539] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Disordered protein chains and segments are fast becoming a major pathway for our understanding of biological function, especially in more evolved species. However, the standard definition of disordered residues: the inability to constrain them in X-ray derived structures, is not easily applied to NMR derived structures. We carry out a statistical comparison between proteins whose structure was resolved using NMR and using X-ray protocols. We start by establishing a connection between these two protocols for obtaining protein structure. We find a close statistical correspondence between NMR and X-ray structures if fluctuations inherent to the NMR protocol are taken into account. Intuitively this tends to lend support to the validity of both NMR and X-ray protocols in deriving biomolecular models that correspond to in vivo conditions. We then establish Lindemann-like parameters for NMR derived structures and examine what order/disorder cutoffs for these parameters are most consistent with X-ray data and how consistent are they. Finally, we find critical value of [Formula: see text] for the best correspondence between X-ray and NMR derived order/disorder assignment, judged by maximizing the Matthews correlation, and a critical value [Formula: see text] if a balance between false positive and false negative prediction is sought. We examine a few non-conforming cases, and examine the origin of the structure derived in X-ray. This study could help in assigning meaningful disorder from NMR experiments.
Collapse
Affiliation(s)
- Eshel Faraggi
- a Department of Biochemistry and Molecular Biology , Indiana University School of Medicine , Indianapolis , 46202 IN , USA .,b Battelle Center for Mathematical Medicine , The Research Institute at Nationwide Children's Hospital , Columbus , 43205 OH , USA .,c Research and Information Systems , LLC , Carmel , 46032 IN , USA
| | - A Keith Dunker
- a Department of Biochemistry and Molecular Biology , Indiana University School of Medicine , Indianapolis , 46202 IN , USA .,d Center for Computational Biology and Bioinformatics , Indiana University School of Medicine , Indianapolis , 46202 IN , USA
| | - Joel L Sussman
- e Department of Structural Biology , Weizmann Institute of Science , Rehovot , 76100 Israel
| | - Andrzej Kloczkowski
- f Battelle Center for Mathematical Medicine , Nationwide Children's Hospital , Columbus , 43215 OH , USA .,g Department of Pediatrics , The Ohio State University , Columbus , 43215 OH , USA .,h Kavli Institute for Theoretical Physics China , Chinese Academy of Sciences , Beijing , 100190 China
| |
Collapse
|
32
|
Maheshwari S, Brylinski M. Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks. BMC Bioinformatics 2017; 18:257. [PMID: 28499419 PMCID: PMC5427563 DOI: 10.1186/s12859-017-1675-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 05/03/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Deciphering complete networks of interactions between proteins is the key to comprehend cellular regulatory mechanisms. A significant effort has been devoted to expanding the coverage of the proteome-wide interaction space at molecular level. Although a growing body of research shows that protein docking can, in principle, be used to predict biologically relevant interactions, the accuracy of the across-proteome identification of interacting partners and the selection of near-native complex structures still need to be improved. RESULTS In this study, we developed a new method to discover and model protein interactions employing an exhaustive all-to-all docking strategy. This approach integrates molecular modeling, structural bioinformatics, machine learning, and functional annotation filters in order to provide interaction data for the bottom-up assembly of protein interaction networks. Encouragingly, the success rates for dimer modeling is 57.5 and 48.7% when experimental and computer-generated monomer structures are employed, respectively. Further, our protocol correctly identifies 81% of protein-protein interactions at the expense of only 19% false positive rate. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Finally, we validated our method against the human immune disease pathway. CONCLUSIONS Protein docking supported by evolutionary restraints and machine learning can be used to reliably identify and model biologically relevant protein assemblies at the proteome scale. Moreover, the accuracy of the identification of protein-protein interactions is improved by considering only those protein pairs co-localized in the same cellular compartment and involved in the same biological process. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms and pathways as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques.
Collapse
Affiliation(s)
- Surabhi Maheshwari
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA. .,Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, USA.
| |
Collapse
|
33
|
Feig M. Computational protein structure refinement: Almost there, yet still so far to go. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2017; 7:e1307. [PMID: 30613211 PMCID: PMC6319934 DOI: 10.1002/wcms.1307] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Protein structures are essential in modern biology yet experimental methods are far from being able to catch up with the rapid increase in available genomic data. Computational protein structure prediction methods aim to fill the gap while the role of protein structure refinement is to take approximate initial template-based models and bring them closer to the true native structure. Current methods for computational structure refinement rely on molecular dynamics simulations, related sampling methods, or iterative structure optimization protocols. The best methods are able to achieve moderate degrees of refinement but consistent refinement that can reach near-experimental accuracy remains elusive. Key issues revolve around the accuracy of the energy function, the inability to reliably rank multiple models, and the use of restraints that keep sampling close to the native state but also limit the degree of possible refinement. A different aspect is the question of what exactly the target of high-resolution refinement should be as experimental structures are affected by experimental conditions and different biological questions require varying levels of accuracy. While improvement of the global protein structure is a difficult problem, high-resolution refinement methods that improves local structural quality such as favorable stereochemistry and the avoidance of atomic clashes are much more successful.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, 603 Wilson Rd., Room 218 BCH, East Lansing, MI, USA, ; 517-432-7439
| |
Collapse
|
34
|
Critical Features of Fragment Libraries for Protein Structure Prediction. PLoS One 2017; 12:e0170131. [PMID: 28085928 PMCID: PMC5235372 DOI: 10.1371/journal.pone.0170131] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/29/2016] [Indexed: 11/19/2022] Open
Abstract
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Collapse
|
35
|
Mozolewska MA, Krupa P, Zaborowski B, Liwo A, Lee J, Joo K, Czaplewski C. Use of Restraints from Consensus Fragments of Multiple Server Models To Enhance Protein-Structure Prediction Capability of the UNRES Force Field. J Chem Inf Model 2016; 56:2263-2279. [DOI: 10.1021/acs.jcim.6b00189] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | - Paweł Krupa
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | | | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Jooyoung Lee
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center
for Advanced Computation, Korea Institute for Advanced Study, 85
Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
36
|
Basu S, Wallner B. DockQ: A Quality Measure for Protein-Protein Docking Models. PLoS One 2016; 11:e0161879. [PMID: 27560519 PMCID: PMC4999177 DOI: 10.1371/journal.pone.0161879] [Citation(s) in RCA: 144] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 08/12/2016] [Indexed: 01/26/2023] Open
Abstract
The state-of-the-art to assess the structural quality of docking models is currently based on three related yet independent quality measures: Fnat, LRMS, and iRMS as proposed and standardized by CAPRI. These quality measures quantify different aspects of the quality of a particular docking model and need to be viewed together to reveal the true quality, e.g. a model with relatively poor LRMS (>10Å) might still qualify as 'acceptable' with a descent Fnat (>0.50) and iRMS (<3.0Å). This is also the reason why the so called CAPRI criteria for assessing the quality of docking models is defined by applying various ad-hoc cutoffs on these measures to classify a docking model into the four classes: Incorrect, Acceptable, Medium, or High quality. This classification has been useful in CAPRI, but since models are grouped in only four bins it is also rather limiting, making it difficult to rank models, correlate with scoring functions or use it as target function in machine learning algorithms. Here, we present DockQ, a continuous protein-protein docking model quality measure derived by combining Fnat, LRMS, and iRMS to a single score in the range [0, 1] that can be used to assess the quality of protein docking models. By using DockQ on CAPRI models it is possible to almost completely reproduce the original CAPRI classification into Incorrect, Acceptable, Medium and High quality. An average PPV of 94% at 90% Recall demonstrating that there is no need to apply predefined ad-hoc cutoffs to classify docking models. Since DockQ recapitulates the CAPRI classification almost perfectly, it can be viewed as a higher resolution version of the CAPRI classification, making it possible to estimate model quality in a more quantitative way using Z-scores or sum of top ranked models, which has been so valuable for the CASP community. The possibility to directly correlate a quality measure to a scoring function has been crucial for the development of scoring functions for protein structure prediction, and DockQ should be useful in a similar development in the protein docking field. DockQ is available at http://github.com/bjornwallner/DockQ/.
Collapse
Affiliation(s)
- Sankar Basu
- Bioinformatics Division, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| | - Björn Wallner
- Bioinformatics Division, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
- Swedish e-Science Research Center, Linköping University, Linköping, Sweden
| |
Collapse
|
37
|
Abstract
![]()
A method for the local refinement
of protein structures that targets
improvements in local stereochemistry while preserving the overall
fold is presented. The method uses force field-based minimization
and sampling via molecular dynamics simulations with a modified force
field to bring bonds, angles, and torsion angles into an acceptable
range for high-resolution protein structures. The method is implemented
in the locPREFMD web server and was tested on computational models
submitted to CASP11. Using MolProbity scores as the main assessment
criterion, the locPREFMD method significantly improves the stereochemical
quality of given input models close to the quality expected for experimental
structures while maintaining the Cα coordinates of the initial
model.
Collapse
Affiliation(s)
- Michael Feig
- Department of Biochemistry and Molecular Biology and ‡Department of Chemistry, Michigan State University , 603 Wilson Road, Room BCH 218, East Lansing, Michigan 48824, United States
| |
Collapse
|
38
|
Krupa P, Mozolewska MA, Wiśniewska M, Yin Y, He Y, Sieradzan AK, Ganzynkowicz R, Lipska AG, Karczyńska A, Ślusarz M, Ślusarz R, Giełdoń A, Czaplewski C, Jagieła D, Zaborowski B, Scheraga HA, Liwo A. Performance of protein-structure predictions with the physics-based UNRES force field in CASP11. Bioinformatics 2016; 32:3270-3278. [PMID: 27378298 DOI: 10.1093/bioinformatics/btw404] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 06/20/2016] [Indexed: 12/20/2022] Open
Abstract
Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulation process except for weak restraints on secondary structure to enable us to finish each prediction within the allowed 3-week time window. Because of simplified UNRES representation of polypeptide chains, use of enhanced sampling methods, code optimization and parallelization and sufficient computational resources, we were able to treat, for the first time, all 55 human prediction targets with sizes from 44 to 595 amino acid residues, the average size being 251 residues. Complete structures of six single-domain proteins were predicted accurately, with the highest accuracy being attained for the T0769, for which the CαRMSD was 3.8 Å for 97 residues of the experimental structure. Correct structures were also predicted for 13 domains of multi-domain proteins with accuracy comparable to that of the best template-based modeling methods. With further improvements of the UNRES force field that are now underway, our physics-based coarse-grained approach to protein-structure prediction will eventually reach global prediction capacity and, consequently, reliability in simulating protein structure and dynamics that are important in biochemical processes. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://www.unres.pl/ CONTACT: has5@cornell.edu.
Collapse
Affiliation(s)
- Paweł Krupa
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Magdalena A Mozolewska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Marta Wiśniewska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Yanping Yin
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Yi He
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Robert Ganzynkowicz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Agnieszka Karczyńska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Dawid Jagieła
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | | | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
39
|
Li W, Schaeffer RD, Otwinowski Z, Grishin NV. Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models. PLoS One 2016; 11:e0154786. [PMID: 27149620 PMCID: PMC4858170 DOI: 10.1371/journal.pone.0154786] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 04/19/2016] [Indexed: 11/19/2022] Open
Abstract
The Critical Assessment of techniques for protein Structure Prediction (or CASP) is a community-wide blind test experiment to reveal the best accomplishments of structure modeling. Assessors have been using the Global Distance Test (GDT_TS) measure to quantify prediction performance since CASP3 in 1998. However, identifying significant score differences between close models is difficult because of the lack of uncertainty estimations for this measure. Here, we utilized the atomic fluctuations caused by structure flexibility to estimate the uncertainty of GDT_TS scores. Structures determined by nuclear magnetic resonance are deposited as ensembles of alternative conformers that reflect the structural flexibility, whereas standard X-ray refinement produces the static structure averaged over time and space for the dynamic ensembles. To recapitulate the structural heterogeneous ensemble in the crystal lattice, we performed time-averaged refinement for X-ray datasets to generate structural ensembles for our GDT_TS uncertainty analysis. Using those generated ensembles, our study demonstrates that the time-averaged refinements produced structure ensembles with better agreement with the experimental datasets than the averaged X-ray structures with B-factors. The uncertainty of the GDT_TS scores, quantified by their standard deviations (SDs), increases for scores lower than 50 and 70, with maximum SDs of 0.3 and 1.23 for X-ray and NMR structures, respectively. We also applied our procedure to the high accuracy version of GDT-based score and produced similar results with slightly higher SDs. To facilitate score comparisons by the community, we developed a user-friendly web server that produces structure ensembles for NMR and X-ray structures and is accessible at http://prodata.swmed.edu/SEnCS. Our work helps to identify the significance of GDT_TS score differences, as well as to provide structure ensembles for estimating SDs of any scores.
Collapse
Affiliation(s)
- Wenlin Li
- Department of Biochemistry and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America
| | - R. Dustin Schaeffer
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America
| | - Zbyszek Otwinowski
- Department of Biochemistry and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America
- Department of Biochemistry and Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390–9050, United States of America
- * E-mail:
| |
Collapse
|
40
|
Uziela K, Wallner B. ProQ2: estimation of model accuracy implemented in Rosetta. Bioinformatics 2016; 32:1411-3. [PMID: 26733453 PMCID: PMC4848402 DOI: 10.1093/bioinformatics/btv767] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 12/23/2015] [Indexed: 11/24/2022] Open
Abstract
Motivation: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. Results: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy. Availability and implementation:https://github.com/bjornwallner/ProQ_scripts Contact:bjornw@ifm.liu.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karolis Uziela
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83, Linköping, Sweden and Swedish e-Science Research Center, Linköping, Sweden
| |
Collapse
|
41
|
Heinze S, Putnam DK, Fischer AW, Kohlmann T, Weiner BE, Meiler J. CASP10-BCL::Fold efficiently samples topologies of large proteins. Proteins 2015; 83:547-63. [PMID: 25581562 DOI: 10.1002/prot.24733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 10/15/2014] [Accepted: 11/03/2014] [Indexed: 12/26/2022]
Abstract
During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein.
Collapse
Affiliation(s)
- Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, 37240
| | | | | | | | | | | |
Collapse
|
42
|
Poleksic A. A polynomial time algorithm for computing the area under a GDT curve. Algorithms Mol Biol 2015; 10:27. [PMID: 26504491 PMCID: PMC4620747 DOI: 10.1186/s13015-015-0058-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2015] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Progress in the field of protein three-dimensional structure prediction depends on the development of new and improved algorithms for measuring the quality of protein models. Perhaps the best descriptor of the quality of a protein model is the GDT function that maps each distance cutoff θ to the number of atoms in the protein model that can be fit under the distance θ from the corresponding atoms in the experimentally determined structure. It has long been known that the area under the graph of this function (GDT_A) can serve as a reliable, single numerical measure of the model quality. Unfortunately, while the well-known GDT_TS metric provides a crude approximation of GDT_A, no algorithm currently exists that is capable of computing accurate estimates of GDT_A. METHODS We prove that GDT_A is well defined and that it can be approximated by the Riemann sums, using available methods for computing accurate (near-optimal) GDT function values. RESULTS In contrast to the GDT_TS metric, GDT_A is neither insensitive to large nor oversensitive to small changes in model's coordinates. Moreover, the problem of computing GDT_A is tractable. More specifically, GDT_A can be computed in cubic asymptotic time in the size of the protein model. CONCLUSIONS This paper presents the first algorithm capable of computing the near-optimal estimates of the area under the GDT function for a protein model. We believe that the techniques implemented in our algorithm will pave ways for the development of more practical and reliable procedures for estimating 3D model quality.
Collapse
|
43
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|
44
|
Szelag M, Czerwoniec A, Wesoly J, Bluyssen HAR. Identification of STAT1 and STAT3 specific inhibitors using comparative virtual screening and docking validation. PLoS One 2015; 10:e0116688. [PMID: 25710482 PMCID: PMC4339377 DOI: 10.1371/journal.pone.0116688] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 12/15/2014] [Indexed: 12/31/2022] Open
Abstract
Signal transducers and activators of transcription (STATs) facilitate action of cytokines, growth factors and pathogens. STAT activation is mediated by a highly conserved SH2 domain, which interacts with phosphotyrosine motifs for specific STAT-receptor contacts and STAT dimerization. The active dimers induce gene transcription in the nucleus by binding to a specific DNA-response element in the promoter of target genes. Abnormal activation of STAT signaling pathways is implicated in many human diseases, like cancer, inflammation and auto-immunity. Searches for STAT-targeting compounds, exploring the phosphotyrosine (pTyr)-SH2 interaction site, yielded many small molecules for STAT3 but sparsely for other STATs. However, many of these inhibitors seem not STAT3-specific, thereby questioning the present modeling and selection strategies of SH2 domain-based STAT inhibitors. We generated new 3D structure models for all human (h)STATs and developed a comparative in silico docking strategy to obtain further insight into STAT-SH2 cross-binding specificity of a selection of previously identified STAT3 inhibitors. Indeed, by primarily targeting the highly conserved pTyr-SH2 binding pocket the majority of these compounds exhibited similar binding affinity and tendency scores for all STATs. By comparative screening of a natural product library we provided initial proof for the possibility to identify STAT1 as well as STAT3-specific inhibitors, introducing the ‘STAT-comparative binding affinity value’ and ‘ligand binding pose variation’ as selection criteria. In silico screening of a multi-million clean leads (CL) compound library for binding of all STATs, likewise identified potential specific inhibitors for STAT1 and STAT3 after docking validation. Based on comparative virtual screening and docking validation, we developed a novel STAT inhibitor screening tool that allows identification of specific STAT1 and STAT3 inhibitory compounds. This could increase our understanding of the functional role of these STATs in different diseases and benefit the clinical need for more drugable STAT inhibitors with high specificity, potency and excellent bioavailability.
Collapse
Affiliation(s)
- Malgorzata Szelag
- Department of Human Molecular Genetics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
| | - Anna Czerwoniec
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
| | - Joanna Wesoly
- Laboratory of High Throughput Technologies, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
| | - Hans A. R. Bluyssen
- Department of Human Molecular Genetics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614 Poznan, Poland
- * E-mail:
| |
Collapse
|
45
|
Tong J, Pei J, Otwinowski Z, Grishin NV. Refinement by shifting secondary structure elements improves sequence alignments. Proteins 2015; 83:411-27. [PMID: 25546158 DOI: 10.1002/prot.24746] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 11/25/2014] [Accepted: 12/10/2014] [Indexed: 01/09/2023]
Abstract
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa.
Collapse
Affiliation(s)
- Jing Tong
- Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390; Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390
| | | | | | | |
Collapse
|
46
|
Adaptive firefly algorithm: parameter analysis and its application. PLoS One 2014; 9:e112634. [PMID: 25397812 PMCID: PMC4232507 DOI: 10.1371/journal.pone.0112634] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 10/09/2014] [Indexed: 12/02/2022] Open
Abstract
As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.
Collapse
|
47
|
Snyder DA, Grullon J, Huang YJ, Tejero R, Montelione GT. The expanded FindCore method for identification of a core atom set for assessment of protein structure prediction. Proteins 2014; 82 Suppl 2:219-30. [PMID: 24327305 DOI: 10.1002/prot.24490] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Revised: 11/14/2013] [Accepted: 11/19/2013] [Indexed: 11/09/2022]
Abstract
Maximizing the scientific impact of NMR-based structure determination requires robust and statistically sound methods for assessing the precision of NMR-derived structures. In particular, a method to define a core atom set for calculating superimpositions and validating structure predictions is critical to the use of NMR-derived structures as targets in the CASP competition. FindCore (Snyder and Montelione, Proteins 2005;59:673-686) is a superimposition independent method for identifying a core atom set and partitioning that set into domains. However, as FindCore optimizes superimposition by sensitively excluding not-well-defined atoms, the FindCore core may not comprise all atoms suitable for use in certain applications of NMR structures, including the CASP assessment process. Adapting the FindCore approach to assess predicted models against experimental NMR structures in CASP10 required modification of the FindCore method. This paper describes conventions and a standard protocol to calculate an "Expanded FindCore" atom set suitable for validation and application in biological and biophysical contexts. A key application of the Expanded FindCore method is to identify a core set of atoms in the experimental NMR structure for which it makes sense to validate predicted protein structure models. We demonstrate the application of this Expanded FindCore method in characterizing well-defined regions of 18 NMR-derived CASP10 target structures. The Expanded FindCore protocol defines "expanded core atom sets" that match an expert's intuition of which parts of the structure are sufficiently well defined to use in assessing CASP model predictions. We also illustrate the impact of this analysis on the CASP GDT assessment scores.
Collapse
Affiliation(s)
- David A Snyder
- Department of Chemistry, William Paterson University, Wayne, New Jersey, 07470
| | | | | | | | | |
Collapse
|
48
|
Skolnick J, Gao M, Zhou H. On the role of physics and evolution in dictating protein structure and function. Isr J Chem 2014; 54:1176-1188. [PMID: 25484448 PMCID: PMC4255337 DOI: 10.1002/ijch.201400013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
How many of the structural and functional properties of proteins are inherent? Computer simulations provide a powerful tool to address this question. A series of studies on QS, quasi-spherical, compact polypeptides which lack any secondary structure; ART, artificial, proteins comprised of compact homopolypeptides with protein-like secondary structure; and PDB, native, single domain proteins shows that essentially all native global folds, pockets and protein-protein interfaces are in the ART library. This suggests that many protein properties are inherent and that evolution is involved in fine-tuning. The completeness of the space of ligand binding pockets and protein-protein interfaces suggests that promiscuous interactions are intrinsic to proteins and that the capacity to perform the biochemistry of life at low level does not require evolution. If so, this has profound consequences for the origin of life.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| |
Collapse
|
49
|
Deng X, Cheng J. Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information. BMC Bioinformatics 2014; 15:252. [PMID: 25062980 PMCID: PMC4133609 DOI: 10.1186/1471-2105-15-252] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 07/17/2014] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent. CONCLUSION Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods.
Collapse
Affiliation(s)
- Xin Deng
- />LexisNexis | Risk Solutions | Healthcare, Orlando, FL 32811 USA
| | - Jianlin Cheng
- />Computer Science Department, Informatics Institute, C. Bond Life Science Center, University of Missouri-Columbia, Columbia, MO 65211 USA
| |
Collapse
|
50
|
Olechnovič K, Venclovas C. The CAD-score web server: contact area-based comparison of structures and interfaces of proteins, nucleic acids and their complexes. Nucleic Acids Res 2014; 42:W259-63. [PMID: 24838571 PMCID: PMC4086110 DOI: 10.1093/nar/gku294] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The Contact Area Difference score (CAD-score) web server provides a universal framework to compute and analyze discrepancies between different 3D structures of the same biological macromolecule or complex. The server accepts both single-subunit and multi-subunit structures and can handle all the major types of macromolecules (proteins, RNA, DNA and their complexes). It can perform numerical comparison of both structures and interfaces. In addition to entire structures and interfaces, the server can assess user-defined subsets. The CAD-score server performs both global and local numerical evaluations of structural differences between structures or interfaces. The results can be explored interactively using sortable tables of global scores, profiles of local errors, superimposed contact maps and 3D structure visualization. The web server could be used for tasks such as comparison of models with the native (reference) structure, comparison of X-ray structures of the same macromolecule obtained in different states (e.g. with and without a bound ligand), analysis of nuclear magnetic resonance (NMR) structural ensemble or structures obtained in the course of molecular dynamics simulation. The web server is freely accessible at: http://www.ibt.lt/bioinformatics/cad-score.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Vilnius University, Graičiūno 8, Vilnius LT-02241, Lithuania Faculty of Mathematics and Informatics, Vilnius University, Naugarduko 24, Vilnius LT-03225, Lithuania
| | - Ceslovas Venclovas
- Institute of Biotechnology, Vilnius University, Graičiūno 8, Vilnius LT-02241, Lithuania
| |
Collapse
|