1
|
Hu Y, Huang B, Zang CZ, Xu JJ. Detection of circular permutations by Protein Language Models. Comput Struct Biotechnol J 2024; 27:214-220. [PMID: 39866668 PMCID: PMC11757225 DOI: 10.1016/j.csbj.2024.12.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/23/2024] [Accepted: 12/26/2024] [Indexed: 01/28/2025] Open
Abstract
Protein circular permutations are crucial for understanding protein evolution and functionality. Traditional detection methods face challenges: sequence-based approaches struggle with detecting distant homologs, while structure-based approaches are limited by the need for structure generation and often treat proteins as rigid bodies. Protein Language Model-based alignment tools have shown advantages in utilizing sequence information to overcome the challenges of detecting distant homologs without requiring structural input. However, many current Protein Language Model-based alignment methods, which rely on sequence alignment algorithms like the Smith-Waterman algorithm, face significant difficulties when dealing with circular permutation (CP) due to their dependency on linear sequence order. This sequence order dependency makes them unsuitable for accurately detecting CP. Our approach, named plmCP, combines classical genetic principles with modern alignment techniques leveraging Protein Language Models to address these limitations. By integrating genetic knowledge, the plmCP method avoids the sequence order dependency, allowing for effective detection of circular permutations and contributing significantly to protein research and engineering by embracing structural flexibility.
Collapse
Affiliation(s)
- Yue Hu
- School of Bioengineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong 250300, China
- Kyiv College, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong 250300, China
| | - Bin Huang
- School of Life Sciences, Yunnan Normal University, Kunming, Yunnan 650500, China
| | - Chun Zi Zang
- Kyiv College, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong 250300, China
| | - Jia Jie Xu
- School of Bioengineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong 250300, China
| |
Collapse
|
2
|
Benchmarking Methods of Protein Structure Alignment. J Mol Evol 2020; 88:575-597. [PMID: 32725409 DOI: 10.1007/s00239-020-09960-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/10/2020] [Indexed: 10/23/2022]
Abstract
The function of a protein is primarily determined by its structure and amino acid sequence. Many biological questions of interest rely on being able to accurately determine the group of structures to which domains of a protein belong; this can be done through alignment and comparison of protein structures. Dozens of different methods for Protein Structure Alignment (PSA) have been proposed that use a wide range of techniques. The aim of this study is to determine the ability of PSA methods to identify pairs of protein domains known to share differing levels of structural similarity, and to assess their utility for clustering domains from several different folds into known groups. We present the results of a comprehensive investigation into eighteen PSA methods, to our knowledge the largest piece of independent research on this topic. Overall, SP-AlignNS (non-sequential) was found to be the best method for classification, and among the best performing methods for clustering. Methods (where possible) were split into the algorithm used to find the optimal alignment and the score used to assess similarity. This allowed us to largely separate the algorithm from the score it maximizes and thus, to assess their effectiveness independently of each other. Surprisingly, we found that some hybrids of mismatched scores and algorithms performed better than either of the native methods at classification and, in some cases, clustering as well. It is hoped that this investigation and the accompanying discussion will be useful for researchers selecting or designing methods to align protein structures.
Collapse
|
3
|
Guo Z, Chen BY. Conformational Sampling Reveals Amino Acids with a Steric Influence on Specificity. J Comput Biol 2015; 22:861-75. [PMID: 26335806 DOI: 10.1089/cmb.2015.0117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Flexible representations of protein structures can enable structure comparison algorithms to find remotely homologous proteins, even when they have been crystallized in different conformations. By compensating for large spatial variations, these representations can enable these algorithms to better detect remote similarities in the space of protein structures. Subtle variations in protein structures can also have a substantial impact structure comparison. For example, the motion of a single side chain into a binding cavity can make the cavity appear totally dissimilar to identical binding sites, even though, in reality, the presence of the side chain does not affect binding. To address the impact of subtle conformational variations, this article describes FAVA (Flexible Aggregate Volumetric Analysis), an algorithm that enables comparisons of ligand binding sites while compensating for subtle, localized flexibility. FAVA integrates hundreds of conformational samples, sourced from any molecular simulation software that provides all-atom detail, to characterize the geometry of ligand binding sites as they frequently appear. This representation enables rare conformations, as defined by the user, to be excluded from the structural comparison. In our results, on three families of serine proteases and three families of enolases, we show that despite substantial binding site variations, FAVA is able to correctly classify families with different binding preferences. We also demonstrate that FAVA can examine the motion of individual amino acids to identify those that influence ligand binding specificity. Together, these capabilities demonstrate that comparison errors associated with small conformational variations, which can substantially alter the geometry of ligand binding sites and other local features, can be mitigated by an analysis of many conformational samples.
Collapse
Affiliation(s)
- Ziyi Guo
- 1 Department of Computer Science and Engineering, Lehigh University , Fairfax, Virginia
| | - Brian Yuan Chen
- 1 Department of Computer Science and Engineering, Lehigh University , Fairfax, Virginia
| |
Collapse
|
4
|
Adjeroh D, Jiang Y, Jiang BH, Lin J. Network analysis of circular permutations in multidomain proteins reveals functional linkages for uncharacterized proteins. Cancer Inform 2015; 13:109-24. [PMID: 25741177 PMCID: PMC4338801 DOI: 10.4137/cin.s14059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/23/2014] [Accepted: 09/24/2014] [Indexed: 01/19/2023] Open
Abstract
Various studies have implicated different multidomain proteins in cancer. However, there has been little or no detailed study on the role of circular multidomain proteins in the general problem of cancer or on specific cancer types. This work represents an initial attempt at investigating the potential for predicting linkages between known cancer-associated proteins with uncharacterized or hypothetical multidomain proteins, based primarily on circular permutation (CP) relationships. First, we propose an efficient algorithm for rapid identification of both exact and approximate CPs in multidomain proteins. Using the circular relations identified, we construct networks between multidomain proteins, based on which we perform functional annotation of multidomain proteins. We then extend the method to construct subnetworks for selected cancer subtypes, and performed prediction of potential link-ages between uncharacterized multidomain proteins and the selected cancer types. We include practical results showing the performance of the proposed methods.
Collapse
Affiliation(s)
- Donald Adjeroh
- Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| | - Yue Jiang
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| | - Bing-Hua Jiang
- Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jie Lin
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| |
Collapse
|
5
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
6
|
Wang HW, Chu CH, Wang WC, Pai TW. A local average distance descriptor for flexible protein structure comparison. BMC Bioinformatics 2014; 15:95. [PMID: 24694083 PMCID: PMC3992163 DOI: 10.1186/1471-2105-15-95] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 03/22/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structures are flexible and often show conformational changes upon binding to other molecules to exert biological functions. As protein structures correlate with characteristic functions, structure comparison allows classification and prediction of proteins of undefined functions. However, most comparison methods treat proteins as rigid bodies and cannot retrieve similarities of proteins with large conformational changes effectively. RESULTS In this paper, we propose a novel descriptor, local average distance (LAD), based on either the geodesic distances (GDs) or Euclidean distances (EDs) for pairwise flexible protein structure comparison. The proposed method was compared with 7 structural alignment methods and 7 shape descriptors on two datasets comprising hinge bending motions from the MolMovDB, and the results have shown that our method outperformed all other methods regarding retrieving similar structures in terms of precision-recall curve, retrieval success rate, R-precision, mean average precision and F1-measure. CONCLUSIONS Both ED- and GD-based LAD descriptors are effective to search deformed structures and overcome the problems of self-connection caused by a large bending motion. We have also demonstrated that the ED-based LAD is more robust than the GD-based descriptor. The proposed algorithm provides an alternative approach for blasting structure database, discovering previously unknown conformational relationships, and reorganizing protein structure classification.
Collapse
Affiliation(s)
| | | | | | - Tun-Wen Pai
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan.
| |
Collapse
|
7
|
Abstract
Motivation: Structural alignment methods are widely used to generate gold standard alignments for improving multiple sequence alignments and transferring functional annotations, as well as for assigning structural distances between proteins. However, the correctness of the alignments generated by these methods is difficult to assess objectively since little is known about the exact evolutionary history of most proteins. Since homology is an equivalence relation, an upper bound on alignment quality can be found by assessing the consistency of alignments. Measuring the consistency of current methods of structure alignment and determining the causes of inconsistencies can, therefore, provide information on the quality of current methods and suggest possibilities for further improvement. Results: We analyze the self-consistency of seven widely-used structural alignment methods (SAP, TM-align, Fr-TM-align, MAMMOTH, DALI, CE and FATCAT) on a diverse, non-redundant set of 1863 domains from the SCOP database and demonstrate that even for relatively similar proteins the degree of inconsistency of the alignments on a residue level is high (30%). We further show that levels of consistency vary substantially between methods, with two methods (SAP and Fr-TM-align) producing more consistent alignments than the rest. Inconsistency is found to be higher near gaps and for proteins of low structural complexity, as well as for helices. The ability of the methods to identify good structural alignments is also assessed using geometric measures, for which FATCAT (flexible mode) is found to be the best performer despite being highly inconsistent. We conclude that there is substantial scope for improving the consistency of structural alignment methods. Contact:msadows@nimr.mrc.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK
| | | |
Collapse
|
8
|
Venkateswaran JG, Song B, Kahveci T, Jermaine C. TRIAL: a tool for finding distant structural similarities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:819-831. [PMID: 21393655 DOI: 10.1109/tcbb.2009.28] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Finding structural similarities in distantly related proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold ε Å, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square deviation (RMSD) of the alignment is at most ε Å. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments show that our method outperforms existing methods. TRIAL alignment brings the secondary structures of distantly related proteins to similar orientations. It also finds larger number of secondary structure matches at lower RMSD values and increased overall alignment lengths. Its classification accuracy is up to 63 percent better than other methods, including CE and DALI. TRIAL successfully aligns 83 percent of the residues from the smaller protein in reasonable time while other methods align only 29 to 65 percent of the residues for the same set of proteins.
Collapse
|
9
|
Chu CH, Lo WC, Wang HW, Hsu YC, Hwang JK, Lyu PC, Pai TW, Tang CY. Detection and alignment of 3D domain swapping proteins using angle-distance image-based secondary structural matching techniques. PLoS One 2010; 5:e13361. [PMID: 20976204 PMCID: PMC2955075 DOI: 10.1371/journal.pone.0013361] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 09/13/2010] [Indexed: 11/18/2022] Open
Abstract
This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had “opened” their “closed” structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications.
Collapse
Affiliation(s)
- Chia-Han Chu
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Hsin-Wei Wang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Yen-Chu Hsu
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Ping-Chiang Lyu
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Tun-Wen Pai
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| |
Collapse
|
10
|
Schmidt-Goenner T, Guerler A, Kolbeck B, Knapp EW. Circular permuted proteins in the universe of protein folds. Proteins 2009; 78:1618-30. [DOI: 10.1002/prot.22678] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
11
|
Abstract
Circular permutation (CP) in a protein can be considered as if its sequence were circularized followed by a creation of termini at a new location. Since the first observation of CP in 1979, a substantial number of studies have concluded that circular permutants (CPs) usually retain native structures and functions, sometimes with increased stability or functional diversity. Although this interesting property has made CP useful in many protein engineering and folding researches, large-scale collections of CP-related information were not available until this study. Here we describe CPDB, the first CP DataBase. The organizational principle of CPDB is a hierarchical categorization in which pairs of circular permutants are grouped into CP clusters, which are further grouped into folds and in turn classes. Additions to CPDB include a useful set of tools and resources for the identification, characterization, comparison and visualization of CP. Besides, several viable CP site prediction methods are implemented and assessed in CPDB. This database can be useful in protein folding and evolution studies, the discovery of novel protein structural and functional relationships, and facilitating the production of new CPs with unique biotechnical or industrial interests. The CPDB database can be accessed at http://sarst.life.nthu.edu.tw/cpdb
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | | | | | |
Collapse
|
12
|
Xie L, Bourne PE. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci U S A 2008; 105:5441-6. [PMID: 18385384 PMCID: PMC2291117 DOI: 10.1073/pnas.0704422105] [Citation(s) in RCA: 181] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Indexed: 11/18/2022] Open
Abstract
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structure relationship remains obscure. Results suggest evolutionary relationships across several previously evolutionary distinct protein structure superfamilies. SOIPPA, along with an increased coverage of protein fold space afforded by the structural genomics initiative, can be used to further test the notion that fold space is continuous rather than discrete.
Collapse
Affiliation(s)
- Lei Xie
- *San Diego Supercomputer Center and
| | - Philip E. Bourne
- *San Diego Supercomputer Center and
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093
| |
Collapse
|
13
|
Lo WC, Lyu PC. CPSARST: an efficient circular permutation search tool applied to the detection of novel protein structural relationships. Genome Biol 2008; 9:R11. [PMID: 18201387 PMCID: PMC2395249 DOI: 10.1186/gb-2008-9-1-r11] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2007] [Revised: 11/19/2007] [Accepted: 01/18/2008] [Indexed: 12/04/2022] Open
Abstract
CPSARST (Circular Permutation Search Aided by Ramachandran Sequential Transformation) is an efficient database search tool that provides a new way for rapidly detecting novel relationships among proteins. Circular permutation of a protein can be visualized as if the original amino- and carboxyl termini were linked and new ones created elsewhere. It has been well-documented that circular permutants usually retain native structures and biological functions. Here we report CPSARST (Circular Permutation Search Aided by Ramachandran Sequential Transformation) to be an efficient database search tool. In this post-genomics era, when the amount of protein structural data is increasing exponentially, it provides a new way to rapidly detect novel relationships among proteins.
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | |
Collapse
|
14
|
Barthel D, Hirst JD, Błażewicz J, Burke EK, Krasnogor N. ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information. BMC Bioinformatics 2007; 8:416. [PMID: 17963510 PMCID: PMC2222653 DOI: 10.1186/1471-2105-8-416] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Accepted: 10/26/2007] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures. RESULTS We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure. CONCLUSION Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface.ProCKSI is publicly available at http://www.procksi.net for academic and non-commercial use.
Collapse
Affiliation(s)
- Daniel Barthel
- ASAP, School of Computer Science and IT, University of Nottingham, Nottingham, NG8 1BB, UK
| | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, Nottingham, NG7 2RD, UK
| | - Jacek Błażewicz
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
- The Institute of Computing Science, 60-965 Poznan, Poland
| | - Edmund K Burke
- ASAP, School of Computer Science and IT, University of Nottingham, Nottingham, NG8 1BB, UK
| | - Natalio Krasnogor
- ASAP, School of Computer Science and IT, University of Nottingham, Nottingham, NG8 1BB, UK
| |
Collapse
|
15
|
Taylor WR. Evolutionary transitions in protein fold space. Curr Opin Struct Biol 2007; 17:354-61. [PMID: 17580115 DOI: 10.1016/j.sbi.2007.06.002] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Revised: 04/11/2007] [Accepted: 06/06/2007] [Indexed: 10/23/2022]
Abstract
With the number of known protein folds potentially approaching completion, the problems associated with their systematic classification are evaluated. It is argued that it will be difficult, if not impossible, to find a general metric based on pairwise comparison that will provide a satisfactory classification. It is suggested that some progress may be made through comparison against a library of idealised 'template' folds, but a proper solution can only be attained if this includes a model of the underlying evolutionary processes. These processes are considered with examples of some unexpected relationships among folds, including circular permutations. The problem is finally set in the wider context of the genetic environment, introducing complications relating to introns, gene fixation and population size.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| |
Collapse
|
16
|
Weekes D, Miller MD, Krishna SS, McMullan D, McPhillips TM, Acosta C, Canaves JM, Elsliger MA, Floyd R, Grzechnik SK, Jaroszewski L, Klock HE, Koesema E, Kovarik JS, Kreusch A, Morse AT, Quijano K, Spraggon G, van den Bedem H, Wolf G, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA. Crystal structure of a transcription regulator (TM1602) from Thermotoga maritima at 2.3 A resolution. Proteins 2007; 67:247-52. [PMID: 17256761 DOI: 10.1002/prot.21221] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
17
|
Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics 2006; 7:510. [PMID: 17118190 PMCID: PMC1683948 DOI: 10.1186/1471-2105-7-510] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2006] [Accepted: 11/21/2006] [Indexed: 11/13/2022] Open
Abstract
Background Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). Results We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. α-helices and β-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. Conclusion As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity.
Collapse
|