1
|
Breiteneder H, Kraft D. The History and Science of the Major Birch Pollen Allergen Bet v 1. Biomolecules 2023; 13:1151. [PMID: 37509186 PMCID: PMC10377203 DOI: 10.3390/biom13071151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/14/2023] [Accepted: 07/17/2023] [Indexed: 07/30/2023] Open
Abstract
The term allergy was coined in 1906 by the Austrian scientist and pediatrician Clemens Freiherr von Pirquet. In 1976, Dietrich Kraft became the head of the Allergy and Immunology Research Group at the Department of General and Experimental Pathology of the University of Vienna. In 1983, Kraft proposed to replace natural extracts used in allergy diagnostic tests and vaccines with recombinant allergen molecules and persuaded Michael Breitenbach to contribute his expertise in molecular cloning as one of the mentors of this project. Thus, the foundation for the Vienna School of Molecular Allergology was laid. With the recruitment of Heimo Breiteneder as a young molecular biology researcher, the work began in earnest, resulting in the publication of the cloning of the first plant allergen Bet v 1 in 1989. Bet v 1 has become the subject of a very large number of basic scientific as well as clinical studies. Bet v 1 is also the founding member of the large Bet v 1-like superfamily of proteins with members-based on the ancient conserved Bet v 1 fold-being present in all three domains of life, i.e., archaea, bacteria and eukaryotes. This suggests that the Bet v 1 fold most likely already existed in the last universal common ancestor. The biological function of this protein was probably related to lipid binding. However, during evolution, a functional diversity within the Bet v 1-like superfamily was established. The superfamily comprises 25 families, one of which is the Bet v 1 family, which in turn is composed of 11 subfamilies. One of these, the PR-10-like subfamily of proteins, contains almost all of the Bet v 1 homologous allergens from pollen and plant foods. Structural and functional comparisons of Bet v 1 and its non-allergenic homologs of the superfamily will pave the way for a deeper understanding of the allergic sensitization process.
Collapse
Affiliation(s)
- Heimo Breiteneder
- Division of Medical Biotechnology, Department of Pathophysiology and Allergy Research, Center of Pathophysiology, Infectiology and Immunology, Medical University of Vienna, 1090 Vienna, Austria
| | - Dietrich Kraft
- Division of Medical Biotechnology, Department of Pathophysiology and Allergy Research, Center of Pathophysiology, Infectiology and Immunology, Medical University of Vienna, 1090 Vienna, Austria
| |
Collapse
|
2
|
Kondra S, Sarkar T, Raghavan V, Xu W. Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery. Front Chem 2021; 8:602291. [PMID: 33520934 PMCID: PMC7838567 DOI: 10.3389/fchem.2020.602291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 12/14/2020] [Indexed: 11/24/2022] Open
Abstract
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across proteins. A structure is thereby represented by a vector of integers. Our method is able to accurately quantify similarity of structure or substructure by matching numbers of identical keys between two proteins. The uniqueness of our method includes: (i) a unique way to represent structures to avoid performing structural superimposition; (ii) use of triangles to represent substructures as it is the simplest primitive to capture shape; (iii) complex structure comparison is achieved by matching integers corresponding to multiple TSRs. Every substructure of one protein is compared to every other substructure in a different protein. The method is used in the studies of proteases and kinases because they play essential roles in cell signaling, and a majority of these constitute drug targets. The new motifs or substructures we identified specifically for proteases and kinases provide a deeper insight into their structural relations. Furthermore, the method provides a unique way to study protein conformational changes. In addition, the results from CATH and SCOP data sets clearly demonstrate that our method can distinguish alpha helices from beta pleated sheets and vice versa. Our method has the potential to be developed into a powerful tool for efficient structure-BLAST search and comparison, just as BLAST is for sequence search and alignment.
Collapse
Affiliation(s)
- Sarika Kondra
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Titli Sarkar
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, Lafayette, LA, United States
| |
Collapse
|
3
|
Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics 2020; 36:478-486. [PMID: 31384919 DOI: 10.1093/bioinformatics/btz609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/22/2019] [Accepted: 08/02/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Protein structure alignment is one of the fundamental problems in computational structure biology. A variety of algorithms have been developed to address this important issue in the past decade. However, due to their heuristic nature, current structure alignment methods may suffer from suboptimal alignment and/or over-fragmentation and thus lead to a biologically wrong alignment in some cases. To overcome these limitations, we have developed an accurate topology-independent and global structure alignment method through an FFT-based exhaustive search algorithm, which is referred to as FTAlign. RESULTS Our FTAlign algorithm was extensively tested on six commonly used datasets and compared with seven state-of-the-art structure alignment approaches, TMalign, DeepAlign, Kpax, 3DCOMB, MICAN, SPalignNS and CLICK. It was shown that FTAlign outperformed the other methods in reproducing manually curated alignments and obtained a high success rate of 96.7 and 90.0% on two gold-standard benchmarks, MALIDUP and MALISAM, respectively. Moreover, FTAlign also achieved the overall best performance in terms of biologically meaningful structure overlap (SO) and TMscore on both the sequential alignment test sets including MALIDUP, MALISAM and 64 difficult cases from HOMSTRAD, and the non-sequential sets including MALIDUP-NS, MALISAM-NS, 199 topology-different cases, where FTAlign especially showed more advantage for non-sequential alignment. Despite its global search feature, FTAlign is also computationally efficient and can normally complete a pairwise alignment within one second. AVAILABILITY AND IMPLEMENTATION http://huanglab.phys.hust.edu.cn/ftalign/.
Collapse
Affiliation(s)
- Zeyu Wen
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| |
Collapse
|
4
|
Joung I, Kim JY, Joo K, Lee J. Non-sequential protein structure alignment by conformational space annealing and local refinement. PLoS One 2019; 14:e0210177. [PMID: 30699145 PMCID: PMC6353097 DOI: 10.1371/journal.pone.0210177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 12/18/2018] [Indexed: 11/18/2022] Open
Abstract
Protein structure alignment is an important tool for studying evolutionary biology and protein modeling. A tool which intensively searches for the globally optimal non-sequential alignments is rarely found. We propose ALIGN-CSA which shows improvement in scores, such as DALI-score, SP-score, SO-score and TM-score over the benchmark set including 286 cases. We performed benchmarking of existing popular alignment scoring functions, where the dependence of the search algorithm was effectively eliminated by using ALIGN-CSA. For the benchmarking, we set the minimum block size to 4 to prevent much fragmented alignments where the biological relevance of small alignment blocks is hard to interpret. With this condition, globally optimal alignments were searched by ALIGN-CSA using the four scoring functions listed above, and TM-score is found to be the most effective in generating alignments with longer match lengths and smaller RMSD values. However, DALI-score is the most effective in generating alignments similar to the manually curated reference alignments, which implies that DALI-score is more biologically relevant score. Due to the high demand on computational resources of ALIGN-CSA, we also propose a relatively fast local refinement method, which can control the minimum block size and whether to allow the reverse alignment. ALIGN-CSA can be used to obtain much improved alignment at the cost of relatively more extensive computation. For faster alignment, we propose a refinement protocol that improves the score of a given alignment obtained by various external tools. All programs are available from http://lee.kias.re.kr.
Collapse
Affiliation(s)
- InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Keehyoung Joo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
5
|
Morales-Cordovilla JA, Sanchez V, Ratajczak M. Protein alignment based on higher order conditional random fields for template-based modeling. PLoS One 2018; 13:e0197912. [PMID: 29856860 PMCID: PMC5983487 DOI: 10.1371/journal.pone.0197912] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 05/10/2018] [Indexed: 11/19/2022] Open
Abstract
The query-template alignment of proteins is one of the most critical steps of template-based modeling methods used to predict the 3D structure of a query protein. This alignment can be interpreted as a temporal classification or structured prediction task and first order Conditional Random Fields have been proposed for protein alignment and proven to be rather successful. Some other popular structured prediction problems, such as speech or image classification, have gained from the use of higher order Conditional Random Fields due to the well known higher order correlations that exist between their labels and features. In this paper, we propose and describe the use of higher order Conditional Random Fields for query-template protein alignment. The experiments carried out on different public datasets validate our proposal, especially on distantly-related protein pairs which are the most difficult to align.
Collapse
Affiliation(s)
| | - Victoria Sanchez
- Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Granada, Spain
| | - Martin Ratajczak
- Graz University of Technology, Signal Processing and Speech Communication Laboratory, Graz, Austria
| |
Collapse
|
6
|
Cao H, Lu Y. Using Variable-Length Aligned Fragment Pairs and an Improved Transition Function for Flexible Protein Structure Alignment. J Comput Biol 2017; 24:2-12. [DOI: 10.1089/cmb.2016.0135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Hu Cao
- School of Information Science and Engineering, Lanzhou University, Gansu 730000, Lanzhou, China
| | - Yonggang Lu
- School of Information Science and Engineering, Lanzhou University, Gansu 730000, Lanzhou, China
| |
Collapse
|
7
|
Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015; 17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open
Abstract
This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
Collapse
|
8
|
Bawono P, van der Velde A, Abeln S, Heringa J. Quantifying the displacement of mismatches in multiple sequence alignment benchmarks. PLoS One 2015; 10:e0127431. [PMID: 25993129 PMCID: PMC4438059 DOI: 10.1371/journal.pone.0127431] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/14/2015] [Indexed: 11/18/2022] Open
Abstract
Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.
Collapse
Affiliation(s)
- Punto Bawono
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| | - Arjan van der Velde
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Molecules Medicines and Systems (AIMMS), VU University Amsterdam, Amsterdam, The Netherlands
- * E-mail: (PB); (JH)
| |
Collapse
|
9
|
Konagurthu AS, Kasarapu P, Allison L, Collier JH, Lesk AM. On sufficient statistics of least-squares superposition of vector sets. J Comput Biol 2015; 22:487-97. [PMID: 25695500 DOI: 10.1089/cmb.2014.0154] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.
Collapse
Affiliation(s)
- Arun S Konagurthu
- 1Clayton School of Computer Science and Information Technology, Faculty of Information Technology, Monash University, Clayton, Australia
| | - Parthan Kasarapu
- 1Clayton School of Computer Science and Information Technology, Faculty of Information Technology, Monash University, Clayton, Australia
| | - Lloyd Allison
- 1Clayton School of Computer Science and Information Technology, Faculty of Information Technology, Monash University, Clayton, Australia
| | - James H Collier
- 1Clayton School of Computer Science and Information Technology, Faculty of Information Technology, Monash University, Clayton, Australia
| | - Arthur M Lesk
- 2The Huck Institute of Genomics, Proteomics and Bioinformatics, Pennsylvania State University, University Park, Pennsylvania.,3Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania
| |
Collapse
|
10
|
Tong J, Pei J, Otwinowski Z, Grishin NV. Refinement by shifting secondary structure elements improves sequence alignments. Proteins 2015; 83:411-27. [PMID: 25546158 DOI: 10.1002/prot.24746] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 11/25/2014] [Accepted: 12/10/2014] [Indexed: 01/09/2023]
Abstract
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa.
Collapse
Affiliation(s)
- Jing Tong
- Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390; Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390
| | | | | | | |
Collapse
|
11
|
Gniewek P, Kolinski A, Kloczkowski A, Gront D. BioShell-Threading: versatile Monte Carlo package for protein 3D threading. BMC Bioinformatics 2014; 15:22. [PMID: 24444459 PMCID: PMC3937128 DOI: 10.1186/1471-2105-15-22] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2012] [Accepted: 11/18/2013] [Indexed: 11/26/2022] Open
Abstract
Background The comparative modeling approach to protein structure prediction inherently relies on a template structure. Before building a model such a template protein has to be found and aligned with the query sequence. Any error made on this stage may dramatically affects the quality of result. There is a need, therefore, to develop accurate and sensitive alignment protocols. Results BioShell threading software is a versatile tool for aligning protein structures, protein sequences or sequence profiles and query sequences to a template structures. The software is also capable of sub-optimal alignment generation. It can be executed as an application from the UNIX command line, or as a set of Java classes called from a script or a Java application. The implemented Monte Carlo search engine greatly facilitates the development and benchmarking of new alignment scoring schemes even when the functions exhibit non-deterministic polynomial-time complexity. Conclusions Numerical experiments indicate that the new threading application offers template detection abilities and provides much better alignments than other methods. The package along with documentation and examples is available at: http://bioshell.pl/threading3d.
Collapse
Affiliation(s)
| | | | | | - Dominik Gront
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| |
Collapse
|
12
|
Konagurthu AS, Kasarapu P, Allison L, Collier JH, Lesk AM. On Sufficient Statistics of Least-Squares Superposition of Vector Sets. LECTURE NOTES IN COMPUTER SCIENCE 2014. [DOI: 10.1007/978-3-319-05269-4_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
13
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|
14
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
15
|
Protein structure alignment beyond spatial proximity. Sci Rep 2013; 3:1448. [PMID: 23486213 PMCID: PMC3596798 DOI: 10.1038/srep01448] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 02/25/2013] [Indexed: 11/08/2022] Open
Abstract
Protein structure alignment is a fundamental problem in computational structure biology. Many programs have been developed for automatic protein structure alignment, but most of them align two protein structures purely based upon geometric similarity without considering evolutionary and functional relationship. As such, these programs may generate structure alignments which are not very biologically meaningful from the evolutionary perspective. This paper presents a novel method DeepAlign for automatic pairwise protein structure alignment. DeepAlign aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary relationship and hydrogen-bonding similarity. Experimental results show that DeepAlign can generate structure alignments much more consistent with manually-curated alignments than other automatic tools especially when proteins under consideration are remote homologs. These results imply that in addition to geometric similarity, evolutionary information and hydrogen-bonding similarity are essential to aligning two protein structures.
Collapse
|
16
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
17
|
Abstract
Motivation: Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%). Results: We present a novel protein threading method, CNFpred, which achieves much more accurate sequence–template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment. Contact:j3xu@ttic.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianzhu Ma
- Toyota Technological Institute at Chicago, IL 60637, USA
| | | | | | | |
Collapse
|
18
|
Ritchie DW, Ghoorah AW, Mavridis L, Venkatraman V. Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity. Bioinformatics 2012; 28:3274-81. [DOI: 10.1093/bioinformatics/bts618] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
19
|
Shah SB, Sahinidis NV. SAS-Pro: simultaneous residue assignment and structure superposition for protein structure alignment. PLoS One 2012; 7:e37493. [PMID: 22662161 PMCID: PMC3360771 DOI: 10.1371/journal.pone.0037493] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 04/24/2012] [Indexed: 11/19/2022] Open
Abstract
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.
Collapse
Affiliation(s)
- Shweta B. Shah
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Nikolaos V. Sahinidis
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
20
|
Abstract
Motivation: Structural alignment methods are widely used to generate gold standard alignments for improving multiple sequence alignments and transferring functional annotations, as well as for assigning structural distances between proteins. However, the correctness of the alignments generated by these methods is difficult to assess objectively since little is known about the exact evolutionary history of most proteins. Since homology is an equivalence relation, an upper bound on alignment quality can be found by assessing the consistency of alignments. Measuring the consistency of current methods of structure alignment and determining the causes of inconsistencies can, therefore, provide information on the quality of current methods and suggest possibilities for further improvement. Results: We analyze the self-consistency of seven widely-used structural alignment methods (SAP, TM-align, Fr-TM-align, MAMMOTH, DALI, CE and FATCAT) on a diverse, non-redundant set of 1863 domains from the SCOP database and demonstrate that even for relatively similar proteins the degree of inconsistency of the alignments on a residue level is high (30%). We further show that levels of consistency vary substantially between methods, with two methods (SAP and Fr-TM-align) producing more consistent alignments than the rest. Inconsistency is found to be higher near gaps and for proteins of low structural complexity, as well as for helices. The ability of the methods to identify good structural alignments is also assessed using geometric measures, for which FATCAT (flexible mode) is found to be the best performer despite being highly inconsistent. We conclude that there is substantial scope for improving the consistency of structural alignment methods. Contact:msadows@nimr.mrc.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK
| | | |
Collapse
|
21
|
Dukka BKC, Tomita E, Suzuki J, Horimoto K, Akutsu T. PROTEIN THREADING WITH PROFILES AND DISTANCE CONSTRAINTS USING CLIQUE BASED ALGORITHMS. J Bioinform Comput Biol 2011; 4:19-42. [PMID: 16568540 DOI: 10.1142/s0219720006001680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2005] [Revised: 07/31/2005] [Accepted: 07/31/2005] [Indexed: 11/18/2022]
Abstract
With the advent of experimental technologies like chemical cross-linking, it has become possible to obtain distances between specific residues of a newly sequenced protein. These types of experiments usually are less time consuming than X-ray crystallography or NMR. Consequently, it is highly desired to develop a method that incorporates this distance information to improve the performance of protein threading methods. However, protein threading with profiles in which constraints on distances between residues are given is known to be NP-hard. By using the notion of a maximum edge-weight clique finding algorithm, we introduce a more efficient method called FTHREAD for profile threading with distance constraints that is 18 times faster than its predecessor CLIQUETHREAD. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with Non-strict constraints. The overall performance of FTHREAD on a data set shows that although our algorithm uses a simple threading function, our algorithm performs equally well as some of the existing methods. Particularly, when there are some unsatisfied constraints, NTHREAD (Non-strict constraints threading algorithm) performs better than threading with FTHREAD (Strict constraints threading algorithm). We have also analyzed the effects of using a number of distance constraints. This algorithm helps the enhancement of alignment quality between the query sequence and template structure, once the corresponding template structure is determined for the target sequence.
Collapse
Affiliation(s)
- Bahadur K C Dukka
- Graduate School of Informatics & Bioinformatics Center, Kyoto University, Kyoto 611-0001, Japan.
| | | | | | | | | |
Collapse
|
22
|
Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T. Incorporation of local structural preference potential improves fold recognition. PLoS One 2011; 6:e17215. [PMID: 21365008 PMCID: PMC3041821 DOI: 10.1371/journal.pone.0017215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/25/2011] [Indexed: 11/19/2022] Open
Abstract
Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models.
Collapse
Affiliation(s)
- Yun Hu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxi Dong
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Aiping Wu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yang Cao
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Liqing Tian
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Taijiao Jiang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- * E-mail:
| |
Collapse
|
23
|
Abstract
Despite its apparent simplicity, the problem of quantifying the differences between two structures of the same protein or complex is nontrivial and continues evolving. In this chapter, we described several methods routinely used to compare computational models to experimental answers in several modeling assessments. The two major classes of measures, positional distance-based and contact-based, are presented, compared, and analyzed. The most popular measure of the first class, the global RMSD, is shown to be the least representative of the degree of structural similarity because it is dominated by the largest error. Several distance-dependent algorithms designed to attenuate the drawbacks of RMSD are described. Measures of the second class, contact-based, are shown to be more robust and relevant. We also illustrate the importance of using combined measures, utility-based measures, and the role of the distributions derived from the pairs of experimental structures in interpreting the results.
Collapse
Affiliation(s)
- Irina Kufareva
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, USA
| | | |
Collapse
|
24
|
Abstract
Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A protein is low-homology if we cannot obtain sufficient amount of homologous information for it from existing protein sequence databases. Results: We present a profile-entropy dependent scoring function for low-homology protein threading. This method will model correlation among various protein features and determine their relative importance according to the amount of homologous information available. When proteins under consideration are low-homology, our method will rely more on structure information; otherwise, homologous information. Experimental results indicate that our threading method greatly outperforms the best profile-based method HHpred and all the top CASP8 servers on low-homology proteins. Tested on the CASP8 hard targets, our threading method is also better than all the top CASP8 servers but slightly worse than Zhang-Server. This is significant considering that Zhang-Server and other top CASP8 servers use a combination of multiple structure-prediction techniques including consensus method, multiple-template modeling, template-free modeling and model refinement while our method is a classical single-template-based threading method without any post-threading refinement. Contact:jinboxu@gmail.com
Collapse
Affiliation(s)
- Jian Peng
- Toyota Technological Institute at Chicago, IL 60637, USA
| | | |
Collapse
|
25
|
Abstract
We developed and tested RAPTOR++ in CASP8 for protein structure prediction. RAPTOR++ contains four modules: threading, model quality assessment, multiple protein alignment, and template-free modeling. RAPTOR++ first threads a target protein to all the templates using three methods and then predicts the quality of the 3D model implied by each alignment using a model quality assessment method. Based upon the predicted quality, RAPTOR++ employs different strategies as follows. If multiple alignments have good quality, RAPTOR++ builds a multiple protein alignment between the target and top templates and then generates a 3D model using MODELLER. If all the alignments have very low quality, RAPTOR++ uses template-free modeling. Otherwise, RAPTOR++ submits a threading-generated 3D model with the best quality. RAPTOR++ was not ready for the first 1/3 targets and was under development during the whole CASP8 season. The template-based and template-free modeling modules in RAPTOR++ are not closely integrated. We are using our template-free modeling technique to refine template-based models.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
26
|
Chi PH, Pang B, Korkin D, Shyu CR. Efficient SCOP-fold classification and retrieval using index-based protein substructure alignments. ACTA ACUST UNITED AC 2009; 25:2559-65. [PMID: 19667079 DOI: 10.1093/bioinformatics/btp474] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION To investigate structure-function relationships, life sciences researchers usually retrieve and classify proteins with similar substructures into the same fold. A manually constructed database, SCOP, is believed to be highly accurate; however, it is labor intensive. Another known method, DALI, is also precise but computationally expensive. We have developed an efficient algorithm, namely, index-based protein substructure alignment (IPSA), for protein-fold classification. IPSA constructs a two-layer indexing tree to quickly retrieve similar substructures in proteins and suggests possible folds by aligning these substructures. RESULTS Compared with known algorithms, such as DALI, CE, MultiProt and MAMMOTH, on a sample dataset of non-redundant proteins from SCOP v1.73, IPSA exhibits an efficiency improvement of 53.10, 16.87, 3.60 and 1.64 times speedup, respectively. Evaluated on three different datasets of non-redundant proteins from SCOP, average accuracy of IPSA is approximately equal to DALI and better than CE, MAMMOTH, MultiProt and SSM. With reliable accuracy and efficiency, this work will benefit the study of high-throughput protein structure-function relationships. AVAILABILITY IPSA is publicly accessible at http://ProteinDBS.rnet.missouri.edu/IPSA.php
Collapse
Affiliation(s)
- Pin-Hao Chi
- Medical and Biological Digital Library Research Lab, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
27
|
Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 2009; 19:341-8. [PMID: 19481444 DOI: 10.1016/j.sbi.2009.04.003] [Citation(s) in RCA: 303] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 04/16/2009] [Indexed: 11/30/2022]
Abstract
Structure comparison opens a window into the distant past of protein evolution, which has been unreachable by sequence comparison alone. With 55,000 entries in the Protein Data Bank and about 500 new structures added each week, automated processing, comparison, and classification are necessary. A variety of methods use different representations, scoring functions, and optimization algorithms, and they generate contradictory results even for moderately distant structures. Sequence mutations, insertions, and deletions are accommodated by plastic deformations of the common core, retaining the precise geometry of the active site, and peripheral regions may refold completely. Therefore structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments. Active research directions include both the search for fold invariant features and the modeling of structural transitions in evolution. Advances have been made in algorithmic robustness, multiple alignment, and speeding up database searches.
Collapse
Affiliation(s)
- Hitomi Hasegawa
- Institute of Biotechnology, University of Helsinki, P.O. Box 56 (Viikinkaari 5), 00014 University of Helsinki, Finland
| | | |
Collapse
|
28
|
Abstract
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server atwww.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
Collapse
Affiliation(s)
- Thomas Margraf
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.
| | | | | |
Collapse
|
29
|
Peng J, Xu J. Boosting Protein Threading Accuracy. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2009; 5541:31-45. [PMID: 22506254 DOI: 10.1007/978-3-642-02008-7_3] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.
Collapse
|
30
|
Abstract
MOTIVATION The recent discovery of tiny RNA molecules such as microRNAs and small interfering RNA are transforming the view of RNA as a simple information transfer molecule. Similar to proteins, the native three-dimensional structure of RNA determines its biological activity. Therefore, classifying the current structural space is paramount for functionally annotating RNA molecules. The increasing numbers of RNA structures deposited in the PDB requires more accurate, automatic and benchmarked methods for RNA structure comparison. In this article, we introduce a new algorithm for RNA structure alignment based on a unit-vector approach. The algorithm has been implemented in the SARA program, which results in RNA structure pairwise alignments and their statistical significance. RESULTS The SARA program has been implemented to be of general applicability even when no secondary structure can be calculated from the RNA structures. A benchmark against the ARTS program using a set of 1275 non-redundant pairwise structure alignments results in inverted approximately 6% extra alignments with at least 50% structurally superposed nucleotides and base pairs. A first attempt to perform RNA automatic functional annotation based on structure alignments indicates that SARA can correctly assign the deepest SCOR classification to >60% of the query structures. AVAILABILITY The SARA program is freely available through a World Wide Web server http://sgu.bioinfo.cipf.es/services/SARA/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emidio Capriotti
- Bioinformatics and Genomics Department, Structural Genomics Unit, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | | |
Collapse
|
31
|
Radauer C, Lackner P, Breiteneder H. The Bet v 1 fold: an ancient, versatile scaffold for binding of large, hydrophobic ligands. BMC Evol Biol 2008; 8:286. [PMID: 18922149 PMCID: PMC2577659 DOI: 10.1186/1471-2148-8-286] [Citation(s) in RCA: 196] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2008] [Accepted: 10/15/2008] [Indexed: 11/10/2022] Open
Abstract
Background The major birch pollen allergen, Bet v 1, is a member of the ubiquitous PR-10 family of plant pathogenesis-related proteins. In recent years, a number of diverse plant proteins with low sequence similarity to Bet v 1 was identified. In addition, determination of the Bet v 1 structure revealed the existence of a large superfamily of structurally related proteins. In this study, we aimed to identify and classify all Bet v 1-related structures from the Protein Data Bank and all Bet v 1-related sequences from the Uniprot database. Results Structural comparisons of representative members of already known protein families structurally related to Bet v 1 with all entries of the Protein Data Bank yielded 47 structures with non-identical sequences. They were classified into eleven families, five of which were newly identified and not included in the Structural Classification of Proteins database release 1.71. The taxonomic distribution of these families extracted from the Pfam protein family database showed that members of the polyketide cyclase family and the activator of Hsp90 ATPase homologue 1 family were distributed among all three superkingdoms, while members of some bacterial families were confined to a small number of species. Comparison of ligand binding activities of Bet v 1-like superfamily members revealed that their functions were related to binding and metabolism of large, hydrophobic compounds such as lipids, hormones, and antibiotics. Phylogenetic relationships within the Bet v 1 family, defined as the group of proteins with significant sequence similarity to Bet v 1, were determined by aligning 264 Bet v 1-related sequences. A distance-based phylogenetic tree yielded a classification into 11 subfamilies, nine exclusively containing plant sequences and two subfamilies of bacterial proteins. Plant sequences included the pathogenesis-related proteins 10, the major latex proteins/ripening-related proteins subfamily, and polyketide cyclase-like sequences. Conclusion The ubiquitous distribution of Bet v 1-related proteins among all superkingdoms suggests that a Bet v 1-like protein was already present in the last universal common ancestor. During evolution, this protein diversified into numerous families with low sequence similarity but with a common fold that succeeded as a versatile scaffold for binding of bulky ligands.
Collapse
Affiliation(s)
- Christian Radauer
- Department of Pathophysiology, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria.
| | | | | |
Collapse
|
32
|
Mosca R, Brannetti B, Schneider TR. Alignment of protein structures in the presence of domain motions. BMC Bioinformatics 2008; 9:352. [PMID: 18727838 PMCID: PMC2535786 DOI: 10.1186/1471-2105-9-352] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 08/27/2008] [Indexed: 11/22/2022] Open
Abstract
Background Structural alignment is an important step in protein comparison. Well-established methods exist for solving this problem under the assumption that the structures under comparison are considered as rigid bodies. However, proteins are flexible entities often undergoing movements that alter the positions of domains or subdomains with respect to each other. Such movements can impede the identification of structural equivalences when rigid aligners are used. Results We introduce a new method called RAPIDO (Rapid Alignment of Proteins in terms of Domains) for the three-dimensional alignment of protein structures in the presence of conformational changes. The flexible aligner is coupled to a genetic algorithm for the identification of structurally conserved regions. RAPIDO is capable of aligning protein structures in the presence of large conformational changes. Structurally conserved regions are reliably detected even if they are discontinuous in sequence but continuous in space and can be used for superpositions revealing subtle differences. Conclusion RAPIDO is more sensitive than other flexible aligners when applied to cases of closely homologues proteins undergoing large conformational changes. When applied to a set of kinase structures it is able to detect similarities that are missed by other alignment algorithms. The algorithm is sufficiently fast to be applied to the comparison of large sets of protein structures.
Collapse
Affiliation(s)
- Roberto Mosca
- IFOM, FIRC Institute for Molecular Oncology Foundation, Via Adamello 16, 20139, Milan, Italy.
| | | | | |
Collapse
|
33
|
Wang S, Zheng WM. CLePAPS: fast pair alignment of protein structures based on conformational letters. J Bioinform Comput Biol 2008; 6:347-66. [PMID: 18464327 DOI: 10.1142/s0219720008003461] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2007] [Revised: 11/22/2007] [Accepted: 12/05/2007] [Indexed: 11/18/2022]
Abstract
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Calpha pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.
Collapse
Affiliation(s)
- Sheng Wang
- Institute of Theoretical Physics, Academia Sinica, Beijing 100080, China
| | | |
Collapse
|
34
|
Ahola V, Aittokallio T, Vihinen M, Uusipaikka E. Model-based prediction of sequence alignment quality. Bioinformatics 2008; 24:2165-71. [DOI: 10.1093/bioinformatics/btn414] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Isomaltose production by modification of the fructose-binding site on the basis of the predicted structure of sucrose isomerase from "Protaminobacter rubrum". Appl Environ Microbiol 2008; 74:5183-94. [PMID: 18552181 DOI: 10.1128/aem.00181-08] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
"Protaminobacter rubrum" sucrose isomerase (SI) catalyzes the isomerization of sucrose to isomaltulose and trehalulose. SI catalyzes the hydrolysis of the glycosidic bond with retention of the anomeric configuration via a mechanism that involves a covalent glycosyl enzyme intermediate. It possesses a (325)RLDRD(329) motif, which is highly conserved and plays an important role in fructose binding. The predicted three-dimensional active-site structure of SI was superimposed on and compared with those of other alpha-glucosidases in family 13. We identified two Arg residues that may play important roles in SI-substrate binding with weak ionic strength. Mutations at Arg(325) and Arg(328) in the fructose-binding site reduced isomaltulose production and slightly increased trehalulose production. In addition, the perturbed interactions between the mutated residues and fructose at the fructose-binding site seemed to have altered the binding affinity of the site, where glucose could now bind and be utilized as a second substrate for isomaltose production. From eight mutant enzymes designed based on structural analysis, the R(325)Q mutant enzyme exhibiting high relative activity for isomaltose production was selected. We recorded 40.0% relative activity at 15% (wt/vol) additive glucose with no temperature shift; the maximum isomaltose concentration and production yield were 57.9 g liter(-1) and 0.55 g of isomaltose/g of sucrose, respectively. Furthermore, isomaltose production increased with temperature but decreased at a temperature of >35 degrees C. Maximum isomaltose production (75.7 g liter(-1)) was recorded at 35 degrees C, and its yield for the consumed sucrose was 0.61 g g(-1) with the addition of 15% (wt/vol) glucose. The relative activity for isomaltose production increased progressively with temperature and reached 45.9% under the same conditions.
Collapse
|
36
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
37
|
Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008; 3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein structure alignments are usually based on very different techniques to sequence alignments. We propose a method which treats sequence, structure and even combined sequence + structure in a single framework. Using a probabilistic approach, we calculate a similarity measure which can be applied to fragments containing only protein sequence, structure or both simultaneously. Results Proof-of-concept results are given for the different problems. For sequence alignments, the methodology is no better than conventional methods. For structure alignments, the techniques are very fast, reliable and tolerant of a range of alignment parameters. Combined sequence and structure alignments may provide a more reliable alignment for pairs of proteins where pure structural alignments can be misled by repetitive elements or apparent symmetries. Conclusion The probabilistic framework has an elegance in principle, merging sequence and structure descriptors into a single framework. It has a practical use in fast structural alignments and a potential use in finding those examples where sequence and structural similarities apparently disagree.
Collapse
|
38
|
Abstract
This article describes the general quality of models of three dimensional structure submitted to CASP7 and analyzes progress since the previous experiment, primarily using measures that were used in earlier analyses. Overall improvement in model accuracy compared to CASP6 is modest, but there are two developments of note: server performance has moved closer to that of humans, and there has been a significant improvement in the fraction of targets for which the best model is superior to that obtainable using knowledge of a single best template structure.
Collapse
|
39
|
Zemla AT, Zhou CLE. Structural Re-Alignment in an Immunogenic Surface Region of Ricin a Chain. Bioinform Biol Insights 2008; 2:5-13. [PMID: 19812763 PMCID: PMC2735970 DOI: 10.4137/bbi.s437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We compared structure alignments generated by several protein structure comparison programs to determine whether existing methods would satisfactorily align residues at a highly conserved position within an immunogenic loop in ribosome inactivating proteins (RIPs). Using default settings, structure alignments generated by several programs (CE, DaliLite, FATCAT, LGA, MAMMOTH, MATRAS, SHEBA, SSM) failed to align the respective conserved residues, although LGA reported correct residue-residue (R-R) correspondences when the beta-carbon (Cb) position was used as the point of reference in the alignment calculations. Further tests using variable points of reference indicated that points distal from the beta carbon along a vector connecting the alpha and beta carbons yielded rigid structural alignments in which residues known to be highly conserved in RIPs were reported as corresponding residues in structural comparisons between ricin A chain, abrin-A, and other RIPs. Results suggest that approaches to structure alignment employing alternate point representations corresponding to side chain position may yield structure alignments that are more consistent with observed conservation of functional surface residues than do standard alignment programs, which apply uniform criteria for alignment (i.e. alpha carbon (Ca) as point of reference) along the entirety of the peptide chain. We present the results of tests that suggest the utility of allowing user-specified points of reference in generating alternate structural alignments, and we present a web server for automatically generating such alignments: http://as2ts.llnl.gov/AS2TS/LGA/lga_pdblist_plots.html.
Collapse
Affiliation(s)
- Adam T. Zemla
- Computational Biology for Countermeasures Group, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 94550
| | - Carol L. Ecale Zhou
- Computational Biology for Countermeasures Group, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 94550
| |
Collapse
|
40
|
Affiliation(s)
- Cédric Notredame
- Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology, Parc Scientifique de Luminy, Marseille, France.
| |
Collapse
|
41
|
Prlić A, Down TA, Kulesha E, Finn RD, Kähäri A, Hubbard TJP. Integrating sequence and structural biology with DAS. BMC Bioinformatics 2007; 8:333. [PMID: 17850653 PMCID: PMC2031907 DOI: 10.1186/1471-2105-8-333] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Accepted: 09/12/2007] [Indexed: 11/16/2022] Open
Abstract
Background The Distributed Annotation System (DAS) is a network protocol for exchanging biological data. It is frequently used to share annotations of genomes and protein sequence. Results Here we present several extensions to the current DAS 1.5 protocol. These provide new commands to share alignments, three dimensional molecular structure data, add the possibility for registration and discovery of DAS servers, and provide a convention how to provide different types of data plots. We present examples of web sites and applications that use the new extensions. We operate a public registry of DAS sources, which now includes entries for more than 250 distinct sources. Conclusion Our DAS extensions are essential for the management of the growing number of services and exchange of diverse biological data sets. In addition the extensions allow new types of applications to be developed and scientific questions to be addressed. The registry of DAS sources is available at
Collapse
Affiliation(s)
- Andreas Prlić
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Thomas A Down
- Wellcome Trust/Cancer Research UK Gurdon Institute, Cambridge University, Cambridge, UK
| | - Eugene Kulesha
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Robert D Finn
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| | - Andreas Kähäri
- European Bioinformatics Institute, Hinxton, Cambridge, UK
| | - Tim JP Hubbard
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
| |
Collapse
|
42
|
Pham Y, Li L, Kim A, Erdogan O, Weinreb V, Butterfoss GL, Kuhlman B, Carter CW. A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and II aminoacyl-tRNA synthetases. Mol Cell 2007; 25:851-62. [PMID: 17386262 DOI: 10.1016/j.molcel.2007.02.010] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2006] [Revised: 01/03/2007] [Accepted: 02/05/2007] [Indexed: 10/23/2022]
Abstract
The emergence of polypeptide catalysts for amino acid activation, the slowest step in protein synthesis, poses a significant puzzle associated with the origin of biology. This problem is compounded as the 20 contemporary aminoacyl-tRNA synthetases belong to two quite distinct families. We describe here the use of protein design to show experimentally that a minimal class I aminoacyl-tRNA synthetase active site might have functioned in the distant past. We deleted the anticodon binding domain from tryptophanyl-tRNA synthetase and fused the discontinuous segments comprising its active site. The resulting 130 residue minimal catalytic domain activates tryptophan. This residual catalytic activity constitutes the first experimental evidence that the conserved class I signature sequences, HIGH and KMSKS, might have arisen in-frame, opposite motifs 2 and 1 from class II, as complementary sense and antisense strands of the same ancestral gene.
Collapse
Affiliation(s)
- Yen Pham
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Cardozo T, Kimura T, Philpott S, Weiser B, Burger H, Zolla-Pazner S. Structural basis for coreceptor selectivity by the HIV type 1 V3 loop. AIDS Res Hum Retroviruses 2007; 23:415-26. [PMID: 17411375 DOI: 10.1089/aid.2006.0130] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The third variable region (V3) of the HIV-1 surface glycoprotein, gp120, plays a central role in the interaction of the virus envelope with the cell surface chemokine receptors, triggering membrane fusion and virus entry into human lymphocytes and macrophages. The CXCR4 and CCR5 chemokine receptors are used by "X4-tropic" and "R5-tropic" viruses, respectively. Recently, the crown of the V3 loop was shown to bear a close structural homology to the beta2-beta3 loop in the CXC and CC chemokines, the natural ligands of CXCR4 and CCR5, respectively. This homology can serve as the foundation for 3D molecular modeling of the V3 loops from primary isolates whose coreceptor usage was experimentally defined. The modeling revealed a charged "patch" on the surface of V3 that correlates with coreceptor usage. This V3 surface patch is positively charged in X4-tropic viruses and negatively charged or neutral in R5-tropic viruses, and is formed by two amino acids, at position 11 and at position 24 or 25; amino acids 11 and 24 or 11 and 25 contact each other in 3D space. Residues at positions 11 and 25 were known previously to influence coreceptor usage, and the charge of the residues at these two positions is often used to predict viral tropism. However, we found that the predictive value of using the charge of residues 11, 24, and 25 to identify X4 or R5 tropism was improved over using only the charge of residues 11 and 25. Thus, the data suggest a new " 11/24/25 rule" : a positively charged amino acid at position 11, 24, or 25 defines X4; otherwise R5. This rule gave an overall predictive value of 94% for 217 viruses whose tropism had been determined experimentally as either X4 or R5. The results have additional implications for the design of HIV therapeutics, vaccines, and strategies for monitoring disease progression.
Collapse
Affiliation(s)
- Timothy Cardozo
- Department of Pharmacology and New York University School of Medicine, New York, NY 10016, USA
| | | | | | | | | | | |
Collapse
|
44
|
Baussand J, Deremble C, Carbone A. Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins. Proteins 2007; 67:695-708. [PMID: 17299747 DOI: 10.1002/prot.21319] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Several studies on large and small families of proteins proved in a general manner that hydrophobic amino acids are globally conserved even if they are subjected to high rate substitution. Statistical analysis of amino acids evolution within blocks of hydrophobic amino acids detected in sequences suggests their usage as a basic structural pattern to align pairs of proteins of less than 25% sequence identity, with no need of knowing their 3D structure. The authors present a new global alignment method and an automatic tool for Proteins with HYdrophobic Blocks ALignment (PHYBAL) based on the combinatorics of overlapping hydrophobic blocks. Two substitution matrices modeling a different selective pressure inside and outside hydrophobic blocks are constructed, the Inside Hydrophobic Blocks Matrix and the Outside Hydrophobic Blocks Matrix, and a 4D space of gap values is explored. PHYBAL performance is evaluated against Needleman and Wunsch algorithm run with Blosum 30, Blosum 45, Blosum 62, Gonnet, HSDM, PAM250, Johnson and Remote Homo matrices. PHYBAL behavior is analyzed on eight randomly selected pairs of proteins of >30% sequence identity that cover a large spectrum of structural properties. It is also validated on two large datasets, the 127 pairs of the Domingues dataset with >30% sequence identity, and 181 pairs issued from BAliBASE 2.0 and ranked by percentage of identity from 7 to 25%. Results confirm the importance of considering substitution matrices modeling hydrophobic contexts and a 4D space of gap values in aligning distantly related proteins. Two new notions of local and global stability are defined to assess the robustness of an alignment algorithm and the accuracy of PHYBAL. A new notion, the SAD-coefficient, to assess the difficulty of structural alignment is also introduced. PHYBAL has been compared with Hydrophobic Cluster Analysis and HMMSUM methods.
Collapse
Affiliation(s)
- J Baussand
- Génomique Analytique, INSERM UMRS511, Université Pierre et Marie Curie-Paris 6, 91, Bd de l'Hôpital, 75013 Paris, France
| | | | | |
Collapse
|
45
|
Leslin CM, Abyzov A, Ilyin VA. TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method. Nucleic Acids Res 2006; 35:D317-21. [PMID: 17065464 PMCID: PMC1635338 DOI: 10.1093/nar/gkl809] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
TOPOFIT-DB (T-DB) is a public web-based database of protein structural alignments based on the TOPOFIT method, providing a comprehensive resource for comparative analysis of protein structure families. The TOPOFIT method is based on the discovery of a saturation point on the alignment curve (topomax point) which presents an ability to objectively identify a border between common and variable parts in a protein structural family, providing additional insight into protein comparison and functional annotation. TOPOFIT also effectively detects non-sequential relations between protein structures. T-DB provides users with the convenient ability to retrieve and analyze structural neighbors for a protein; do one-to-all calculation of a user provided structure against the entire current PDB release with T-Server, and pair-wise comparison using the TOPOFIT method through the T-Pair web page. All outputs are reported in various web-based tables and graphics, with automated viewing of the structure-sequence alignments in the Friend software package for complete, detailed analysis. T-DB presents researchers with the opportunity for comprehensive studies of the variability in proteins and is publicly available at .
Collapse
Affiliation(s)
| | | | - Valentin A. Ilyin
- To whom correspondence should be addressed. Tel: +1 617 373 7048; Fax: +1 617 373 3724;
| |
Collapse
|
46
|
Dryla A, Hoffmann B, Gelbmann D, Giefing C, Hanner M, Meinke A, Anderson AS, Koppensteiner W, Konrat R, von Gabain A, Nagy E. High-affinity binding of the staphylococcal HarA protein to haptoglobin and hemoglobin involves a domain with an antiparallel eight-stranded beta-barrel fold. J Bacteriol 2006; 189:254-64. [PMID: 17041047 PMCID: PMC1797202 DOI: 10.1128/jb.01366-06] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Iron scavenging from the host is essential for the growth of pathogenic bacteria. In this study, we further characterized two staphylococcal cell wall proteins previously shown to bind hemoproteins. HarA and IsdB harbor homologous ligand binding domains, the so called NEAT domain (for "near transporter") present in several surface proteins of gram-positive pathogens. Surface plasmon resonance measurements using glutathione S-transferase (GST)-tagged HarAD1, one of the ligand binding domains of HarA, and GST-tagged full-length IsdB proteins confirmed high-affinity binding to hemoglobin and haptoglobin-hemoglobin complexes with equilibrium dissociation constants (K(D)) of 5 to 50 nM. Haptoglobin binding could be detected only with HarA and was in the low micromolar range. In order to determine the fold of this evolutionarily conserved ligand binding domain, the untagged HarAD1 protein was subjected to nuclear magnetic resonance spectroscopy, which revealed an eight-stranded, purely antiparallel beta-barrel with the strand order (-beta1 -beta2 -beta3 -beta6 -beta5 -beta4 -beta7 -beta8), forming two Greek key motifs. Based on structural-homology searches, the topology of the HarAD1 domain resembles that of the immunoglobulin (Ig) fold family, whose members are involved in protein-protein interactions, but with distinct structural features. Therefore, we consider that the HarAD1/NEAT domain fold is a novel variant of the Ig fold that has not yet been observed in other proteins.
Collapse
|
47
|
Palmer B, Danzer JF, Hambly K, Debe DA. StructSorter: a method for continuously updating a comprehensive protein structure alignment database. J Chem Inf Model 2006; 46:1871-6. [PMID: 16859318 DOI: 10.1021/ci0601012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Advances in protein crystallography and homology modeling techniques are producing vast amounts of high resolution protein structure data at ever increasing rates. As such, the ability to quickly and easily extract structural similarities is a key tool in discovering important functional relationships. We report on an approach for creating and maintaining a database of pairwise structure alignments for a comprehensive database comprising the PDB and homology models for the human and select pathogen genomes. Our approach consists of a novel, multistage method for determining pairwise structural similarity coupled with an efficient clustering protocol that approximates a full NxN assessment in a fraction of the time. Since biologists are commonly interested in recently released structures, and the homology models built from them, an automatically updating database of structural alignments has great value. Our approach yields a querying system that allows scientists to retrieve databank-wide protein structure similarities as easily as retrieving protein sequence similarities via BLAST or PSI-BLAST. Basic, noncommercial access to the database can be requested at https://tip.eidogen-sertanty.com/.
Collapse
Affiliation(s)
- Brian Palmer
- Eidogen-Sertanty, Inc., 9381 Judicial Drive, Suite 200, San Diego, California 92121, USA.
| | | | | | | |
Collapse
|
48
|
Shih ESC, Gan RCR, Hwang MJ. OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006; 34:W95-8. [PMID: 16845117 PMCID: PMC1538888 DOI: 10.1093/nar/gkl264] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The large number of experimentally determined protein 3D structures is a rich resource for studying protein function and evolution, and protein structure comparison (PSC) is a key method for such studies. When comparing two protein structures, almost all currently available PSC servers report a single and sequential (i.e. topological) alignment, whereas the existence of good alternative alignments, including those involving permutations (i.e. non-sequential or non-topological alignments), is well known. We have recently developed a novel PSC method that can detect alternative alignments of statistical significance (alignment similarity P-value <10−5), including structural permutations at all levels of complexity. OPAAS, the server of this PSC method freely accessible at our website (), provides an easy-to-read hierarchical layout of output to display detailed information on all of the significant alternative alignments detected. Because these alternative alignments can offer a more complete picture on the structural, evolutionary and functional relationship between two proteins, OPAAS can be used in structural bioinformatics research to gain additional insight that is not readily provided by existing PSC servers.
Collapse
Affiliation(s)
| | | | - Ming-Jing Hwang
- To whom correspondence should be addressed. Tel: +886 2 2789 9033; Fax: +886 2 2788 7641;
| |
Collapse
|
49
|
Kryshtafovych A, Venclovas C, Fidelis K, Moult J. Progress over the first decade of CASP experiments. Proteins 2006; 61 Suppl 7:225-236. [PMID: 16187365 DOI: 10.1002/prot.20740] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
CASP has now completed a decade of monitoring the state of the art in protein structure prediction. The quality of structure models produced in the latest experiment, CASP6, has been compared with that in earlier CASPs. Significant although modest progress has again been made in the fold recognition regime, and cumulatively, progress in this area is impressive. Models of previously unknown folds again appear to have modestly improved, and several mixed alpha/beta structures have been modeled in a topologically correct manner. Progress remains hard to detect in high sequence identity comparative modeling, but server performance in this area has moved forward.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California, USA
| | | | | | | |
Collapse
|
50
|
Abstract
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.
Collapse
Affiliation(s)
- Jeffrey Roach
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
| | | | | | | |
Collapse
|