1
|
Staritzbichler R, Sarti E, Yaklich E, Aleksandrova A, Stamm M, Khafizov K, Forrest LR. Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors. PLoS One 2021; 16:e0239881. [PMID: 33930031 PMCID: PMC8087094 DOI: 10.1371/journal.pone.0239881] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 04/15/2021] [Indexed: 01/08/2023] Open
Abstract
The alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution, and in the generation of homology-based models. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly matched by a standard pairwise sequence alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the aligned regions either side of the newly matched positions. Previous studies have introduced restraints as a means to impose the matching of positions during sequence alignments, originally in the context of genome assembly. Here we introduce position restraints, or "anchors" as a feature in our alignment tool AlignMe, providing an aid to pairwise global sequence alignment of alpha-helical membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of anchors can be used to modify alignments, while still maintaining the reproducibility and rigor of the rest of the alignment. Anchored alignments can be generated using the online version of AlignMe available at www.bioinfo.mpg.de/AlignMe/.
Collapse
Affiliation(s)
- René Staritzbichler
- ProteinFormatics Group, Institute of Biophysics and Medical Physics, University of Leipzig, Leipzig, Germany
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States of America
- Laboratoire de Biologie Computationnelle et Quantitative, Institut de Biologie Paris Seine, Sorbonne Université, Paris, France
| | - Emily Yaklich
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States of America
| | - Antoniya Aleksandrova
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States of America
| | - Marcus Stamm
- Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Kamil Khafizov
- Moscow Institute of Physics and Technology, National Research University, Moscow, Russia
| | - Lucy R. Forrest
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
2
|
Fidler DR, Murphy SE, Courtis K, Antonoudiou P, El-Tohamy R, Ient J, Levine TP. Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains. Traffic 2016; 17:1214-1226. [PMID: 27601190 PMCID: PMC5091641 DOI: 10.1111/tra.12432] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 08/30/2016] [Accepted: 08/30/2016] [Indexed: 01/08/2023]
Abstract
Advances in membrane cell biology are hampered by the relatively high proportion of proteins with no known function. Such proteins are largely or entirely devoid of structurally significant domain annotations. Structural bioinformaticians have developed profile‐profile tools such as HHsearch (online version called HHpred), which can detect remote homologies that are missed by tools used to annotate databases. Here we have applied HHsearch to study a single structural fold in a single model organism as proof of principle. In the entire clan of protein domains sharing the pleckstrin homology domain fold in yeast, systematic application of HHsearch accurately identified known PH‐like domains. It also predicted 16 new domains in 13 yeast proteins many of which are implicated in intracellular traffic. One of these was Vps13p, where we confirmed the functional importance of the predicted PH‐like domain. Even though such predictions require considerable work to be corroborated, they are useful first steps. HHsearch should be applied more widely, particularly across entire proteomes of model organisms, to significantly improve database annotations.
Collapse
Affiliation(s)
- David R Fidler
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Sarah E Murphy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Katherine Courtis
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | | | - Rana El-Tohamy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Jonathan Ient
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Timothy P Levine
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK.
| |
Collapse
|
3
|
Lhota J, Hauptman R, Hart T, Ng C, Xie L. A new method to improve network topological similarity search: applied to fold recognition. Bioinformatics 2015; 31:2106-14. [PMID: 25717198 DOI: 10.1093/bioinformatics/btv125] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 02/21/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework-Enrichment of Network Topological Similarity (ENTS)-to improve the performance of large scale similarity searches in bioinformatics. RESULTS We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. AVAILABILITY AND IMPLEMENTATION Source code freely available upon request CONTACT : lxie@iscb.org.
Collapse
Affiliation(s)
- John Lhota
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Ruth Hauptman
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Thomas Hart
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Clara Ng
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Lei Xie
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A. Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| |
Collapse
|
4
|
Trötschel C, Follmann M, Nettekoven JA, Mohrbach T, Forrest LR, Burkovski A, Marin K, Krämer R. Methionine uptake in Corynebacterium glutamicum by MetQNI and by MetPS, a novel methionine and alanine importer of the NSS neurotransmitter transporter family. Biochemistry 2015; 47:12698-709. [PMID: 18991398 DOI: 10.1021/bi801206t] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The soil bacterium Corynebacterium glutamicum is a model organism in amino acid biotechnology. Here we present the identification of two different L-methionine uptake systems including the first characterization of a bacterial secondary methionine carrier. The primary carrier MetQNI is a high affinity ABC-type transporter specific for l-methionine. Its expression is under the control of the transcription factor McbR, the global regulator of sulfur metabolism in C. glutamicum. Besides MetQNI, a novel secondary methionine uptake system of the NSS (neurotransmitter:sodium symporter) family was identified and named MetP. The MetP system is characterized by a lower affinity for methionine and uses Na(+) ions for energetic coupling. It is also the main alanine transporter in C. glutamicum and is expressed constitutively. These observations are consistent with models of methionine, alanine, and leucine bound to MetP, derived from the X-ray crystal structure of the LeuT transporter from Aquifex aeolicus. Complementation studies show that MetP consists of two components, a large subunit with 12 predicted transmembrane segments and, surprisingly, an additional subunit with one predicted transmembrane segment only. Thus, this new member of the NSS transporter family adds a novel feature to this class of carriers, namely, the functional dependence on an additional small subunit.
Collapse
Affiliation(s)
- Christian Trötschel
- Institute of Biochemistry, University of Koln, 50674 Koln, Germany, and Max Planck Institute of Biophysics, Max-von-Laue-Strasse 3, 60438 Frankfurt, Germany
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Deng X, Cheng J. Enhancing HMM-based protein profile-profile alignment with structural features and evolutionary coupling information. BMC Bioinformatics 2014; 15:252. [PMID: 25062980 PMCID: PMC4133609 DOI: 10.1186/1471-2105-15-252] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2014] [Accepted: 07/17/2014] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis. RESULTS In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent. CONCLUSION Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods.
Collapse
Affiliation(s)
- Xin Deng
- />LexisNexis | Risk Solutions | Healthcare, Orlando, FL 32811 USA
| | - Jianlin Cheng
- />Computer Science Department, Informatics Institute, C. Bond Life Science Center, University of Missouri-Columbia, Columbia, MO 65211 USA
| |
Collapse
|
6
|
Identification of an ideal-like fingerprint for a protein fold using overlapped conserved residues based approach. Sci Rep 2014; 4:5643. [PMID: 25008052 PMCID: PMC4090624 DOI: 10.1038/srep05643] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 06/19/2014] [Indexed: 02/04/2023] Open
Abstract
Design of an efficient fingerprint that detects homologous proteins at distant sequence identity has been a great challenge. This paper proposes a strategy to extract an ideal-like fingerprint with high specificity and sensitivity from a group of sequences related to a fold. The approach is devised based on the assumptions that the critical residues for a protein fold may be conserved in three aspects, i.e. sequence, structure, and intramolecular interaction, and embedded in secondary structures. We hypothesized that the residues satisfying such conditions simultaneously may work as an efficient fingerprint. This idea was tested on protein folds of various classes, such as beta-strand rich, alpha + beta proteins and alpha/beta proteins with discrete sequence similarities. The fingerprint for each fold was generated by selecting the overlapped conserved residues (OCR) from the conserved residues obtained using independent three alignment methods, i.e. multiple sequence alignment, structure-based alignment, and alignment based on the interstrand hydrogen-bonds. The OCR fingerprints showed more than 90% detection efficiency for all the folds tested and were identified to be almost the minimal fingerprints composed of only critical residues. This study is expected to provide an important conceptual improvement in the identification or design of ideal fingerprints for a protein fold.
Collapse
|
7
|
Stamm M, Staritzbichler R, Khafizov K, Forrest LR. AlignMe--a membrane protein sequence alignment web server. Nucleic Acids Res 2014; 42:W246-51. [PMID: 24753425 PMCID: PMC4086118 DOI: 10.1093/nar/gku291] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe
Collapse
Affiliation(s)
- Marcus Stamm
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main 60438, Germany
| | - René Staritzbichler
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main 60438, Germany
| | - Kamil Khafizov
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main 60438, Germany
| | - Lucy R Forrest
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main 60438, Germany
| |
Collapse
|
8
|
Stamm M, Staritzbichler R, Khafizov K, Forrest LR. Alignment of helical membrane protein sequences using AlignMe. PLoS One 2013; 8:e57731. [PMID: 23469223 PMCID: PMC3587630 DOI: 10.1371/journal.pone.0057731] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Accepted: 01/24/2013] [Indexed: 12/20/2022] Open
Abstract
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set.
Collapse
Affiliation(s)
- Marcus Stamm
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main, Germany.
| | | | | | | |
Collapse
|
9
|
Kuziemko A, Honig B, Petrey D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 2011; 7:e1002175. [PMID: 21998567 PMCID: PMC3188491 DOI: 10.1371/journal.pcbi.1002175] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/14/2011] [Indexed: 11/18/2022] Open
Abstract
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. It has been suggested that, for nearly every protein sequence, there is already a protein with a similar structure in current protein structure databases. However, with poor or undetectable sequence relationships, it is expected that accurate alignments and models cannot be generated. Here we show that this is not the case, and that whenever structural relationship exists, there are usually local sequence relationships that can be used to generate an accurate alignment, no matter what the global sequence identity. However, this requires an alternative to the traditional dynamic programming algorithm and the consideration of a small ensemble of alignments. We present an algorithm, S4, and demonstrate that it is capable of generating accurate alignments in nearly all cases where a structural relationship exists between two proteins. Our results thus constitute an important advance in the full exploitation of the information in structural databases. That is, the expectation of an accurate alignment suggests that a meaningful model can be generated for nearly every sequence for which a suitable template exists.
Collapse
Affiliation(s)
- Andrew Kuziemko
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
10
|
Krishnadev O, Srinivasan N. AlignHUSH: alignment of HMMs using structure and hydrophobicity information. BMC Bioinformatics 2011; 12:275. [PMID: 21729312 PMCID: PMC3228556 DOI: 10.1186/1471-2105-12-275] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Accepted: 07/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection. Results We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments. Conclusions A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function. A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/
Collapse
Affiliation(s)
- Oruganty Krishnadev
- Molecular Biophysics Unit Indian Institute of Science, Bangalore 560012, India
| | | |
Collapse
|
11
|
Xu HS, Ren WK, Liu XH, Li XQ. Aligning protein sequence and analysing substitution pattern using a class-specific matrix. J Biosci 2011; 35:295-314. [PMID: 20689185 DOI: 10.1007/s12038-010-0033-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Aligning protein sequences using a score matrix has became a routine but valuable method in modern biological research. However, alignment in the 'twilight zone' remains an open issue. It is feasible and necessary to construct a new score matrix as more protein structures are resolved. Three structural class-specific score matrices (all-alpha, all-beta and alpha/beta) were constructed based on the structure alignment of low identity proteins of the corresponding structural classes. The class-specific score matrices were significantly better than a structure-derived matrix (HSDM) and three other generalized matrices (BLOSUM30, BLOSUM60 and Gonnet250) in alignment performance tests. The optimized gap penalties presented here also promote alignment performance. The results indicate that different protein classes have distinct amino acid substitution patterns, and an amino acid score matrix should be constructed based on different structural classes. The class-specific score matrices could also be used in profile construction to improve homology detection.
Collapse
Affiliation(s)
- Hai Song Xu
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100124, China
| | | | | | | |
Collapse
|
12
|
Khafizov K, Staritzbichler R, Stamm M, Forrest LR. A Study of the Evolution of Inverted-Topology Repeats from LeuT-Fold Transporters Using AlignMe. Biochemistry 2010; 49:10702-13. [PMID: 21073167 DOI: 10.1021/bi101256x] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Kamil Khafizov
- Computational Structural Biology Group, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - René Staritzbichler
- Computational Structural Biology Group, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Marcus Stamm
- Computational Structural Biology Group, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| | - Lucy R. Forrest
- Computational Structural Biology Group, Max Planck Institute of Biophysics, 60438 Frankfurt am Main, Germany
| |
Collapse
|
13
|
Norel R, Petrey D, Honig B. PUDGE: a flexible, interactive server for protein structure prediction. Nucleic Acids Res 2010; 38:W550-4. [PMID: 20525783 PMCID: PMC2896183 DOI: 10.1093/nar/gkq475] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The construction of a homology model for a protein can involve a number of decisions requiring the integration of different sources of information and the application of different modeling tools depending on the particular problem. Functional information can be especially important in guiding the modeling process, but such information is not generally integrated into modeling pipelines. Pudge is a flexible, interactive protein structure prediction server, which is designed with these issues in mind. By dividing the modeling into five stages (template selection, alignment, model building, model refinement and model evaluation) and providing various tools to visualize, analyze and compare the results at each stage, we enable a flexible modeling strategy that can be tailored to the needs of a given problem. Pudge is freely available at http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PUDGE.
Collapse
Affiliation(s)
- Raquel Norel
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA.
| | | | | |
Collapse
|
14
|
Zhu J, Cheng L, Fang Q, Zhou ZH, Honig B. Building and refining protein models within cryo-electron microscopy density maps based on homology modeling and multiscale structure refinement. J Mol Biol 2010; 397:835-51. [PMID: 20109465 DOI: 10.1016/j.jmb.2010.01.041] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Revised: 01/04/2010] [Accepted: 01/20/2010] [Indexed: 11/16/2022]
Abstract
Automatic modeling methods using cryoelectron microscopy (cryoEM) density maps as constraints are promising approaches to building atomic models of individual proteins or protein domains. However, their application to large macromolecular assemblies has not been possible largely due to computational limitations inherent to such unsupervised methods. Here we describe a new method, EM-IMO (electron microscopy-iterative modular optimization), for building, modifying and refining local structures of protein models using cryoEM maps as a constraint. As a supervised refinement method, EM-IMO allows users to specify parameters derived from inspections so as to guide, and as a consequence, significantly speed up the refinement. An EM-IMO-based refinement protocol is first benchmarked on a data set of 50 homology models using simulated density maps. A multiscale refinement strategy that combines EM-IMO-based and molecular dynamics-based refinement is then applied to build backbone models for the seven conformers of the five capsid proteins in our near-atomic-resolution cryoEM map of the grass carp reovirus virion, a member of the Aquareovirus genus of the Reoviridae family. The refined models allow us to reconstruct a backbone model of the entire grass carp reovirus capsid and provide valuable functional insights that are described in the accompanying publication [Cheng, L., Zhu, J., Hui, W. H., Zhang, X., Honig, B., Fang, Q. & Zhou, Z. H. (2010). Backbone model of an aquareovirus virion by cryo-electron microscopy and bioinformatics. J. Mol. Biol. (this issue). doi:10.1016/j.jmb.2009.12.027.]. Our study demonstrates that the integrated use of homology modeling and a multiscale refinement protocol that combines supervised and automated structure refinement offers a practical strategy for building atomic models based on medium- to high-resolution cryoEM density maps.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | | | | | | | | |
Collapse
|
15
|
Schushan M, Xiang M, Bogomiakov P, Padan E, Rao R, Ben-Tal N. Model-guided mutagenesis drives functional studies of human NHA2, implicated in hypertension. J Mol Biol 2010; 396:1181-96. [PMID: 20053353 DOI: 10.1016/j.jmb.2009.12.055] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2009] [Revised: 12/22/2009] [Accepted: 12/27/2009] [Indexed: 11/18/2022]
Abstract
Human NHA2 is a poorly characterized Na(+)/H(+) antiporter recently implicated in essential hypertension. We used a range of computational tools and evolutionary conservation analysis to build and validate a three-dimensional model of NHA2 based on the crystal structure of a distantly related bacterial transporter, NhaA. The model guided mutagenic evaluation of transport function, ion selectivity, and pH dependence of NHA2 by phenotype screening in yeast. We describe a cluster of essential, highly conserved titratable residues located in an assembly region made of two discontinuous helices of inverted topology, each interrupted by an extended chain. Whereas in NhaA, oppositely charged residues compensate for partial dipoles generated within this assembly, in NHA2, polar but uncharged residues suffice. Our findings led to a model for transport mechanism that was compared to the well-known electroneutral NHE1 and electrogenic NhaA subtypes. This study establishes NHA2 as a prototype for the poorly understood, yet ubiquitous, CPA2 antiporter family recently recognized in plants and metazoans and illustrates a structure-driven approach to derive functional information on a newly discovered transporter.
Collapse
Affiliation(s)
- Maya Schushan
- Department of Biochemistry, The George S Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv, 69978 Tel-Aviv, Israel
| | | | | | | | | | | |
Collapse
|
16
|
Abstract
Background Mutations in leucine-rich repeat kinase 2 (LRRK2) are the most common genetic cause of Parkinson disease (PD). LRRK2 contains an “enzymatic core” composed of GTPase and kinase domains that is flanked by leucine-rich repeat (LRR) and WD40 protein-protein interaction domains. While kinase activity and GTP-binding have both been implicated in LRRK2 neurotoxicity, the potential role of other LRRK2 domains has not been as extensively explored. Principal Findings We demonstrate that LRRK2 normally exists in a dimeric complex, and that removing the WD40 domain prevents complex formation and autophosphorylation. Moreover, loss of the WD40 domain completely blocks the neurotoxicity of multiple LRRK2 PD mutations. Conclusion These findings suggest that LRRK2 dimerization and autophosphorylation may be required for the neurotoxicity of LRRK2 PD mutations and highlight a potential role for the WD40 domain in the mechanism of LRRK2-mediated cell death.
Collapse
|
17
|
Mooney C, Pollastri G. Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins 2009; 77:181-90. [DOI: 10.1002/prot.22429] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
18
|
Zhu J, Fan H, Periole X, Honig B, Mark AE. Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins 2008; 72:1171-88. [PMID: 18338384 PMCID: PMC2761145 DOI: 10.1002/prot.22005] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute and Columbia University, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, USA
| | - Hao Fan
- Groningen Biomolecular Sciences and Biotechnology Institute (GBB), Department of Biophysical Chemistry, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Xavier Periole
- Groningen Biomolecular Sciences and Biotechnology Institute (GBB), Department of Biophysical Chemistry, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
| | - Barry Honig
- Howard Hughes Medical Institute and Columbia University, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, NY 10032, USA
| | - Alan E. Mark
- Groningen Biomolecular Sciences and Biotechnology Institute (GBB), Department of Biophysical Chemistry, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands
- School of Molecular and Microbial Sciences, and the Institute for Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
19
|
Kundrotas PJ, Lensink MF, Alexov E. Homology-based modeling of 3D structures of protein–protein complexes using alignments of modified sequence profiles. Int J Biol Macromol 2008; 43:198-208. [DOI: 10.1016/j.ijbiomac.2008.05.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2008] [Revised: 05/09/2008] [Accepted: 05/12/2008] [Indexed: 11/25/2022]
|
20
|
Targeted deletion in the beta20-beta21 loop of HIV envelope glycoprotein gp120 exposes the CD4 binding site for antibody binding. Virology 2008; 377:330-8. [PMID: 18519142 DOI: 10.1016/j.virol.2008.03.040] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Revised: 03/03/2008] [Accepted: 03/28/2008] [Indexed: 11/20/2022]
Abstract
Different isolates of HIV-1 are known to vary in antibody binding and sensitivity to neutralization. In response to selective pressure, the virus may conceal important neutralizing determinants, such as the CD4 binding site on gp120, through steric hindrance or conformational masking. The 3D structure of gp120 shows five loop structures that surround the CD4 binding site (CD4BS) and may restrict antibody access to the site. We have generated gp120 mutants lacking each of these loops and characterized them with a panel of monoclonal antibodies, including b12 and F105. A targeted deletion in the beta20-beta21 loop resulted in gp120 with enhanced binding of both monoclonals. Enhancement of b12 binding suggests reduced steric hindrance, since the antibody is relatively insensitive to conformation. Enhanced binding of F105, which depends strongly on the protein conformation, suggests that the mutation may allow gp120 to move more freely into the liganded form. The same viral strategies that limit antibody binding may also inhibit antibody induction. Modified forms of gp120, in which the CD4 binding site is more exposed and accessible to antibodies, could provide novel immunogens for eliciting antibodies to this broadly shared neutralizing determinant.
Collapse
|
21
|
Posy S, Shapiro L, Honig B. Sequence and structural determinants of strand swapping in cadherin domains: do all cadherins bind through the same adhesive interface? J Mol Biol 2008; 378:954-68. [PMID: 18395225 PMCID: PMC2435303 DOI: 10.1016/j.jmb.2008.02.063] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2007] [Revised: 02/06/2008] [Accepted: 02/27/2008] [Indexed: 11/19/2022]
Abstract
Cadherins are cell surface adhesion proteins important for tissue development and integrity. Type I and type II, or classical, cadherins form adhesive dimers via an interface formed through the exchange, or "swapping", of the N-terminal beta-strands from their membrane-distal EC1 domains. Here, we ask which sequence and structural features in EC1 domains are responsible for beta-strand swapping and whether members of other cadherin families form similar strand-swapped binding interfaces. We created a comprehensive database of multiple alignments of each type of cadherin domain. We used the known three-dimensional structures of classical cadherins to identify conserved positions in multiple sequence alignments that appear to be crucial determinants of the cadherin domain structure. We identified features that are unique to EC1 domains. On the basis of our analysis, we conclude that all cadherin domains have very similar overall folds but, with the exception of classical and desmosomal cadherin EC1 domains, most of them do not appear to bind through a strand-swapping mechanism. Thus, non-classical cadherins that function in adhesion are likely to use different protein-protein interaction interfaces. Our results have implications for the evolution of molecular mechanisms of cadherin-mediated adhesion in vertebrates.
Collapse
Affiliation(s)
- Shoshana Posy
- Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA
| | | | | |
Collapse
|
22
|
Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008; 70:611-25. [PMID: 17876813 DOI: 10.1002/prot.21688] [Citation(s) in RCA: 340] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
Collapse
Affiliation(s)
- Riccardo M Bennett-Lovsey
- Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London SW7 2AY, United Kingdom
| | | | | | | |
Collapse
|
23
|
Tai K, Fowler P, Mokrab Y, Stansfeld P, Sansom MSP. Molecular modeling and simulation studies of ion channel structures, dynamics and mechanisms. Methods Cell Biol 2008; 90:233-65. [PMID: 19195554 DOI: 10.1016/s0091-679x(08)00812-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Ion channels are integral membrane proteins that enable selected ions to flow passively across membranes. Channel proteins have been the focus of computational approaches to relate their three-dimensional (3D) structure to their physiological function. We describe a number of computational tools to model ion channels. Homology modeling may be used to construct structural models of channels based on available X-ray structures. Electrostatics calculations enable an approximate evaluation of the energy profile of an ion passing through a channel. Molecular dynamics simulations and free-energy calculations provide information on the thermodynamics and kinetics of channel function.
Collapse
Affiliation(s)
- Kaihsu Tai
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | | | | | | | | |
Collapse
|
24
|
|
25
|
Landau M, Herz K, Padan E, Ben-Tal N. Model Structure of the Na+/H+ Exchanger 1 (NHE1). J Biol Chem 2007; 282:37854-63. [DOI: 10.1074/jbc.m705460200] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
26
|
Improving pairwise sequence alignment between distantly related proteins. Methods Mol Biol 2007. [PMID: 17993679 DOI: 10.1007/978-1-59745-514-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Sequence alignment between remotely related proteins has been one of the more difficult problems in structural biology. Improvements have been achieved by incorporating information that enhances the diversity of the substitution matrices. NdPASA is a web-based server that optimizes sequence alignments between proteins sharing low percentages of sequence identity. The program integrates structure information of the template sequence into a global alignment algorithm by employing amino acids' neighbor-dependent propensities for secondary structure as unique parameters for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. The server is designed to aid homologous protein structure modeling. It is most effective when the structure of the template sequence is known. NdPASA can be accessed online at www.fenglab.org/bioserver.html.
Collapse
|
27
|
Kundrotas P, Georgieva P, Shosheva A, Christova P, Alexov E. Assessing the quality of the homology-modeled 3D structures from electrostatic standpoint: test on bacterial nucleoside monophosphate kinase families. J Bioinform Comput Biol 2007; 5:693-715. [PMID: 17688312 DOI: 10.1142/s0219720007002709] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2006] [Accepted: 02/06/2007] [Indexed: 11/18/2022]
Abstract
In this study, we address the issue of performing meaningful pK(a) calculations using homology modeled three-dimensional (3D) structures and analyze the possibility of using the calculated pK(a) values to detect structural defects in the models. For this purpose, the 3D structure of each member of five large protein families of a bacterial nucleoside monophosphate kinases (NMPK) have been modeled by means of homology-based approach. Further, we performed pK(a) calculations for the each model and for the template X-ray structures. Each bacterial NMPK family used in the study comprised on average 100 members providing a pool of sequences and 3D models large enough for reliable statistical analysis. It was shown that pK(a) values of titratable groups, which are highly conserved within a family, tend to be conserved among the models too. We demonstrated that homology modeled structures with sequence identity larger than 35% and gap percentile smaller than 10% can be used for meaningful pK(a) calculations. In addition, it was found that some highly conserved titratable groups either exhibit large pK(a) fluctuations among the models or have pK(a) values shifted by several pH units with respect to the pK(a) calculated for the X-ray structure. We demonstrated that such case usually indicates structural errors associated with the model. Thus, we argue that pK(a) calculations can be used for assessing the quality of the 3D models by monitoring fluctuations of the pK(a) values for highly conserved titratable residues within large sets of homologous proteins.
Collapse
Affiliation(s)
- Petras Kundrotas
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
28
|
Liu S, Zhang C, Liang S, Zhou Y. Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007; 68:636-45. [PMID: 17510969 DOI: 10.1002/prot.21459] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on http://sparks.informatics.iupui.edu/SP4.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
29
|
Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007; 66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
Collapse
Affiliation(s)
- Nebojsa Mirkovic
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
30
|
Punta M, Forrest LR, Bigelow H, Kernytsky A, Liu J, Rost B. Membrane protein prediction methods. Methods 2007; 41:460-74. [PMID: 17367718 PMCID: PMC1934899 DOI: 10.1016/j.ymeth.2006.07.026] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2006] [Accepted: 07/05/2006] [Indexed: 10/23/2022] Open
Abstract
We survey computational approaches that tackle membrane protein structure and function prediction. While describing the main ideas that have led to the development of the most relevant and novel methods, we also discuss pitfalls, provide practical hints and highlight the challenges that remain. The methods covered include: sequence alignment, motif search, functional residue identification, transmembrane segment and protein topology predictions, homology and ab initio modeling. In general, predictions of functional and structural features of membrane proteins are improving, although progress is hampered by the limited amount of high-resolution experimental information available. While predictions of transmembrane segments and protein topology rank among the most accurate methods in computational biology, more attention and effort will be required in the future to ameliorate database search, homology and ab initio modeling.
Collapse
Affiliation(s)
- Marco Punta
- Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Ave., New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
31
|
Shah AR, Oehmen CS, Harper J, Webb-Robertson BJM. Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms. Comput Biol Chem 2007; 31:138-42. [PMID: 17416337 DOI: 10.1016/j.compbiolchem.2007.02.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 02/20/2007] [Indexed: 11/30/2022]
Abstract
A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value<0.006) than using either sequence similarity or subcellular location independently.
Collapse
Affiliation(s)
- Anuj R Shah
- Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | | | |
Collapse
|
32
|
Zhu J, Xie L, Honig B. Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge-based potentials, and clustering. Proteins 2006; 65:463-79. [PMID: 16927337 DOI: 10.1002/prot.21085] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
In this article, we present an iterative, modular optimization (IMO) protocol for the local structure refinement of protein segments containing secondary structure elements (SSEs). The protocol is based on three modules: a torsion-space local sampling algorithm, a knowledge-based potential, and a conformational clustering algorithm. Alternative methods are tested for each module in the protocol. For each segment, random initial conformations were constructed by perturbing the native dihedral angles of loops (and SSEs) of the segment to be refined while keeping the protein body fixed. Two refinement procedures based on molecular mechanics force fields - using either energy minimization or molecular dynamics - were also tested but were found to be less successful than the IMO protocol. We found that DFIRE is a particularly effective knowledge-based potential and that clustering algorithms that are biased by the DFIRE energies improve the overall results. Results were further improved by adding an energy minimization step to the conformations generated with the IMO procedure, suggesting that hybrid strategies that combine both knowledge-based and physical effective energy functions may prove to be particularly effective in future applications.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, New York 10032, USA
| | | | | |
Collapse
|
33
|
Kosloff M, Han GW, Krishna SS, Schwarzenbacher R, Fasnacht M, Elsliger MA, Abdubek P, Agarwalla S, Ambing E, Astakhova T, Axelrod HL, Canaves JM, Carlton D, Chiu HJ, Clayton T, DiDonato M, Duan L, Feuerhelm J, Grittini C, Grzechnik SK, Hale J, Hampton E, Haugen J, Jaroszewski L, Jin KK, Johnson H, Klock HE, Knuth MW, Koesema E, Kreusch A, Kuhn P, Levin I, McMullan D, Miller MD, Morse AT, Moy K, Nigoghossian E, Okach L, Oommachen S, Page R, Paulsen J, Quijano K, Reyes R, Rife CL, Sims E, Spraggon G, Sridhar V, Stevens RC, van den Bedem H, Velasquez J, White A, Wolf G, Xu Q, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA. Comparative structural analysis of a novel glutathioneS-transferase (ATU5508) fromAgrobacterium tumefaciensat 2.0 Å resolution. Proteins 2006; 65:527-37. [PMID: 16988933 DOI: 10.1002/prot.21130] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Glutathione S-transferases (GSTs) comprise a diverse superfamily of enzymes found in organisms from all kingdoms of life. GSTs are involved in diverse processes, notably small-molecule biosynthesis or detoxification, and are frequently also used in protein engineering studies or as biotechnology tools. Here, we report the high-resolution X-ray structure of Atu5508 from the pathogenic soil bacterium Agrobacterium tumefaciens (atGST1). Through use of comparative sequence and structural analysis of the GST superfamily, we identified local sequence and structural signatures, which allowed us to distinguish between different GST classes. This approach enables GST classification based on structure, without requiring additional biochemical or immunological data. Consequently, analysis of the atGST1 crystal structure suggests a new GST class, distinct from previously characterized GSTs, which would make it an attractive target for further biochemical studies.
Collapse
Affiliation(s)
- Mickey Kosloff
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Scheeff ED, Bourne PE. Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics 2006; 7:410. [PMID: 16970830 PMCID: PMC1622756 DOI: 10.1186/1471-2105-7-410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 09/14/2006] [Indexed: 11/30/2022] Open
Abstract
Background One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear. Results We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models. Conclusion When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.
Collapse
Affiliation(s)
- Eric D Scheeff
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA
- Present address: Razavi-Newman Center for Bioinformatics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Rd., La Jolla, CA 92037, USA
| | - Philip E Bourne
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA
- Department of Pharmacology, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| |
Collapse
|
35
|
Johnston RJ, Copeland JW, Fasnacht M, Etchberger JF, Liu J, Honig B, Hobert O. An unusual Zn-finger/FH2 domain protein controls a left/right asymmetric neuronal fate decision in C. elegans. Development 2006; 133:3317-28. [PMID: 16887832 DOI: 10.1242/dev.02494] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Gene regulatory networks that control the terminally differentiated state of a cell are, by and large, only superficially understood. In a mutant screen aimed at identifying regulators of gene batteries that define the differentiated state of two left/right asymmetric C. elegans gustatory neurons, ASEL and ASER, we have isolated a mutant, fozi-1, with a novel mixed-fate phenotype, characterized by de-repression of ASEL fate in ASER. fozi-1 codes for a protein that functions in the nucleus of ASER to inhibit the expression of the LIM homeobox gene lim-6, neuropeptide-encoding genes and putative chemoreceptors of the GCY gene family. The FOZI-1 protein displays a highly unusual domain architecture, that combines two functionally essential C2H2 zinc-finger domains, which are probably involved in transcriptional regulation, with a formin homology 2 (FH2) domain, normally found only in cytosolic regulators of the actin cytoskeleton. We demonstrate that the FH2 domain of FOZI-1 has lost its actin polymerization function but maintains its phylogenetically ancient ability to homodimerize. fozi-1 genetically interacts with several transcription factors and micro RNAs in the context of specific regulatory network motifs. These network motifs endow the system with properties that provide insights into how cells adopt their stable terminally differentiated states.
Collapse
Affiliation(s)
- Robert J Johnston
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, 701 W. 168th Street, New York, NY 10032, USA
| | | | | | | | | | | | | |
Collapse
|
36
|
Kundrotas PJ, Alexov E. Predicting 3D structures of transient protein-protein complexes by homology. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1764:1498-511. [PMID: 16963323 DOI: 10.1016/j.bbapap.2006.08.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Revised: 07/27/2006] [Accepted: 08/03/2006] [Indexed: 11/26/2022]
Abstract
The paper reports a homology based approach for predicting the 3D structures of full length hetero protein complexes. We have created a database of templates that includes structures of hetero protein-protein complexes as well as domain-domain structures (), which allowed us to expand the template pool up to 418 two-chain entries (at 40% sequence identity). Two protocols were tested-a protocol based on position specific Blast search (Protocol-I) and a protocol based on structural similarity of monomers (Protocol-II). All possible combinations of two monomers (350,284 pairs) in the ProtCom database were subjected to both protocols to predict if they form complexes. The predictions were benchmarked against the ProtCom database resulting to false-true positives ratios of approximately 5:1 and approximately 7:1 and recovery of 19% and 86%, respectively for protocols I and II. From 350,284 trials Protocol-I made only approximately 500 wrong predictions resulting to 0.5% error. In addition, though it was shown that artificially created domain-domain structures can in principle be good templates for modeling full length protein complexes, more sensitive methods are needed to detect homology relations. The quality of the models was assessed using two different criteria such as interfacial residues and overall RMSD. It was found that there is no correlation between these two measures. In many cases the interface residues were predicted correctly, but the overall RMSD was over 6 A and vice versa.
Collapse
Affiliation(s)
- Petras J Kundrotas
- Computational Biophysics and Bioinformatics, Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | |
Collapse
|
37
|
Ohlson T, Aggarwal V, Elofsson A, MacCallum RM. Improved alignment quality by combining evolutionary information, predicted secondary structure and self-organizing maps. BMC Bioinformatics 2006; 7:357. [PMID: 16869963 PMCID: PMC1562450 DOI: 10.1186/1471-2105-7-357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 07/25/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein sequence alignment is one of the basic tools in bioinformatics. Correct alignments are required for a range of tasks including the derivation of phylogenetic trees and protein structure prediction. Numerous studies have shown that the incorporation of predicted secondary structure information into alignment algorithms improves their performance. Secondary structure predictors have to be trained on a set of somewhat arbitrarily defined states (e.g. helix, strand, coil), and it has been shown that the choice of these states has some effect on alignment quality. However, it is not unlikely that prediction of other structural features also could provide an improvement. In this study we use an unsupervised clustering method, the self-organizing map, to assign sequence profile windows to "structural states" and assess their use in sequence alignment. RESULTS The addition of self-organizing map locations as inputs to a profile-profile scoring function improves the alignment quality of distantly related proteins slightly. The improvement is slightly smaller than that gained from the inclusion of predicted secondary structure. However, the information seems to be complementary as the two prediction schemes can be combined to improve the alignment quality by a further small but significant amount. CONCLUSION It has been observed in many studies that predicted secondary structure significantly improves the alignments. Here we have shown that the addition of self-organizing map locations can further improve the alignments as the self-organizing map locations seem to contain some information that is not captured by the predicted secondary structure.
Collapse
Affiliation(s)
- Tomas Ohlson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Varun Aggarwal
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Robert M MacCallum
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
- Division of Cell and Molecular Biology, Imperial College London, London, UK
| |
Collapse
|
38
|
Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 2006; 58:321-8. [PMID: 15523666 PMCID: PMC1408319 DOI: 10.1002/prot.20308] [Citation(s) in RCA: 178] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
| | - Yaoqi Zhou
- *Correspondence to: Dr. Yaoqi Zhou, Howard Hughes Medical Institute, Center for Single Molecule Biophysics and Department of Physiology & Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214. E-mail:
| |
Collapse
|
39
|
Abstract
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function.
Collapse
Affiliation(s)
- Zhexin Xiang
- Center for Molecular Modeling, Center for Information Technology, National Institutes of Health, Building 12A Room 2051, 12 South Drive, Bethesda, Maryland 20892-5624, USA.
| |
Collapse
|
40
|
Tomii K, Hirokawa T, Motono C. Protein structure prediction using a variety of profile libraries and 3D verification. Proteins 2006; 61 Suppl 7:114-121. [PMID: 16187352 DOI: 10.1002/prot.20727] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This study is intended to construct a useful method for fold recognition, regardless of whether the proteins to be compared are evolutionarily related. We developed several descendants of our profile-profile comparison method to make use of known structural information for protein structure prediction. Our prediction strategy in CASP6 is simple. For every CASP6 target, we derived target-template alignments from several different versions of profile-profile comparisons. We then constructed and exhaustively evaluated 3D models based on those alignments. Subsequently, we selected proper model(s) among them. We specifically addressed the validation of our simple approach for protein structure prediction through CASP6 because the fold recognition results of CASP5 revealed areas of improvement in the selection of good models. Consequently, we applied a more stringent method for 3D model evaluation this time. All generated models were evaluated based on a structural quality score calculated by both Verify3D and Prosa2003 programs. It turns out that the prediction results of our human group were supported by the results of three servers. The pipeline that we constructed for our human group prediction and human intervention were also greatly effective in improving prediction models, but the efficacy of our scheme for 3D model evaluation was obscure.
Collapse
Affiliation(s)
- Kentaro Tomii
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
| | | | | |
Collapse
|
41
|
Abstract
Two single-method servers, SPARKS 2 and SP3, participated in automatic-server predictions in CASP6. The overall results for all as well as detailed performance in comparative modeling targets are presented. It is shown that both SPARKS 2 and SP3 are able to recognize their corresponding best templates for all easy comparative modeling targets. The alignment accuracy, however, is not always the best among all the servers. Possible factors are discussed. SPARKS 2 and SP3 fold recognition servers, as well as their executables, are freely available for all academic users on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York, Buffalo, New York 14214, USA
| | | |
Collapse
|
42
|
Wang J, Feng JA. NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins 2006; 58:628-37. [PMID: 15616964 DOI: 10.1002/prot.20359] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known. NdPASA can be accessed online at http://astro.temple.edu/feng/Servers/BioinformaticServers.htm.
Collapse
Affiliation(s)
- Junwen Wang
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, USA
| | | |
Collapse
|
43
|
Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006; 16:374-84. [PMID: 16713709 DOI: 10.1016/j.sbi.2006.05.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/22/2006] [Accepted: 05/08/2006] [Indexed: 10/24/2022]
Abstract
Sequence comparison is a major step in the prediction of protein structure from existing templates in the Protein Data Bank. The identification of potentially remote homologues to be used as templates for modeling target sequences of unknown structure and their accurate alignment remain challenges, despite many years of study. The most recent advances have been in combining as many sources of information as possible--including amino acid variation in the form of profiles or hidden Markov models for both the target and template families, known and predicted secondary structures of the template and target, respectively, the combination of structure alignment for distant homologues and sequence alignment for close homologues to build better profiles, and the anchoring of certain regions of the alignment based on existing biological data. Newer technologies have been applied to the problem, including the use of support vector machines to tackle the fold classification problem for a target sequence and the alignment of hidden Markov models. Finally, using the consensus of many fold recognition methods, whether based on profile-profile alignments, threading or other approaches, continues to be one of the most successful strategies for both recognition and alignment of remote homologues. Although there is still room for improvement in identification and alignment methods, additional progress may come from model building and refinement methods that can compensate for large structural changes between remotely related targets and templates, as well as for regions of misalignment.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
| |
Collapse
|
44
|
Forrest LR, Tang CL, Honig B. On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 2006; 91:508-17. [PMID: 16648166 PMCID: PMC1483079 DOI: 10.1529/biophysj.106.082313] [Citation(s) in RCA: 183] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with C alpha-RMSD values to the native of 2 A or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment, a result which relies on the accuracy of the secondary-structure prediction of the query sequence. The most accurate alignments were obtained using template profiles constructed with the aid of structural alignments. In contrast, a simple sequence-to-sequence alignment algorithm, using a membrane protein-specific substitution matrix, shows no improvement in alignment accuracy. We suggest that profile-to-profile alignment methods should be adopted to maximize the accuracy of homology models of membrane proteins.
Collapse
Affiliation(s)
- Lucy R Forrest
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10032, USA
| | | | | |
Collapse
|
45
|
Abstract
Is highly approximate knowledge of a protein's backbone structure sufficient to successfully identify its family, superfamily, and tertiary fold? To explore this question, backbone dihedral angles were extracted from the known three-dimensional structure of 2,439 proteins and mapped into 36 labeled, 60 degrees x 60 degrees bins, called mesostates. Using this coarse-grained mapping, protein conformation can be approximated by a linear sequence of mesostates. These linear strings can then be aligned and assessed by conventional sequence-comparison methods. We report that the mesostate sequence is sufficient to recognize a protein's family, superfamily, and fold with good fidelity.
Collapse
Affiliation(s)
- Haipeng Gong
- Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218-2608, USA
| | | |
Collapse
|
46
|
Cheng J, Baldi P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006; 22:1456-63. [PMID: 16547073 DOI: 10.1093/bioinformatics/btl102] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. RESULTS Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is approximately 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.
Collapse
Affiliation(s)
- Jianlin Cheng
- Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA
| | | |
Collapse
|
47
|
Casbon JA, Saqi MAS. On single and multiple models of protein families for the detection of remote sequence relationships. BMC Bioinformatics 2006; 7:48. [PMID: 16448555 PMCID: PMC1397874 DOI: 10.1186/1471-2105-7-48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Accepted: 01/31/2006] [Indexed: 11/23/2022] Open
Abstract
Background The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. Results Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. Conclusion Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.
Collapse
Affiliation(s)
- James A Casbon
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| | - Mansoor AS Saqi
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| |
Collapse
|
48
|
Murray PS, Li Z, Wang J, Tang CL, Honig B, Murray D. Retroviral matrix domains share electrostatic homology: models for membrane binding function throughout the viral life cycle. Structure 2006; 13:1521-31. [PMID: 16216583 DOI: 10.1016/j.str.2005.07.010] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Revised: 07/01/2005] [Accepted: 07/09/2005] [Indexed: 11/25/2022]
Abstract
The matrix domain (MA) of Gag polyproteins performs multiple functions throughout the retroviral life cycle. MA structures have an electropositive surface patch that is implicated in membrane association. Here, we use computational methods to demonstrate that electrostatic control of membrane binding is a central characteristic of all retroviruses. We are able to explain a wide range of experimental observations and provide a level of quantitative and molecular detail that has been inaccessible to experiment. We further predict that MA may exist in a variety of oligomerization states and propose mechanistic models for the effects of phosphoinositides and phosphorylation. The calculations provide a conceptual model for how non-myristoylated and myristoylated MAs behave similarly in assembly and disassembly. Hence, they provide a unified quantitative picture of the structural and energetic origins of the entire range of MA function and thus enhance, extend, and integrate previous observations on individual stages of the process.
Collapse
Affiliation(s)
- Paul S Murray
- Department of Microbiology and Immunology and The Institute for Computational Biomedicine, Weill Medical College of Cornell, New York, New York 10021, USA
| | | | | | | | | | | |
Collapse
|
49
|
Abstract
In recent years, there has been significant progress in the ability to predict the three-dimensional structure of proteins from their amino acid sequence. Progress has been due to new methods to extract the growing amount of information in sequence and structure databases and improved computational descriptions of protein energetics. This review summarizes recent advances in these areas and describes a number of novel biological applications made possible by structure prediction. Despite remaining challenges, protein structure prediction is becoming an extremely useful tool in understanding phenomena in modern molecular and cell biology.
Collapse
Affiliation(s)
- Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
50
|
Shatsky M, Nussinov R, Wolfson HJ. Optimization of multiple-sequence alignment based on multiple-structure alignment. Proteins 2005; 62:209-17. [PMID: 16294339 DOI: 10.1002/prot.20665] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.
Collapse
Affiliation(s)
- Maxim Shatsky
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel.
| | | | | |
Collapse
|