1
|
Sequence Pattern for Supersecondary Structure of Sandwich-Like Proteins. Methods Mol Biol 2019. [PMID: 30945226 DOI: 10.1007/978-1-4939-9161-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The goal is to define sequence characteristics of beta-sandwich proteins that are unique for the beta-sandwich supersecondary structure (SSS). Finding of the conserved residues that are critical for protein structure can often be accomplished with homology methods, but these methods are not always adequate as residues with similar structural role do not always occupy the same position as determined by sequence alignment. In this paper, we show how to identify residues that play the same structural role in the different proteins of the same SSS, even when these residue positions cannot be aligned with sequence alignment methods. The SSS characteristics are (a) a set of positions in each strand that are involved in the formation of a hydrophobic core, residue content, and correlations of residues at these key positions, (b) maximum allowable number of "low-frequency residues" for each strand, (c) minimum allowed number of "high-frequency" residues for each loop, and (d) minimum and maximum lengths of each loop. These sequence characteristics are referred to as "sequence pattern" for their respective SSS. The high specificity and sensitivity for a particular SSS are confirmed by applying this pattern to all protein structures in the SCOP data bank. We present here the pattern for one of the most common SSS of beta-sandwich proteins.
Collapse
|
2
|
Fotoohifiroozabadi S, Mohamad MS, Deris S. NAHAL-Flex: A Numerical and Alphabetical Hinge Detection Algorithm for Flexible Protein Structure Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:934-943. [PMID: 28534783 DOI: 10.1109/tcbb.2017.2705080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
Collapse
|
3
|
Khadka B, Adeolu M, Blankenship RE, Gupta RS. Novel insights into the origin and diversification of photosynthesis based on analyses of conserved indels in the core reaction center proteins. PHOTOSYNTHESIS RESEARCH 2017; 131:159-171. [PMID: 27638319 DOI: 10.1007/s11120-016-0307-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 09/07/2016] [Indexed: 06/06/2023]
Abstract
The evolution and diversification of different types of photosynthetic reaction centers (RCs) remains an important unresolved problem. We report here novel sequence features of the core proteins from Type I RCs (RC-I) and Type II RCs (RC-II) whose analyses provide important insights into the evolution of the RCs. The sequence alignments of the RC-I core proteins contain two conserved inserts or deletions (indels), a 3 amino acid (aa) indel that is uniquely found in all RC-I homologs from Cyanobacteria (both PsaA and PsaB) and a 1 aa indel that is specifically shared by the Chlorobi and Acidobacteria homologs. Ancestral sequence reconstruction provides evidence that the RC-I core protein from Heliobacteriaceae (PshA), lacking these indels, is most closely related to the ancestral RC-I protein. Thus, the identified 3 aa and 1 aa indels in the RC-I protein sequences must have been deletions, which occurred, respectively, in an ancestor of the modern Cyanobacteria containing a homodimeric form of RC-I and in a common ancestor of the RC-I core protein from Chlorobi and Acidobacteria. We also report a conserved 1 aa indel in the RC-II protein sequences that is commonly shared by all homologs from Cyanobacteria but not found in the homologs from Chloroflexi, Proteobacteria and Gemmatimonadetes. Ancestral sequence reconstruction provides evidence that the RC-II subunits lacking this indel are more similar to the ancestral RC-II protein. The results of flexible structural alignments of the indel-containing region of the RC-II protein with the homologous region in the RC-I core protein, which shares structural similarity with the RC-II homologs, support the view that the 1 aa indel present in the RC-II homologs from Cyanobacteria is a deletion, which was not present in the ancestral form of the RC-II protein. Our analyses of the conserved indels found in the RC-I and RC-II proteins, thus, support the view that the earliest photosynthetic lineages with living descendants likely contained only a single RC (RC-I or RC-II), and the presence of both RC-I and RC-II in a linked state, as found in the modern Cyanobacteria, is a derivation from these earlier phototrophs.
Collapse
Affiliation(s)
- Bijendra Khadka
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, L8N 3Z5, Canada
| | - Mobolaji Adeolu
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, L8N 3Z5, Canada
| | - Robert E Blankenship
- Department of Biology and Department of Chemistry, Washington University in St. Louis, St. Louis, MO, 63130, USA
| | - Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, L8N 3Z5, Canada.
| |
Collapse
|
4
|
Ritchie DW. Calculating and scoring high quality multiple flexible protein structure alignments. Bioinformatics 2016; 32:2650-8. [PMID: 27187202 DOI: 10.1093/bioinformatics/btw300] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 05/07/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Calculating multiple protein structure alignments (MSAs) is important for understanding functional and evolutionary relationships between protein families, and for modeling protein structures by homology. While incorporating backbone flexibility promises to circumvent many of the limitations of rigid MSA algorithms, very few flexible MSA algorithms exist today. This article describes several novel improvements to the Kpax algorithm which allow high quality flexible MSAs to be calculated. This article also introduces a new Gaussian-based MSA quality measure called 'M-score', which circumvents the pitfalls of RMSD-based quality measures. RESULTS As well as calculating flexible MSAs, the new version of Kpax can also score MSAs from other aligners and from previously aligned reference datasets. Results are presented for a large-scale evaluation of the Homstrad, SABmark and SISY benchmark sets using Kpax and Matt as examples of state-of-the-art flexible aligners and 3DCOMB as an example of a state-of-the-art rigid aligner. These results demonstrate the utility of the M-score as a measure of MSA quality and show that high quality MSAs may be achieved when structural flexibility is properly taken into account. AVAILABILITY AND IMPLEMENTATION Kpax 5.0 may be downloaded for academic use at http://kpax.loria.fr/ CONTACT dave.ritchie@inria.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|
6
|
Edwards H, Deane CM. Structural Bridges through Fold Space. PLoS Comput Biol 2015; 11:e1004466. [PMID: 26372166 PMCID: PMC4570669 DOI: 10.1371/journal.pcbi.1004466] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/12/2015] [Indexed: 12/05/2022] Open
Abstract
Several protein structure classification schemes exist that partition the protein universe into structural units called folds. Yet these schemes do not discuss how these units sit relative to each other in a global structure space. In this paper we construct networks that describe such global relationships between folds in the form of structural bridges. We generate these networks using four different structural alignment methods across multiple score thresholds. The networks constructed using the different methods remain a similar distance apart regardless of the probability threshold defining a structural bridge. This suggests that at least some structural bridges are method specific and that any attempt to build a picture of structural space should not be reliant on a single structural superposition method. Despite these differences all representations agree on an organisation of fold space into five principal community structures: all-α, all-β sandwiches, all-β barrels, α/β and α + β. We project estimated fold ages onto the networks and find that not only are the pairings of unconnected folds associated with higher age differences than bridged folds, but this difference increases with the number of networks displaying an edge. We also examine different centrality measures for folds within the networks and how these relate to fold age. While these measures interpret the central core of fold space in varied ways they all identify the disposition of ancestral folds to fall within this core and that of the more recently evolved structures to provide the peripheral landscape. These findings suggest that evolutionary information is encoded along these structural bridges. Finally, we identify four highly central pivotal folds representing dominant topological features which act as key attractors within our landscapes. Folds are considered to be the structural units which make up the protein universe. Structural classification schemes focus on the assignment and organisation of protein domains into folds. However, they do not suggest how different folds might relate to one another in a global way. We introduce the concept of bridges through fold space: significant similarities between these units. We consider four alignment methods and a dynamic approach to placing these bridges. A greater consensus between these methods cannot be achieved by simply increasing the stringency with which edges are assigned. Instead, we emphasise the importance of considering consensus maps and only report results where there is agreement across all networks. It is possible that a study of the bridges may reveal evolutionary relationships. Based on a phylogenetic analysis of structures, we find that bridges consistently fall between folds which evolved at similar times. Moreover, the landscapes all consist of a core of older folds, with younger structures more often seen at the periphery. Finally we identify four pivotal folds in the landscapes. They contain topological motifs which unite disparate regions of fold space.
Collapse
Affiliation(s)
- Hannah Edwards
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
7
|
Stamm M, Forrest LR. Structure alignment of membrane proteins: Accuracy of available tools and a consensus strategy. Proteins 2015; 83:1720-32. [PMID: 26178143 DOI: 10.1002/prot.24857] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Revised: 05/07/2015] [Accepted: 06/07/2015] [Indexed: 12/31/2022]
Abstract
Protein structure alignment methods are used for the detection of evolutionary and functionally related positions in proteins. A wide array of different methods are available, but the choice of the best method is often not apparent to the user. Several studies have assessed the alignment accuracy and consistency of structure alignment methods, but none of these explicitly considered membrane proteins, which are important targets for drug development and have distinct structural features. Here, we compared 13 widely used pairwise structural alignment methods on a test set of homologous membrane protein structures (called HOMEP3). Each pair of structures was aligned and the corresponding sequence alignment was used to construct homology models. The model accuracy compared to the known structures was assessed using scoring functions not incorporated in the tested structural alignment methods. The analysis shows that fragment-based approaches such as FR-TM-align are the most useful for aligning structures of membrane proteins. Moreover, fragment-based approaches are more suitable for comparison of protein structures that have undergone large conformational changes. Nevertheless, no method was clearly superior to all other methods. Additionally, all methods lack a measure to rate the reliability of a position within a structure alignment. To solve both of these problems, we propose a consensus-type approach, combining alignments from four different methods, namely FR-TM-align, DaliLite, MATT, and FATCAT. Agreement between the methods is used to assign confidence values to each position of the alignment. Overall, we conclude that there remains scope for the improvement of structural alignment methods for membrane proteins.
Collapse
Affiliation(s)
- Marcus Stamm
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt Am Main, Germany
| | - Lucy R Forrest
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt Am Main, Germany.,Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
8
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
9
|
Consequences of domain insertion on sequence-structure divergence in a superfold. Proc Natl Acad Sci U S A 2013; 110:E3381-7. [PMID: 23959887 DOI: 10.1073/pnas.1305519110] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although the universe of protein structures is vast, these innumerable structures can be categorized into a finite number of folds. New functions commonly evolve by elaboration of existing scaffolds, for example, via domain insertions. Thus, understanding structural diversity of a protein fold evolving via domain insertions is a fundamental challenge. The haloalkanoic dehalogenase superfamily serves as an excellent model system wherein a variable cap domain accessorizes the ubiquitous Rossmann-fold core domain. Here, we determine the impact of the cap-domain insertion on the sequence and structure divergence of the core domain. Through quantitative analysis on a unique dataset of 154 core-domain-only and cap-domain-only structures, basic principles of their evolution have been uncovered. The relationship between sequence and structure divergence of the core domain is shown to be monotonic and independent of the corresponding type of domain insert, reflecting the robustness of the Rossmann fold to mutation. However, core domains with the same cap type share greater similarity at the sequence and structure levels, suggesting interplay between the cap and core domains. Notably, results reveal that the variance in structure maps to α-helices flanking the central β-sheet and not to the domain-domain interface. Collectively, these results hint at intramolecular coevolution where the fold diverges differentially in the context of an accessory domain, a feature that might also apply to other multidomain superfamilies.
Collapse
|
10
|
Topham CM, Rouquier M, Tarrat N, André I. Adaptive Smith-Waterman residue match seeding for protein structural alignment. Proteins 2013; 81:1823-39. [DOI: 10.1002/prot.24327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 04/22/2013] [Accepted: 05/15/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Christopher M. Topham
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Mickaël Rouquier
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Nathalie Tarrat
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Isabelle André
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| |
Collapse
|