1
|
Kister AE. Beta Sandwich-Like Folds: Sequences, Contacts, Classification of Invariant Substructures and Beta Sandwich Protein Grammar. Methods Mol Biol 2025; 2870:51-62. [PMID: 39543030 DOI: 10.1007/978-1-0716-4213-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
This chapter addresses the following fundamental question: Do sequences of protein domains with sandwich architecture have common sequence characteristics even though they belong to different superfamilies and folds? The analysis was carried out in two stages: (1) determination of domain substructures shared by all sandwich proteins and (2) detection of common sequence characteristics within the substructures. Analysis of supersecondary structures in domains of proteins revealed two types of four-strand substructures that are common to sandwich proteins. At least one of these common substructures was found in proteins of 42 sandwich-like folds (per structural classification in the CATH database). A comparison of sequence fragments and residue-residue contacts constituting common substructures revealed specific distributions of hydrophobic residues in these chains. The shared sequences and structural characteristics can be conceptualized as the "grammatical rules of beta protein linguistics." Understanding the structural and sequence commonalities of sandwich proteins may prove useful for rational protein design.
Collapse
|
2
|
Corominas-Murtra B, Seoane LF, Solé R. Zipf's Law, unbounded complexity and open-ended evolution. J R Soc Interface 2018; 15:20180395. [PMID: 30958235 PMCID: PMC6303796 DOI: 10.1098/rsif.2018.0395] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Accepted: 11/19/2018] [Indexed: 11/12/2022] Open
Abstract
A major problem for evolutionary theory is understanding the so-called open-ended nature of evolutionary change, from its definition to its origins. Open-ended evolution (OEE) refers to the unbounded increase in complexity that seems to characterize evolution on multiple scales. This property seems to be a characteristic feature of biological and technological evolution and is strongly tied to the generative potential associated with combinatorics, which allows the system to grow and expand their available state spaces. Interestingly, many complex systems presumably displaying OEE, from language to proteins, share a common statistical property: the presence of Zipf's Law. Given an inventory of basic items (such as words or protein domains) required to build more complex structures (sentences or proteins) Zipf's Law tells us that most of these elements are rare whereas a few of them are extremely common. Using algorithmic information theory, in this paper we provide a fundamental definition for open-endedness, which can be understood as postulates. Its statistical counterpart, based on standard Shannon information theory, has the structure of a variational problem which is shown to lead to Zipf's Law as the expected consequence of an evolutionary process displaying OEE. We further explore the problem of information conservation through an OEE process and we conclude that statistical information (standard Shannon information) is not conserved, resulting in the paradoxical situation in which the increase of information content has the effect of erasing itself. We prove that this paradox is solved if we consider non-statistical forms of information. This last result implies that standard information theory may not be a suitable theoretical framework to explore the persistence and increase of the information content in OEE systems.
Collapse
Affiliation(s)
| | - Luís F. Seoane
- Department of Physics, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- UPF-PRBB, ICREA-Complex Systems Lab, Dr Aiguader 88, 08003 Barcelona, Spain
- Institute Evolutionary Biology, UPF-CSIC, Pg Maritim Barceloneta 37, 08003 Barcelona, Spain
| | - Ricard Solé
- UPF-PRBB, ICREA-Complex Systems Lab, Dr Aiguader 88, 08003 Barcelona, Spain
- Institute Evolutionary Biology, UPF-CSIC, Pg Maritim Barceloneta 37, 08003 Barcelona, Spain
- Santa Fe Institute, 1399 Hyde Park Road, 87501 Santa Fe, NM, USA
| |
Collapse
|
3
|
Choi JH, Lee H, Choi HR, Cho M. Graph Theory and Ion and Molecular Aggregation in Aqueous Solutions. Annu Rev Phys Chem 2018; 69:125-149. [DOI: 10.1146/annurev-physchem-050317-020915] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Jun-Ho Choi
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science, Seoul 02841, Republic of Korea
- Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
- Current affiliation: Department of Chemistry, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea
| | - Hochan Lee
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science, Seoul 02841, Republic of Korea
- Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
| | - Hyung Ran Choi
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science, Seoul 02841, Republic of Korea
- Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
| | - Minhaeng Cho
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science, Seoul 02841, Republic of Korea
- Department of Chemistry, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
4
|
Chaotropes trigger conformational rearrangements differently in Concanavalin A. J CHEM SCI 2017. [DOI: 10.1007/s12039-017-1333-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
5
|
O'Rourke KF, Gorman SD, Boehr DD. Biophysical and computational methods to analyze amino acid interaction networks in proteins. Comput Struct Biotechnol J 2016; 14:245-51. [PMID: 27441044 PMCID: PMC4939391 DOI: 10.1016/j.csbj.2016.06.002] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 06/04/2016] [Accepted: 06/13/2016] [Indexed: 12/20/2022] Open
Abstract
Globular proteins are held together by interacting networks of amino acid residues. A number of different structural and computational methods have been developed to interrogate these amino acid networks. In this review, we describe some of these methods, including analyses of X-ray crystallographic data and structures, computer simulations, NMR data, and covariation among protein sequences, and indicate the critical insights that such methods provide into protein function. This information can be leveraged towards the design of new allosteric drugs, and the engineering of new protein function and protein regulation strategies.
Collapse
Affiliation(s)
- Kathleen F O'Rourke
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott D Gorman
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - David D Boehr
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
6
|
Choi JH, Cho M. Ion aggregation in high salt solutions. IV. Graph-theoretical analyses of ion aggregate structure and water hydrogen bonding network. J Chem Phys 2015; 143:104110. [DOI: 10.1063/1.4930608] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jun-Ho Choi
- Department of Chemistry, Korea University, Seoul 136-713, South Korea
| | - Minhaeng Cho
- Department of Chemistry, Korea University, Seoul 136-713, South Korea
- Center for Molecular Spectroscopy and Dynamics, Institute for Basic Science, Korea University, Seoul 136-713, South Korea
| |
Collapse
|
7
|
Abstract
Modularity is known as one of the most important features of protein's robust and efficient design. The architecture and topology of proteins play a vital role by providing necessary robust scaffolds to support organism's growth and survival in constant evolutionary pressure. These complex biomolecules can be represented by several layers of modular architecture, but it is pivotal to understand and explore the smallest biologically relevant structural component. In the present study, we have developed a component-based method, using protein's secondary structures and their arrangements (i.e. patterns) in order to investigate its structural space. Our result on all-alpha protein shows that the known structural space is highly populated with limited set of structural patterns. We have also noticed that these frequently observed structural patterns are present as modules or "building blocks" in large proteins (i.e. higher secondary structure content). From structural descriptor analysis, observed patterns are found to be within similar deviation; however, frequent patterns are found to be distinctly occurring in diverse functions e.g. in enzymatic classes and reactions. In this study, we are introducing a simple approach to explore protein structural space using combinatorial- and graph-based geometry methods, which can be used to describe modularity in protein structures. Moreover, analysis indicates that protein function seems to be the driving force that shapes the known structure space.
Collapse
Affiliation(s)
- Taushif Khan
- a School of Computational & Integrative Sciences , Jawaharlal Nehru University , New Delhi 110067 , India
| | - Indira Ghosh
- a School of Computational & Integrative Sciences , Jawaharlal Nehru University , New Delhi 110067 , India
| |
Collapse
|
8
|
Choi JH, Cho M. Ion aggregation in high salt solutions. II. Spectral graph analysis of water hydrogen-bonding network and ion aggregate structures. J Chem Phys 2014; 141:154502. [DOI: 10.1063/1.4897638] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Affiliation(s)
- Jun-Ho Choi
- Department of Chemistry, Korea University, Seoul 136-713, South Korea
| | - Minhaeng Cho
- Department of Chemistry, Korea University, Seoul 136-713, South Korea
| |
Collapse
|
9
|
Probabilistic grammatical model for helix-helix contact site classification. Algorithms Mol Biol 2013; 8:31. [PMID: 24350601 PMCID: PMC3892132 DOI: 10.1186/1748-7188-8-31] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2013] [Accepted: 11/28/2013] [Indexed: 11/25/2022] Open
Abstract
Background Hidden Markov Models power many state‐of‐the‐art tools in
the field of protein bioinformatics. While excelling in their tasks, these
methods of protein analysis do not convey directly information on
medium‐ and long‐range residue‐residue interactions. This
requires an expressive power of at least context‐free grammars.
However, application of more powerful grammar formalisms to protein analysis
has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for
problem‐specific protein languages and apply it to classification of
transmembrane helix‐helix pairs configurations. The core of the model
consists of a probabilistic context‐free grammar, automatically
inferred by a genetic algorithm from only a generic set of
expert‐based rules and positive training samples. The model was
applied to produce sequence based descriptors of four classes of
transmembrane helix‐helix contact site configurations. The highest
performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability
of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for
analysis of protein sequences outperforms the state of the art in the task
of helix‐helix contact site classification. However, this is achieved
without necessarily requiring modeling long range dependencies between
interacting residues. A significant feature of our approach is that grammar
rules and parse trees are human‐readable. Thus they could provide
biologically meaningful information for molecular biologists.
Collapse
|
10
|
Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM. Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach. PLoS One 2012; 7:e50039. [PMID: 23185527 PMCID: PMC3503725 DOI: 10.1371/journal.pone.0050039] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 10/15/2012] [Indexed: 11/19/2022] Open
Abstract
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or "words". We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and "compressed" English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., "key words") and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.
Collapse
Affiliation(s)
- Kenta Motomura
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
- Department of Information Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Tomohiro Fujita
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Motosuke Tsutsumi
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Satsuki Kikuzato
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Morikazu Nakamura
- Department of Information Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Joji M. Otaki
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| |
Collapse
|
11
|
Searls DB. A primer in macromolecular linguistics. Biopolymers 2012; 99:203-17. [PMID: 23034580 DOI: 10.1002/bip.22101] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 05/25/2012] [Indexed: 01/01/2023]
Abstract
Polymeric macromolecules, when viewed abstractly as strings of symbols, can be treated in terms of formal language theory, providing a mathematical foundation for characterizing such strings both as collections and in terms of their individual structures. In addition this approach offers a framework for analysis of macromolecules by tools and conventions widely used in computational linguistics. This article introduces the ways that linguistics can be and has been applied to molecular biology, covering the relevant formal language theory at a relatively nontechnical level. Analogies between macromolecules and human natural language are used to provide intuitive insights into the relevance of grammars, parsing, and analysis of language complexity to biology.
Collapse
|
12
|
Subramani A, Wei Y, Floudas CA. ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. AIChE J 2012; 58:1619-1637. [PMID: 23049093 DOI: 10.1002/aic.12669] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The three-dimensional (3-D) structure prediction of proteins, given their amino acid sequence, is addressed using the first principles-based approach ASTRO-FOLD 2.0. The key features presented are: (1) Secondary structure prediction using a novel optimization-based consensus approach, (2) β-sheet topology prediction using mixed-integer linear optimization (MILP), (3) Residue-to-residue contact prediction using a high-resolution distance-dependent force field and MILP formulation, (4) Tight dihedral angle and distance bound generation for loop residues using dihedral angle clustering and non-linear optimization (NLP), (5) 3-D structure prediction using deterministic global optimization, stochastic conformational space annealing, and the full-atomistic ECEPP/3 potential, (6) Near-native structure selection using a traveling salesman problem-based clustering approach, ICON, and (7) Improved bound generation using chemical shifts of subsets of heavy atoms, generated by SPARTA and CS23D. Computational results of ASTRO-FOLD 2.0 on 47 blind targets of the recently concluded CASP9 experiment are presented.
Collapse
Affiliation(s)
- A Subramani
- Dept. of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544
| | | | | |
Collapse
|
13
|
Subramani A, Floudas CA. β-sheet topology prediction with high precision and recall for β and mixed α/β proteins. PLoS One 2012; 7:e32461. [PMID: 22427840 PMCID: PMC3302896 DOI: 10.1371/journal.pone.0032461] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 01/26/2012] [Indexed: 11/19/2022] Open
Abstract
The prediction of the correct -sheet topology for pure and mixed proteins is a critical intermediate step toward the three dimensional protein structure prediction. The predicted beta sheet topology provides distance constraints between sequentially separated residues, which reduces the three dimensional search space for a protein structure prediction algorithm. Here, we present a novel mixed integer linear optimization based framework for the prediction of -sheet topology in and mixed proteins. The objective is to maximize the total strand-to-strand contact potential of the protein. A large number of physical constraints are applied to provide biologically meaningful topology results. The formulation permits the creation of a rank-ordered list of preferred -sheet arrangements. Finally, the generated topologies are re-ranked using a fully atomistic approach involving torsion angle dynamics and clustering. For a large, non-redundant data set of 2102 and mixed proteins with at least 3 strands taken from the PDB, the proposed approach provides the top 5 solutions with average precision and recall greater than 78%. Consistent results are obtained in the -sheet topology prediction for blind targets provided during the CASP8 and CASP9 experiments, as well as for actual and predicted secondary structures. The -sheet topology prediction algorithm, BeST, is available to the scientific community at http://selene.princeton.edu/BeST/.
Collapse
Affiliation(s)
| | - Christodoulos A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
14
|
Abstract
The quantitative underpinning of the information content of biosequences represents an elusive goal and yet also an obvious prerequisite to the quantitative modeling and study of biological function and evolution. Several past studies have addressed the question of what distinguishes biosequences from random strings, the latter being clearly unpalatable to the living cell. Such studies typically analyze the organization of biosequences in terms of their constituent characters or substrings and have, in particular, consistently exposed a tenacious lack of compressibility on behalf of biosequences. This article attempts, perhaps for the first time, an assessement of the structure and randomness of polypeptides in terms on newly introduced parameters that relate to the vocabulary of their (suitably constrained) subsequences rather than their substrings. It is shown that such parameters grasp structural/functional information, and are related to each other under a specific set of rules that span biochemically diverse polypeptides. Measures on subsequences separate few amino acid strings from their random permutations, but show that the random permutations of most polypeptides amass along specific linear loci.
Collapse
Affiliation(s)
- Alberto Apostolico
- College of Computing, Georgia Institute of Technology, Atlanta, GA 30318, USA.
| | | |
Collapse
|
15
|
Data Compression Concepts and Algorithms and their Applications to Bioinformatics. ENTROPY 2009; 12:34. [PMID: 20157640 DOI: 10.3390/e12010034] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.
Collapse
|
16
|
May P, Kreuchwig A, Steinke T, Koch I. PTGL: a database for secondary structure-based protein topologies. Nucleic Acids Res 2009; 38:D326-30. [PMID: 19906706 PMCID: PMC2808981 DOI: 10.1093/nar/gkp980] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
With growing amount of experimental data, the number of known protein structures also increases continuously. Classification of protein structures helps to understand relationships between protein structure and function. The main classification methods based on secondary structures are SCOP, CATH and TOPS, which all classify under different aspects, and therefore can lead to different results. We developed a mathematically unique representation of protein structure topologies at a higher abstraction level providing new aspects of classification and enabling for a fast search through the data. Protein Topology Graph Library (PTGL; http://ptgl.zib.de) aims at providing a database on protein secondary structure topologies, including search facilities, the visualization as intuitive topology diagrams as well as in the 3D structure, and additional information. Secondary structure-based protein topologies are represented uniquely as undirected labeled graphs in four different ways allowing for exploration under different aspects. The linear notations, and the 2D and 3D diagrams of each notation facilitate a deeper understanding of protein topologies. Several search functions for topologies and sub-topologies, BLAST search possibility, and links to SCOP, CATH and PDBsum support individual and large-scale investigation of protein structures. Currently, PTGL comprises topologies of 54 859 protein structures. Main structural patterns for common structural motifs like TIM-barrel or Jelly Roll are pre-implemented, and can easily be searched.
Collapse
Affiliation(s)
- Patrick May
- Max Planck Institute for Molecular Plant Physiology, Bioinformatics, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany.
| | | | | | | |
Collapse
|
17
|
Martin CH, Nielsen DR, Solomon KV, Prather KLJ. Synthetic metabolism: engineering biology at the protein and pathway scales. ACTA ACUST UNITED AC 2009; 16:277-86. [PMID: 19318209 DOI: 10.1016/j.chembiol.2009.01.010] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 01/21/2009] [Accepted: 01/22/2009] [Indexed: 11/25/2022]
Abstract
Biocatalysis has become a powerful tool for the synthesis of high-value compounds, particularly so in the case of highly functionalized and/or stereoactive products. Nature has supplied thousands of enzymes and assembled them into numerous metabolic pathways. Although these native pathways can be use to produce natural bioproducts, there are many valuable and useful compounds that have no known natural biochemical route. Consequently, there is a need for both unnatural metabolic pathways and novel enzymatic activities upon which these pathways can be built. Here, we review the theoretical and experimental strategies for engineering synthetic metabolic pathways at the protein and pathway scales, and highlight the challenges that this subfield of synthetic biology currently faces.
Collapse
Affiliation(s)
- Collin H Martin
- Department of Chemical Engineering, Synthetic Biology Engineering Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
18
|
Kifer I, Nussinov R, Wolfson HJ. Constructing templates for protein structure prediction by simulation of protein folding pathways. Proteins 2009; 73:380-94. [PMID: 18433063 DOI: 10.1002/prot.22073] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.
Collapse
Affiliation(s)
- Ilona Kifer
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
19
|
Jeong J, Berman P, Przytycka TM. Improving strand pairing prediction through exploring folding cooperativity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:484-491. [PMID: 18989036 PMCID: PMC2597093 DOI: 10.1109/tcbb.2008.88] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The topology of beta-sheets is defined by the pattern of hydrogen-bonded strand pairing. Therefore, predicting hydrogen bonded strand partners is a fundamental step towards predicting beta-sheet topology. At the same time, finding the correct partners is very difficult due to long range interactions involved in strand pairing. Additionally, patterns of amino acids involved, in beta-sheet formations are very general and therefore difficult to use for computational recognition of specific contacts between strands. In this work, we report a new strand pairing algorithm. To address above mentioned difficulties, our algorithm attempts to mimic elements of the folding process. Namely, in addition to ensuring that the predicted hydrogen bonded strand pairs satisfy basic global consistency constraints, it takes into account hypothetical folding pathways. Consistently with this view, introducing hydrogen bonds between a pair of strands changes the probabilities of forming hydrogen bonds between other pairs of strand. We demonstrate that this approach provides an improvement over previously proposed algorithms. We also compare the performance of this method to that of a global optimization algorithm that poses the problem as integer linear programming optimization problem and solves it using ILOG CPLEX package.
Collapse
Affiliation(s)
- Jieun Jeong
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
| | | | | |
Collapse
|
20
|
Goldstein RA. The structure of protein evolution and the evolution of protein structure. Curr Opin Struct Biol 2008; 18:170-7. [DOI: 10.1016/j.sbi.2008.01.006] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2007] [Revised: 12/20/2007] [Accepted: 01/09/2008] [Indexed: 11/29/2022]
|
21
|
Gimona M. Protein Linguistics and the Modular Code of the Cytoskeleton. BIOSEMIOTICS 2008:189-206. [DOI: 10.1007/978-1-4020-6340-4_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
22
|
Abstract
An important puzzle in structural biology is the question of how proteins are able to fold so quickly into their unique native structures. There is much evidence that protein folding is hierarchic. In that case, folding routes are not linear, but have a tree structure. Trees are commonly used to represent the grammatical structure of natural language sentences, and chart parsing algorithms efficiently search the space of all possible trees for a given input string. Here we show that one such method, the CKY algorithm, can be useful both for providing novel insight into the physical protein folding process, and for computational protein structure prediction. As proof of concept, we apply this algorithm to the HP lattice model of proteins. Our algorithm identifies all direct folding route trees to the native state and allows us to construct a simple model of the folding process. Despite its simplicity, our model provides an account for the fact that folding rates depend only on the topology of the native state but not on sequence composition.
Collapse
Affiliation(s)
- Julia Hockenmaier
- Institute for Research in Cognitive Science and Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104-6228, USA.
| | | | | |
Collapse
|
23
|
Viksna J, Gilbert D. Assessment of the probabilities for evolutionary structural changes in protein folds. Bioinformatics 2007; 23:832-41. [PMID: 17282999 DOI: 10.1093/bioinformatics/btm022] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. RESULTS We have tried to assess the comparative probabilities for a number of known structural changes, and to relate the probabilities of such changes with the distance between protein sequences. We have formalized these structural changes using a topological representation of structures (TOPS), and have developed an algorithm for measuring structural distances that involve few evolutionary steps. The probabilities of structural changes then were estimated on the basis of all-against-all comparisons of the sequence and structure of protein domains from the CATH-95 representative set. The results obtained are reasonably consistent for a number of different data subsets and permit the identification of several 'most popular' types of evolutionary changes in protein structure. The results also suggest that alterations in protein structure are more likely to occur when the sequence similarity is >10% (the average similarity being approximately 6% for the data sets employed in this study), and that the distribution of probabilities of structural changes is fairly uniform within the interval of 15-50% sequence similarity. AVAILABILITY The algorithms have been implemented on the Windows operating system in C++ and using the Borland Visual Component Library. The source code is available on request from the first author. The data sets used for this study (representative sets of protein domains, matrices of sequence similarities and structural distances) are available on http://bioinf.mii.lu.lv/epsrc_project/struct_ev.html.
Collapse
Affiliation(s)
- Juris Viksna
- Institute of Mathematics and Computer Science, University of Latvia, Rainis boulevard 29, Riga LV-1459, Latvia.
| | | |
Collapse
|
24
|
Brinda K, Surolia A, Vishveshwara S. Insights into the quaternary association of proteins through structure graphs: a case study of lectins. Biochem J 2006; 391:1-15. [PMID: 16173917 PMCID: PMC1237133 DOI: 10.1042/bj20050434] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The unique three-dimensional structure of both monomeric and oligomeric proteins is encoded in their sequence. The biological functions of proteins are dependent on their tertiary and quaternary structures, and hence it is important to understand the determinants of quaternary association in proteins. Although a large number of investigations have been carried out in this direction, the underlying principles of protein oligomerization are yet to be completely understood. Recently, new insights into this problem have been gained from the analysis of structure graphs of proteins belonging to the legume lectin family. The legume lectins are an interesting family of proteins with very similar tertiary structures but varied quaternary structures. Hence they have become a very good model with which to analyse the role of primary structures in determining the modes of quaternary association. The present review summarizes the results of a legume lectin study as well as those obtained from a similar analysis carried out here on the animal lectins, namely galectins, pentraxins, calnexin, calreticulin and rhesus rotavirus Vp4 sialic-acid-binding domain. The lectin structure graphs have been used to obtain clusters of non-covalently interacting amino acid residues at the intersubunit interfaces. The present study, performed along with traditional sequence alignment methods, has provided the signature sequence motifs for different kinds of quaternary association seen in lectins. Furthermore, the network representation of the lectin oligomers has enabled us to detect the residues which make extensive interactions ('hubs') across the oligomeric interfaces that can be targetted for interface-destabilizing mutations. The present review also provides an overview of the methodology involved in representing oligomeric protein structures as connected networks of amino acid residues. Further, it illustrates the potential of such a representation in elucidating the structural determinants of protein-protein association in general and will be of significance to protein chemists and structural biologists.
Collapse
Affiliation(s)
- K. V. Brinda
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
| | - Avadhesha Surolia
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
- Correspondence can be addressed to either of these authors (email or )
| | - Sarawathi Vishveshwara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India 560012
- Correspondence can be addressed to either of these authors (email or )
| |
Collapse
|
25
|
Altun G, Zhong W, Pan Y, Tai PC, Harrison RW. A new seed selection algorithm that maximizes local structural similarity in proteins. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2006; 2006:5822-5825. [PMID: 17946336 DOI: 10.1109/iembs.2006.259338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The PHI-BLAST algorithm for protein sequence alignment takes a query sequence and searches a protein database for a small seed or region of high similarity and extends this alignment to produce the total alignment for sequences. Clearly, the success of this approach depends on the quality of the seeds. We propose an algorithm that maximizes the likelihood of seeds sharing the same local structure in both the query and known sequences. This was tested on the 2290 protein sequences in the PISCES database. Our new algorithm results in an effective a priori estimate of seed structural quality.
Collapse
|
26
|
Abstract
The correspondence between biology and linguistics at the level of sequence and lexical inventories, and of structure and syntax, has fuelled attempts to describe genome structure by the rules of formal linguistics. But how can we define protein linguistic rules? And how could compositional semantics improve our understanding of protein organization and functional plasticity?
Collapse
Affiliation(s)
- Mario Gimona
- Consorzio Mario Negri Sud, Marie Curie Unit of Actin Cytoskeleton Regulation, Department of Cell Biology and Oncology, Via Nazionale 8A, 66030 Santa Maria Imbaro, Italy.
| |
Collapse
|
27
|
Rosselló F, Valiente G. Graph Transformation in Molecular Biology. FORMAL METHODS IN SOFTWARE AND SYSTEMS MODELING 2005. [DOI: 10.1007/978-3-540-31847-7_7] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
28
|
Abstract
Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to that of the genome as the 'book of life'. But do these analogies go deeper and, if so, can the methods developed for analysing languages be applied to molecular biology? In fact, many techniques used in bioinformatics, even if developed independently, may be seen to be grounded in linguistics. Further interweaving of these fields will be instrumental in extending our understanding of the language of life.
Collapse
Affiliation(s)
- David B Searls
- Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals, King of Prussia, PA 19406, USA.
| |
Collapse
|
29
|
Abstract
A conceptual framework for understanding the protein folding problem has remained elusive in spite of many significant advances. We show that geometrical constraints imposed by chain connectivity, compactness, and the avoidance of steric clashes can be encompassed in a natural way using a three-body potential and lead to a selection in structure space, independent of chemical details. Strikingly, secondary motifs such as hairpins, sheets, and helices, which are the building blocks of protein folds, emerge as the chosen structures for segments of the protein backbone based just on elementary geometrical considerations.
Collapse
Affiliation(s)
- Jayanth R Banavar
- Department of Physics, 104 Davey Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| | | | | | | |
Collapse
|