1
|
Sánchez IE, Galpern EA, Garibaldi MM, Ferreiro DU. Molecular Information Theory Meets Protein Folding. J Phys Chem B 2022; 126:8655-8668. [PMID: 36282961 DOI: 10.1021/acs.jpcb.2c04532] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold ∼2.2 ± 0.3 bits/(site·operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human-built macroscopic machines. We propose a simple mapping between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy, and the energetics of protein folding.
Collapse
Affiliation(s)
- Ignacio E Sánchez
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Ezequiel A Galpern
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Martín M Garibaldi
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Diego U Ferreiro
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| |
Collapse
|
2
|
Konjevoda P, Štambuk N. Relational model of the standard genetic code. Biosystems 2021; 210:104529. [PMID: 34464669 DOI: 10.1016/j.biosystems.2021.104529] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/26/2021] [Accepted: 08/27/2021] [Indexed: 11/28/2022]
Abstract
The genetic code is a set of rules that establishes mapping between triplets in messenger RNA and amino acids in proteins. The most common way to display these rules is the Standard Genetic Code (SGC) table. This paper takes an alternative approach, based on the relational data model by Edgar F. Codd (Commun. ACM, 13:377-387, 1970). The relational model (RM) proposes a distributed storage of data into a collection of tables (called relations), that can be connected by shared communality. Basic elements of the table are rows (called records or tuples), and columns (called fields or attributes). The SGC table, according to the relational data model, represents the so called unnormalized form of a table. Using normalization rules it is possible to subdivide the SGC table into four tables. The rows and columns of single tables are defined by the first and second base and individual tables by the third codon base. The result of this model is an approach to managing genetic code data, represented in terms of tuples and grouped into relations, with table structure and language consistent with first-order (predicate) logic. The RM explains that the final step in the development of the SGC was the adoption of coding function by the third base, which makes an informational/functional unit with the first base, despite the different physical location in a triplet. This enabled the synthesis of specific proteins without ambiguity, in accordance with the concept of ambiguity reduction and five phases of the general model on the origin of biological codes by Marcello Barbieri (BioSystems 181:11-19, 2019).
Collapse
Affiliation(s)
- Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia.
| | - Nikola Štambuk
- Center for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000 Zagreb, Croatia.
| |
Collapse
|
3
|
Plasma methionine metabolic profile is associated with longevity in mammals. Commun Biol 2021; 4:725. [PMID: 34117367 PMCID: PMC8196171 DOI: 10.1038/s42003-021-02254-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 05/20/2021] [Indexed: 01/28/2023] Open
Abstract
Methionine metabolism arises as a key target to elucidate the molecular adaptations underlying animal longevity due to the negative association between longevity and methionine content. The present study follows a comparative approach to analyse plasma methionine metabolic profile using a LC-MS/MS platform from 11 mammalian species with a longevity ranging from 3.5 to 120 years. Our findings demonstrate the existence of a species-specific plasma profile for methionine metabolism associated with longevity characterised by: i) reduced methionine, cystathionine and choline; ii) increased non-polar amino acids; iii) reduced succinate and malate; and iv) increased carnitine. Our results support the existence of plasma longevity features that might respond to an optimised energetic metabolism and intracellular structures found in long-lived species. Mota-Martorell and colleagues use a comparative metabolomics approach to examine plasma metabolite levels associated with methionine metabolism in 11 mammalian species. They identify species specific plasma profiles indicative of a link between lifetime longevity and methionine metabolism.
Collapse
|
4
|
Hilburg SL, Ruan Z, Xu T, Alexander-Katz A. Behavior of Protein-Inspired Synthetic Random Heteropolymers. Macromolecules 2020. [DOI: 10.1021/acs.macromol.0c01886] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Shayna L. Hilburg
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zhiyuan Ruan
- Department of Materials Science & Engineering, University of California Berkeley, Berkeley, California 94720, United States
| | - Ting Xu
- Department of Materials Science & Engineering, University of California Berkeley, Berkeley, California 94720, United States
- Department of Chemistry, University of California Berkeley, Berkeley, California 94720, United States
- Tsinghua−Berkeley Shenzhen Institute, University of California Berkeley, Berkeley, California 94720, United States
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Alfredo Alexander-Katz
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Kimura M, Akanuma S. Reconstruction and Characterization of Thermally Stable and Catalytically Active Proteins Comprising an Alphabet of ~ 13 Amino Acids. J Mol Evol 2020; 88:372-381. [PMID: 32201904 DOI: 10.1007/s00239-020-09938-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 03/11/2020] [Indexed: 10/24/2022]
Abstract
While extant organisms synthesize proteins using approximately 20 kinds of genetically coded amino acids, the earliest protein synthesis system is likely to have been much simpler, utilizing a reduced set of amino acids. However, which types of building blocks were involved in primordial protein synthesis remains unclear. Herein, we reconstructed three convergent sequences of an ancestral nucleoside diphosphate kinase, each comprising a 10 amino acid "alphabet," and found that two of these variants folded into soluble and stable tertiary structures. Therefore, an alphabet consisting of 10 amino acids contains sufficient information for creating stable proteins. Furthermore, re-incorporation of a few more amino acid types into the active site of the 10 amino acid variants improved the catalytic activity, although the specific activity was not as high as that of extant proteins. Collectively, our results provide experimental support for the idea that robust protein scaffolds can be built with a subset of the current 20 amino acids that might have existed abundantly in the prebiotic environment, while the other amino acids, especially those with functional sidechains, evolved to contribute to efficient enzyme catalysis.
Collapse
Affiliation(s)
- Madoka Kimura
- Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, Saitama, 359-1192, Japan
| | - Satoshi Akanuma
- Faculty of Human Sciences, Waseda University, 2-579-15 Mikajima, Tokorozawa, Saitama, 359-1192, Japan.
| |
Collapse
|
6
|
Newton MS, Morrone DJ, Lee KH, Seelig B. Genetic Code Evolution Investigated through the Synthesis and Characterisation of Proteins from Reduced-Alphabet Libraries. Chembiochem 2019; 20:846-856. [PMID: 30511381 DOI: 10.1002/cbic.201800668] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Indexed: 11/08/2022]
Abstract
The universal genetic code of 20 amino acids is the product of evolution. It is believed that earlier versions of the code had fewer residues. Many theories for the order in which amino acids were integrated into the code have been proposed, considering factors ranging from prebiotic chemistry to codon capture. Several meta-analyses combined these theories to yield a feasible consensus chronology of the genetic code's evolution, but there is a dearth of experimental data to test the hypothesised order. We used combinatorial chemistry to synthesise libraries of random polypeptides that were based on different subsets of the 20 standard amino acids, thus representing different stages of a plausible history of the alphabet. Four libraries were comprised of the five, nine, and 16 most ancient amino acids, and all 20 extant residues for a direct side-by-side comparison. We characterised numerous variants from each library for their solubility and propensity to form secondary, tertiary or quaternary structures. Proteins from the two most ancient libraries were more likely to be soluble than those from the extant library. Several individual protein variants exhibited inducible protein folding and other traits typical of intrinsically disordered proteins. From these libraries, we can infer how primordial protein structure and function might have evolved with the genetic code.
Collapse
Affiliation(s)
- Matilda S Newton
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA.,BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, 140 Gortner Laboratory, St. Paul, MN, 55108-6106, USA
| | - Dana J Morrone
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA.,BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, 140 Gortner Laboratory, St. Paul, MN, 55108-6106, USA
| | - Kun-Hwa Lee
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA.,BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, 140 Gortner Laboratory, St. Paul, MN, 55108-6106, USA
| | - Burckhard Seelig
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA.,BioTechnology Institute, University of Minnesota, 1479 Gortner Avenue, 140 Gortner Laboratory, St. Paul, MN, 55108-6106, USA
| |
Collapse
|
7
|
Vitas M, Dobovišek A. In the Beginning was a Mutualism - On the Origin of Translation. ORIGINS LIFE EVOL B 2018; 48:223-243. [PMID: 29713988 DOI: 10.1007/s11084-018-9557-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2017] [Accepted: 04/23/2018] [Indexed: 12/28/2022]
Abstract
The origin of translation is critical for understanding the evolution of life, including the origins of life. The canonical genetic code is one of the most dominant aspects of life on this planet, while the origin of heredity is one of the key evolutionary transitions in living world. Why the translation apparatus evolved is one of the enduring mysteries of molecular biology. Assuming the hypothesis, that during the emergence of life evolution had to first involve autocatalytic systems which only subsequently acquired the capacity of genetic heredity, we propose and discuss possible mechanisms, basic aspects of the emergence and subsequent molecular evolution of translation and ribosomes, as well as enzymes as we know them today. It is possible, in this sense, to view the ribosome as a digital-to-analogue information converter. The proposed mechanism is based on the abilities and tendencies of short RNA and polypeptides to fold and to catalyse biochemical reactions. The proposed mechanism is in concordance with the hypothesis of a possible chemical co-evolution of RNA and proteins in the origin of the genetic code or even more generally at the early evolution of life on Earth. The possible abundance and availability of monomers at prebiotic conditions are considered in the mechanism. The hypothesis that early polypeptides were folding on the RNA scaffold is also considered and mutualism in molecular evolutionary development of RNA and peptides is favoured.
Collapse
Affiliation(s)
- Marko Vitas
- , Laze pri Borovnici 38, Borovnica, Slovenia.
| | - Andrej Dobovišek
- Faculty of Natural Sciences and Mathematics, University of Maribor, Koroška cesta 160, 2000, Maribor, Slovenia
| |
Collapse
|
8
|
Granold M, Hajieva P, Toşa MI, Irimie FD, Moosmann B. Modern diversification of the amino acid repertoire driven by oxygen. Proc Natl Acad Sci U S A 2018; 115:41-46. [PMID: 29259120 PMCID: PMC5776824 DOI: 10.1073/pnas.1717100115] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
All extant life employs the same 20 amino acids for protein biosynthesis. Studies on the number of amino acids necessary to produce a foldable and catalytically active polypeptide have shown that a basis set of 7-13 amino acids is sufficient to build major structural elements of modern proteins. Hence, the reasons for the evolutionary selection of the current 20 amino acids out of a much larger available pool have remained elusive. Here, we have analyzed the quantum chemistry of all proteinogenic and various prebiotic amino acids. We find that the energetic HOMO-LUMO gap, a correlate of chemical reactivity, becomes incrementally closer in modern amino acids, reaching the level of specialized redox cofactors in the late amino acids tryptophan and selenocysteine. We show that the arising prediction of a higher reactivity of the more recently added amino acids is correct as regards various free radicals, particularly oxygen-derived peroxyl radicals. Moreover, we demonstrate an immediate survival benefit conferred by the enhanced redox reactivity of the modern amino acids tyrosine and tryptophan in oxidatively stressed cells. Our data indicate that in demanding building blocks with more versatile redox chemistry, biospheric molecular oxygen triggered the selective fixation of the last amino acids in the genetic code. Thus, functional rather than structural amino acid properties were decisive during the finalization of the universal genetic code.
Collapse
Affiliation(s)
- Matthias Granold
- Evolutionary Biochemistry and Redox Medicine, Institute for Pathobiochemistry, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany
| | - Parvana Hajieva
- Cellular Adaptation Group, Institute for Pathobiochemistry, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany
| | - Monica Ioana Toşa
- Group of Biocatalysis and Biotransformations, Faculty of Chemistry and Chemical Engineering, Babeş-Bolyai University, Cluj-Napoca 400028, Romania
| | - Florin-Dan Irimie
- Group of Biocatalysis and Biotransformations, Faculty of Chemistry and Chemical Engineering, Babeş-Bolyai University, Cluj-Napoca 400028, Romania
| | - Bernd Moosmann
- Evolutionary Biochemistry and Redox Medicine, Institute for Pathobiochemistry, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany;
| |
Collapse
|
9
|
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]
Abstract
Study of the hierarchy of domain structure with alternative sets of domains and analysis of discontinuous domains, consisting of remote segments of the polypeptide chain, raised a question about the minimal structural unit of the protein domain. The hypothesis on the decisive role of the polypeptide backbone in determining the elementary units of globular proteins have led to the discovery of closed loops. It is reviewed here how closed loops form the loop-n-lock structure of proteins, providing the foundation for stability and designability of protein folds/domain and underlying their co-translational folding. Simplified protein sequences are considered here with the aim to explore the basic principles that presumably dominated the folding and stability of proteins in the early stages of structural evolution. Elementary functional loops (EFLs), closed loops with one or few catalytic residues, are, in turn, units of the protein function. They are apparent descendants of the prebiotic ring-like peptides, which gave rise to the first functional folds/domains being fused in the beginning of the evolution of protein structure. It is also shown how evolutionary relations between protein functional superfamilies and folds delineated with the help of EFLs can contribute to establishing the rules for design of desired enzymatic functions. Generalized descriptors of the elementary functions are proposed to be used as basic units in the future computational design.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Zejun Zheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| |
Collapse
|
10
|
Solis AD. Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins. Proteins 2015; 83:2198-216. [DOI: 10.1002/prot.24936] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 09/04/2015] [Accepted: 09/04/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Armando D. Solis
- Biological Sciences Department, New York City College of Technology; the City University of New York (CUNY); Brooklyn New York 11201
| |
Collapse
|
11
|
Influent Fractionation for Modeling Continuous Anaerobic Digestion Processes. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2015; 151:137-69. [DOI: 10.1007/978-3-319-21993-6_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
12
|
Huang JT, Wang T, Huang SR, Li X. Prediction of protein folding rates from simplified secondary structure alphabet. J Theor Biol 2015; 383:1-6. [PMID: 26247139 DOI: 10.1016/j.jtbi.2015.07.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Revised: 06/20/2015] [Accepted: 07/23/2015] [Indexed: 10/23/2022]
Abstract
Protein folding is a very complicated and highly cooperative dynamic process. However, the folding kinetics is likely to depend more on a few key structural features. Here we find that secondary structures can determine folding rates of only large, multi-state folding proteins and fails to predict those for small, two-state proteins. The importance of secondary structures for protein folding is ordered as: extended β strand > α helix > bend > turn > undefined secondary structure>310 helix > isolated β strand > π helix. Only the first three secondary structures, extended β strand, α helix and bend, can achieve a good correlation with folding rates. This suggests that the rate-limiting step of protein folding would depend upon the formation of regular secondary structures and the buckling of chain. The reduced secondary structure alphabet provides a simplified description for the machine learning applications in protein design.
Collapse
Affiliation(s)
- Jitao T Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China.
| | - Titi Wang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | - Shanran R Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | - Xin Li
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| |
Collapse
|
13
|
Huang JT, Wang T, Huang SR, Li X. Reduced alphabet for protein folding prediction. Proteins 2015; 83:631-9. [PMID: 25641420 DOI: 10.1002/prot.24762] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/07/2014] [Accepted: 12/21/2014] [Indexed: 01/17/2023]
Abstract
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design.
Collapse
Affiliation(s)
- Jitao T Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin, 300071, People's Republic of China
| | | | | | | |
Collapse
|
14
|
Ferrada E. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets. PLoS Comput Biol 2014; 10:e1003946. [PMID: 25473967 PMCID: PMC4256021 DOI: 10.1371/journal.pcbi.1003946] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2014] [Accepted: 09/26/2014] [Indexed: 11/19/2022] Open
Abstract
The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet.
Collapse
Affiliation(s)
- Evandro Ferrada
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
15
|
Rouch DA. Evolution of the first genetic cells and the universal genetic code: a hypothesis based on macromolecular coevolution of RNA and proteins. J Theor Biol 2014; 357:220-44. [PMID: 24931677 DOI: 10.1016/j.jtbi.2014.06.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 06/02/2014] [Accepted: 06/03/2014] [Indexed: 11/19/2022]
Abstract
A qualitative hypothesis based on coevolution of protein and nucleic acid macromolecules was developed to explain the evolution of the first genetic cells, from the likely organic chemical-rich environment of early earth, through to the Last Universal Common Ancestor (LUCA). The evolution of the first genetic cell was divided into three phases, proto-genetic cells I, II and III, and the transition to each milestone is described, based on development of chemical cross-catalysis, bio-cross-catalysis, and the universal genetic code, respectively. Selection of macromolecular properties of both peptides and nucleic acids, in response to environmental factors, was likely to be a key aspect of early evolution. The development of hereditable nucleic acids with various key functions; translation, transcription and replication, is described. These functions are envisaged to have coevolved with protein enzymes, from simple organic precursors. Genetically heritable nucleotides may have developed after the local earth environment had cooled below 63 °C. Around this temperature G-C bases would have been preferentially utilized for nucleotide synthesis. Under these conditions RNA type nucleotides were then likely selected from a range of different types of nucleotide backbones through template-based synthesis. Initial development of the genetic coding system was simplified by the availability of proto-messenger RNA sequences that contained only G and C bases, and the need to encode only four amino acids. The step-wise addition of further amino acids to the code was predicted to parallel the growing metabolic complexity of the proto-genetic cell. On completion of this evolutionary process the proto-genetic cell is envisaged to have become the LUCA, the last common ancestor of bacteria, eukaryote and archaea domains. Key issues addressed by the model include: (a) the transition from non-hereditable random sequences of peptides and nucleic acids to specific proteins coded by hereditable nucleotide sequences, (b) the origin of homochiral amino acids and sugars, and (c) the mutation limits on the sizes of early nucleic acid genomes. The first genome was limited to a size of about 200 base pairs.
Collapse
Affiliation(s)
- Duncan A Rouch
- Biotechnology and Environmental Biology, RMIT University, PO Box 71, Bundoora, Melbourne, Vic 3083, Australia.
| |
Collapse
|
16
|
Truong HH, Kim BL, Schafer NP, Wolynes PG. Funneling and frustration in the energy landscapes of some designed and simplified proteins. J Chem Phys 2013; 139:121908. [PMID: 24089720 PMCID: PMC3732306 DOI: 10.1063/1.4813504] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 06/26/2013] [Indexed: 11/15/2022] Open
Abstract
We explore the similarities and differences between the energy landscapes of proteins that have been selected by nature and those of some proteins designed by humans. Natural proteins have evolved to function as well as fold, and this is a source of energetic frustration. The sequence of Top7, on the other hand, was designed with architecture alone in mind using only native state stability as the optimization criterion. Its topology had not previously been observed in nature. Experimental studies show that the folding kinetics of Top7 is more complex than the kinetics of folding of otherwise comparable naturally occurring proteins. In this paper, we use structure prediction tools, frustration analysis, and free energy profiles to illustrate the folding landscapes of Top7 and two other proteins designed by Takada. We use both perfectly funneled (structure-based) and predictive (transferable) models to gain insight into the role of topological versus energetic frustration in these systems and show how they differ from those found for natural proteins. We also study how robust the folding of these designs would be to the simplification of the sequences using fewer amino acid types. Simplification using a five amino acid type code results in comparable quality of structure prediction to the full sequence in some cases, while the two-letter simplification scheme dramatically reduces the quality of structure prediction.
Collapse
Affiliation(s)
- Ha H Truong
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
| | | | | | | |
Collapse
|
17
|
Stephenson JD, Freeland SJ. Unearthing the root of amino acid similarity. J Mol Evol 2013; 77:159-69. [PMID: 23743923 PMCID: PMC6763418 DOI: 10.1007/s00239-013-9565-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2013] [Accepted: 05/08/2013] [Indexed: 12/31/2022]
Abstract
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.
Collapse
Affiliation(s)
- James D Stephenson
- NASA Astrobiology Institute, University of Hawaii, Honolulu, HI, 96822, USA,
| | | |
Collapse
|
18
|
Longo LM, Lee J, Blaber M. Simplified protein design biased for prebiotic amino acids yields a foldable, halophilic protein. Proc Natl Acad Sci U S A 2013; 110:2135-9. [PMID: 23341608 PMCID: PMC3568330 DOI: 10.1073/pnas.1219530110] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A compendium of different types of abiotic chemical syntheses identifies a consensus set of 10 "prebiotic" α-amino acids. Before the emergence of biosynthetic pathways, this set is the most plausible resource for protein formation (i.e., proteogenesis) within the overall process of abiogenesis. An essential unsolved question regarding this prebiotic set is whether it defines a "foldable set"--that is, does it contain sufficient chemical information to permit cooperatively folding polypeptides? If so, what (if any) characteristic properties might such polypeptides exhibit? To investigate these questions, two "primitive" versions of an extant protein fold (the β-trefoil) were produced by top-down symmetric deconstruction, resulting in a reduced alphabet size of 12 or 13 amino acids and a percentage of prebiotic amino acids approaching 80%. These proteins show a substantial acidification of pI and require high salt concentrations for cooperative folding. The results suggest that the prebiotic amino acids do comprise a foldable set within the halophile environment.
Collapse
Affiliation(s)
- Liam M. Longo
- Department of Biomedical Sciences, Florida State University, Tallahassee, FL 32306-4300
| | | | - Michael Blaber
- Department of Biomedical Sciences, Florida State University, Tallahassee, FL 32306-4300
| |
Collapse
|
19
|
Longo LM, Blaber M. Protein design at the interface of the pre-biotic and biotic worlds. Arch Biochem Biophys 2012; 526:16-21. [DOI: 10.1016/j.abb.2012.06.009] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 06/23/2012] [Indexed: 12/01/2022]
|
20
|
Narasimhan SL, Rajarajan AK, Vardharaj L. HP-sequence design for lattice proteins—An exact enumeration study on diamond as well as square lattice. J Chem Phys 2012; 137:115102. [DOI: 10.1063/1.4752479] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
21
|
Liu X, Zhao YP. Substitution matrices of residue triplets derived from protein blocks. J Comput Biol 2011; 17:1679-87. [PMID: 21128854 DOI: 10.1089/cmb.2008.0035] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In protein sequence alignment, residue similarity is usually evaluated by substitution matrix, which scores all possible exchanges of one amino acid with another. Several matrices are widely used in sequence alignment, including PAM matrices derived from homologous sequence and BLOSUM matrices derived from aligned segments of BLOCKS. However, most matrices have not addressed the high-order residue-residue interactions that are vital to the bio-properties of protein. With consideration for the inherent correlation in residue triplet, we present a new scoring scheme for sequence alignment. Protein sequence is treated as overlapping and successive 3-residue segments. Two edge residues of a triplet are clustered into hydrophobic or polar categories, respectively. Protein sequence is then rewritten into triplet sequence with 2 x 20 x 2 = 80 alphabets. Using a traditional approach, we construct a new scoring scheme named TLESUM(hp) (TripLEt SUbstitution Matrices with hydrophobic and polar information) for pairwise substitution of triplets, which characterizes the similarity of residue triplets. The applications of this matrix led to marked improvements in multiple sequence alignment and in searching structurally alike residue segments. The reason for the occurrence of the "twilight zone," i.e., structure explosion of low identity sequences, is also discussed.
Collapse
Affiliation(s)
- Xin Liu
- State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing, China
| | | |
Collapse
|
22
|
Wang J, Cao Z, Yu J. Protein Structures-based Neighborhood Analysis vs Preferential Interactions Between the Special Pairs of Amino acids? J Biomol Struct Dyn 2011; 28:629-32; discussion 669-674. [DOI: 10.1080/073911011010524968] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
23
|
Structural characterization of a beta-turn mimic within a protein-protein interface. Proc Natl Acad Sci U S A 2010; 107:18336-41. [PMID: 20937907 DOI: 10.1073/pnas.1004187107] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
β-Turns are secondary structure elements not only exposed on protein surfaces, but also frequently found to be buried in protein-protein interfaces. Protein engineering so far considered mainly the backbone-constraining properties of synthetic β-turn mimics as parts of surface-exposed loops. A β-turn mimic, Hot═Tap, that is available in gram amounts, provides two hydroxyl groups that enhance its turn-inducing properties besides being able to form side-chain-like interactions. NMR studies on cyclic hexapeptides harboring the Hot═Tap dipeptide proved its strong β-turn-inducing capability. Crystallographic analyses of the trimeric fibritin-foldon/Hot═Tap hybrid reveal at atomic resolution how Hot═Tap replaces a βI'-turn by a βII'-type structure. Furthermore, Hot═Tap adapts to the complex protein environment by participating in several direct and water-bridged interactions across the foldon trimer interface. As building blocks, β-turn mimics capable of both backbone and side-chain mimicry may simplify the design of synthetic proteins.
Collapse
|
24
|
Liu X, Zhao YP. A scheme for multiple sequence alignment optimization--an improvement based on family representative mechanics features. J Theor Biol 2009; 261:593-7. [PMID: 19733185 DOI: 10.1016/j.jtbi.2009.08.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Revised: 08/26/2009] [Accepted: 08/26/2009] [Indexed: 10/20/2022]
Abstract
As a basic tool of modern biology, sequence alignment can provide us useful information in fold, function, and active site of protein. For many cases, the increased quality of sequence alignment means a better performance. The motivation of present work is to increase ability of the existing scoring scheme/algorithm by considering residue-residue correlations better. Based on a coarse-grained approach, the hydrophobic force between each pair of residues is written out from protein sequence. It results in the construction of an intramolecular hydrophobic force network that describes the whole residue-residue interactions of each protein molecule, and characterizes protein's biological properties in the hydrophobic aspect. A former work has suggested that such network can characterize the top weighted feature regarding hydrophobicity. Moreover, for each homologous protein of a family, the corresponding network shares some common and representative family characters that eventually govern the conservation of biological properties during protein evolution. In present work, we score such family representative characters of a protein by the deviation of its intramolecular hydrophobic force network from that of background. Such score can assist the existing scoring schemes/algorithms, and boost up the ability of multiple sequences alignment, e.g. achieving a prominent increase (approximately 50%) in searching the structurally alike residue segments at a low identity level. As the theoretical basis is different, the present scheme can assist most existing algorithms, and improve their efficiency remarkably.
Collapse
Affiliation(s)
- Xin Liu
- The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, No. 15 Beisihuanxi Road, Beijing 100190, China.
| | | |
Collapse
|
25
|
Peterson EL, Kondev J, Theriot JA, Phillips R. Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. ACTA ACUST UNITED AC 2009; 25:1356-62. [PMID: 19351620 DOI: 10.1093/bioinformatics/btp164] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by comparing protein sequences written in a reduced alphabet. RESULTS We tested over 150 of the amino acid clustering schemes proposed in the literature with all-versus-all pairwise sequence alignments of sequences in the Distance mAtrix aLIgnment database. We combined several metrics from information retrieval popular in the literature: mean precision, area under the Receiver Operating Characteristic curve and recall at a fixed error rate and found that, in contrast to previous work, reduced alphabets in many cases outperform full alphabets. We find that reduced alphabets can perform at a level comparable to full alphabets in correct pairwise alignment of sequences and can show increased sensitivity to pairs of sequences with structural similarity but low-sequence identity. Based on these results, we hypothesize that reduced alphabets may also show performance gains with more sophisticated methods such as profile and pattern searches. AVAILABILITY A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from (http://www.rpgroup.caltech.edu/publications/supplements/alphabets).
Collapse
Affiliation(s)
- Eric L Peterson
- Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | |
Collapse
|
26
|
Clemente JC, Ikeo K, Valiente G, Gojobori T. Optimized ancestral state reconstruction using Sankoff parsimony. BMC Bioinformatics 2009; 10:51. [PMID: 19200389 PMCID: PMC2677398 DOI: 10.1186/1471-2105-10-51] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 02/07/2009] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n(2)) in the number of states, making it impractical for large values of n. RESULTS In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1alpha and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation. CONCLUSION The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.
Collapse
Affiliation(s)
- José C Clemente
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | - Kazuho Ikeo
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| | | | - Takashi Gojobori
- Center for Information Biology and DNA Databank of Japan, National Institute of Genetics, Yata 1111, Mishima, Japan
| |
Collapse
|
27
|
Abstract
Both supervised and unsupervised neural networks have been applied to the prediction of protein structure and function. Here, we focus on feedforward neural networks and describe how these learning machines can be applied to protein prediction. We discuss how to select an appropriate data set, how to choose and encode protein features into the neural network input, and how to assess the predictor's performance.
Collapse
Affiliation(s)
- Marco Punta
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | | |
Collapse
|
28
|
Rahaman H, Khan KA, Hassan I, Wahid M, Singh SB, Singh TP, Moosavi-Movahedi AA, Ahmad F. Sequence and stability of the goat cytochrome c. Biophys Chem 2008; 138:23-8. [DOI: 10.1016/j.bpc.2008.08.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2008] [Revised: 08/20/2008] [Accepted: 08/22/2008] [Indexed: 10/21/2022]
|
29
|
Abstract
Identification and Classification of G-protein coupled receptors (GPCRs) using protein sequences is an important computational challenge, given that experimental screening of thousands of ligands is an expensive proposition. There are two distinct but complementary approaches to GPCR classification --machine learning and sequence motif analysis. Machine learning methodologies typically suffer from problems of class imbalance and lack of multi-class classification. Many sequence motif methods, meanwhile, are too dependent on the similarity of the primary sequence alignments. It is desirable to have a motif discovery and application methodology that is not strongly dependent on primary sequence similarity. It should also overcome limitations of machine learning. We propose and evaluate the effectiveness of a simple methodology that uses a reduced protein functional alphabet representation, where similar functional residues have similar symbols. Regular expression motifs can then be obtained by ClustalW based multiple sequence alignment, using an identity matrix. Since evolutionary matrices like BLOSUM, PAM are not used, this method can be useful for any set of sequences that do not necessarily share a common ancestry. Reduced alphabet motifs can accurately classify known GPCR proteins and the results are comparable to PRINTS and PROSITE. For well known GPCR proteins from SWISSPROT, there were no false negatives and only a few false positives. This methodology covers most currently known classes of GPCRs, even if there are very few representative sequences. It also predicts more than one class for certain sequences, thus overcoming the limitation of machine learning methods. We also annotated, 695 orphan receptors, and 121 were identified as belonging to Family A. A simple JavaScript based web interface has been developed to predict GPCR families and subfamilies (www.insilico-consulting.com/gpcrmotif.html).
Collapse
Affiliation(s)
- Rajeev Gangal
- Insilico Consulting, 402, Citi Centre, 39/2, Erandwane, Karve Road, Pune, Maharashtra, India
| | | |
Collapse
|
30
|
Hu J, Yan C. HMM_RA: an improved method for alpha-helical transmembrane protein topology prediction. Bioinform Biol Insights 2008; 2:67-74. [PMID: 19812766 PMCID: PMC2735969 DOI: 10.4137/bbi.s358] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
alpha-helical transmembrane (TM) proteins play important and diverse functional roles in cells. The ability to predict the topology of these proteins is important for identifying functional sites and inferring function of membrane proteins. This paper presents a Hidden Markov Model (referred to as HMM_RA) that can predict the topology of alpha-helical transmembrane proteins with improved performance. HMM_RA adopts the same structure as the HMMTOP method, which has five modules: inside loop, inside helix tail, membrane helix, outside helix tail and outside loop. Each module consists of one or multiple states. HMM_RA allows using reduced alphabets to encode protein sequences. Thus, each state of HMM_RA is associated with n emission probabilities, where n is the size of the reduced alphabet set. Direct comparisons using two standard data sets show that HMM_RA consistently outperforms HMMTOP and TMHMM in topology prediction. Specifically, on a high-quality data set of 83 proteins, HMM_RA outperforms HMMTOP by up to 7.6% in topology accuracy and 6.4% in alpha-helices location accuracy. On the same data set, HMM_RA outperforms TMHMM by up to 6.4% in topology accuracy and 2.9% in location accuracy. Comparison also shows that HMM_RA achieves comparable performance as Phobius, a recently published method.
Collapse
Affiliation(s)
- Jing Hu
- Department of Computer Science, Utah State University, Logan, UT 84322 U.S.A
| | - Changhui Yan
- Department of Computer Science, Utah State University, Logan, UT 84322 U.S.A
| |
Collapse
|
31
|
Buchete NV, Straub JE, Thirumalai D. Dissecting contact potentials for proteins: relative contributions of individual amino acids. Proteins 2008; 70:119-30. [PMID: 17640067 DOI: 10.1002/prot.21538] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge-based contact potentials are routinely used in fold recognition, binding of peptides to proteins, structure prediction, and coarse-grained models to probe protein folding kinetics. The dominant physical forces embodied in the contact potentials are revealed by eigenvalue analysis of the matrices, whose elements describe the strengths of interaction between amino acid side chains. We propose a general method to rank quantitatively the importance of various inter-residue interactions represented in the currently popular pair contact potentials. Eigenvalue analysis and correlation diagrams are used to rank the inter-residue pair interactions with respect to the magnitude of their relative contributions to the contact potentials. The amino acid ranking is shown to be consistent with a mean field approximation that is used to reconstruct the original contact potentials from the most relevant amino acids for several contact potentials. By providing a general, relative ranking score for amino acids, this method permits a detailed, quantitative comparison of various contact interaction schemes. For most contact potentials, between 7 and 9 amino acids of varying chemical character are needed to accurately reconstruct the full matrix. By correlating the identified important amino acid residues in contact potentials and analysis of about 7800 structural domains in the CATH database we predict that it is important to model accurately interactions between small hydrophobic residues. In addition, only potentials that take interactions involving the protein backbone into account can predict dense packing in protein structures.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, USA.
| | | | | |
Collapse
|
32
|
Constraining protein sequence space: four amino acid alphabets are sufficient to recapitulate lambda repressor multimerization. J Mol Biol 2007; 374:399-410. [PMID: 17931656 DOI: 10.1016/j.jmb.2007.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2007] [Revised: 08/11/2007] [Accepted: 09/06/2007] [Indexed: 11/21/2022]
Abstract
Nucleic acid polymers selected from random sequence space constitute an enormous array of catalytic, diagnostic and therapeutic molecules. Despite the fact that proteins are robust polymers with far greater chemical and physical diversity, success in unlocking protein sequence space remains elusive. We have devised a combinatorial strategy for accessing nucleic acid sequence space corresponding to proteins comprising selected amino acid alphabets. Using the SynthOMIC approach (synthesis of ORFs by multimerizing in-frame codons), representative libraries comprising four amino acid alphabets were fused in-frame to the lambda repressor DNA-binding domain to provide an in vivo selection for self-interacting proteins that re-constitute lambda repressor function. The frequency of self-interactors as a function of amino acid composition ranged over five orders of magnitude, from approximately 6% of clones in a library comprising the amino acid residues LARE to approximately 0.6 in 10(6) in the MASH library. Sequence motifs were evident by inspection in many cases, and individual clones from each library presented substantial sequence identity with translated proteins by BLAST analysis. We posit that the SynthOMIC approach represents a powerful strategy for creating combinatorial libraries of open reading frames that distils protein sequence space on the basis of three inherent properties: it supports the use of selected amino acid alphabets, eliminates redundant sequences and locally constrains amino acids.
Collapse
|
33
|
Luthra A, Jha AN, Ananthasuresh GK, Vishveswara S. A method for computing the inter-residue interaction potentials for reduced amino acid alphabet. J Biosci 2007; 32:883-9. [PMID: 17914230 DOI: 10.1007/s12038-007-0088-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Inter-residue potentials are extensively used in the design and evaluation of protein structures. However,dealing with all (20 x 20) interactions becomes computationally difficult in extensive investigations. Hence, it is desirable to reduce the alphabet of 20 amino acids to a smaller number. Currently, several methods of reducing the residue types exist; however a critical assessment of these methods is not available. Towards this goal,here we review and evaluate different methods by comparing with the complete (20 x 20) matrix of Miyazawa-Jernigan potential, including a method of grouping adopted by us, based on multi dimensional scaling (MDS). The second goal of this paper is the computation of inter-residue interaction energies for the reduced amino acid alphabet, which has not been explicitly addressed in the literature until now. By using a least squares technique, we present a systematic method of obtaining the interaction energy values for any type of grouping scheme that reduces the amino acid alphabet. This can be valuable in designing the protein structures.
Collapse
Affiliation(s)
- Abhinav Luthra
- Department of Biotechnology, Indian Institute of Technology-Guwahati, Guwahati 781 039, India
| | | | | | | |
Collapse
|
34
|
Li J, Wang W. Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids. ACTA ACUST UNITED AC 2007; 50:392-402. [PMID: 17609897 DOI: 10.1007/s11427-007-0023-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2006] [Accepted: 09/19/2006] [Indexed: 10/23/2022]
Abstract
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.
Collapse
Affiliation(s)
- Jing Li
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing, 210093, China
| | | |
Collapse
|
35
|
Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 36:1059-69. [PMID: 17565494 DOI: 10.1007/s00249-007-0188-5] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Revised: 05/05/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Collapse
Affiliation(s)
- C Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Denis DIDEROT, Paris 7, case 7113, 2, place Jussieu, 75251, Paris, France
| | | | | | | | | |
Collapse
|
36
|
Abstract
Human tissue-specific genes were reported to be longer than housekeeping genes (both in coding and intronic parts). The competing neutralist and adaptationist models were proposed to explain this observation. Here I show that in human genome the longest are genes with the intermediate expression pattern. From the standpoint of information theory, the regulation of such genes should be most complex. In the genomewide context, they are found here to have the higher informational load on all available levels: from participation in protein interaction networks, pathways and modules reflected in Gene Ontology categories through transcription factor regulatory sets and protein functional domains to amino acid tuples (words) in encoded proteins and nucleotide tuples in introns and promoter regions. Thus, the intermediately expressed genes have the higher functional and regulatory complexity that is reflected in their greater length (which is consistent with the 'genome design' model). The dichotomy of housekeeping versus tissue-specific entities is more pronounced on the modular level than on the molecular level. There are much lesser intermediate-specific modules (modules overrepresented in the intermediately expressed genes) than housekeeping or tissue-specific modules (normalized to gene number). The dichotomy of housekeeping versus tissue-specific genes and modules in multicellular organisms is probably caused by the burden of regulatory complexity acted on the intermediately expressed genes.
Collapse
|
37
|
Melo F, Marti-Renom MA. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 2006; 63:986-95. [PMID: 16506243 DOI: 10.1002/prot.20881] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs.
Collapse
Affiliation(s)
- Francisco Melo
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile.
| | | |
Collapse
|
38
|
Abstract
Although one standard amino-acid 'alphabet' is used by most organisms on Earth, the evolutionary cause(s) and significance of this alphabet remain elusive. Fresh insights into the origin of the alphabet are now emerging from disciplines as diverse as astrobiology, biochemical engineering and bioinformatics.
Collapse
Affiliation(s)
- Yi Lu
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Stephen Freeland
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| |
Collapse
|
39
|
Shell MS, Debenedetti PG, Panagiotopoulos AZ. Computational characterization of the sequence landscape in simple protein alphabets. Proteins 2005; 62:232-43. [PMID: 16284961 DOI: 10.1002/prot.20714] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We characterize the "sequence landscapes" in several simple, heteropolymer models of proteins by examining their mutation properties. Using an efficient flat-histogram Monte Carlo search method, our approach involves determining the distribution in energy of all sequences of a given length when threaded through a common backbone. These calculations are performed for a number of Protein Data Bank structures using two variants of the 20-letter contact potential developed by Miyazawa and Jernigan [Miyazawa S, Jernigan WL. Macromolecules 1985;18:534], and the 2-monomer HP model of Lau and Dill [Lau KF, Dill KA. Macromolecules 1989;22:3986]. Our results indicate significant differences among the energy functions in terms of the "smoothness" of their landscapes. In particular, one of the Miyazawa-Jernigan contact potentials reveals unusual cooperative behavior among its species' interactions, resulting in what is essentially a set of phase transitions in sequence space. Our calculations suggest that model-specific features can have a profound effect on protein design algorithms, and our methods offer a number of ways by which sequence landscapes can be quantified.
Collapse
Affiliation(s)
- M Scott Shell
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544, USA.
| | | | | |
Collapse
|
40
|
Guharoy M, Chakrabarti P. Conservation and relative importance of residues across protein-protein interfaces. Proc Natl Acad Sci U S A 2005; 102:15447-52. [PMID: 16221766 PMCID: PMC1266102 DOI: 10.1073/pnas.0505425102] [Citation(s) in RCA: 193] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2005] [Accepted: 08/23/2005] [Indexed: 11/18/2022] Open
Abstract
A core region surrounded by a rim characterizes biological interfaces. We ascertain the importance of the core by showing the sequence entropies of the residues comprising the core to be smaller than those in the rim. Such a distinction is not seen in the 2-fold-related, nonphysiological interfaces formed in crystal lattices of monomeric proteins, thereby providing a procedure for characterizing the oligomeric state from crystal structures of protein molecules. This method is better than those that rely on the comparison of the sequence entropies in the interface and the rest of the protein surface, especially in cases where the surface harbors additional binding sites. To a good approximation there is a correlation between the accessible surface area lost because of complexation and DeltaDeltaG values obtained through alanine-scanning mutagenesis (26-38 cal per A(2) of the surface buried) for residues located in the core, a relationship that is not discernable for rim residues. If, however, a residue participates in hydrogen bonding across the interface, the extent of stabilization is 52 cal/mol per 1 A(2) of the nonpolar surface area buried by the residue. As opposed to an amino acid classification used earlier, an environment-based grouping of residues yields a better discrimination in the sequence entropy between the core and the rim.
Collapse
Affiliation(s)
- Mainak Guharoy
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| | | |
Collapse
|
41
|
Silva IR, Dos Reis LM, Caliri A. Topology-dependent protein folding rates analyzed by a stereochemical model. J Chem Phys 2005; 123:154906. [PMID: 16252971 DOI: 10.1063/1.2052607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
It is an experimental fact that gross topological parameters of the native structure of small proteins presenting two-state kinetics, as relative contact order chi, correlate with the logarithm of their respective folding rate constant kappa(f). However, reported results show specific cases for which the (chi,log kappa(f)) dependence does not follow the overall trend of the entire collection of experimental data. Therefore, an interesting point to be clarified is to what extent the native topology alone can explain these exceptional data. In this work, the structural determinants of the folding kinetics are investigated by means of a 27-mer lattice model, in that each native is represented by a compact self-avoiding (CSA) configuration. The hydrophobic effect and steric constraints are taken as basic ingredients of the folding mechanism, and each CSA configuration is characterized according to its composition of specific patterns (resembling basic structural elements such as loops, sheets, and helices). Our results suggest that (i) folding rate constants are largely influenced by topological details of the native structure, as configurational pattern types and their combinations, and (ii) global parameters, as the relative contact order, may not be effective to detect them. Distinct pattern types and their combinations are determinants of what we call here the "content of secondary-type" structure (sigma) of the native: high sigma implies a large kappa(f). The largest part of all CSA configurations presents a mix of distinct structural patterns, which determine the chixlog kappa(f) linear dependence: Those structures not presenting a proper chi-dependent balance of patterns have their folding kinetics affected with respect to the pretense linear correlation between chi and log kappa(f). The basic physical mechanism relating sigma and kappa(f) involves the concept of cooperativity: If the native is composed of patterns producing a spatial order rich in effective short-range contacts, a properly designed sequence undertakes a fast folding process. On the other hand, the presence of some structural patterns, such as long loops, may reduce substantially the folding performance. This fact is illustrated through natives having a very similar topology but presenting a distinct folding rate kappa(f), and by analyzing structures having the same chi but different sigma.
Collapse
Affiliation(s)
- Inês R Silva
- Departamento de Física e Matemática Faculdade de Filosofia Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, 14040-901 Ribeirão Preto, SP, Brazil.
| | | | | |
Collapse
|
42
|
Saha RP, Bahadur RP, Chakrabarti P. Interresidue Contacts in Proteins and Protein−Protein Interfaces and Their Use in Characterizing the Homodimeric Interface. J Proteome Res 2005; 4:1600-9. [PMID: 16212412 DOI: 10.1021/pr050118k] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The environment of amino acid residues in protein tertiary structures and three types of interfaces formed by protein-protein association--in complexes, homodimers, and crystal lattices of monomeric proteins--has been analyzed in terms of the propensity values of the 20 amino acid residues to be in contact with a given residue. On the basis of the similarity of the environment, twenty residues can be divided into nine classes, which may correspond to a set of reduced amino acid alphabet. There is no appreciable change in the environment in going from the tertiary structure to the interface, those participating in the crystal contacts showing the maximum deviation. Contacts between identical residues are very prominent in homodimers and crystal dimers and arise due to 2-fold related association of residues lining the axis of rotation. These two types of interfaces, representing specific and nonspecific associations, are characterized by the types of residues that partake in "self-contacts"--most notably Leu in the former and Glu in the latter. The relative preference of residues to be involved in "self-contacts" can be used to develop a scoring function to identify homodimeric proteins from crystal structures. Thirty-four percent of such residues are fully conserved among homologous proteins in the homodimer dataset, as opposed to only 20% in crystal dimers. Results point to Leu being the stickiest of all amino acid residues, hence its widespread use in motifs, such as leucine zippers.
Collapse
Affiliation(s)
- Rudra Prasad Saha
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme 7M, Calcutta 700-054, India
| | | | | |
Collapse
|
43
|
Doi N, Kakukawa K, Oishi Y, Yanagawa H. High solubility of random-sequence proteins consisting of five kinds of primitive amino acids. Protein Eng Des Sel 2005; 18:279-84. [PMID: 15928003 DOI: 10.1093/protein/gzi034] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Searching for functional proteins among random-sequence libraries is a major challenge of protein engineering; the difficulties include the poor solubility of many random-sequence proteins. A library in which most of the polypeptides are soluble and stable would therefore be of great benefit. Although modern proteins consist of 20 amino acids, it has been suggested that early proteins evolved from a reduced alphabet. Here, we have constructed a library of random-sequence proteins consisting of only five amino acids, Ala, Gly, Val, Asp and Glu, which are believed to have been the most abundant in the prebiotic environment. Expression and characterization of arbitrarily chosen proteins in the library indicated that five-alphabet random-sequence proteins have higher solubility than do 20-alphabet random-sequence proteins with a similar level of hydrophobicity. The results support the reduced-alphabet hypothesis of the primordial genetic code and should also be helpful in constructing optimized protein libraries for evolutionary protein engineering.
Collapse
Affiliation(s)
- Nobuhide Doi
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | | | | | | |
Collapse
|
44
|
Li X, Liang J. Geometric cooperativity and anticooperativity of three-body interactions in native proteins. Proteins 2005; 60:46-65. [PMID: 15849756 DOI: 10.1002/prot.20438] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterizing multibody interactions of hydrophobic, polar, and ionizable residues in protein is important for understanding the stability of protein structures. We introduce a geometric model for quantifying 3-body interactions in native proteins. With this model, empirical propensity values for many types of 3-body interactions can be reliably estimated from a database of native protein structures, despite the overwhelming presence of pairwise contacts. In addition, we define a nonadditive coefficient that characterizes cooperativity and anticooperativity of residue interactions in native proteins by measuring the deviation of 3-body interactions from 3 independent pairwise interactions. It compares the 3-body propensity value from what would be expected if only pairwise interactions were considered, and highlights the distinction of propensity and cooperativity of 3-body interaction. Based on the geometric model, and what can be inferred from statistical analysis of such a model, we find that hydrophobic interactions and hydrogen-bonding interactions make nonadditive contributions to protein stability, but the nonadditive nature depends on whether such interactions are located in the protein interior or on the protein surface. When located in the interior, many hydrophobic interactions such as those involving alkyl residues are anticooperative. Salt-bridge and regular hydrogen-bonding interactions, such as those involving ionizable residues and polar residues, are cooperative. When located on the protein surface, these salt-bridge and regular hydrogen-bonding interactions are anticooperative, and hydrophobic interactions involving alkyl residues become cooperative. We show with examples that incorporating 3-body interactions improves discrimination of protein native structures against decoy conformations. In addition, analysis of cooperative 3-body interaction may reveal spatial motifs that can suggest specific protein functions.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, SEO, MC-063, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | |
Collapse
|
45
|
Khatun J, Khare SD, Dokholyan NV. Can Contact Potentials Reliably Predict Stability of Proteins? J Mol Biol 2004; 336:1223-38. [PMID: 15037081 DOI: 10.1016/j.jmb.2004.01.002] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2003] [Revised: 01/08/2004] [Accepted: 01/08/2004] [Indexed: 11/17/2022]
Abstract
The simplest approximation of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodology to determine the contact potentials in proteins from experimental measurements of changes in protein's thermodynamic stabilities (DeltaDeltaG) upon mutations. We apply our methodology to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce experimental measurements by statistical tests. We evaluate the maximum accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of experimental (DeltaDeltaG) values. We argue that it is impossible to reach experimental accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of DeltaDeltaG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.
Collapse
Affiliation(s)
- Jainab Khatun
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|