1
|
Nelson MG, Talavera D. Identification of coevolving positions by ancestral reconstruction. Commun Biol 2025; 8:329. [PMID: 40021815 PMCID: PMC11871020 DOI: 10.1038/s42003-025-07676-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 02/05/2025] [Indexed: 03/03/2025] Open
Abstract
Coevolution within proteins occurs when changes in one position affect the selective pressure in another position to preserve the protein structure or function. The identification of coevolving positions within proteins remains contentious, with most methods disregarding the phylogenetic information. Here, we present a time-efficient approach for detecting coevolving pairs, which is almost perfect in terms of precision and specificity. It is based on maximum parsimony-based ancestral reconstruction followed by the identification of pairs with a depletion on separate changes when compared to their number of concurrent changes. Our analysis of a previously characterised biological dataset shows that the coevolving pairs that we identified tend to be close in the protein sequence and structure, slightly less solvent exposed and have a higher mutation rate. We also show how the ancestral reconstruction can be used to detect favourable and unfavourable amino acid combinations. Altogether, we demonstrate how this approach is essential for identifying pairs of positions with weak covariation patterns.
Collapse
Affiliation(s)
- Michael G Nelson
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK.
| |
Collapse
|
2
|
Swint-Kruse L, Martin TA, Wu T, Dougherty LL, Fenton AW. Identification of positions in human aldolase a that are neutral for apparent K M. Arch Biochem Biophys 2024; 761:110183. [PMID: 39461494 PMCID: PMC11908651 DOI: 10.1016/j.abb.2024.110183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/16/2024] [Accepted: 10/20/2024] [Indexed: 10/29/2024]
Abstract
According to evolutionary theory, many naturally-occurring amino acid substitutions are expected to be neutral or near-neutral, with little effect on protein structure or function. Accordingly, most changes observed in human exomes are also expected to be neutral. As such, accurate algorithms for identifying medically-relevant changes must discriminate rare, non-neutral substitutions against a background of neutral substitutions. However, due to historical biases in biochemical experiments, the data available to train and validate prediction algorithms mostly contains non-neutral substitutions, with few examples of neutral substitutions. Thus, available training sets have the opposite composition of the desired test sets. Towards improving a dataset of these critical negative controls, we have concentrated on identifying neutral positions - those positions for which most of the possible 19 amino acid substitutions have little effect on protein structure or function. Here, we used a strategy based on multiple sequence alignments to identify putative neutral positions in human aldolase A, followed by biochemical assays for 147 aldolase substitutions. Results showed that most variants had little effect on either the apparent Michaelis constant for substrate fructose-1,6-bisphosphate or its apparent cooperativity. Thus, these data are useful for training and validating prediction algorithms. In addition, we created a database of these and other biochemically characterized aldolase variants along with aldolase sequences and characteristics derived from sequence and structure analyses. This database is publicly available at https://github.com/liskinsk/Aldolase-variant-and-sequence-database.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA.
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd, MSN 3030, Kansas City, KS, 66160, USA.
| |
Collapse
|
3
|
Swint-Kruse L, Fenton AW. Rheostats, toggles, and neutrals, Oh my! A new framework for understanding how amino acid changes modulate protein function. J Biol Chem 2024; 300:105736. [PMID: 38336297 PMCID: PMC10914490 DOI: 10.1016/j.jbc.2024.105736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/09/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024] Open
Abstract
Advances in personalized medicine and protein engineering require accurately predicting outcomes of amino acid substitutions. Many algorithms correctly predict that evolutionarily-conserved positions show "toggle" substitution phenotypes, which is defined when a few substitutions at that position retain function. In contrast, predictions often fail for substitutions at the less-studied "rheostat" positions, which are defined when different amino acid substitutions at a position sample at least half of the possible functional range. This review describes efforts to understand the impact and significance of rheostat positions: (1) They have been observed in globular soluble, integral membrane, and intrinsically disordered proteins; within single proteins, their prevalence can be up to 40%. (2) Substitutions at rheostat positions can have biological consequences and ∼10% of substitutions gain function. (3) Although both rheostat and "neutral" (defined when all substitutions exhibit wild-type function) positions are nonconserved, the two classes have different evolutionary signatures. (4) Some rheostat positions have pleiotropic effects on function, simultaneously modulating multiple parameters (e.g., altering both affinity and allosteric coupling). (5) In structural studies, substitutions at rheostat positions appear to cause only local perturbations; the overall conformations appear unchanged. (6) Measured functional changes show promising correlations with predicted changes in protein dynamics; the emergent properties of predicted, dynamically coupled amino acid networks might explain some of the complex functional outcomes observed when substituting rheostat positions. Overall, rheostat positions provide unique opportunities for using single substitutions to tune protein function. Future studies of these positions will yield important insights into the protein sequence/function relationship.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
4
|
Kim D, Noh MH, Park M, Kim I, Ahn H, Ye DY, Jung GY, Kim S. Enzyme activity engineering based on sequence co-evolution analysis. Metab Eng 2022; 74:49-60. [PMID: 36113751 DOI: 10.1016/j.ymben.2022.09.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/17/2022]
Abstract
The utility of engineering enzyme activity is expanding with the development of biotechnology. Conventional methods have limited applicability as they require high-throughput screening or three-dimensional structures to direct target residues of activity control. An alternative method uses sequence evolution of natural selection. A repertoire of mutations was selected for fine-tuning enzyme activities to adapt to varying environments during the evolution. Here, we devised a strategy called sequence co-evolutionary analysis to control the efficiency of enzyme reactions (SCANEER), which scans the evolution of protein sequences and direct mutation strategy to improve enzyme activity. We hypothesized that amino acid pairs for various enzyme activity were encoded in the evolutionary history of protein sequences, whereas loss-of-function mutations were avoided since those are depleted during the evolution. SCANEER successfully predicted the enzyme activities of beta-lactamase and aminoglycoside 3'-phosphotransferase. SCANEER was further experimentally validated to control the activities of three different enzymes of great interest in chemical production: cis-aconitate decarboxylase, α-ketoglutaric semialdehyde dehydrogenase, and inositol oxygenase. Activity-enhancing mutations that improve substrate-binding affinity or turnover rate were found at sites distal from known active sites or ligand-binding pockets. We provide SCANEER to control desired enzyme activity through a user-friendly webserver.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Myung Hyun Noh
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Minhyuk Park
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Hyunsoo Ahn
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea
| | - Dae-Yeol Ye
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Gyoo Yeol Jung
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea; Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| |
Collapse
|
5
|
Lee MS, Tuohy PJ, Kim CY, Lichauco K, Parrish HL, Van Doorslaer K, Kuhns MS. Enhancing and inhibitory motifs regulate CD4 activity. eLife 2022; 11:e79508. [PMID: 35861317 PMCID: PMC9333989 DOI: 10.7554/elife.79508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 07/20/2022] [Indexed: 11/15/2022] Open
Abstract
CD4+ T cells use T cell receptor (TCR)-CD3 complexes, and CD4, to respond to peptide antigens within MHCII molecules (pMHCII). We report here that, through ~435 million years of evolution in jawed vertebrates, purifying selection has shaped motifs in the extracellular, transmembrane, and intracellular domains of eutherian CD4 that enhance pMHCII responses, and covary with residues in an intracellular motif that inhibits responses. Importantly, while CD4 interactions with the Src kinase, Lck, are viewed as key to pMHCII responses, our data indicate that CD4-Lck interactions derive their importance from the counterbalancing activity of the inhibitory motif, as well as motifs that direct CD4-Lck pairs to specific membrane compartments. These results have implications for the evolution and function of complex transmembrane receptors and for biomimetic engineering.
Collapse
Affiliation(s)
- Mark S Lee
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Peter J Tuohy
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Caleb Y Kim
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Katrina Lichauco
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Heather L Parrish
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Koenraad Van Doorslaer
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
- School of Animal and Comparative Biomedical Sciences, University of ArizonaTucsonUnited States
- Cancer Biology Graduate Interdisciplinary Program and Genetics Graduate Interdisciplinary Program, The University of ArizonaTucsonUnited States
- The BIO-5 Institute, The University of ArizonaTucsonUnited States
- The University of Arizona Cancer CenterTucsonUnited States
| | - Michael S Kuhns
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
- Cancer Biology Graduate Interdisciplinary Program and Genetics Graduate Interdisciplinary Program, The University of ArizonaTucsonUnited States
- The BIO-5 Institute, The University of ArizonaTucsonUnited States
- The University of Arizona Cancer CenterTucsonUnited States
- The Arizona Center on Aging, The University of Arizona College of MedicineTucsonUnited States
| |
Collapse
|
6
|
Swint-Kruse L, Martin TA, Page BM, Wu T, Gerhart PM, Dougherty LL, Tang Q, Parente DJ, Mosier BR, Bantis LE, Fenton AW. Rheostat functional outcomes occur when substitutions are introduced at nonconserved positions that diverge with speciation. Protein Sci 2021; 30:1833-1853. [PMID: 34076313 DOI: 10.1002/pro.4136] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022]
Abstract
When amino acids vary during evolution, the outcome can be functionally neutral or biologically-important. We previously found that substituting a subset of nonconserved positions, "rheostat" positions, can have surprising effects on protein function. Since changes at rheostat positions can facilitate functional evolution or cause disease, more examples are needed to understand their unique biophysical characteristics. Here, we explored whether "phylogenetic" patterns of change in multiple sequence alignments (such as positions with subfamily specific conservation) predict the locations of functional rheostat positions. To that end, we experimentally tested eight phylogenetic positions in human liver pyruvate kinase (hLPYK), using 10-15 substitutions per position and biochemical assays that yielded five functional parameters. Five positions were strongly rheostatic and three were non-neutral. To test the corollary that positions with low phylogenetic scores were not rheostat positions, we combined these phylogenetic positions with previously-identified hLPYK rheostat, "toggle" (most substitution abolished function), and "neutral" (all substitutions were like wild-type) positions. Despite representing 428 variants, this set of 33 positions was poorly statistically powered. Thus, we turned to the in vivo phenotypic dataset for E. coli lactose repressor protein (LacI), which comprised 12-13 substitutions at 329 positions and could be used to identify rheostat, toggle, and neutral positions. Combined hLPYK and LacI results show that positions with strong phylogenetic patterns of change are more likely to exhibit rheostat substitution outcomes than neutral or toggle outcomes. Furthermore, phylogenetic patterns were more successful at identifying rheostat positions than were co-evolutionary or eigenvector centrality measures of evolutionary change.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Braelyn M Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Paige M Gerhart
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Brian R Mosier
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Leonidas E Bantis
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
7
|
Martin TA, Wu T, Tang Q, Dougherty LL, Parente DJ, Swint-Kruse L, Fenton AW. Identification of biochemically neutral positions in liver pyruvate kinase. Proteins 2020; 88:1340-1350. [PMID: 32449829 DOI: 10.1002/prot.25953] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/10/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023]
Abstract
Understanding how each residue position contributes to protein function has been a long-standing goal in protein science. Substitution studies have historically focused on conserved protein positions. However, substitutions of nonconserved positions can also modify function. Indeed, we recently identified nonconserved positions that have large substitution effects in human liver pyruvate kinase (hLPYK), including altered allosteric coupling. To facilitate a comparison of which characteristics determine when a nonconserved position does vs does not contribute to function, the goal of the current work was to identify neutral positions in hLPYK. However, existing hLPYK data showed that three features commonly associated with neutral positions-high sequence entropy, high surface exposure, and alanine scanning-lacked the sensitivity needed to guide experimental studies. We used multiple evolutionary patterns identified in a sequence alignment of the PYK family to identify which positions were least patterned, reasoning that these were most likely to be neutral. Nine positions were tested with a total of 117 amino acid substitutions. Although exploring all potential functions is not feasible for any protein, five parameters associated with substrate/effector affinities and allosteric coupling were measured for hLPYK variants. For each position, the aggregate functional outcomes of all variants were used to quantify a "neutrality" score. Three positions showed perfect neutral scores for all five parameters. Furthermore, the nine positions showed larger neutral scores than 17 positions located near allosteric binding sites. Thus, our strategy successfully enriched the dataset for positions with neutral and modest substitutions.
Collapse
Affiliation(s)
- Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Family and Community Medicine, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
8
|
Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
As an important category of proteins, alpha-helix transmembrane proteins (αTMPs) play an important role in various biological activities. Because the solved αTMP structures are inadequate, predicting the residue contacts among the transmembrane segments of an αTMP exhibits the basis of protein fold, which can be used to further discover more protein functions. A few efforts have been devoted to predict the interhelical residue contact using machine learning methods based on the prior knowledge of transmembrane protein structure. However, it is still a challenge to improve the prediction accuracy, while the deep learning method provides an opportunity to utilize the structural knowledge in a different insight. For this purpose, we proposed a novel αTMP residue-residue contact prediction method IMPContact, in which a convolutional neural network (CNN) was applied to recognize those interhelical contacts in a TMP using its specific structural features. There were four sequence-based TMP-specific features selected to descript a pair of residues, namely, evolutionary covariation, predicted topology structure, residue relative position, and evolutionary conservation. An up-to-date dataset was used to train and test the IMPContact; our method achieved better performance compared to peer methods. In the case studies, IHRCs in the regular transmembrane helixes were better predicted than in the irregular ones.
Collapse
Affiliation(s)
- Chao Fang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yajie Jia
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Lihong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yinghua Lu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
9
|
Swint-Kruse L. Using Evolution to Guide Protein Engineering: The Devil IS in the Details. Biophys J 2017; 111:10-8. [PMID: 27410729 DOI: 10.1016/j.bpj.2016.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 10/21/2022] Open
Abstract
For decades, protein engineers have endeavored to reengineer existing proteins for novel applications. Overall, protein folds and gross functions can be readily transferred from one protein to another by transplanting large blocks of sequence (i.e., domain recombination). However, predictably fine-tuning function (e.g., by adjusting ligand affinity, specificity, catalysis, and/or allosteric regulation) remains a challenge. One approach has been to use the sequences of protein families to identify amino acid positions that change during the evolution of functional variation. The rationale is that these nonconserved positions could be mutated to predictably fine-tune function. Evolutionary approaches to protein design have had some success, but the engineered proteins seldom replicate the functional performances of natural proteins. This Biophysical Perspective reviews several complexities that have been revealed by evolutionary and experimental studies of protein function. These include 1) challenges in defining computational and biological thresholds that define important amino acids; 2) the co-occurrence of many different patterns of amino acid changes in evolutionary data; 3) difficulties in mapping the patterns of amino acid changes to discrete functional parameters; 4) the nonconventional mutational outcomes that occur for a particular group of functionally important, nonconserved positions; 5) epistasis (nonadditivity) among multiple mutations; and 6) the fact that a large fraction of a protein's amino acids contribute to its overall function. To overcome these challenges, new goals are identified for future studies.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas.
| |
Collapse
|
10
|
Woldring DR, Holec PV, Hackel BJ. ScaffoldSeq: Software for characterization of directed evolution populations. Proteins 2016; 84:869-74. [PMID: 27018773 DOI: 10.1002/prot.25040] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 03/08/2016] [Accepted: 03/18/2016] [Indexed: 12/21/2022]
Abstract
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Daniel R Woldring
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| | - Patrick V Holec
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| | - Benjamin J Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| |
Collapse
|
11
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 176] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
12
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
13
|
Brender JR, Zhang Y. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles. PLoS Comput Biol 2015; 11:e1004494. [PMID: 26506533 PMCID: PMC4624718 DOI: 10.1371/journal.pcbi.1004494] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/06/2015] [Indexed: 11/18/2022] Open
Abstract
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. Few proteins carry out their tasks in isolation. Instead, proteins combine with each other in complicated ways that can be affected by either the natural genetic variation that occurs among people or by disease causing mutations such as those that occur in cancer or in genetic disorders. To understand how these mutations affect our health, it is necessary to understand how mutations can affect the strength of the interactions that bind proteins together. This is a difficult task to do in a laboratory on a large scale and scientists are increasingly turning to computational methods to predict these effects in advance. We show that by looking at the multiple alignments of similar protein-protein complex structures at the interface regions, new constraints based on the evolution of the three dimensional structures of proteins can be made to predict which mutations are compatible with two proteins interacting and which are not.
Collapse
Affiliation(s)
- Jeffrey R. Brender
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
14
|
Abstract
Mutations in the GBA1 gene are associated with increased risk of Parkinson's disease, and the protein produced by the gene, glucocerebrosidase, interacts with α-synuclein, the protein at the center of the disease etiology. One possibility is that the mutations disrupt a beneficial interaction between the proteins, and a beneficial interaction would imply that the proteins have coevolved. To explore this possibility, a correlated mutation analysis has been performed for all 72 vertebrate species where complete sequences of α-synuclein and glucocerebrosidase are known. The most highly correlated pair of residue variations is α-synuclein A53T and glucocerebrosidase G115E. Intriguingly, the A53T mutation is a Parkinson's disease risk factor in humans, suggesting the pathology associated with this mutation and interaction with glucocerebrosidase might be connected. Correlations with β-synuclein are also evaluated. To assess the impact of lowered species number on accuracy, intra and inter-chain correlations are also calculated for hemoglobin, using mutual information Z-value and direct coupling analyses.
Collapse
Affiliation(s)
- James M. Gruschus
- Laboratory of Structural Biophysics, NHLBI, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
15
|
Abstract
Recent developments in the analysis of amino acid covariation are leading to breakthroughs in protein structure prediction, protein design, and prediction of the interactome. It is assumed that observed patterns of covariation are caused by molecular coevolution, where substitutions at one site affect the evolutionary forces acting at neighboring sites. Our theoretical and empirical results cast doubt on this assumption. We demonstrate that the strongest coevolutionary signal is a decrease in evolutionary rate and that unfeasibly long times are required to produce coordinated substitutions. We find that covarying substitutions are mostly found on different branches of the phylogenetic tree, indicating that they are independent events that may or may not be attributable to coevolution. These observations undermine the hypothesis that molecular coevolution is the primary cause of the covariation signal. In contrast, we find that the pairs of residues with the strongest covariation signal tend to have low evolutionary rates, and that it is this low rate that gives rise to the covariation signal. Slowly evolving residue pairs are disproportionately located in the protein’s core, which explains covariation methods’ ability to detect pairs of residues that are close in three dimensions. These observations lead us to propose the “coevolution paradox”: The strength of coevolution required to cause coordinated changes means the evolutionary rate is so low that such changes are highly unlikely to occur. As modern covariation methods may lead to breakthroughs in structural genomics, it is critical to recognize their biases and limitations.
Collapse
Affiliation(s)
- David Talavera
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon Whelan
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom Evolutionary Biology Centre, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
16
|
Parente DJ, Swint-Kruse L. Multiple co-evolutionary networks are supported by the common tertiary scaffold of the LacI/GalR proteins. PLoS One 2013; 8:e84398. [PMID: 24391951 PMCID: PMC3877293 DOI: 10.1371/journal.pone.0084398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 11/15/2013] [Indexed: 11/18/2022] Open
Abstract
Protein families might evolve paralogous functions on their common tertiary scaffold in two ways. First, the locations of functionally-important sites might be "hard-wired" into the structure, with novel functions evolved by altering the amino acid (e.g. Ala vs Ser) at these positions. Alternatively, the tertiary scaffold might be adaptable, accommodating a unique set of functionally important sites for each paralogous function. To discriminate between these possibilities, we compared the set of functionally important sites in the six largest paralogous subfamilies of the LacI/GalR transcription repressor family. LacI/GalR paralogs share a common tertiary structure, but have low sequence identity (≤ 30%), and regulate a variety of metabolic processes. Functionally important positions were identified by conservation and co-evolutionary sequence analyses. Results showed that conserved positions use a mixture of the "hard-wired" and "accommodating" scaffold frameworks, but that the co-evolution networks were highly dissimilar between any pair of subfamilies. Therefore, the tertiary structure can accommodate multiple networks of functionally important positions. This possibility should be included when designing and interpreting sequence analyses of other protein families. Software implementing conservation and co-evolution analyses is available at https://sourceforge.net/projects/coevolutils/.
Collapse
Affiliation(s)
- Daniel J. Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
- * E-mail:
| |
Collapse
|
17
|
Meinhardt S, Manley MW, Parente DJ, Swint-Kruse L. Rheostats and toggle switches for modulating protein function. PLoS One 2013; 8:e83502. [PMID: 24386217 PMCID: PMC3875437 DOI: 10.1371/journal.pone.0083502] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/03/2013] [Indexed: 01/08/2023] Open
Abstract
The millions of protein sequences generated by genomics are expected to transform protein engineering and personalized medicine. To achieve these goals, tools for predicting outcomes of amino acid changes must be improved. Currently, advances are hampered by insufficient experimental data about nonconserved amino acid positions. Since the property “nonconserved” is identified using a sequence alignment, we designed experiments to recapitulate that context: Mutagenesis and functional characterization was carried out in 15 LacI/GalR homologs (rows) at 12 nonconserved positions (columns). Multiple substitutions were made at each position, to reveal how various amino acids of a nonconserved column were tolerated in each protein row. Results showed that amino acid preferences of nonconserved positions were highly context-dependent, had few correlations with physico-chemical similarities, and were not predictable from their occurrence in natural LacI/GalR sequences. Further, unlike the “toggle switch” behaviors of conserved positions, substitutions at nonconserved positions could be rank-ordered to show a “rheostatic”, progressive effect on function that spanned several orders of magnitude. Comparisons to various sequence analyses suggested that conserved and strongly co-evolving positions act as functional toggles, whereas other important, nonconserved positions serve as rheostats for modifying protein function. Both the presence of rheostat positions and the sequence analysis strategy appear to be generalizable to other protein families and should be considered when engineering protein modifications or predicting the impact of protein polymorphisms.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Michael W. Manley
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Daniel J. Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
- * E-mail:
| |
Collapse
|
18
|
Abstract
Co-evolution is a fundamental component of the theory of evolution and is essential for understanding the relationships between species in complex ecological networks. A wide range of co-evolution-inspired computational methods has been designed to predict molecular interactions, but it is only recently that important advances have been made. Breakthroughs in the handling of phylogenetic information and in disentangling indirect relationships have resulted in an improved capacity to predict interactions between proteins and contacts between different protein residues. Here, we review the main co-evolution-based computational approaches, their theoretical basis, potential applications and foreseeable developments.
Collapse
Affiliation(s)
- David de Juan
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | |
Collapse
|
19
|
Ashenberg O, Laub MT. Using analyses of amino Acid coevolution to understand protein structure and function. Methods Enzymol 2013; 523:191-212. [PMID: 23422431 DOI: 10.1016/b978-0-12-394292-0.00009-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Determining which residues of a protein contribute to a specific function is a difficult problem. Analyses of amino acid covariation within a protein family can serve as a useful guide by identifying residues that are functionally coupled. Covariation analyses have been successfully used on several different protein families to identify residues that work together to promote folding, enable protein-protein interactions, or contribute to an enzymatic activity. Covariation is a statistical signal that can be measured in a multiple sequence alignment of homologous proteins. As sequence databases have expanded dramatically, covariation analyses have become easier and more powerful. In this chapter, we describe how functional covariation arises during the evolution of proteins and how this signal can be distinguished from various background signals. We discuss the basic methodology for performing amino acid covariation analysis, using bacterial two-component signal transduction proteins as an example. We provide practical suggestions for each step of the process including assembly of protein sequences, construction of a multiple sequence alignment, measurement of covariation, and analysis of results.
Collapse
Affiliation(s)
- Orr Ashenberg
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | |
Collapse
|
20
|
Voordeckers K, Brown CA, Vanneste K, van der Zande E, Voet A, Maere S, Verstrepen KJ. Reconstruction of ancestral metabolic enzymes reveals molecular mechanisms underlying evolutionary innovation through gene duplication. PLoS Biol 2012; 10:e1001446. [PMID: 23239941 PMCID: PMC3519909 DOI: 10.1371/journal.pbio.1001446] [Citation(s) in RCA: 147] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 10/30/2012] [Indexed: 11/24/2022] Open
Abstract
Gene duplications are believed to facilitate evolutionary innovation. However, the mechanisms shaping the fate of duplicated genes remain heavily debated because the molecular processes and evolutionary forces involved are difficult to reconstruct. Here, we study a large family of fungal glucosidase genes that underwent several duplication events. We reconstruct all key ancestral enzymes and show that the very first preduplication enzyme was primarily active on maltose-like substrates, with trace activity for isomaltose-like sugars. Structural analysis and activity measurements on resurrected and present-day enzymes suggest that both activities cannot be fully optimized in a single enzyme. However, gene duplications repeatedly spawned daughter genes in which mutations optimized either isomaltase or maltase activity. Interestingly, similar shifts in enzyme activity were reached multiple times via different evolutionary routes. Together, our results provide a detailed picture of the molecular mechanisms that drove divergence of these duplicated enzymes and show that whereas the classic models of dosage, sub-, and neofunctionalization are helpful to conceptualize the implications of gene duplication, the three mechanisms co-occur and intertwine.
Collapse
Affiliation(s)
- Karin Voordeckers
- VIB Laboratory for Systems Biology, Leuven, Belgium
- CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - Chris A. Brown
- VIB Laboratory for Systems Biology, Leuven, Belgium
- CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
- Fathom Information Design, Boston, Massachusetts, United States of America
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Kevin Vanneste
- VIB Department of Plant Systems Biology, Gent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium
| | - Elisa van der Zande
- VIB Laboratory for Systems Biology, Leuven, Belgium
- CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| | - Arnout Voet
- Laboratory for Molecular en Structural Biology, KU Leuven, Leuven, Belgium
| | - Steven Maere
- VIB Department of Plant Systems Biology, Gent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium
| | - Kevin J. Verstrepen
- VIB Laboratory for Systems Biology, Leuven, Belgium
- CMPG Laboratory for Genetics and Genomics, KU Leuven, Leuven, Belgium
| |
Collapse
|
21
|
Li X, Zhang Z, Song J. Computational enzyme design approaches with significant biological outcomes: progress and challenges. Comput Struct Biotechnol J 2012; 2:e201209007. [PMID: 24688648 PMCID: PMC3962085 DOI: 10.5936/csbj.201209007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Revised: 09/27/2012] [Accepted: 10/04/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are powerful biocatalysts, however, so far there is still a large gap between the number of enzyme-based practical applications and that of naturally occurring enzymes. Multiple experimental approaches have been applied to generate nearly all possible mutations of target enzymes, allowing the identification of desirable variants with improved properties to meet the practical needs. Meanwhile, an increasing number of computational methods have been developed to assist in the modification of enzymes during the past few decades. With the development of bioinformatic algorithms, computational approaches are now able to provide more precise guidance for enzyme engineering and make it more efficient and less laborious. In this review, we summarize the recent advances of method development with significant biological outcomes to provide important insights into successful computational protein designs. We also discuss the limitations and challenges of existing methods and the future directions that should improve them.
Collapse
Affiliation(s)
- Xiaoman Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China ; Department of Biochemistry and Molecular Biology and ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
22
|
Accurate simulation and detection of coevolution signals in multiple sequence alignments. PLoS One 2012; 7:e47108. [PMID: 23091608 PMCID: PMC3473043 DOI: 10.1371/journal.pone.0047108] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 09/10/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. METHODOLOGY/PRINCIPAL FINDINGS We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. CONCLUSIONS/SIGNIFICANCE Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.
Collapse
|
23
|
Gomes M, Hamer R, Reinert G, Deane CM. Mutual information and variants for protein domain-domain contact prediction. BMC Res Notes 2012; 5:472. [PMID: 23244412 PMCID: PMC3532072 DOI: 10.1186/1756-0500-5-472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/10/2012] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific "successful" case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
Collapse
Affiliation(s)
- Mireille Gomes
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | |
Collapse
|
24
|
Lee Y, Mick J, Furdui C, Beamer LJ. A coevolutionary residue network at the site of a functionally important conformational change in a phosphohexomutase enzyme family. PLoS One 2012; 7:e38114. [PMID: 22685552 PMCID: PMC3369874 DOI: 10.1371/journal.pone.0038114] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 05/01/2012] [Indexed: 11/26/2022] Open
Abstract
Coevolution analyses identify residues that co-vary with each other during evolution, revealing sequence relationships unobservable from traditional multiple sequence alignments. Here we describe a coevolutionary analysis of phosphomannomutase/phosphoglucomutase (PMM/PGM), a widespread and diverse enzyme family involved in carbohydrate biosynthesis. Mutual information and graph theory were utilized to identify a network of highly connected residues with high significance. An examination of the most tightly connected regions of the coevolutionary network reveals that most of the involved residues are localized near an interdomain interface of this enzyme, known to be the site of a functionally important conformational change. The roles of four interface residues found in this network were examined via site-directed mutagenesis and kinetic characterization. For three of these residues, mutation to alanine reduces enzyme specificity to ∼10% or less of wild-type, while the other has ∼45% activity of wild-type enzyme. An additional mutant of an interface residue that is not densely connected in the coevolutionary network was also characterized, and shows no change in activity relative to wild-type enzyme. The results of these studies are interpreted in the context of structural and functional data on PMM/PGM. Together, they demonstrate that a network of coevolving residues links the highly conserved active site with the interdomain conformational change necessary for the multi-step catalytic reaction. This work adds to our understanding of the functional roles of coevolving residue networks, and has implications for the definition of catalytically important residues.
Collapse
Affiliation(s)
- Yingying Lee
- Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America
| | - Jacob Mick
- Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
| | - Cristina Furdui
- Department of Internal Medicine, Wake Forest University Health Sciences Winston-Salem, North Carolina, United States of America
| | - Lesa J. Beamer
- Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America
- Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
25
|
Livesay DR, Kreth KE, Fodor AA. A critical evaluation of correlated mutation algorithms and coevolution within allosteric mechanisms. Methods Mol Biol 2012; 796:385-398. [PMID: 22052502 DOI: 10.1007/978-1-61779-334-9_21] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The notion of using the evolutionary history encoded within multiple sequence alignments to predict allosteric mechanisms is appealing. In this approach, correlated mutations are expected to reflect coordinated changes that maintain intramolecular coupling between residue pairs. Despite much early fanfare, the general suitability of correlated mutations to predict allosteric couplings has not yet been established. Lack of progress along these lines has been hindered by several algorithmic limitations including phylogenetic artifacts within alignments masking true covariance and the computational intractability of consideration of more than two correlated residues at a time. Recent progress in algorithm development, however, has been substantial with a new generation of correlated mutation algorithms that have made fundamental progress toward solving these difficult problems. Despite these encouraging results, there remains little evidence to suggest that the evolutionary constraints acting on allosteric couplings are sufficient to be recovered from multiple sequence alignments. In this review, we argue that due to the exquisite sensitivity of protein dynamics, and hence that of allosteric mechanisms, the latter vary widely within protein families. If it turns out to be generally true that even very similar homologs display a wide divergence of allosteric mechanisms, then even a perfect correlated mutation algorithm could not be reliably used as a general mechanism for discovery of allosteric pathways.
Collapse
Affiliation(s)
- Dennis R Livesay
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | | |
Collapse
|
26
|
Pond SLK, Murrell B, Poon AFY. Evolution of viral genomes: interplay between selection, recombination, and other forces. Methods Mol Biol 2012; 856:239-72. [PMID: 22399462 DOI: 10.1007/978-1-61779-585-5_10] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
RNA viruses evolve very rapidly, often recombine, and are subject to strong host (immune response) and anthropogenic (antiretroviral drugs) selective forces. Given their compact and extensively sequenced genomes, comparative analysis of RNA viral data can provide important insights into the molecular mechanisms of adaptation, pathogenicity, immune evasion, and drug resistance. In this chapter, we present an example-based overview of recent advances in evolutionary models and statistical approaches that enable screening viral alignments for evidence of adaptive change in the presence of recombination, detecting bursts of directional adaptive evolution associated with the phenotypic changes, and detecting of coevolving sites in viral genes.
Collapse
|
27
|
Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011; 28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
An improved understanding of protein conformational changes has broad implications for elucidating the mechanisms of various biological processes and for the design of protein engineering experiments. Understanding rearrangements of residue interactions is a key component in the challenge of describing structural transitions. Evolutionary properties of protein sequences and structures are extensively studied; however, evolution of protein motions, especially with respect to interaction rearrangements, has yet to be explored. Here, we investigated the relationship between sequence evolution and protein conformational changes and discovered that structural transitions are encoded in amino acid sequences as coevolving residue pairs. Furthermore, we found that highly coevolving residues are clustered in the flexible regions of proteins and facilitate structural transitions by forming and disrupting their interactions cooperatively. Our results provide insight into the evolution of protein conformational changes and help to identify residues important for structural transitions.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | |
Collapse
|
28
|
Ackerman SH, Gatti DL. The contribution of coevolving residues to the stability of KDO8P synthase. PLoS One 2011; 6:e17459. [PMID: 21408011 PMCID: PMC3052366 DOI: 10.1371/journal.pone.0017459] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Accepted: 02/03/2011] [Indexed: 12/03/2022] Open
Abstract
Background The evolutionary tree of 3-deoxy-D-manno-octulosonate 8-phosphate (KDO8P) synthase (KDO8PS), a bacterial enzyme that catalyzes a key step in the biosynthesis of bacterial endotoxin, is evenly divided between metal and non-metal forms, both having similar structures, but diverging in various degrees in amino acid sequence. Mutagenesis, crystallographic and computational studies have established that only a few residues determine whether or not KDO8PS requires a metal for function. The remaining divergence in the amino acid sequence of KDO8PSs is apparently unrelated to the underlying catalytic mechanism. Methodology/Principal Findings The multiple alignment of all known KDO8PS sequences reveals that several residue pairs coevolved, an indication of their possible linkage to a structural constraint. In this study we investigated by computational means the contribution of coevolving residues to the stability of KDO8PS. We found that about 1/4 of all strongly coevolving pairs probably originated from cycles of mutation (decreasing stability) and suppression (restoring it), while the remaining pairs are best explained by a succession of neutral or nearly neutral covarions. Conclusions/Significance Both sequence conservation and coevolution are involved in the preservation of the core structure of KDO8PS, but the contribution of coevolving residues is, in proportion, smaller. This is because small stability gains or losses associated with selection of certain residues in some regions of the stability landscape of KDO8PS are easily offset by a large number of possible changes in other regions. While this effect increases the tolerance of KDO8PS to deleterious mutations, it also decreases the probability that specific pairs of residues could have a strong contribution to the thermodynamic stability of the protein.
Collapse
Affiliation(s)
- Sharon H. Ackerman
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
| | - Domenico L. Gatti
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- Cardiovascular Research Institute, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- * E-mail:
| |
Collapse
|