1
|
Nelson MG, Talavera D. Identification of coevolving positions by ancestral reconstruction. Commun Biol 2025; 8:329. [PMID: 40021815 PMCID: PMC11871020 DOI: 10.1038/s42003-025-07676-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 02/05/2025] [Indexed: 03/03/2025] Open
Abstract
Coevolution within proteins occurs when changes in one position affect the selective pressure in another position to preserve the protein structure or function. The identification of coevolving positions within proteins remains contentious, with most methods disregarding the phylogenetic information. Here, we present a time-efficient approach for detecting coevolving pairs, which is almost perfect in terms of precision and specificity. It is based on maximum parsimony-based ancestral reconstruction followed by the identification of pairs with a depletion on separate changes when compared to their number of concurrent changes. Our analysis of a previously characterised biological dataset shows that the coevolving pairs that we identified tend to be close in the protein sequence and structure, slightly less solvent exposed and have a higher mutation rate. We also show how the ancestral reconstruction can be used to detect favourable and unfavourable amino acid combinations. Altogether, we demonstrate how this approach is essential for identifying pairs of positions with weak covariation patterns.
Collapse
Affiliation(s)
- Michael G Nelson
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK
| | - David Talavera
- Division of Cardiovascular Sciences, School of Medical Sciences, The University of Manchester, Oxford Road, Manchester, UK.
| |
Collapse
|
2
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
3
|
Lee MS, Tuohy PJ, Kim CY, Lichauco K, Parrish HL, Van Doorslaer K, Kuhns MS. Enhancing and inhibitory motifs regulate CD4 activity. eLife 2022; 11:e79508. [PMID: 35861317 PMCID: PMC9333989 DOI: 10.7554/elife.79508] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 07/20/2022] [Indexed: 11/15/2022] Open
Abstract
CD4+ T cells use T cell receptor (TCR)-CD3 complexes, and CD4, to respond to peptide antigens within MHCII molecules (pMHCII). We report here that, through ~435 million years of evolution in jawed vertebrates, purifying selection has shaped motifs in the extracellular, transmembrane, and intracellular domains of eutherian CD4 that enhance pMHCII responses, and covary with residues in an intracellular motif that inhibits responses. Importantly, while CD4 interactions with the Src kinase, Lck, are viewed as key to pMHCII responses, our data indicate that CD4-Lck interactions derive their importance from the counterbalancing activity of the inhibitory motif, as well as motifs that direct CD4-Lck pairs to specific membrane compartments. These results have implications for the evolution and function of complex transmembrane receptors and for biomimetic engineering.
Collapse
Affiliation(s)
- Mark S Lee
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Peter J Tuohy
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Caleb Y Kim
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Katrina Lichauco
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Heather L Parrish
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
| | - Koenraad Van Doorslaer
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
- School of Animal and Comparative Biomedical Sciences, University of ArizonaTucsonUnited States
- Cancer Biology Graduate Interdisciplinary Program and Genetics Graduate Interdisciplinary Program, The University of ArizonaTucsonUnited States
- The BIO-5 Institute, The University of ArizonaTucsonUnited States
- The University of Arizona Cancer CenterTucsonUnited States
| | - Michael S Kuhns
- Department of Immunobiology, The University of Arizona College of MedicineTucsonUnited States
- Cancer Biology Graduate Interdisciplinary Program and Genetics Graduate Interdisciplinary Program, The University of ArizonaTucsonUnited States
- The BIO-5 Institute, The University of ArizonaTucsonUnited States
- The University of Arizona Cancer CenterTucsonUnited States
- The Arizona Center on Aging, The University of Arizona College of MedicineTucsonUnited States
| |
Collapse
|
4
|
Shi J, Shen Q, Cho JH, Hwang W. Entropy Hotspots for the Binding of Intrinsically Disordered Ligands to a Receptor Domain. Biophys J 2020; 118:2502-2512. [PMID: 32311315 DOI: 10.1016/j.bpj.2020.03.026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 02/28/2020] [Accepted: 03/23/2020] [Indexed: 11/18/2022] Open
Abstract
Proline-rich motifs (PRMs) are widely used for mediating protein-protein interactions with weak binding affinities. Because they are intrinsically disordered when unbound, conformational entropy plays a significant role for the binding. However, residue-level differences of the entropic contribution in the binding of different ligands remain not well understood. We use all-atom molecular dynamics simulation and the maximal information spanning tree formalism to analyze conformational entropy associated with the binding of two PRMs, one from the Abl kinase and the other from the nonstructural protein 1 of the 1918 Spanish influenza A virus, to the N-terminal SH3 (nSH3) domain of the CrkII protein. Side chains of the stably folded nSH3 experience more entropy change upon ligand binding than the backbone, whereas PRMs involve comparable but heterogeneous entropy changes among the backbone and side chains. In nSH3, two conserved nonpolar residues forming contacts with the PRM experience the largest side-chain entropy loss. In contrast, the C-terminal charged residues of PRMs that form polar contacts with nSH3 experience the greatest side-chain entropy loss, although their "fuzzy" nature is attributable to the backbone that remains relatively flexible. Thus, residues that form high-occupancy contacts between nSH3 and PRM do not reciprocally contribute to entropy loss. Furthermore, certain surface residues of nSH3 distal to the interface with PRMs gain entropy, indicating a nonlocal effect of ligand binding. Comparing between the PRMs from cAbl and nonstructural protein 1, the latter involves a larger side-chain entropy loss and forms more contacts with nSH3. Consistent with experiments, this indicates stronger binding of the viral ligand at the expense of losing the flexibility of side chains, whereas the backbone experiences less entropy loss. The entropy "hotspots" as identified in this study will be important for tuning the binding affinity of various ligands to a receptor.
Collapse
Affiliation(s)
- Jie Shi
- Department of Biomedical Engineering, Texas A&M University, College Station, Texas
| | - Qingliang Shen
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Jae-Hyun Cho
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas.
| | - Wonmuk Hwang
- Department of Biomedical Engineering, Texas A&M University, College Station, Texas; Department of Materials Science and Engineering, Texas A&M University, College Station, Texas; Department of Physics and Astronomy, Texas A&M University, College Station, Texas; School of Computational Sciences, Korea Institute for Advanced Study, Seoul, South Korea.
| |
Collapse
|
5
|
Georgoulis A, Louka M, Mylonas S, Stavros P, Nounesis G, Vorgias CE. Consensus protein engineering on the thermostable histone-like bacterial protein HUs significantly improves stability and DNA binding affinity. Extremophiles 2020; 24:293-306. [PMID: 31980943 DOI: 10.1007/s00792-020-01154-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 01/06/2020] [Indexed: 11/28/2022]
Abstract
Consensus-based protein engineering strategy has been applied to various proteins and it can lead to the design of proteins with enhanced biological performance. Histone-like HUs comprise a protein family with sequence variety within a highly conserved 3D-fold. HU function includes compacting and regulating bacterial DNA in a wide range of biological conditions in bacteria. To explore the possible impact of consensus-based design in the thermodynamic stability of HU proteins, the approach was applied using a dataset of sequences derived from a group of 40 mesostable, thermostable, and hyperthermostable HUs. The consensus-derived HU protein was named HUBest, since it is expected to perform best. The synthetic HU gene was overexpressed in E. coli and the recombinant protein was purified. Subsequently, HUBest was characterized concerning its correct folding and thermodynamic stability, as well as its ability to interact with plasmid DNA. A substantial increase in HUBest stability at high temperatures is observed. HUBest has significantly improved biological performance at ambience temperature, presenting very low Kd values for binding plasmid DNA as indicated from the Gibbs energy profile of HUBest. This Kd may be associated to conformational changes leading to decreased thermodynamic stability and, therefore, higher flexibility at ambient temperature.
Collapse
Affiliation(s)
- Anastasios Georgoulis
- Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, 157 01, Zografou, Greece
| | - Maria Louka
- Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, 157 01, Zografou, Greece
| | - Stratos Mylonas
- Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, 157 01, Zografou, Greece
| | - Philemon Stavros
- Biomolecular Physics Laboratory, INRASTES, National Centre for Scientific Research "Demokritos", 153 10, Agia Paraskevi, Greece
| | - George Nounesis
- Biomolecular Physics Laboratory, INRASTES, National Centre for Scientific Research "Demokritos", 153 10, Agia Paraskevi, Greece
| | - Constantinos E Vorgias
- Department of Biochemistry and Molecular Biology, National and Kapodistrian University of Athens, 157 01, Zografou, Greece.
| |
Collapse
|
6
|
Pilla SP, R B, Bahadur RP. Dissecting protein‐protein interactions in proteasome assembly: Implication to its self‐assembly. J Mol Recognit 2019; 32:e2784. [DOI: 10.1002/jmr.2784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 03/07/2019] [Accepted: 03/19/2019] [Indexed: 01/18/2023]
Affiliation(s)
- Smita P. Pilla
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| | - Babu R
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| | - Ranjit P. Bahadur
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| |
Collapse
|
7
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
8
|
Allosteric Modulation of Binding Specificity by Alternative Packing of Protein Cores. J Mol Biol 2018; 431:336-350. [PMID: 30471255 DOI: 10.1016/j.jmb.2018.11.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 11/04/2018] [Accepted: 11/14/2018] [Indexed: 11/21/2022]
Abstract
Hydrophobic cores are often viewed as tightly packed and rigid, but they do show some plasticity and could thus be attractive targets for protein design. Here we explored the role of different functional pressures on the core packing and ligand recognition of the SH3 domain from human Fyn tyrosine kinase. We randomized the hydrophobic core and used phage display to select variants that bound to each of three distinct ligands. The three evolved groups showed remarkable differences in core composition, illustrating the effect of different selective pressures on the core. Changes in the core did not significantly alter protein stability, but were linked closely to changes in binding affinity and specificity. Structural analysis and molecular dynamics simulations revealed the structural basis for altered specificity. The evolved domains had significantly reduced core volumes, which in turn induced increased backbone flexibility. These motions were propagated from the core to the binding surface and induced significant conformational changes. These results show that alternative core packing and consequent allosteric modulation of binding interfaces could be used to engineer proteins with novel functions.
Collapse
|
9
|
Mechanical variations in proteins with large-scale motions highlight the formation of structural locks. J Struct Biol 2018; 203:195-204. [DOI: 10.1016/j.jsb.2018.05.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 05/18/2018] [Accepted: 05/22/2018] [Indexed: 12/18/2022]
|
10
|
Zeng J, Guareschi R, Damre M, Cao R, Kless A, Neumaier B, Bauer A, Giorgetti A, Carloni P, Rossetti G. Structural Prediction of the Dimeric Form of the Mammalian Translocator Membrane Protein TSPO: A Key Target for Brain Diagnostics. Int J Mol Sci 2018; 19:E2588. [PMID: 30200318 PMCID: PMC6165245 DOI: 10.3390/ijms19092588] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 08/21/2018] [Accepted: 08/28/2018] [Indexed: 11/17/2022] Open
Abstract
Positron emission tomography (PET) radioligands targeting the human translocator membrane protein (TSPO) are broadly used for the investigations of neuroinflammatory conditions associated with neurological disorders. Structural information on the mammalian protein homodimers-the suggested functional state of the protein-is limited to a solid-state nuclear magnetic resonance (NMR) study and to a model based on the previously-deposited solution NMR structure of the monomeric mouse protein. Computational studies performed here suggest that the NMR-solved structure in the presence of detergents is not prone to dimer formation and is furthermore unstable in its native membrane environment. We, therefore, propose a new model of the functionally-relevant dimeric form of the mouse protein, based on a prokaryotic homologue. The model, fully consistent with solid-state NMR data, is very different from the previous predictions. Hence, it provides, for the first time, structural insights into this pharmaceutically-important target which are fully consistent with experimental data.
Collapse
Affiliation(s)
- Juan Zeng
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
- Laboratory of Computational Chemistry and Drug Design, Laboratory of Chemical Genomics, Peking University Shenzhen Graduate School, 518055 Shenzhen, China.
| | - Riccardo Guareschi
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
| | - Mangesh Damre
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy.
- Neurobiology, International School for Advanced Studies (SISSA), 34136 Trieste, Italy.
| | - Ruyin Cao
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
| | - Achim Kless
- Grünenthal Innovation, Translational Science & Intelligence, Grünenthal GmbH, 52078 Aachen, Germany.
| | - Bernd Neumaier
- Institute for Neuroscience and Medicine (INM)-5, Forschungszentrum Jülich, 52428 Jülich, Germany.
| | - Andreas Bauer
- Institute for Neuroscience and Medicine (INM)-2, Forschungszentrum Jülich, 52428 Jülich, Germany.
| | - Alejandro Giorgetti
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
- Department of Biotechnology, University of Verona, Strada Le Grazie 15, 37134 Verona, Italy.
| | - Paolo Carloni
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
- RWTH Aachen University, Department of Physics, 52078 Aachen, Germany.
| | - Giulia Rossetti
- Institute for Advanced Simulations (IAS)-5/Institute for Neuroscience and Medicine (INM)-9, Forschungszentrum Jülich, 52428 Jülich, Germany.
- Jülich Supercomputing Center (JSC), Forschungszentrum Jülich, 52428 Jülich, Germany.
- University Hospital Aachen, RWTH Aachen University, 52078 Aachen, Germany.
| |
Collapse
|
11
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
12
|
EpiSweep: Computationally Driven Reengineering of Therapeutic Proteins to Reduce Immunogenicity While Maintaining Function. Methods Mol Biol 2017; 1529:375-398. [PMID: 27914063 DOI: 10.1007/978-1-4939-6637-0_20] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Therapeutic proteins are yielding ever more advanced and efficacious new drugs, but the biological origins of these highly effective therapeutics render them subject to immune surveillance within the patient's body. When recognized by the immune system as a foreign agent, protein drugs elicit a coordinated response that can manifest a range of clinical complications including rapid drug clearance, loss of functionality and efficacy, delayed infusion-like allergic reactions, more serious anaphylactic shock, and even induced auto-immunity. It is thus often necessary to deimmunize an exogenous protein in order to enable its clinical application; critically, the deimmunization process must also maintain the desired therapeutic activity.To meet the growing need for effective, efficient, and broadly applicable protein deimmunization technologies, we have developed the EpiSweep suite of protein design algorithms. EpiSweep seamlessly integrates computational prediction of immunogenic T cell epitopes with sequence- or structure-based assessment of the impacts of mutations on protein stability and function, in order to select combinations of mutations that make Pareto optimal trade-offs between the competing goals of low immunogenicity and high-level function. The methods are applicable both to the design of individual functionally deimmunized variants as well as the design of combinatorial libraries enriched in functionally deimmunized variants. After validating EpiSweep in a series of retrospective case studies providing comparisons to conventional approaches to T cell epitope deletion, we have experimentally demonstrated it to be highly effective in prospective application to deimmunization of a number of different therapeutic candidates. We conclude that our broadly applicable computational protein design algorithms guide the engineer towards the most promising deimmunized therapeutic candidates, and thereby have the potential to accelerate development of new protein drugs by shortening time frames and improving hit rates.
Collapse
|
13
|
Baker FN, Porollo A. CoeViz: a web-based tool for coevolution analysis of protein residues. BMC Bioinformatics 2016; 17:119. [PMID: 26956673 PMCID: PMC4782369 DOI: 10.1186/s12859-016-0975-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 03/01/2016] [Indexed: 11/30/2022] Open
Abstract
Background Proteins generally perform their function in a folded state. Residues forming an active site, whether it is a catalytic center or interaction interface, are frequently distant in a protein sequence. Hence, traditional sequence-based prediction methods focusing on a single residue (or a short window of residues) at a time may have difficulties in identifying and clustering the residues constituting a functional site, especially when a protein has multiple functions. Evolutionary information encoded in multiple sequence alignments is known to greatly improve sequence-based predictions. Identification of coevolving residues further advances the protein structure and function annotation by revealing cooperative pairs and higher order groupings of residues. Results We present a new web-based tool (CoeViz) that provides a versatile analysis and visualization of pairwise coevolution of amino acid residues. The tool computes three covariance metrics: mutual information, chi-square statistic, Pearson correlation, and one conservation metric: joint Shannon entropy. Implemented adjustments of covariance scores include phylogeny correction, corrections for sequence dissimilarity and alignment gaps, and the average product correction. Visualization of residue relationships is enhanced by hierarchical cluster trees, heat maps, circular diagrams, and the residue highlighting in protein sequence and 3D structure. Unlike other existing tools, CoeViz is not limited to analyzing conserved domains or protein families and can process long, unstructured and multi-domain proteins thousands of residues long. Two examples are provided to illustrate the use of the tool for identification of residues (1) involved in enzymatic function, (2) forming short linear functional motifs, and (3) constituting a structural domain. Conclusions CoeViz represents a practical resource for a quick sequence-based protein annotation for molecular biologists, e.g., for identifying putative functional clusters of residues and structural domains. CoeViz also can serve computational biologists as a resource of coevolution matrices, e.g., for developing machine learning-based prediction models. The presented tool is integrated in the POLYVIEW-2D server (http://polyview.cchmc.org/) and available from resulting pages of POLYVIEW-2D.
Collapse
Affiliation(s)
- Frazier N Baker
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA. .,Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| | - Aleksey Porollo
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA. .,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
14
|
Identification of residues in ABCG2 affecting protein trafficking and drug transport, using co-evolutionary analysis of ABCG sequences. Biosci Rep 2015; 35:BSR20150150. [PMID: 26294421 PMCID: PMC4613716 DOI: 10.1042/bsr20150150] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 07/17/2015] [Indexed: 12/31/2022] Open
Abstract
ABCG2 is an ABC (ATP-binding cassette) transporter with a physiological role in urate transport in the kidney and is also implicated in multi-drug efflux from a number of organs in the body. The trafficking of the protein and the mechanism by which it recognizes and transports diverse drugs are important areas of research. In the current study, we have made a series of single amino acid mutations in ABCG2 on the basis of sequence analysis. Mutant isoforms were characterized for cell surface expression and function. One mutant (I573A) showed disrupted glycosylation and reduced trafficking kinetics. In contrast with many ABC transporter folding mutations which appear to be 'rescued' by chemical chaperones or low temperature incubation, the I573A mutation was not enriched at the cell surface by either treatment, with the majority of the protein being retained in the endoplasmic reticulum (ER). Two other mutations (P485A and M549A) showed distinct effects on transport of ABCG2 substrates reinforcing the role of TM helix 3 in drug recognition and transport and indicating the presence of intracellular coupling regions in ABCG2.
Collapse
|
15
|
Abstract
Recent developments in the analysis of amino acid covariation are leading to breakthroughs in protein structure prediction, protein design, and prediction of the interactome. It is assumed that observed patterns of covariation are caused by molecular coevolution, where substitutions at one site affect the evolutionary forces acting at neighboring sites. Our theoretical and empirical results cast doubt on this assumption. We demonstrate that the strongest coevolutionary signal is a decrease in evolutionary rate and that unfeasibly long times are required to produce coordinated substitutions. We find that covarying substitutions are mostly found on different branches of the phylogenetic tree, indicating that they are independent events that may or may not be attributable to coevolution. These observations undermine the hypothesis that molecular coevolution is the primary cause of the covariation signal. In contrast, we find that the pairs of residues with the strongest covariation signal tend to have low evolutionary rates, and that it is this low rate that gives rise to the covariation signal. Slowly evolving residue pairs are disproportionately located in the protein’s core, which explains covariation methods’ ability to detect pairs of residues that are close in three dimensions. These observations lead us to propose the “coevolution paradox”: The strength of coevolution required to cause coordinated changes means the evolutionary rate is so low that such changes are highly unlikely to occur. As modern covariation methods may lead to breakthroughs in structural genomics, it is critical to recognize their biases and limitations.
Collapse
Affiliation(s)
- David Talavera
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon C Lovell
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom
| | - Simon Whelan
- Faculty of Life Sciences, University of Manchester, Manchester, United Kingdom Evolutionary Biology Centre, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
16
|
Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014; 15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to. RESULTS To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de. CONCLUSIONS Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
17
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
18
|
Bordner AJ, Mittelmann HD. A new formulation of protein evolutionary models that account for structural constraints. Mol Biol Evol 2013; 31:736-49. [PMID: 24307688 DOI: 10.1093/molbev/mst240] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Despite the importance of a thermodynamically stable structure with a conserved fold for protein function, almost all evolutionary models neglect site-site correlations that arise from physical interactions between neighboring amino acid sites. This is mainly due to the difficulty in formulating a computationally tractable model since rate matrices can no longer be used. Here, we introduce a general framework, based on factor graphs, for constructing probabilistic models of protein evolution with site interdependence. Conveniently, efficient approximate inference algorithms, such as Belief Propagation, can be used to calculate likelihoods for these models. We fit an amino acid substitution model of this type that accounts for both solvent accessibility and site-site correlations. Comparisons of the new model with rate matrix models and alternative structure-dependent models demonstrate that it better fits the sequence data. We also examine evolution within a family of homohexameric enzymes and find that site-site correlations between most contacting subunits contribute to a higher likelihood. In addition, we show that the new substitution model has a similar mathematical form to the one introduced in Rodrigue et al. (Rodrigue N, Lartillot N, Bryant D, Philippe H. 2005. Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207-217), although with different parameter interpretations and values. We also perform a statistical analysis of the effects of amino acids at neighboring sites on substitution probabilities and find a significant perturbation of most probabilities, further supporting the significant role of site-site interactions in protein evolution and motivating the development of new evolutionary models similar to the one described here. Finally, we discuss possible extensions and applications of the new substitution model.
Collapse
|
19
|
Ollikainen N, Kortemme T. Computational protein design quantifies structural constraints on amino acid covariation. PLoS Comput Biol 2013; 9:e1003313. [PMID: 24244128 PMCID: PMC3828131 DOI: 10.1371/journal.pcbi.1003313] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/20/2013] [Indexed: 02/02/2023] Open
Abstract
Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This covariation can arise from multiple factors, including selective pressures for maintaining protein structure, requirements imposed by a specific function, or from phylogenetic sampling bias. Here we employed flexible backbone computational protein design to quantify the extent to which protein structure has constrained amino acid covariation for 40 diverse protein domains. We find significant similarities between the amino acid covariation in alignments of natural protein sequences and sequences optimized for their structures by computational protein design methods. These results indicate that the structural constraints imposed by protein architecture play a dominant role in shaping amino acid covariation and that computational protein design methods can capture these effects. We also find that the similarity between natural and designed covariation is sensitive to the magnitude and mechanism of backbone flexibility used in computational protein design. Our results thus highlight the necessity of including backbone flexibility to correctly model precise details of correlated amino acid changes and give insights into the pressures underlying these correlations. Proteins generally fold into specific three-dimensional structures to perform their cellular functions, and the presence of misfolded proteins is often deleterious for cellular and organismal fitness. For these reasons, maintenance of protein structure is thought to be one of the major fitness pressures acting on proteins. Consequently, the sequences of today's naturally occurring proteins contain signatures reflecting the constraints imposed by protein structure. Here we test the ability of computational protein design methods to recapitulate and explain these signatures. We focus on the physical basis of evolutionary pressures that act on interactions between amino acids in folded proteins, which are critical in determining protein structure and function. Such pressures can be observed from the appearance of amino acid covariation, where the amino acids at certain positions in protein sequences are correlated with each other. We find similar patterns of amino acid covariation in natural sequences and sequences optimized for their structures using computational protein design, demonstrating the importance of structural constraints in protein molecular evolution and providing insights into the structural mechanisms leading to covariation. In addition, these results characterize the ability of computational methods to model the precise details of correlated amino acid changes, which is critical for engineering new proteins with useful functions beyond those seen in nature.
Collapse
Affiliation(s)
- Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Science, University of California San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
20
|
Seeliger D. Development of scoring functions for antibody sequence assessment and optimization. PLoS One 2013; 8:e76909. [PMID: 24204701 PMCID: PMC3804498 DOI: 10.1371/journal.pone.0076909] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 08/26/2013] [Indexed: 12/27/2022] Open
Abstract
Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity.
Collapse
Affiliation(s)
- Daniel Seeliger
- Departement of Lead Identification and Optimization Support, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach/Riss, Germany
- * E-mail:
| |
Collapse
|
21
|
The Global Sequence Signature algorithm unveils a structural network surrounding heavy chain CDR3 loop in Camelidae variable domains. Biochim Biophys Acta Gen Subj 2013; 1830:3373-81. [DOI: 10.1016/j.bbagen.2013.02.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Revised: 02/13/2013] [Accepted: 02/15/2013] [Indexed: 11/16/2022]
|
22
|
Proctor EA, Kota P, Demarest SJ, Caravella JA, Dokholyan NV. Highly covarying residues have a functional role in antibody constant domains. Proteins 2013; 81:884-95. [PMID: 23280585 DOI: 10.1002/prot.24247] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 12/05/2012] [Accepted: 12/14/2012] [Indexed: 01/25/2023]
Abstract
The ability to generate and design antibodies recognizing specific targets has revolutionized the pharmaceutical industry and medical imaging. Engineering antibody therapeutics in some cases requires modifying their constant domains to enable new and altered interactions. Engineering novel specificities into antibody constant domains has proved challenging due to the complexity of inter-domain interactions. Covarying networks of residues that tend to cluster on the protein surface and near binding sites have been identified in some proteins. However, the underlying role these networks play in the protein resulting in their conservation remains unclear in most cases. Resolving their role is crucial, because residues in these networks are not viable design targets if their role is to maintain the fold of the protein. Conversely, these networks of residues are ideal candidates for manipulating specificity if they are primarily involved in binding, such as the myriad interdomain interactions maintained within antibodies. Here, we identify networks of evolutionarily-related residues in C-class antibody domains by evaluating covariation, a measure of propensity with which residue pairs vary dependently during evolution. We computationally test whether mutation of residues in these networks affects stability of the folded antibody domain, determining their viability as design candidates. We find that members of covarying networks cluster at domain-domain interfaces, and that mutations to these residues are diverse and frequent during evolution, precluding their importance to domain stability. These results indicate that networks of covarying residues exist in antibody domains for functional reasons unrelated to thermodynamic stability, making them ideal targets for antibody design.
Collapse
Affiliation(s)
- Elizabeth A Proctor
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599-7260, USA
| | | | | | | | | |
Collapse
|
23
|
Ashenberg O, Laub MT. Using analyses of amino Acid coevolution to understand protein structure and function. Methods Enzymol 2013; 523:191-212. [PMID: 23422431 DOI: 10.1016/b978-0-12-394292-0.00009-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Determining which residues of a protein contribute to a specific function is a difficult problem. Analyses of amino acid covariation within a protein family can serve as a useful guide by identifying residues that are functionally coupled. Covariation analyses have been successfully used on several different protein families to identify residues that work together to promote folding, enable protein-protein interactions, or contribute to an enzymatic activity. Covariation is a statistical signal that can be measured in a multiple sequence alignment of homologous proteins. As sequence databases have expanded dramatically, covariation analyses have become easier and more powerful. In this chapter, we describe how functional covariation arises during the evolution of proteins and how this signal can be distinguished from various background signals. We discuss the basic methodology for performing amino acid covariation analysis, using bacterial two-component signal transduction proteins as an example. We provide practical suggestions for each step of the process including assembly of protein sequences, construction of a multiple sequence alignment, measurement of covariation, and analysis of results.
Collapse
Affiliation(s)
- Orr Ashenberg
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | |
Collapse
|
24
|
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model. PLoS One 2012; 7:e43847. [PMID: 22937107 PMCID: PMC3427247 DOI: 10.1371/journal.pone.0043847] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 07/26/2012] [Indexed: 11/26/2022] Open
Abstract
Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.
Collapse
|
25
|
Kalinina OV, Oberwinkler H, Glass B, Kräusslich HG, Russell RB, Briggs JAG. Computational identification of novel amino-acid interactions in HIV Gag via correlated evolution. PLoS One 2012; 7:e42468. [PMID: 22879995 PMCID: PMC3411748 DOI: 10.1371/journal.pone.0042468] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 07/09/2012] [Indexed: 12/31/2022] Open
Abstract
Pairs of amino acid positions that evolve in a correlated manner are proposed to play important roles in protein structure or function. Methods to detect them might fare better with families for which sequences of thousands of closely related homologs are available than families with only a few distant relatives. We applied co-evolution analysis to thousands of sequences of HIV Gag, finding that the most significantly co-evolving positions are proximal in the quaternary structures of the viral capsid. A reduction in infectivity caused by mutating one member of a significant pair could be rescued by a compensatory mutation of the other.
Collapse
Affiliation(s)
- Olga V. Kalinina
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - Heike Oberwinkler
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Bärbel Glass
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Hans-Georg Kräusslich
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Robert B. Russell
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - John A. G. Briggs
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
26
|
Dietrich S, Borst N, Schlee S, Schneider D, Janda JO, Sterner R, Merkl R. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. Biochemistry 2012; 51:5633-41. [PMID: 22737967 DOI: 10.1021/bi300747r] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.
Collapse
Affiliation(s)
- Susanne Dietrich
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany
| | | | | | | | | | | | | |
Collapse
|
27
|
Mendoza JL, Schmidt A, Li Q, Nuvaga E, Barrett T, Bridges RJ, Feranchak AP, Brautigam CA, Thomas PJ. Requirements for efficient correction of ΔF508 CFTR revealed by analyses of evolved sequences. Cell 2012; 148:164-74. [PMID: 22265409 DOI: 10.1016/j.cell.2011.11.023] [Citation(s) in RCA: 224] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 10/20/2011] [Accepted: 11/03/2011] [Indexed: 12/14/2022]
Abstract
Misfolding of ΔF508 cystic fibrosis (CF) transmembrane conductance regulator (CFTR) underlies pathology in most CF patients. F508 resides in the first nucleotide-binding domain (NBD1) of CFTR near a predicted interface with the fourth intracellular loop (ICL4). Efforts to identify small molecules that restore function by correcting the folding defect have revealed an apparent efficacy ceiling. To understand the mechanistic basis of this obstacle, positions statistically coupled to 508, in evolved sequences, were identified and assessed for their impact on both NBD1 and CFTR folding. The results indicate that both NBD1 folding and interaction with ICL4 are altered by the ΔF508 mutation and that correction of either individual process is only partially effective. By contrast, combination of mutations that counteract both defects restores ΔF508 maturation and function to wild-type levels. These results provide a mechanistic rationale for the limited efficacy of extant corrector compounds and suggest approaches for identifying compounds that correct both defective steps.
Collapse
Affiliation(s)
- Juan L Mendoza
- Molecular Biophysics Program, and Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9040, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
Coordinated variation among positions in amino acid sequence alignments can reveal genetic dependencies at noncontiguous positions, but methods to assess these interactions are incompletely developed. Previously, we found genome-wide networks of covarying residue positions in the hepatitis C virus genome (R. Aurora, M. J. Donlin, N. A. Cannon, and J. E. Tavis, J. Clin. Invest. 119:225-236, 2009). Here, we asked whether such networks are present in a diverse set of viruses and, if so, what they may imply about viral biology. Viral sequences were obtained for 16 viruses in 13 species from 9 families. The entire viral coding potential for each virus was aligned, all possible amino acid covariances were identified using the observed-minus-expected-squared algorithm at a false-discovery rate of ≤1%, and networks of covariances were assessed using standard methods. Covariances that spanned the viral coding potential were common in all viruses. In all cases, the covariances formed a single network that contained essentially all of the covariances. The hepatitis C virus networks had hub-and-spoke topologies, but all other networks had random topologies with an unusually large number of highly connected nodes. These results indicate that genome-wide networks of genetic associations and the coordinated evolution they imply are very common in viral genomes, that the networks rarely have the hub-and-spoke topology that dominates other biological networks, and that network topologies can vary substantially even within a given viral group. Five examples with hepatitis B virus and poliovirus are presented to illustrate how covariance network analysis can lead to inferences about viral biology.
Collapse
|
29
|
Wang LY. COVARIATION ANALYSIS OF LOCAL AMINO ACID SEQUENCES IN RECURRENT PROTEIN LOCAL STRUCTURES. J Bioinform Comput Biol 2011; 3:1391-409. [PMID: 16374913 DOI: 10.1142/s0219720005001648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 07/10/2005] [Accepted: 09/07/2005] [Indexed: 11/18/2022]
Abstract
Local structural information is supposed to be frequently encoded in local amino acid sequences. Previous research only indicated that some local structure positions have specific residue preferences in some particular local structures. However, correlated pairwise replacements for interacting residues in recurrent local structural motifs from unrelated proteins have not been studied systematically. We introduced a new method fusing statistical covariation analysis and local structure-based alignment. Systematic analysis of structure-based multiple alignments of recurrent local structures from unrelated proteins in representative subset of Protein Databank indicates that covarying residue pairs with statistical significance exist in local structural motifs, in particular β-turns and helix caps. These residue pairs are mostly linked through polar functional groups with direct or indirect hydrogen bonding. Hydrophobic interaction is also a major factor in constraining pairwise amino acid residue replacement in recurrent local structures. We also found correlated residue pairs that are not clearly linked with through-space interactions. The physical constrains underlying these covariations are less clear. Overall, covarying residue pairs with statistical significance exist in local structures from unrelated proteins. The existence of sequence covariations in local structural motifs from unrelated proteins indicates that many relics of local relations are still retained in the tertiary structures after protein folding. It supports the notion that some local structural information is encoded in local sequences and the local structural codes could play important roles in determining native state protein folding topology.
Collapse
Affiliation(s)
- Lu-Yong Wang
- Integrated Data Systems Department, Siemens Corporate Research and Center for Computational Biology & Bioingormatics, Columbia University, 755, College Road East, Princeton, New Jersey 08540, USA.
| |
Collapse
|
30
|
Dutheil JY. Detecting coevolving positions in a molecule: why and how to account for phylogeny. Brief Bioinform 2011; 13:228-43. [PMID: 21949241 DOI: 10.1093/bib/bbr048] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Positions in a molecule that share a common constraint do not evolve independently, and therefore leave a signature in the patterns of homologous sequences. Exhibiting such positions with a coevolution pattern from a sequence alignment has great potential for predicting functional and structural properties of molecules through comparative analysis. This task is complicated by the existence of additional correlation sources, leading to false predictions. The nature of the data is a major source of noise correlation: sequences are taken from individuals with different degrees of relatedness, and who therefore are intrinsically correlated. This has led to several method developments in different fields that are potentially confusing for non-expert users interested in these methodologies. It also explains why coevolution detection methods are largely unemployed despite the importance of the biological questions they address. In this article, I focus on the role of shared ancestry for understanding molecular coevolution patterns. I review and classify existing coevolution detection methods according to their ability to handle shared ancestry. Using a ribosomal RNA benchmark data set, for which detailed knowledge of the structure and coevolution patterns is available, I demonstrate and explain why taking the underlying evolutionary history of sequences into account is the only way to extract the full coevolution signal in the data. I also evaluate, using rigorous statistical procedures, the best approaches to do so, and discuss several important biological aspects to consider when performing coevolution analyses.
Collapse
Affiliation(s)
- Julien Y Dutheil
- Institut des Sciences de l'Evolution - Montpellier (I.S.E.-M.) Unité Mixte de Recherche UMII - CNRS (UMR 5554) Université de Montpellier II - CC 065 34095 Montpellier Cedex 05.
| |
Collapse
|
31
|
Yip KY, Utz L, Sitwell S, Hu X, Sidhu SS, Turk BE, Gerstein M, Kim PM. Identification of specificity determining residues in peptide recognition domains using an information theoretic approach applied to large-scale binding maps. BMC Biol 2011; 9:53. [PMID: 21835011 PMCID: PMC3224579 DOI: 10.1186/1741-7007-9-53] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 08/11/2011] [Indexed: 01/06/2023] Open
Abstract
Background Peptide Recognition Domains (PRDs) are commonly found in signaling proteins. They mediate protein-protein interactions by recognizing and binding short motifs in their ligands. Although a great deal is known about PRDs and their interactions, prediction of PRD specificities remains largely an unsolved problem. Results We present a novel approach to identifying these Specificity Determining Residues (SDRs). Our algorithm generalizes earlier information theoretic approaches to coevolution analysis, to become applicable to this problem. It leverages the growing wealth of binding data between PRDs and large numbers of random peptides, and searches for PRD residues that exhibit strong evolutionary covariation with some positions of the statistical profiles of bound peptides. The calculations involve only information from sequences, and thus can be applied to PRDs without crystal structures. We applied the approach to PDZ, SH3 and kinase domains, and evaluated the results using both residue proximity in co-crystal structures and verified binding specificity maps from mutagenesis studies. Discussion Our predictions were found to be strongly correlated with the physical proximity of residues, demonstrating the ability of our approach to detect physical interactions of the binding partners. Some high-scoring pairs were further confirmed to affect binding specificity using previous experimental results. Combining the covariation results also allowed us to predict binding profiles with higher reliability than two other methods that do not explicitly take residue covariation into account. Conclusions The general applicability of our approach to the three different domain families demonstrated in this paper suggests its potential in predicting binding targets and assisting the exploration of binding mechanisms.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Use of mutual information arrays to predict coevolving sites in the full length HIV gp120 protein for subtypes B and C. Virol Sin 2011; 26:95-104. [PMID: 21468932 DOI: 10.1007/s12250-011-3188-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 02/22/2011] [Indexed: 10/18/2022] Open
Abstract
It is well established that different sites within a protein evolve at different rates according to their role within the protein; identification of these correlated mutations can aid in tasks such as ab initio protein structure, structure function analysis or sequence alignment. Mutual Information is a standard measure for coevolution between two sites but its application is limited by signal to noise ratio. In this work we report a preliminary study to investigate whether larger sequence sets could circumvent this problem by calculating mutual information arrays for two sets of drug naïve sequences from the HIV gp120 protein for the B and C subtypes. Our results suggest that while the larger sequences sets can improve the signal to noise ratio, the gain is offset by the high mutation rate of the HIV virus which makes it more difficult to achieve consistent alignments. Nevertheless, we were able to predict a number of coevolving sites that were supported by previous experimental studies as well as a region close to the C terminal of the protein that was highly variable in the C subtype but highly conserved in the B subtype.
Collapse
|
33
|
Castaño A, Ruiz L, Elena SF, Hernández C. Population differentiation and selective constraints in Pelargonium line pattern virus. Virus Res 2011; 155:274-82. [DOI: 10.1016/j.virusres.2010.10.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Revised: 09/23/2010] [Accepted: 10/16/2010] [Indexed: 12/23/2022]
|
34
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
Affiliation(s)
- Andreas Kowarsch
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Angelika Fuchs
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
35
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 PMCID: PMC3058783 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Hyung Rae Kim
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| |
Collapse
|
36
|
Xu Y, Tillier ERM. Regional covariation and its application for predicting protein contact patches. Proteins 2010; 78:548-58. [PMID: 19768681 DOI: 10.1002/prot.22576] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.
Collapse
Affiliation(s)
- Yongbai Xu
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
37
|
Kerr ID, Jones PM, George AM. Multidrug efflux pumps: the structures of prokaryotic ATP-binding cassette transporter efflux pumps and implications for our understanding of eukaryotic P-glycoproteins and homologues. FEBS J 2009; 277:550-63. [PMID: 19961540 DOI: 10.1111/j.1742-4658.2009.07486.x] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
One of the Holy Grails of ATP-binding cassette transporter research is a structural understanding of drug binding and transport in a eukaryotic multidrug resistance pump. These transporters are front-line mediators of drug resistance in cancers and represent an important therapeutic target in future chemotherapy. Although there has been intensive biochemical research into the human multidrug pumps, their 3D structure at atomic resolution remains unknown. The recent determination of the structure of a mouse P-glycoprotein at subatomic resolution is complemented by structures for a number of prokaryotic homologues. These structures have provided advances into our knowledge of the ATP-binding cassette exporter structure and mechanism, and have provided the template data for a number of homology modelling studies designed to reconcile biochemical data on these clinically important proteins.
Collapse
Affiliation(s)
- Ian D Kerr
- School of Biomedical Sciences, University of Nottingham, Nottingham, UK.
| | | | | |
Collapse
|
38
|
Protein sectors: evolutionary units of three-dimensional structure. Cell 2009; 138:774-86. [PMID: 19703402 DOI: 10.1016/j.cell.2009.07.038] [Citation(s) in RCA: 526] [Impact Index Per Article: 32.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Revised: 07/03/2009] [Accepted: 07/30/2009] [Indexed: 11/23/2022]
Abstract
Proteins display a hierarchy of structural features at primary, secondary, tertiary, and higher-order levels, an organization that guides our current understanding of their biological properties and evolutionary origins. Here, we reveal a structural organization distinct from this traditional hierarchy by statistical analysis of correlated evolution between amino acids. Applied to the S1A serine proteases, the analysis indicates a decomposition of the protein into three quasi-independent groups of correlated amino acids that we term "protein sectors." Each sector is physically connected in the tertiary structure, has a distinct functional role, and constitutes an independent mode of sequence divergence in the protein family. Functionally relevant sectors are evident in other protein families as well, suggesting that they may be general features of proteins. We propose that sectors represent a structural organization of proteins that reflects their evolutionary histories.
Collapse
|
39
|
Wang N, Smith WF, Miller BR, Aivazian D, Lugovskoy AA, Reff ME, Glaser SM, Croner LJ, Demarest SJ. Conserved amino acid networks involved in antibody variable domain interactions. Proteins 2009; 76:99-114. [PMID: 19089973 DOI: 10.1002/prot.22319] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Engineered antibodies are a large and growing class of protein therapeutics comprising both marketed products and many molecules in clinical trials in various disease indications. We investigated naturally conserved networks of amino acids that support antibody V(H) and V(L) function, with the goal of generating information to assist in the engineering of robust antibody or antibody-like therapeutics. We generated a large and diverse sequence alignment of V-class Ig-folds, of which V(H) and V(L) domains are family members. To identify conserved amino acid networks, covariations between residues at all possible position pairs were quantified as correlation coefficients (phi-values). We provide rosters of the key conserved amino acid pairs in antibody V(H) and V(L) domains, for reference and use by the antibody research community. The majority of the most strongly conserved amino acid pairs in V(H) and V(L) are at or adjacent to the V(H)-V(L) interface suggesting that the ability to heterodimerize is a constraining feature of antibody evolution. For the V(H) domain, but not the V(L) domain, residue pairs at the variable-constant domain interface (V(H)-C(H)1 interface) are also strongly conserved. The same network of conserved V(H) positions involved in interactions with both the V(L) and C(H)1 domains is found in camelid V(HH) domains, which have evolved to lack interactions with V(L) and C(H)1 domains in their mature structures; however, the amino acids at these positions are different, reflecting their different function. Overall, the data describe naturally occurring amino acid networks in antibody Fv regions that can be referenced when designing antibodies or antibody-like fragments with the goal of improving their biophysical properties.
Collapse
Affiliation(s)
- Norman Wang
- Biogen Idec, San Diego, California 92122, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Stollar EJ, Garcia B, Chong PA, Rath A, Lin H, Forman-Kay JD, Davidson AR. Structural, functional, and bioinformatic studies demonstrate the crucial role of an extended peptide binding site for the SH3 domain of yeast Abp1p. J Biol Chem 2009; 284:26918-27. [PMID: 19590096 DOI: 10.1074/jbc.m109.028431] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
SH3 domains, which are among the most frequently occurring protein interaction modules in nature, bind to peptide targets ranging in length from 7 to more than 25 residues. Although the bulk of studies on the peptide binding properties of SH3 domains have focused on interactions with relatively short peptides (less than 10 residues), a number of domains have been recently shown to require much longer sequences for optimal binding affinity. To gain greater insight into the binding mechanism and biological importance of interactions between an SH3 domain and extended peptide sequences, we have investigated interactions of the yeast Abp1p SH3 domain (AbpSH3) with several physiologically relevant 17-residue target peptide sequences. To obtain a molecular model for AbpSH3 interactions, we solved the structure of the AbpSH3 bound to a target peptide from the yeast actin patch kinase, Ark1p. Peptide target complexes from binding partners Scp1p and Sjl2p were also characterized, revealing that the AbpSH3 uses a common extended interface for interaction with these peptides, despite K(d) values for these peptides ranging from 0.3 to 6 mum. Mutagenesis studies demonstrated that residues across the whole 17-residue binding site are important both for maximal in vitro binding affinity and for in vivo function. Sequence conservation analysis revealed that both the AbpSH3 and its extended target sequences are highly conserved across diverse fungal species as well as higher eukaryotes. Our data imply that the AbpSH3 must bind extended target sites to function efficiently inside the cell.
Collapse
Affiliation(s)
- Elliott J Stollar
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | | | | | | | | | | |
Collapse
|
41
|
Abstract
Covariation between sites can arise due to a common evolutionary history. At the same time, structure and function of proteins play significant role in evolvability of different sites that are not directly connected with the common ancestry. The nature of forces which cause residues to coevolve is still not thoroughly understood, it is especially not clear how coevolutionary processes are related to functional diversification within protein families. We analyzed both functional and structural factors that might cause covariation of specificity determinants and showed that they more often participate in coevolutionary relationships with each other and other sites compared with functional sites and those sites that are not under strong functional constraints. We also found that protein sites with higher number of coevolutionary connections with other sites have a tendency to evolve slower. Our results indicate that in some cases coevolutionary connections exist between specificity sites that are located far away in space but are under similar functional constraints. Such correlated changes and compensations can be realized through the stepwise coevolutionary processes which in turn can shed light on the mechanisms of functional diversification.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| |
Collapse
|
42
|
Noivirt-Brik O, Unger R, Horovitz A. Analysing the origin of long-range interactions in proteins using lattice models. BMC STRUCTURAL BIOLOGY 2009; 9:4. [PMID: 19178726 PMCID: PMC2670300 DOI: 10.1186/1472-6807-9-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Accepted: 01/29/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND Long-range communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether long-range interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding. RESULTS Using a computational version of double-mutant cycle analysis, we show that long-range interactions emerge in lattice models even though they are not an input feature of them. The coupling energy of both short- and long-range pairwise interactions is found to become more positive (destabilizing) in a linear fashion with increasing 'contact-frequency', an entropic term that corresponds to the fraction of states in the conformational ensemble of the sequence in which the pair of residues is in contact. A mathematical derivation of the linear dependence of the coupling energy on 'contact-frequency' is provided. CONCLUSION Our work shows how 'contact-frequency' should be taken into account in attempts to stabilize proteins by introducing (or stabilizing) contacts in the native state and/or through 'negative design' of non-native contacts.
Collapse
Affiliation(s)
- Orly Noivirt-Brik
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | |
Collapse
|
43
|
Zhang F, Zarrine-Afsar A, Al-Abdul-Wahid MS, Prosser RS, Davidson AR, Woolley GA. Structure-Based Approach to the Photocontrol of Protein Folding. J Am Chem Soc 2009; 131:2283-9. [PMID: 19170498 DOI: 10.1021/ja807938v] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Fuzhong Zhang
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| | - Arash Zarrine-Afsar
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| | - M. Sameer Al-Abdul-Wahid
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| | - R. Scott Prosser
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| | - Alan R. Davidson
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| | - G. Andrew Woolley
- Department of Chemistry, University of Toronto, 80 Saint George Street, Toronto M5S 3H6 Canada, Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, M5S 1A8, Canada, and Department of Chemistry, University of Toronto at Mississauga, 3359 Mississauga Road North, Mississauga, Ontario, L5L 1C6, Canada
| |
Collapse
|
44
|
Aurora R, Donlin MJ, Cannon NA, Tavis JE. Genome-wide hepatitis C virus amino acid covariance networks can predict response to antiviral therapy in humans. J Clin Invest 2008; 119:225-36. [PMID: 19104147 DOI: 10.1172/jci37085] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 10/22/2008] [Indexed: 12/17/2022] Open
Abstract
Hepatitis C virus (HCV) is a common RNA virus that causes hepatitis and liver cancer. Infection is treated with IFN-alpha and ribavirin, but this expensive and physically demanding therapy fails in half of patients. The genomic sequences of independent HCV isolates differ by approximately 10%, but the effects of this variation on the response to therapy are unknown. To address this question, we analyzed amino acid covariance within the full viral coding region of pretherapy HCV sequences from 94 participants in the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C (Virahep-C) clinical study. Covarying positions were common and linked together into networks that differed by response to therapy. There were 3-fold more hydrophobic amino acid pairs in HCV from nonresponding patients, and these hydrophobic interactions were predicted to contribute to failure of therapy by stabilizing viral protein complexes. Using our analysis to detect patterns within the networks, we could predict the outcome of therapy with greater than 95% coverage and 100% accuracy, raising the possibility of a prognostic test to reduce therapeutic failures. Furthermore, the hub positions in the networks are attractive antiviral targets because of their genetic linkage with many other positions that we predict would suppress evolution of resistant variants. Finally, covariance network analysis could be applicable to any virus with sufficient genetic variation, including most human RNA viruses.
Collapse
Affiliation(s)
- Rajeev Aurora
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA.
| | | | | | | |
Collapse
|
45
|
Kim SJ, Dumont C, Gruebele M. Simulation-based fitting of protein-protein interaction potentials to SAXS experiments. Biophys J 2008; 94:4924-31. [PMID: 18326645 PMCID: PMC2397344 DOI: 10.1529/biophysj.107.123240] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2007] [Accepted: 01/30/2008] [Indexed: 11/18/2022] Open
Abstract
We present a new method for computing interaction potentials of solvated proteins directly from small-angle x-ray scattering data. An ensemble of proteins is modeled by Monte Carlo or molecular dynamics simulation. The global x-ray scattering of the whole model ensemble is then computed at each snapshot of the simulation, and averaged to obtain the x-ray scattering intensity. Finally, the interaction potential parameters are adjusted by an optimization algorithm, and the procedure is iterated until the best agreement between simulation and experiment is obtained. This new approach obviates the need for approximations that must be made in simplified analytical models. We apply the method to lambda repressor fragment 6-85 and fyn-SH3. With the increased availability of fast computer clusters, Monte Carlo and molecular dynamics analysis using residue-level or even atomistic potentials may soon become feasible.
Collapse
Affiliation(s)
- Seung Joong Kim
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | | | | |
Collapse
|
46
|
Merkl R, Zwick M. H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 2008; 9:151. [PMID: 18366663 PMCID: PMC2323388 DOI: 10.1186/1471-2105-9-151] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 03/18/2008] [Indexed: 11/15/2022] Open
Abstract
Background A multiple sequence alignment (MSA) generated for a protein can be used to characterise residues by means of a statistical analysis of single columns. In addition to the examination of individual positions, the investigation of co-variation of amino acid frequencies offers insights into function and evolution of the protein and residues. Results We introduce conn(k), a novel parameter for the characterisation of individual residues. For each residue k, conn(k) is the number of most extreme signals of co-evolution. These signals were deduced from a normalised mutual information (MI) value U(k, l) computed for all pairs of residues k, l. We demonstrate that conn(k) is a more robust indicator than an individual MI-value for the prediction of residues most plausibly important for the evolution of a protein. This proposition was inferred by means of statistical methods. It was further confirmed by the analysis of several proteins. A server, which computes conn(k)-values is available at . Conclusion The algorithms H2r, which analyses MSAs and computes conn(k)-values, characterises a specific class of residues. In contrast to strictly conserved ones, these residues possess some flexibility in the composition of side chains. However, their allocation is sensibly balanced with several other positions, as indicated by conn(k).
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, D-93040 Regensburg, Germany.
| | | |
Collapse
|
47
|
Rubinstein R, Fiser A. Predicting disulfide bond connectivity in proteins by correlated mutations analysis. Bioinformatics 2008; 24:498-504. [PMID: 18203772 DOI: 10.1093/bioinformatics/btm637] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Prediction of disulfide bond connectivity facilitates structural and functional annotation of proteins. Previous studies suggest that cysteines of a disulfide bond mutate in a correlated manner. RESULTS We developed a method that analyzes correlated mutation patterns in multiple sequence alignments in order to predict disulfide bond connectivity. Proteins with known experimental structures and varying numbers of disulfide bonds, and that spanned various evolutionary distances, were aligned. We observed frequent variation of disulfide bond connectivity within members of the same protein families, and it was also observed that in 99% of the cases, cysteine pairs forming non-conserved disulfide bonds mutated in concert. Our data support the notion that substitution of a cysteine in a disulfide bond prompts the substitution of its cysteine partner and that oxidized cysteines appear in pairs. The method we developed predicts disulfide bond connectivity patterns with accuracies of 73, 69 and 61% for proteins with two, three and four disulfide bonds, respectively.
Collapse
Affiliation(s)
- Rotem Rubinstein
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA.
| | | |
Collapse
|
48
|
Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D. Co-evolving residues in membrane proteins. Bioinformatics 2007; 23:3312-9. [DOI: 10.1093/bioinformatics/btm515] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
49
|
Dunn S, Wahl L, Gloor G. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2007; 24:333-40. [DOI: 10.1093/bioinformatics/btm604] [Citation(s) in RCA: 363] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
50
|
Yip KY, Patel P, Kim PM, Engelman DM, McDermott D, Gerstein M. An integrated system for studying residue coevolution in proteins. Bioinformatics 2007; 24:290-2. [PMID: 18056067 DOI: 10.1093/bioinformatics/btm584] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Residue coevolution has recently emerged as an important concept, especially in the context of protein structures. While a multitude of different functions for quantifying it have been proposed, not much is known about their relative strengths and weaknesses. Also, subtle algorithmic details have discouraged implementing and comparing them. We addressed this issue by developing an integrated online system that enables comparative analyses with a comprehensive set of commonly used scoring functions, including Statistical Coupling Analysis (SCA), Explicit Likelihood of Subset Variation (ELSC), mutual information and correlation-based methods. A set of data preprocessing options are provided for improving the sensitivity and specificity of coevolution signal detection, including sequence weighting, residue grouping and the filtering of sequences, sites and site pairs. A total of more than 100 scoring variations are available. The system also provides facilities for studying the relationship between coevolution scores and inter-residue distances from a crystal structure if provided, which may help in understanding protein structures. AVAILABILITY The system is available at http://coevolution.gersteinlab.org. The source code and JavaDoc API can also be downloaded from the web site.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| | | | | | | | | | | |
Collapse
|