1
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
2
|
Verma S, Chakraborti S, Singh OP, Pande V, Dixit R, Pandey AV, Pandey KC. Recognition of fold- and function-specific sites in the ligand-binding domain of the thyroid hormone receptor-like family. Front Endocrinol (Lausanne) 2022; 13:981090. [PMID: 36246927 PMCID: PMC9559826 DOI: 10.3389/fendo.2022.981090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 09/12/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The thyroid hormone receptor-like (THR-like) family is the largest transcription factors family belonging to the nuclear receptor superfamily, which directly binds to DNA and regulates the gene expression and thereby controls various metabolic processes in a ligand-dependent manner. The THR-like family contains receptors THRs, RARs, VDR, PPARs, RORs, Rev-erbs, CAR, PXR, LXRs, and others. THR-like receptors are involved in many aspects of human health, including development, metabolism and homeostasis. Therefore, it is considered an important therapeutic target for various diseases such as osteoporosis, rickets, diabetes, etc. METHODS In this study, we have performed an extensive sequence and structure analysis of the ligand-binding domain (LBD) of the THR-like family spanning multiple taxa. We have use different computational tools (information-theoretic measures; relative entropy) to predict the key residues responsible for fold and functional specificity in the LBD of the THR-like family. The MSA of THR-like LBDs was further used as input in conservation studies and phylogenetic clustering studies. RESULTS Phylogenetic analysis of the LBD domain of THR-like proteins resulted in the clustering of eight subfamilies based on their sequence homology. The conservation analysis by relative entropy (RE) revealed that structurally important residues are conserved throughout the LBDs in the THR-like family. The multi-harmony conservation analysis further predicted specificity in determining residues in LBDs of THR-like subfamilies. Finally, fold and functional specificity determining residues (residues critical for ligand, DBD and coregulators binding) were mapped on the three-dimensional structure of thyroid hormone receptor protein. We then compiled a list of natural mutations in THR-like LBDs and mapped them along with fold and function-specific mutations. Some of the mutations were found to have a link with severe diseases like hypothyroidism, rickets, obesity, lipodystrophy, epilepsy, etc. CONCLUSION Our study identifies fold and function-specific residues in THR-like LBDs. We believe that this study will be useful in exploring the role of these residues in the binding of different drugs, ligands, and protein-protein interaction among partner proteins. So this study might be helpful in the rational design of either ligands or receptors.
Collapse
Affiliation(s)
- Sonia Verma
- Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
- Pediatric Endocrinology, Diabetology, and Metabolism, University Children’s Hospital, Bern, Switzerland
- Translational Hormone Research Cluster, Department of Biomedical Research, University of Bern, Bern, Switzerland
| | | | - Om P. Singh
- Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Veena Pande
- Kumaun University, Nainital, Uttrakhand, India
| | - Rajnikant Dixit
- Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
| | - Amit V. Pandey
- Pediatric Endocrinology, Diabetology, and Metabolism, University Children’s Hospital, Bern, Switzerland
- Translational Hormone Research Cluster, Department of Biomedical Research, University of Bern, Bern, Switzerland
| | - Kailash C. Pandey
- Parasite-Host Biology Group, ICMR-National Institute of Malaria Research, New Delhi, India
- Academy of Scientific and Innovative Research, Ghaziabad, Uttar Pradesh, India
| |
Collapse
|
3
|
Sirota FL, Maurer-Stroh S, Li Z, Eisenhaber F, Eisenhaber B. Functional Classification of Super-Large Families of Enzymes Based on Substrate Binding Pocket Residues for Biocatalysis and Enzyme Engineering Applications. Front Bioeng Biotechnol 2021; 9:701120. [PMID: 34409021 PMCID: PMC8366029 DOI: 10.3389/fbioe.2021.701120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/12/2021] [Indexed: 11/13/2022] Open
Abstract
Large enzyme families such as the groups of zinc-dependent alcohol dehydrogenases (ADHs), long chain alcohol oxidases (AOxs) or amine dehydrogenases (AmDHs) with, sometimes, more than one million sequences in the non-redundant protein database and hundreds of experimentally characterized enzymes are excellent cases for protein engineering efforts aimed at refining and modifying substrate specificity. Yet, the backside of this wealth of information is that it becomes technically difficult to rationally select optimal sequence targets as well as sequence positions for mutagenesis studies. In all three cases, we approach the problem by starting with a group of experimentally well studied family members (including those with available 3D structures) and creating a structure-guided multiple sequence alignment and a modified phylogenetic tree (aka binding site tree) based just on a selection of potential substrate binding residue positions derived from experimental information (not from the full-length sequence alignment). Hereupon, the remaining, mostly uncharacterized enzyme sequences can be mapped; as a trend, sequence grouping in the tree branches follows substrate specificity. We show that this information can be used in the target selection for protein engineering work to narrow down to single suitable sequences and just a few relevant candidate positions for directed evolution towards activity for desired organic compound substrates. We also demonstrate how to find the closest thermophile example in the dataset if the engineering is aimed at achieving most robust enzymes.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Zhi Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| |
Collapse
|
4
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
5
|
Prescher M, Bonus M, Stindt J, Keitel-Anselmino V, Smits SHJ, Gohlke H, Schmitt L. Evidence for a credit-card-swipe mechanism in the human PC floppase ABCB4. Structure 2021; 29:1144-1155.e5. [PMID: 34107287 DOI: 10.1016/j.str.2021.05.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/27/2021] [Accepted: 05/17/2021] [Indexed: 10/21/2022]
Abstract
ABCB4 is described as an ATP-binding cassette (ABC) transporter that primarily transports lipids of the phosphatidylcholine (PC) family but is also capable of translocating a subset of typical multidrug-resistance-associated drugs. The high degree of amino acid identity of 76% for ABCB4 and ABCB1, which is a prototype multidrug-resistance-mediating protein, results in ABCB4's second subset of substrates, which overlap with ABCB1's substrates. This often leads to incomplete annotations of ABCB4, in which it was described as exclusively PC-lipid specific. When the hydrophilic amino acids from ABCB4 are changed to the analogous but hydrophobic ones from ABCB1, the stimulation of ATPase activity by 1,2-dioleoyl-sn-glycero-3-phosphocholine, as a prime example of PC lipids, is strongly diminished, whereas the modulation capability of ABCB1 substrates remains unchanged. This indicates two distinct and autonomous substrate binding sites in ABCB4.
Collapse
Affiliation(s)
- Martin Prescher
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Michele Bonus
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jan Stindt
- Clinic for Gastroenterology, Hepatology and Infectious Diseases University Hospital Düsseldorf, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Verena Keitel-Anselmino
- Clinic for Gastroenterology, Hepatology and Infectious Diseases University Hospital Düsseldorf, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Sander H J Smits
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Center for Structural Studies, Heinrich Heine University Düsseldorf, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Lutz Schmitt
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
6
|
Hou Q, Stringer B, Waury K, Capel H, Haydarlou R, Xue F, Abeln S, Heringa J, Feenstra KA. SeRenDIP-CE: Sequence-based Interface Prediction for Conformational Epitopes. Bioinformatics 2021; 37:3421-3427. [PMID: 33974039 PMCID: PMC8136078 DOI: 10.1093/bioinformatics/btab321] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/26/2021] [Accepted: 04/26/2021] [Indexed: 11/21/2022] Open
Abstract
Motivation Antibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen’s epitope region, as a special type of protein–protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predict epitopes from sequence in order to focus time-consuming wet-lab experiments toward the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody. Results We collected and curated a high quality epitope dataset from the SAbDab database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody–antigen structure of the COVID19 virus spike receptor binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research. Availability and implementation Webserver, source code and datasets at www.ibi.vu.nl/programs/serendipwww/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Bas Stringer
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Katharina Waury
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Henriette Capel
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Reza Haydarlou
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National institute of health data science of China, Shandong University, Shandong 250002, P. R. China
| | - Sanne Abeln
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Jaap Heringa
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands.,AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam
| |
Collapse
|
7
|
Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
8
|
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020; 10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open
Abstract
Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
Collapse
|
9
|
van der Ree MH, Jansen L, Welkers MRA, Reesink HW, Feenstra KA, Kootstra NA. Deep sequencing identifies hepatitis B virus core protein signatures in chronic hepatitis B patients. Antiviral Res 2018; 158:213-225. [PMID: 30121196 DOI: 10.1016/j.antiviral.2018.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2018] [Revised: 08/10/2018] [Accepted: 08/13/2018] [Indexed: 11/16/2022]
Abstract
BACKGROUND We aimed to identify HBc amino acid differences between subgroups of chronic hepatitis B (CHB) patients. METHODS Deep sequencing of HBc was performed in samples of 89 CHB patients (42 HBeAg positive, 47 HBeAg negative). Amino acid types were compared using Sequence Harmony to identify subgroup specific sites between HBeAg-positive and -negative patients, and between patients with combined response and non-response to peginterferon/adefovir combination therapy. RESULTS We identified 54 positions in HBc where the frequency of appearing amino acids was significantly different between HBeAg-positive and -negative patients. In HBeAg negative patients, 22 positions in HBc were identified which differed between patients with treatment response and those with non-response. The fraction non-consensus sequence on selected positions was significantly higher in HBeAg-negative patients, and was negatively correlated with HBV DNA and HBsAg levels. CONCLUSIONS Sequence Harmony identified a number of amino acid changes associated with HBeAg-status and response to peginterferon/adefovir combination therapy.
Collapse
Affiliation(s)
- Meike H van der Ree
- Department of Gastroenterology and Hepatology, Academic Medical Center, Amsterdam, The Netherlands; Department of Experimental Immunology, Academic Medical Center, Amsterdam, The Netherlands
| | - Louis Jansen
- Department of Gastroenterology and Hepatology, Academic Medical Center, Amsterdam, The Netherlands; Department of Experimental Immunology, Academic Medical Center, Amsterdam, The Netherlands
| | - Matthijs R A Welkers
- Department of Medical Microbiology, Academic Medical Center, Amsterdam, The Netherlands
| | - Hendrik W Reesink
- Department of Gastroenterology and Hepatology, Academic Medical Center, Amsterdam, The Netherlands; Department of Experimental Immunology, Academic Medical Center, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Department of Computer Science, Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, The Netherlands
| | - Neeltje A Kootstra
- Department of Experimental Immunology, Academic Medical Center, Amsterdam, The Netherlands.
| |
Collapse
|
10
|
Kalaivani R, Reema R, Srinivasan N. Recognition of sites of functional specialisation in all known eukaryotic protein kinase families. PLoS Comput Biol 2018; 14:e1005975. [PMID: 29438395 PMCID: PMC5826538 DOI: 10.1371/journal.pcbi.1005975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 02/26/2018] [Accepted: 01/13/2018] [Indexed: 11/25/2022] Open
Abstract
The conserved function of protein phosphorylation, catalysed by members of protein kinase superfamily, is regulated in different ways in different kinase families. Further, differences in activating triggers, cellular localisation, domain architecture and substrate specificity between kinase families are also well known. While the transfer of γ-phosphate from ATP to the hydroxyl group of Ser/Thr/Tyr is mediated by a conserved Asp, the characteristic functional and regulatory sites are specialized at the level of families or sub-families. Such family-specific sites of functional specialization are unknown for most families of kinases. In this work, we systematically identify the family-specific residue features by comparing the extent of conservation of physicochemical properties, Shannon entropy and statistical probability of residue distributions between families of kinases. An integrated discriminatory score, which combines these three features, is developed to demarcate the functionally specialized sites in a kinase family from other sites. We achieved an area under ROC curve of 0.992 for the discrimination of kinase families. Our approach was extensively tested on well-studied families CDK and MAPK, wherein specific protein interaction sites and substrate recognition sites were successfully detected (p-value < 0.05). We also find that the known family-specific oncogenic driver mutation sites were scored high by our method. The method was applied to all known kinases encompassing 107 families from diverse eukaryotic organisms leading to a comprehensive list of family-specific functional sites. Apart from other uses, our method facilitates identification of specific protein interaction sites and drug target sites in a kinase family.
Collapse
Affiliation(s)
- Raju Kalaivani
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Raju Reema
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | |
Collapse
|
11
|
Neuwald AF, Aravind L, Altschul SF. Inferring joint sequence-structural determinants of protein functional specificity. eLife 2018; 7. [PMID: 29336305 PMCID: PMC5770160 DOI: 10.7554/elife.29880] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023] Open
Abstract
Residues responsible for allostery, cooperativity, and other subtle but functionally important interactions remain difficult to detect. To aid such detection, we employ statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. We identify such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues. Such interactions are derived either from atomic coordinates or from Direct Coupling Analysis scores, used as surrogates for structural distances. Applying this approach to N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases yielded results congruent with biochemical understanding of these proteins, and also revealed striking sequence-structural features overlooked by other methods. These and similar analyses can aid the design of drugs targeting allosteric sites.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States.,Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, United States
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| | - Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| |
Collapse
|
12
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
13
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria
- Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
14
|
Medvedev KE, Kolchanov NA, Afonnikov DA. Identification of residues of the archaeal RNA-binding Nip7 proteins specific to environmental conditions. J Bioinform Comput Biol 2017; 15:1650036. [DOI: 10.1142/s0219720016500360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The understanding of biological and molecular mechanisms providing survival of cells under extreme temperatures and pressures will help to answer fundamental questions related to the origin of life and to design of biotechnologically important enzymes with new properties. Here, we analyze amino acid sequences of the Nip7 proteins from 35 archaeal species to identify positions containing mutations specific to the hydrostatic pressure and temperature of organism’s habitat. The number of such positions related to pressure change is much lower than related to temperature change. The results suggest that adaptation to temperature changes of the Nip7 protein cause more pronounced modifications in sequence and structure, than to the pressure changes. Structural analysis of residues at these positions demonstrated their involvement in salt-bridge formation, which may reflect the importance of protein structure stabilization by salt-bridges at extreme environmental conditions.
Collapse
Affiliation(s)
- Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern, Medical Center, Dallas, Texas 75390, USA
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Nikolay A. Kolchanov
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- NRC Kurchatov Institute, Akademika Kurchatova pl., 1, Moscow 123182, Russia
- Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| | - Dmitry A. Afonnikov
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| |
Collapse
|
15
|
Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016; 12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open
Abstract
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, Baltimore, MD, United States of America
- * E-mail:
| | - Stephen F. Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
16
|
Sloutsky R, Naegle KM. High-Resolution Identification of Specificity Determining Positions in the LacI Protein Family Using Ensembles of Sub-Sampled Alignments. PLoS One 2016; 11:e0162579. [PMID: 27681038 PMCID: PMC5040260 DOI: 10.1371/journal.pone.0162579] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 08/08/2016] [Indexed: 01/24/2023] Open
Abstract
Since the advent of large-scale genomic sequencing, and the consequent availability of large numbers of homologous protein sequences, there has been burgeoning development of methods for extracting functional information from multiple sequence alignments (MSAs). One type of analysis seeks to identify specificity determining positions (SDPs) based on the assumption that such positions are highly conserved within groups of sequences sharing functional specificity, but conserved to different amino acids in different specificity groups. This unsupervised approach to utilizing evolutionary information may elucidate mechanisms of specificity in protein-protein interactions, catalytic activity of enzymes, sensitivity to allosteric regulation, and other types of protein functionality. We present an analysis of SDPs in the LacI family of transcriptional regulators in which we 1) relax the constraint that all specificity groups must contribute to SDP signal, and 2) use a novel approach to robust treatment of sequence alignment uncertainty based on sub-sampling. We find that the vast majority of SDP signal occurs at positions with a conservation pattern that significantly complicates detection by previously described methods. This pattern, which we term “partial SDP”, consists of the commonly accepted SDP conservation pattern among a subset of specificity groups and strong degeneracy among the rest. An upshot of this fact is that the SDP complement of every specificity group appears to be unique. Additionally, sub-sampling gives us the ability to assign a confidence interval to the SDP score, as well as increase fidelity, as compared to analysis of a single, comprehensive alignment—the current standard in multiple sequence alignment methodologies.
Collapse
Affiliation(s)
- Roman Sloutsky
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
| | - Kristen M. Naegle
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- * E-mail:
| |
Collapse
|
17
|
Karasev DA, Veselovsky AV, Oparina NY, Filimonov DA, Sobolev BN. Prediction of amino acid positions specific for functional groups in a protein family based on local sequence similarity. J Mol Recognit 2015; 29:159-69. [DOI: 10.1002/jmr.2515] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/28/2015] [Accepted: 09/30/2015] [Indexed: 01/24/2023]
Affiliation(s)
- Dmitry A. Karasev
- Russian National Research Medical University; Moscow Russia
- Laboratory of Structure-Function Based Drug Design; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Alexander V. Veselovsky
- Laboratory of Structure Bioinformatics; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Nina Yu. Oparina
- Department of Medical Biochemistry and Microbiology; Uppsala University; Uppsala Sweden
- Engelhardt Institute of Molecular Biology; Moscow Russia
| | - Dmitry A. Filimonov
- Laboratory of Structure Bioinformatics; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Boris N. Sobolev
- Laboratory of Structure-Function Based Drug Design; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| |
Collapse
|
18
|
Hou Q, Dutilh BE, Huynen MA, Heringa J, Feenstra KA. Sequence specificity between interacting and non-interacting homologs identifies interface residues--a homodimer and monomer use case. BMC Bioinformatics 2015; 16:325. [PMID: 26449222 PMCID: PMC4599308 DOI: 10.1186/s12859-015-0758-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 09/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Protein families participating in protein-protein interactions may contain sub-families that have different binding characteristics, ranging from right binding to showing no interaction at all. Composition differences at the sequence level in these sub-families are often decisive to their differential functional interaction. Methods to predict interface sites from protein sequences typically exploit conservation as a signal. Here, instead, we provide proof of concept that the sequence specificity between interacting versus non-interacting groups can be exploited to recognise interaction sites. Results We collected homodimeric and monomeric proteins and formed homologous groups, each having an interacting (homodimer) subgroup and a non-interacting (monomer) subgroup. We then compiled multiple sequence alignments of the proteins in the homologous groups and identified compositional differences between the homodimeric and monomeric subgroups for each of the alignment positions. Our results show that this specificity signal distinguishes interface and other surface residues with 40.9 % recall and up to 25.1 % precision. Conclusions To our best knowledge, this is the first large scale study that exploits sequence specificity between interacting and non-interacting homologs to predict interaction sites from sequence information only. The performance obtained indicates that this signal contains valuable information to identify protein-protein interaction sites. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0758-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qingzhen Hou
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands. .,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands. .,Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije University Amsterdam, De Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands.
| |
Collapse
|
19
|
Basis for substrate recognition and distinction by matrix metalloproteinases. Proc Natl Acad Sci U S A 2014; 111:E4148-55. [PMID: 25246591 DOI: 10.1073/pnas.1406134111] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Genomic sequencing and structural genomics produced a vast amount of sequence and structural data, creating an opportunity for structure-function analysis in silico [Radivojac P, et al. (2013) Nat Methods 10(3):221-227]. Unfortunately, only a few large experimental datasets exist to serve as benchmarks for function-related predictions. Furthermore, currently there are no reliable means to predict the extent of functional similarity among proteins. Here, we quantify structure-function relationships among three phylogenetic branches of the matrix metalloproteinase (MMP) family by comparing their cleavage efficiencies toward an extended set of phage peptide substrates that were selected from ∼ 64 million peptide sequences (i.e., a large unbiased representation of substrate space). The observed second-order rate constants [k(obs)] across the substrate space provide a distance measure of functional similarity among the MMPs. These functional distances directly correlate with MMP phylogenetic distance. There is also a remarkable and near-perfect correlation between the MMP substrate preference and sequence identity of 50-57 discontinuous residues surrounding the catalytic groove. We conclude that these residues represent the specificity-determining positions (SDPs) that allowed for the expansion of MMP proteolytic function during evolution. A transmutation of only a few selected SDPs proximal to the bound substrate peptide, and contributing the most to selectivity among the MMPs, is sufficient to enact a global change in the substrate preference of one MMP to that of another, indicating the potential for the rational and focused redesign of cleavage specificity in MMPs.
Collapse
|
20
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
21
|
Gijsbers EF, Feenstra KA, van Nuenen AC, Navis M, Heringa J, Schuitemaker H, Kootstra NA. HIV-1 replication fitness of HLA-B*57/58:01 CTL escape variants is restored by the accumulation of compensatory mutations in gag. PLoS One 2013; 8:e81235. [PMID: 24339913 PMCID: PMC3855271 DOI: 10.1371/journal.pone.0081235] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 10/10/2013] [Indexed: 11/30/2022] Open
Abstract
Expression of HLA-B*57 and the closely related HLA-B*58:01 are associated with prolonged survival after HIV-1 infection. However, large differences in disease course are observed among HLA-B*57/58:01 patients. Escape mutations in CTL epitopes restricted by these HLA alleles come at a fitness cost and particularly the T242N mutation in the TW10 CTL epitope in Gag has been demonstrated to decrease the viral replication capacity. Additional mutations within or flanking this CTL epitope can partially restore replication fitness of CTL escape variants. Five HLA-B*57/58:01 progressors and 5 HLA-B*57/58:01 long-term nonprogressors (LTNPs) were followed longitudinally and we studied which compensatory mutations were involved in the restoration of the viral fitness of variants that escaped from HLA-B*57/58:01-restricted CTL pressure. The Sequence Harmony algorithm was used to detect homology in amino acid composition by comparing longitudinal Gag sequences obtained from HIV-1 patients positive and negative for HLA-B*57/58:01 and from HLA-B*57/58:01 progressors and LTNPs. Although virus isolates from HLA-B*57/58:01 individuals contained multiple CTL escape mutations, these escape mutations were not associated with disease progression. In sequences from HLA-B*57/58:01 progressors, 5 additional mutations in Gag were observed: S126N, L215T, H219Q, M228I and N252H. The combination of these mutations restored the replication fitness of CTL escape HIV-1 variants. Furthermore, we observed a positive correlation between the number of escape and compensatory mutations in Gag and the replication fitness of biological HIV-1 variants isolated from HLA-B*57/58:01 patients, suggesting that the replication fitness of HLA-B*57/58:01 escape variants is restored by accumulation of compensatory mutations.
Collapse
Affiliation(s)
- Esther F. Gijsbers
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Ad C. van Nuenen
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Marjon Navis
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Hanneke Schuitemaker
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Neeltje A. Kootstra
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
22
|
Joshi A, Lee RTC, Mohl J, Sedano M, Khong WX, Ng OT, Maurer-Stroh S, Garg H. Genetic signatures of HIV-1 envelope-mediated bystander apoptosis. J Biol Chem 2013; 289:2497-514. [PMID: 24265318 DOI: 10.1074/jbc.m113.514018] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The envelope (Env) glycoprotein of HIV is an important determinant of viral pathogenesis. Several lines of evidence support the role of HIV-1 Env in inducing bystander apoptosis that may be a contributing factor in CD4(+) T cell loss. However, most of the studies testing this phenomenon have been conducted with laboratory-adapted HIV-1 isolates. This raises the question of whether primary Envs derived from HIV-infected patients are capable of inducing bystander apoptosis and whether specific Env signatures are associated with this phenomenon. We developed a high throughput assay to determine the bystander apoptosis inducing activity of a panel of primary Envs. We tested 38 different Envs for bystander apoptosis, virion infectivity, neutralizing antibody sensitivity, and putative N-linked glycosylation sites along with a comprehensive sequence analysis to determine if specific sequence signatures within the viral Env are associated with bystander apoptosis. Our studies show that primary Envs vary considerably in their bystander apoptosis-inducing potential, a phenomenon that correlates inversely with putative N-linked glycosylation sites and positively with virion infectivity. By use of a novel phylogenetic analysis that avoids subtype bias coupled with structural considerations, we found specific residues like Arg-476 and Asn-425 that were associated with differences in bystander apoptosis induction. A specific role of these residues was also confirmed experimentally. These data demonstrate for the first time the potential of primary R5 Envs to mediate bystander apoptosis in CD4(+) T cells. Furthermore, we identify specific genetic signatures within the Env that may be associated with the bystander apoptosis-inducing phenotype.
Collapse
Affiliation(s)
- Anjali Joshi
- From the Center of Excellence for Infectious Diseases, Department of Biomedical Sciences, Texas Tech University Health Sciences Center, El Paso, Texas 79905
| | | | | | | | | | | | | | | |
Collapse
|
23
|
van den Kerkhof TLGM, Feenstra KA, Euler Z, van Gils MJ, Rijsdijk LWE, Boeser-Nunnink BD, Heringa J, Schuitemaker H, Sanders RW. HIV-1 envelope glycoprotein signatures that correlate with the development of cross-reactive neutralizing activity. Retrovirology 2013; 10:102. [PMID: 24059682 PMCID: PMC3849187 DOI: 10.1186/1742-4690-10-102] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 09/12/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Current HIV-1 envelope glycoprotein (Env) vaccines are unable to induce cross-reactive neutralizing antibodies. However, such antibodies are elicited in 10-30% of HIV-1 infected individuals, but it is unknown why these antibodies are induced in some individuals and not in others. We hypothesized that the Envs of early HIV-1 variants in individuals who develop cross-reactive neutralizing activity (CrNA) might have unique characteristics that support the induction of CrNA. RESULTS We retrospectively generated and analyzed env sequences of early HIV-1 clonal variants from 31 individuals with diverse levels of CrNA 2-4 years post-seroconversion. These sequences revealed a number of Env signatures that coincided with CrNA development. These included a statistically shorter variable region 1 and a lower probability of glycosylation as implied by a high ratio of NXS versus NXT glycosylation motifs. Furthermore, lower probability of glycosylation at position 332, which is involved in the epitopes of many broadly reactive neutralizing antibodies, was associated with the induction of CrNA. Finally, Sequence Harmony identified a number of amino acid changes associated with the development of CrNA. These residues mapped to various Env subdomains, but in particular to the first and fourth variable region as well as the underlying α2 helix of the third constant region. CONCLUSIONS These findings imply that the development of CrNA might depend on specific characteristics of early Env. Env signatures that correlate with the induction of CrNA might be relevant for the design of effective HIV-1 vaccines.
Collapse
Affiliation(s)
- Tom L G M van den Kerkhof
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
- Netherlands Bioinformatics Center (NBIC), 6525 GA Nijmegen, the Netherlands
| | - Zelda Euler
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Marit J van Gils
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Linda W E Rijsdijk
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Brigitte D Boeser-Nunnink
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU) and Amsterdam Institute for Molecules, Medicine and Systems (AIMMS), VU University Amsterdam, 1081 HV Amsterdam, the Netherlands
- Netherlands Bioinformatics Center (NBIC), 6525 GA Nijmegen, the Netherlands
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
| | - Hanneke Schuitemaker
- Department of Experimental Immunology and Landsteiner Laboratory, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Crucell Holland BV, 2333 CN Leiden, the Netherlands
| | - Rogier W Sanders
- Department of Medical Microbiology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, the Netherlands
- Department of Microbiology and Immunology, Weill Medical College, Cornell University, New York, NY 10065 USA
| |
Collapse
|
24
|
Suplatov D, Shalaeva D, Kirilin E, Arzhanik V, Švedas V. Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J Biomol Struct Dyn 2013; 32:75-87. [DOI: 10.1080/07391102.2012.750249] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
25
|
Membrane-integral pyrophosphatase subfamily capable of translocating both Na+ and H+. Proc Natl Acad Sci U S A 2013; 110:1255-60. [PMID: 23297210 DOI: 10.1073/pnas.1217816110] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the strategies used by organisms to adapt to life under conditions of short energy supply is to use the by-product pyrophosphate to support cation gradients in membranes. Transport reactions are catalyzed by membrane-integral pyrophosphatases (PPases), which are classified into two homologous subfamilies: H(+)-transporting (found in prokaryotes, protists, and plants) and Na(+)-transporting (found in prokaryotes). Transport activities have been believed to require specific machinery for each ion, in accordance with the prevailing paradigm in membrane transport. However, experiments using a fluorescent pH probe and (22)Na(+) measurements in the current study revealed that five bacterial PPases expressed in Escherichia coli have the ability to simultaneously translocate H(+) and Na(+) into inverted membrane vesicles under physiological conditions. Consistent with data from phylogenetic analyses, our results support the existence of a third, dual-specificity bacterial Na(+),H(+)-PPase subfamily, which apparently evolved from Na(+)-PPases. Interestingly, genes for Na(+),H(+)-PPase have been found in the major microbes colonizing the human gastrointestinal tract. The Na(+),H(+)-PPases require Na(+) for hydrolytic and transport activities and are further activated by K(+). Based on ionophore effects, we conclude that the Na(+) and H(+) transport reactions are electrogenic and do not result from secondary antiport effects. Sequence comparisons further disclosed four Na(+),H(+)-PPase signature residues located outside the ion conductance channel identified earlier in PPases using X-ray crystallography. Our results collectively support the emerging paradigm that both Na(+) and H(+) can be transported via the same mechanism, with switching between Na(+) and H(+) specificities requiring only subtle changes in the transporter structure.
Collapse
|
26
|
Baussand J, Kleinjung J. Specific Conformational States of Ras GTPase upon Effector Binding. J Chem Theory Comput 2012; 9:738-749. [PMID: 23316125 PMCID: PMC3541755 DOI: 10.1021/ct3007265] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Indexed: 12/31/2022]
Abstract
![]()
To uncover the structural and dynamical determinants
involved in
the highly specific binding of Ras GTPase to its effectors, the conformational
states of Ras in uncomplexed form and complexed to the downstream
effectors Byr2, PI3Kγ, PLCε, and RalGDS were investigated
using molecular dynamics and cross-comparison of the trajectories.
The subtle changes in the dynamics and conformations of Ras upon effector
binding require an analysis that targets local changes independent
of global motions. Using a structural alphabet, a computational procedure
is proposed to quantify local conformational changes. Positions detected
by this approach were characterized as either specific for a particular
effector, specific for an effector domain type, or as effector unspecific.
A set of nine structurally connected residues (Ras residues 5–8,
32–35, 39–42, 55–59, 73–78, and 161–165),
which link the effector binding site to the distant C-terminus, changed
dynamics upon effector binding, indicating a potential effector-unspecific
signaling route within the Ras structure. Additional conformational
changes were detected along the N-terminus of the central β-sheet.
Besides the Ras residues at the effector interface (e.g., D33, E37,
D38, and Y40), which adopt effector-specific local conformations,
the binding signal propagates from the interface to distant hot-spot
residues, in particular to Y5 and D57. The results of this study reveal
possible conformational mechanisms for the stabilization of the active
state of Ras upon downstream effector binding and for the structural
determinants responsible for effector specificity.
Collapse
Affiliation(s)
- Julie Baussand
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom
| | | |
Collapse
|
27
|
Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics 2012; 13:235. [PMID: 22978315 PMCID: PMC3515339 DOI: 10.1186/1471-2105-13-235] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 09/05/2012] [Indexed: 11/11/2022] Open
Abstract
Background A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap. Results Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system. Conclusions This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
Collapse
Affiliation(s)
- Elin Teppa
- Fundación Instituto Leloir, Avda, Patricias Argentinas 435, CABA, C1405BWE, Argentina
| | | | | | | |
Collapse
|
28
|
Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012; 13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open
Abstract
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, 801 West Baltimore St, Baltimore, MD 21201, USA.
| | | | | |
Collapse
|
29
|
Chakraborty A, Mandloi S, Lanczycki CJ, Panchenko AR, Chakrabarti S. SPEER-SERVER: a web server for prediction of protein specificity determining sites. Nucleic Acids Res 2012; 40:W242-8. [PMID: 22689646 PMCID: PMC3394334 DOI: 10.1093/nar/gks559] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Sites that show specific conservation patterns within subsets of proteins in a protein family are likely to be involved in the development of functional specificity. These sites, generally termed specificity determining sites (SDS), might play a crucial role in binding to a specific substrate or proteins. Identification of SDS through experimental techniques is a slow, difficult and tedious job. Hence, it is very important to develop efficient computational methods that can more expediently identify SDS. Herein, we present Specificity prediction using amino acids’ Properties, Entropy and Evolution Rate (SPEER)-SERVER, a web server that predicts SDS by analyzing quantitative measures of the conservation patterns of protein sites based on their physico-chemical properties and the heterogeneity of evolutionary changes between and within the protein subfamilies. This web server provides an improved representation of results, adds useful input and output options and integrates a wide range of analysis and data visualization tools when compared with the original standalone version of the SPEER algorithm. Extensive benchmarking finds that SPEER-SERVER exhibits sensitivity and precision performance that, on average, meets or exceeds that of other currently available methods. SPEER-SERVER is available at http://www.hpppi.iicb.res.in/ss/.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR)-Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | | | | | | | | |
Collapse
|
30
|
Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families. PLoS One 2011; 6:e24382. [PMID: 21931701 PMCID: PMC3171465 DOI: 10.1371/journal.pone.0024382] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 08/08/2011] [Indexed: 11/19/2022] Open
Abstract
In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function. This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.
Collapse
|
31
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
32
|
Yip KY, Utz L, Sitwell S, Hu X, Sidhu SS, Turk BE, Gerstein M, Kim PM. Identification of specificity determining residues in peptide recognition domains using an information theoretic approach applied to large-scale binding maps. BMC Biol 2011; 9:53. [PMID: 21835011 PMCID: PMC3224579 DOI: 10.1186/1741-7007-9-53] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 08/11/2011] [Indexed: 01/06/2023] Open
Abstract
Background Peptide Recognition Domains (PRDs) are commonly found in signaling proteins. They mediate protein-protein interactions by recognizing and binding short motifs in their ligands. Although a great deal is known about PRDs and their interactions, prediction of PRD specificities remains largely an unsolved problem. Results We present a novel approach to identifying these Specificity Determining Residues (SDRs). Our algorithm generalizes earlier information theoretic approaches to coevolution analysis, to become applicable to this problem. It leverages the growing wealth of binding data between PRDs and large numbers of random peptides, and searches for PRD residues that exhibit strong evolutionary covariation with some positions of the statistical profiles of bound peptides. The calculations involve only information from sequences, and thus can be applied to PRDs without crystal structures. We applied the approach to PDZ, SH3 and kinase domains, and evaluated the results using both residue proximity in co-crystal structures and verified binding specificity maps from mutagenesis studies. Discussion Our predictions were found to be strongly correlated with the physical proximity of residues, demonstrating the ability of our approach to detect physical interactions of the binding partners. Some high-scoring pairs were further confirmed to affect binding specificity using previous experimental results. Combining the covariation results also allowed us to predict binding profiles with higher reliability than two other methods that do not explicitly take residue covariation into account. Conclusions The general applicability of our approach to the three different domain families demonstrated in this paper suggests its potential in predicting binding targets and assisting the exploration of binding mechanisms.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms. Stat Appl Genet Mol Biol 2011; 10:Article 36. [PMID: 22331370 DOI: 10.2202/1544-6115.1666] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these mysterious biochemical phenomena with a view to formulating experimentally testable hypotheses. One approach is to access the implicit biochemical information encoded within the vast amount of genomic sequence data now becoming available. Here, a new Gibbs sampling strategy is formulated and implemented that can partition hundreds of thousands of sequences within a major protein class into multiple, functionally-divergent categories based on those pattern residues that best discriminate between categories. The sampler precisely defines the partition and pattern for each category by explicitly modeling unrelated, non-functional and related-yet-divergent proteins that would otherwise obscure the analysis. To aid biological interpretation, auxiliary routines can characterize pattern residues within available crystal structures and identify those structures most likely to shed light on the roles of pattern residues. This approach can be used to define and annotate automatically subgroup-specific conserved domain profiles based on statistically-rigorous empirical criteria rather than on the subjective and labor-intensive process of manual curation. Incorporating such profiles into domain database search sites (such as the NCBI BLAST site) will provide biologists with previously inaccessible molecular information useful for hypothesis generation and experimental design. Analyses of P-loop GTPases and of AAA+ ATPases illustrate the sampler's ability to obtain such information.
Collapse
|
34
|
Luoto HH, Belogurov GA, Baykov AA, Lahti R, Malinen AM. Na+-translocating membrane pyrophosphatases are widespread in the microbial world and evolutionarily precede H+-translocating pyrophosphatases. J Biol Chem 2011; 286:21633-42. [PMID: 21527638 DOI: 10.1074/jbc.m111.244483] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Membrane pyrophosphatases (PPases), divided into K(+)-dependent and K(+)-independent subfamilies, were believed to pump H(+) across cell membranes until a recent demonstration that some K(+)-dependent PPases function as Na(+) pumps. Here, we have expressed seven evolutionarily important putative PPases in Escherichia coli and estimated their hydrolytic, Na(+) transport, and H(+) transport activities as well as their K(+) and Na(+) requirements in inner membrane vesicles. Four of these enzymes (from Anaerostipes caccae, Chlorobium limicola, Clostridium tetani, and Desulfuromonas acetoxidans) were identified as K(+)-dependent Na(+) transporters. Phylogenetic analysis led to the identification of a monophyletic clade comprising characterized and predicted Na(+)-transporting PPases (Na(+)-PPases) within the K(+)-dependent subfamily. H(+)-transporting PPases (H(+)-PPases) are more heterogeneous and form at least three independent clades in both subfamilies. These results suggest that rather than being a curious rarity, Na(+)-PPases predominantly constitute the K(+)-dependent subfamily. Furthermore, Na(+)-PPases possibly preceded H(+)-PPases in evolution, and transition from Na(+) to H(+) transport may have occurred in several independent enzyme lineages. Site-directed mutagenesis studies facilitated the identification of a specific Glu residue that appears to be central in the transport mechanism. This residue is located in the cytoplasm-membrane interface of transmembrane helix 6 in Na(+)-PPases but shifted to within the membrane or helix 5 in H(+)-PPases. These results contribute to the prediction of the transport specificity and K(+) dependence for a particular membrane PPase sequence based on its position in the phylogenetic tree, identity of residues in the K(+) dependence signature, and position of the membrane-located Glu residue.
Collapse
Affiliation(s)
- Heidi H Luoto
- Department of Biochemistry and Food Chemistry, University of Turku, FIN-20014 Turku, Finland
| | | | | | | | | |
Collapse
|
35
|
Martin-Galiano AJ, Oliva MA, Sanz L, Bhattacharyya A, Serna M, Yebenes H, Valpuesta JM, Andreu JM. Bacterial tubulin distinct loop sequences and primitive assembly properties support its origin from a eukaryotic tubulin ancestor. J Biol Chem 2011; 286:19789-803. [PMID: 21467045 DOI: 10.1074/jbc.m111.230094] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The structure of the unique bacterial tubulin BtubA/B from Prosthecobacter is very similar to eukaryotic αβ-tubulin but, strikingly, BtubA/B fold without eukaryotic chaperones. Our sequence comparisons indicate that BtubA and BtubB do not really correspond to either α- or β-tubulin but have mosaic sequences with intertwining features from both. Their nucleotide-binding loops are more conserved, and their more divergent sequences correspond to discrete surface zones of tubulin involved in microtubule assembly and binding to eukaryotic cytosolic chaperonin, which is absent from the Prosthecobacter dejongeii draft genome. BtubA/B cooperatively assembles over a wider range of conditions than αβ-tubulin, forming pairs of protofilaments that coalesce into bundles instead of microtubules, and it lacks the ability to differentially interact with divalent cations and bind typical tubulin drugs. Assembled BtubA/B contain close to one bound GTP and GDP. Both BtubA and BtubB subunits hydrolyze GTP, leading to disassembly. The mutant BtubA/B-S144G in the tubulin signature motif GGG(T/S)G(S/T)G has strongly inhibited GTPase, but BtubA-T147G/B does not, suggesting that BtubB is a more active GTPase, like β-tubulin. BtubA/B chimera bearing the β-tubulin loops M, H1-S2, and S9-S10 in BtubB fold, assemble, and have reduced GTPase activity. However, introduction of the α-tubulin loop S9-S10 with its unique eight-residue insertion impaired folding. From the sequence analyses, its primitive assembly features, and the properties of the chimeras, we propose that BtubA/B were acquired shortly after duplication of a spontaneously folding α- and β-tubulin ancestor, possibly by horizontal gene transfer from a primitive eukaryotic cell, followed by divergent evolution.
Collapse
Affiliation(s)
- Antonio J Martin-Galiano
- Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Mazin PV, Gelfand MS, Mironov AA, Rakhmaninova AB, Rubinov AR, Russell RB, Kalinina OV. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms Mol Biol 2010; 5:29. [PMID: 20633297 PMCID: PMC2914642 DOI: 10.1186/1748-7188-5-29] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 07/15/2010] [Indexed: 11/30/2022] Open
Abstract
Background Recent progress in sequencing and 3 D structure determination techniques stimulated development of approaches aimed at more precise annotation of proteins, that is, prediction of exact specificity to a ligand or, more broadly, to a binding partner of any kind. Results We present a method, SDPclust, for identification of protein functional subfamilies coupled with prediction of specificity-determining positions (SDPs). SDPclust predicts specificity in a phylogeny-independent stochastic manner, which allows for the correct identification of the specificity for proteins that are separated on a phylogenetic tree, but still bind the same ligand. SDPclust is implemented as a Web-server http://bioinf.fbb.msu.ru/SDPfoxWeb/ and a stand-alone Java application available from the website. Conclusions SDPclust performs a simultaneous identification of specificity determinants and specificity groups in a statistically robust and phylogeny-independent manner.
Collapse
|
37
|
Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 2010; 38:W35-40. [PMID: 20525785 PMCID: PMC2896201 DOI: 10.1093/nar/gkq415] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.
Collapse
Affiliation(s)
- Bernd W Brandt
- Centre for Integrative Bioinformatics, VU University Amsterdam, De Boelelaan 1081A, 1081HV Amsterdam, The Netherlands
| | | | | |
Collapse
|
38
|
Georgi B, Schultz J, Schliep A. Partially-supervised protein subclass discovery with simultaneous annotation of functional residues. BMC STRUCTURAL BIOLOGY 2009; 9:68. [PMID: 19857261 PMCID: PMC2777906 DOI: 10.1186/1472-6807-9-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Accepted: 10/26/2009] [Indexed: 03/20/2023]
Abstract
BACKGROUND The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level. RESULTS We have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach. CONCLUSION The partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.
Collapse
Affiliation(s)
- Benjamin Georgi
- Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany.
| | | | | |
Collapse
|
39
|
Goldstein P, Zucko J, Vujaklija D, Krisko A, Hranueli D, Long PF, Etchebest C, Basrak B, Cullum J. Clustering of protein domains for functional and evolutionary studies. BMC Bioinformatics 2009; 10:335. [PMID: 19832975 PMCID: PMC2770074 DOI: 10.1186/1471-2105-10-335] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2009] [Accepted: 10/15/2009] [Indexed: 11/16/2022] Open
Abstract
Background The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
Collapse
Affiliation(s)
- Pavle Goldstein
- Department of Genetics, University of Kaiserslautern, Postfach 3049, 67653 Kaiserslautern, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Chakrabarti S, Panchenko AR. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 2009; 10:207. [PMID: 19573245 PMCID: PMC2716344 DOI: 10.1186/1471-2105-10-207] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/02/2009] [Indexed: 11/29/2022] Open
Abstract
Background It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results. Results It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand. Conclusion We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
| | | |
Collapse
|
41
|
SDPhound, a Mutual Information-Based Method to Investigate Specificity-Determining Positions. ALGORITHMS 2009. [DOI: 10.3390/a2020764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
42
|
Abstract
Covariation between sites can arise due to a common evolutionary history. At the same time, structure and function of proteins play significant role in evolvability of different sites that are not directly connected with the common ancestry. The nature of forces which cause residues to coevolve is still not thoroughly understood, it is especially not clear how coevolutionary processes are related to functional diversification within protein families. We analyzed both functional and structural factors that might cause covariation of specificity determinants and showed that they more often participate in coevolutionary relationships with each other and other sites compared with functional sites and those sites that are not under strong functional constraints. We also found that protein sites with higher number of coevolutionary connections with other sites have a tendency to evolve slower. Our results indicate that in some cases coevolutionary connections exist between specificity sites that are located far away in space but are under similar functional constraints. Such correlated changes and compensations can be realized through the stepwise coevolutionary processes which in turn can shed light on the mechanisms of functional diversification.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| |
Collapse
|
43
|
Donald JE, Shakhnovich EI. SDR: a database of predicted specificity-determining residues in proteins. Nucleic Acids Res 2008; 37:D191-4. [PMID: 18927118 PMCID: PMC2686543 DOI: 10.1093/nar/gkn716] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA, USA.
| | | |
Collapse
|
44
|
Sankararaman S, Sjölander K. INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification. ACTA ACUST UNITED AC 2008; 24:2445-52. [PMID: 18776193 PMCID: PMC2572704 DOI: 10.1093/bioinformatics/btn474] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: Identification of functionally important residues in proteins plays a significant role in biological discovery. Here, we present INTREPID—an information–theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments (MSAs). INTREPID uses a traversal of the phylogeny in combination with a positional conservation score, based on Jensen–Shannon divergence, to rank positions in an MSA. While knowledge of protein 3D structure can significantly improve the accuracy of functional site identification, since structural information is not available for a majority of proteins, INTREPID relies solely on sequence information. We evaluated INTREPID on two tasks: predicting catalytic residues and predicting specificity determinants. Results: In catalytic residue prediction, INTREPID provides significant improvements over Evolutionary Trace, ConSurf as well as over a baseline global conservation method on a set of 100 manually curated enzymes from the Catalytic Site Atlas. In particular, INTREPID is able to better predict catalytic positions that are not globally conserved and hence, attains improved sensitivity at high values of specificity. We also investigated the performance of INTREPID as a function of the evolutionary divergence of the protein family. We found that INTREPID is better able to exploit the diversity in such families and that accuracy improves when homologs with very low sequence identity are included in an alignment. In specificity determinant prediction, when subtype information is known, INTREPID-SPEC, a variant of INTREPID, attains accuracies that are competitive with other approaches for this task. Availability: INTREPID is available for 16919 families in the PhyloFacts resource (http://phylogenomics.berkeley.edu/phylofacts). Contact:sriram_s@cs.berkeley.edu Supplementary information: Relevant online supplementary material is available at http://phylogenomics.berkeley.edu/INTREPID.
Collapse
Affiliation(s)
- Sriram Sankararaman
- Department of Electrical Engineering & Computer Science and Department of Bioengineering, University of California, Berkeley, USA.
| | | |
Collapse
|
45
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 PMCID: PMC2561214 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|
46
|
Niv MY, Skrabanek L, Roberts RJ, Scheraga HA, Weinstein H. Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints. Proteins 2008; 71:631-40. [PMID: 17972284 PMCID: PMC2465807 DOI: 10.1002/prot.21777] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Collapse
Affiliation(s)
- Masha Y Niv
- Department of Physiology and Biophysics, Weill Medical College of Cornell University, 1300 York Ave., New York, New York 10021, USA.
| | | | | | | | | |
Collapse
|
47
|
Capra JA, Singh M. Characterization and prediction of residues determining protein functional specificity. ACTA ACUST UNITED AC 2008; 24:1473-80. [PMID: 18450811 PMCID: PMC2718669 DOI: 10.1093/bioinformatics/btn214] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. RESULTS We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. AVAILABILITY Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John A Capra
- Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | | |
Collapse
|
48
|
Identification and evolution of fungal mitochondrial tyrosyl-tRNA synthetases with group I intron splicing activity. Proc Natl Acad Sci U S A 2008; 105:6010-5. [PMID: 18413600 DOI: 10.1073/pnas.0801722105] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The bifunctional Neurospora crassa mitochondrial tyrosyl-tRNA synthetase (CYT-18 protein) both aminoacylates mitochondrial tRNA(Tyr) and acts as a structure-stabilizing splicing cofactor for group I introns. Previous studies showed that CYT-18 has distinct tRNA(Tyr) and group I intron-binding sites, with the latter formed by three small "insertions" in the nucleotide-binding fold and other structural adaptations compared with nonsplicing bacterial tyrosyl-tRNA synthetases. Here, analysis of genomic sequences shows that mitochondrial tyrosyl-tRNA synthetases with structural adaptations similar to CYT-18's are uniquely characteristic of fungi belonging to the subphylum Pezizomycotina, and biochemical assays confirm group I intron splicing activity for the enzymes from several of these organisms, including Aspergillus nidulans and the human pathogens Coccidioides posadasii and Histoplasma capsulatum. By combining multiple sequence alignments with a previously determined cocrystal structure of a CYT-18/group I intron RNA complex, we identify conserved features of the Pezizomycotina enzymes related to group I intron and tRNA interactions. Our results suggest that mitochondrial tyrosyl-tRNA synthetases with group I intron splicing activity evolved during or after the divergence of the fungal subphyla Pezizomycotina and Saccharomycotina by a mechanism involving the concerted differentiation of preexisting protein loop regions. The unique group I intron splicing activity of these fungal enzymes may provide a new target for antifungal drugs.
Collapse
|
49
|
|
50
|
Lee D, Redfern O, Orengo C. Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 2007; 8:995-1005. [PMID: 18037900 DOI: 10.1038/nrm2281] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|