1
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
2
|
Sirota FL, Maurer-Stroh S, Li Z, Eisenhaber F, Eisenhaber B. Functional Classification of Super-Large Families of Enzymes Based on Substrate Binding Pocket Residues for Biocatalysis and Enzyme Engineering Applications. Front Bioeng Biotechnol 2021; 9:701120. [PMID: 34409021 PMCID: PMC8366029 DOI: 10.3389/fbioe.2021.701120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 07/12/2021] [Indexed: 11/13/2022] Open
Abstract
Large enzyme families such as the groups of zinc-dependent alcohol dehydrogenases (ADHs), long chain alcohol oxidases (AOxs) or amine dehydrogenases (AmDHs) with, sometimes, more than one million sequences in the non-redundant protein database and hundreds of experimentally characterized enzymes are excellent cases for protein engineering efforts aimed at refining and modifying substrate specificity. Yet, the backside of this wealth of information is that it becomes technically difficult to rationally select optimal sequence targets as well as sequence positions for mutagenesis studies. In all three cases, we approach the problem by starting with a group of experimentally well studied family members (including those with available 3D structures) and creating a structure-guided multiple sequence alignment and a modified phylogenetic tree (aka binding site tree) based just on a selection of potential substrate binding residue positions derived from experimental information (not from the full-length sequence alignment). Hereupon, the remaining, mostly uncharacterized enzyme sequences can be mapped; as a trend, sequence grouping in the tree branches follows substrate specificity. We show that this information can be used in the target selection for protein engineering work to narrow down to single suitable sequences and just a few relevant candidate positions for directed evolution towards activity for desired organic compound substrates. We also demonstrate how to find the closest thermophile example in the dataset if the engineering is aimed at achieving most robust enzymes.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Zhi Li
- Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science Technology and Research (ASTAR), Singapore, Singapore.,Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| |
Collapse
|
3
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
4
|
Prescher M, Bonus M, Stindt J, Keitel-Anselmino V, Smits SHJ, Gohlke H, Schmitt L. Evidence for a credit-card-swipe mechanism in the human PC floppase ABCB4. Structure 2021; 29:1144-1155.e5. [PMID: 34107287 DOI: 10.1016/j.str.2021.05.013] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 04/27/2021] [Accepted: 05/17/2021] [Indexed: 10/21/2022]
Abstract
ABCB4 is described as an ATP-binding cassette (ABC) transporter that primarily transports lipids of the phosphatidylcholine (PC) family but is also capable of translocating a subset of typical multidrug-resistance-associated drugs. The high degree of amino acid identity of 76% for ABCB4 and ABCB1, which is a prototype multidrug-resistance-mediating protein, results in ABCB4's second subset of substrates, which overlap with ABCB1's substrates. This often leads to incomplete annotations of ABCB4, in which it was described as exclusively PC-lipid specific. When the hydrophilic amino acids from ABCB4 are changed to the analogous but hydrophobic ones from ABCB1, the stimulation of ATPase activity by 1,2-dioleoyl-sn-glycero-3-phosphocholine, as a prime example of PC lipids, is strongly diminished, whereas the modulation capability of ABCB1 substrates remains unchanged. This indicates two distinct and autonomous substrate binding sites in ABCB4.
Collapse
Affiliation(s)
- Martin Prescher
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Michele Bonus
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jan Stindt
- Clinic for Gastroenterology, Hepatology and Infectious Diseases University Hospital Düsseldorf, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Verena Keitel-Anselmino
- Clinic for Gastroenterology, Hepatology and Infectious Diseases University Hospital Düsseldorf, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Sander H J Smits
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Center for Structural Studies, Heinrich Heine University Düsseldorf, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Lutz Schmitt
- Institute of Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
5
|
Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
6
|
Recurrent sequence evolution after independent gene duplication. BMC Evol Biol 2020; 20:98. [PMID: 32770961 PMCID: PMC7414715 DOI: 10.1186/s12862-020-01660-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 07/17/2020] [Indexed: 11/10/2022] Open
Abstract
Background Convergent and parallel evolution provide unique insights into the mechanisms of natural selection. Some of the most striking convergent and parallel (collectively recurrent) amino acid substitutions in proteins are adaptive, but there are also many that are selectively neutral. Accordingly, genome-wide assessment has shown that recurrent sequence evolution in orthologs is chiefly explained by nearly neutral evolution. For paralogs, more frequent functional change is expected because additional copies are generally not retained if they do not acquire their own niche. Yet, it is unknown to what extent recurrent sequence differentiation is discernible after independent gene duplications in different eukaryotic taxa. Results We develop a framework that detects patterns of recurrent sequence evolution in duplicated genes. This is used to analyze the genomes of 90 diverse eukaryotes. We find a remarkable number of families with a potentially predictable functional differentiation following gene duplication. In some protein families, more than ten independent duplications show a similar sequence-level differentiation between paralogs. Based on further analysis, the sequence divergence is found to be generally asymmetric. Moreover, about 6% of the recurrent sequence evolution between paralog pairs can be attributed to recurrent differentiation of subcellular localization. Finally, we reveal the specific recurrent patterns for the gene families Hint1/Hint2, Sco1/Sco2 and vma11/vma3. Conclusions The presented methodology provides a means to study the biochemical underpinning of functional differentiation between paralogs. For instance, two abundantly repeated substitutions are identified between independently derived Sco1 and Sco2 paralogs. Such identified substitutions allow direct experimental testing of the biological role of these residues for the repeated functional differentiation. We also uncover a diverse set of families with recurrent sequence evolution and reveal trends in the functional and evolutionary trajectories of this hitherto understudied phenomenon.
Collapse
|
7
|
Gazara RK, Moharana KC, Bellieny-Rabelo D, Venancio TM. Expansion and diversification of the gibberellin receptor GIBBERELLIN INSENSITIVE DWARF1 (GID1) family in land plants. PLANT MOLECULAR BIOLOGY 2018; 97:435-449. [PMID: 29956113 DOI: 10.1007/s11103-018-0750-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 06/14/2018] [Indexed: 05/13/2023]
Abstract
Here we uncover the major evolutionary events shaping the evolution of the GID1 family of gibberellin receptors in land plants at the sequence, structure and gene expression levels. Gibberellic acid (gibberellin, GA) controls key developmental processes in the life cycle of land plants. By interacting with the GIBBERELLIN INSENSITIVE DWARF1 (GID1) receptor, GA regulates the expression of a wide range of genes through different pathways. Here we report the systematic identification and classification of GID1s in 54 plants genomes, encompassing from bryophytes and lycophytes, to several monocots and eudicots. We investigated the evolutionary relationship of GID1s using a comparative genomics framework and found strong support for a previously proposed phylogenetic classification of this family in land plants. We identified lineage-specific expansions of particular subfamilies (i.e. GID1ac and GID1b) in different eudicot lineages (e.g. GID1b in legumes). Further, we found both, shared and divergent structural features between GID1ac and GID1b subgroups in eudicots that provide mechanistic insights on their functions. Gene expression data from several species show that at least one GID1 gene is expressed in every sampled tissue, with a strong bias of GID1b expression towards underground tissues and dry legume seeds (which typically have low GA levels). Taken together, our results indicate that GID1ac retained canonical GA signaling roles, whereas GID1b specialized in conditions of low GA concentrations. We propose that this functional specialization occurred initially at the gene expression level and was later fine-tuned by mutations that conferred greater GA affinity to GID1b, including a Phe residue in the GA-binding pocket. Finally, we discuss the importance of our findings to understand the diversification of GA perception mechanisms in land plants.
Collapse
Affiliation(s)
- Rajesh K Gazara
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
| | - Kanhu C Moharana
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
| | - Daniel Bellieny-Rabelo
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
- Department of Microbiology and Plant Pathology, University of Pretoria, Lunnon Road, Pretoria, 0028, South Africa
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil.
| |
Collapse
|
8
|
Maayan Y, Pandaranayaka EPJ, Srivastava DA, Lapidot M, Levin I, Dombrovsky A, Harel A. Using genomic analysis to identify tomato Tm-2 resistance-breaking mutations and their underlying evolutionary path in a new and emerging tobamovirus. Arch Virol 2018; 163:1863-1875. [PMID: 29582165 DOI: 10.1007/s00705-018-3819-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 03/05/2018] [Indexed: 12/20/2022]
Abstract
In September 2014, a new tobamovirus was discovered in Israel that was able to break Tm-2-mediated resistance in tomato that had lasted 55 years. The virus was isolated, and sequencing of its genome showed it to be tomato brown rugose fruit virus (ToBRFV), a new tobamovirus recently identified in Jordan. Previous studies on mutant viruses that cause resistance breaking, including Tm-2-mediated resistance, demonstrated that this phenotype had resulted from only a few mutations. Identification of important residues in resistance breakers is hindered by significant background variation, with 9-15% variability in the genomic sequences of known isolates. To understand the evolutionary path leading to the emergence of this resistance breaker, we performed a comprehensive phylogenetic analysis and genomic comparison of different tobamoviruses, followed by molecular modeling of the viral helicase. The phylogenetic location of the resistance-breaking genes was found to be among host-shifting clades, and this, together with the observation of a relatively low mutation rate, suggests that a host shift contributed to the emergence of this new virus. Our comparative genomic analysis identified twelve potential resistance-breaking mutations in the viral movement protein (MP), the primary target of the related Tm-2 resistance, and nine in its replicase. Finally, molecular modeling of the helicase enabled the identification of three additional potential resistance-breaking mutations.
Collapse
Affiliation(s)
- Yonatan Maayan
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Eswari P J Pandaranayaka
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Dhruv Aditya Srivastava
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Moshe Lapidot
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Ilan Levin
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Aviv Dombrovsky
- Department of Plant Pathology and Weed Research, Institute of Plant Protection, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel
| | - Arye Harel
- Department of Vegetable and Field Crop Research, Institute of Plant Sciences, Agricultural Research Organization, Volcani Center, 68 HaMaccabim Road, P.O. Box 15159, 7505101, Rishon LeZion, Israel.
| |
Collapse
|
9
|
Indrischek H, Prohaska SJ, Gurevich VV, Gurevich EV, Stadler PF. Uncovering missing pieces: duplication and deletion history of arrestins in deuterostomes. BMC Evol Biol 2017; 17:163. [PMID: 28683816 PMCID: PMC5501109 DOI: 10.1186/s12862-017-1001-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/19/2017] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The cytosolic arrestin proteins mediate desensitization of activated G protein-coupled receptors (GPCRs) via competition with G proteins for the active phosphorylated receptors. Arrestins in active, including receptor-bound, conformation are also transducers of signaling. Therefore, this protein family is an attractive therapeutic target. The signaling outcome is believed to be a result of structural and sequence-dependent interactions of arrestins with GPCRs and other protein partners. Here we elucidated the detailed evolution of arrestins in deuterostomes. RESULTS Identity and number of arrestin paralogs were determined searching deuterostome genomes and gene expression data. In contrast to standard gene prediction methods, our strategy first detects exons situated on different scaffolds and then solves the problem of assigning them to the correct gene. This increases both the completeness and the accuracy of the annotation in comparison to conventional database search strategies applied by the community. The employed strategy enabled us to map in detail the duplication- and deletion history of arrestin paralogs including tandem duplications, pseudogenizations and the formation of retrogenes. The two rounds of whole genome duplications in the vertebrate stem lineage gave rise to four arrestin paralogs. Surprisingly, visual arrestin ARR3 was lost in the mammalian clades Afrotheria and Xenarthra. Duplications in specific clades, on the other hand, must have given rise to new paralogs that show signatures of diversification in functional elements important for receptor binding and phosphate sensing. CONCLUSION The current study traces the functional evolution of deuterostome arrestins in unprecedented detail. Based on a precise re-annotation of the exon-intron structure at nucleotide resolution, we infer the gain and loss of paralogs and patterns of conservation, co-variation and selection.
Collapse
Affiliation(s)
- Henrike Indrischek
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
| | - Vsevolod V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Eugenia V Gurevich
- Department of Pharmacology, Vanderbilt University, 2200 Pierce Ave, Nashville, TN 37232, USA
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, D-04107, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany
- Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria
- Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| |
Collapse
|
10
|
Abstract
BACKGROUND Variable domains of camelid heavy-chain antibodies, commonly named nanobodies, have high biotechnological potential. In view of their broad range of applications in research, diagnostics and therapy, engineering their stability is of particular interest. One important aspect is the improvement of thermostability, because it can have immediate effects on conformational stability, protease resistance and aggregation propensity of the protein. METHODS We analyzed the sequences and thermostabilities of 78 purified nanobody binders. From this data, potentially stabilizing amino acid variations were identified and studied experimentally. RESULTS Some mutations improved the stability of nanobodies by up to 6.1°C, with an average of 2.3°C across eight modified nanobodies. The stabilizing mechanism involves an improvement of both conformational stability and aggregation behavior, explaining the variable degree of stabilization in individual molecules. In some instances, variations predicted to be stabilizing actually led to thermal destabilization of the proteins. The reasons for this contradiction between prediction and experiment were investigated. CONCLUSIONS The results reveal a mutational strategy to improve the biophysical behavior of nanobody binders and indicate a species-specificity of nanobody architecture. GENERAL SIGNIFICANCE This study illustrates the potential and limitations of engineering nanobody thermostability by merging sequence information with stability data, an aspect that is becoming increasingly important with the recent development of high-throughput biophysical methods.
Collapse
|
11
|
Palma A, Tinti M, Paoluzi S, Santonico E, Brandt BW, Hooft van Huijsduijnen R, Masch A, Heringa J, Schutkowski M, Castagnoli L, Cesareni G. Both Intrinsic Substrate Preference and Network Context Contribute to Substrate Selection of Classical Tyrosine Phosphatases. J Biol Chem 2017; 292:4942-4952. [PMID: 28159843 DOI: 10.1074/jbc.m116.757518] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 01/31/2017] [Indexed: 01/19/2023] Open
Abstract
Reversible tyrosine phosphorylation is a widespread post-translational modification mechanism underlying cell physiology. Thus, understanding the mechanisms responsible for substrate selection by kinases and phosphatases is central to our ability to model signal transduction at a system level. Classical protein-tyrosine phosphatases can exhibit substrate specificity in vivo by combining intrinsic enzymatic specificity with the network of protein-protein interactions, which positions the enzymes in close proximity to their substrates. Here we use a high throughput approach, based on high density phosphopeptide chips, to determine the in vitro substrate preference of 16 members of the protein-tyrosine phosphatase family. This approach helped identify one residue in the substrate binding pocket of the phosphatase domain that confers specificity for phosphopeptides in a specific sequence context. We also present a Bayesian model that combines intrinsic enzymatic specificity and interaction information in the context of the human protein interaction network to infer new phosphatase substrates at the proteome level.
Collapse
Affiliation(s)
- Anita Palma
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Michele Tinti
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Serena Paoluzi
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Elena Santonico
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Bernd Willem Brandt
- the Centre for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands, and
| | | | - Antonia Masch
- the Institut für Biochemie & Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, 06108 Halle, Germany
| | - Jaap Heringa
- the Centre for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands, and
| | - Mike Schutkowski
- the Institut für Biochemie & Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, 06108 Halle, Germany
| | - Luisa Castagnoli
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Gianni Cesareni
- From the Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy,
| |
Collapse
|
12
|
Yan R, Friemel M, Aloisi C, Huynen M, Taylor IA, Leimkühler S, Pastore A. The Eukaryotic-Specific ISD11 Is a Complex-Orphan Protein with Ability to Bind the Prokaryotic IscS. PLoS One 2016; 11:e0157895. [PMID: 27427956 PMCID: PMC4948766 DOI: 10.1371/journal.pone.0157895] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/07/2016] [Indexed: 11/18/2022] Open
Abstract
The eukaryotic protein Isd11 is a chaperone that binds and stabilizes the central component of the essential metabolic pathway responsible for formation of iron-sulfur clusters in mitochondria, the desulfurase Nfs1. Little is known about the exact role of Isd11. Here, we show that human Isd11 (ISD11) is a helical protein which exists in solution as an equilibrium between monomer, dimeric and tetrameric species when in the absence of human Nfs1 (NFS1). We also show that, surprisingly, recombinant ISD11 expressed in E. coli co-purifies with the bacterial orthologue of NFS1, IscS. Binding is weak but specific suggesting that, despite the absence of Isd11 sequences in bacteria, there is enough conservation between the two desulfurases to retain a similar mode of interaction. This knowledge may inform us on the conservation of the mode of binding of Isd11 to the desulfurase. We used evolutionary evidence to suggest Isd11 residues involved in the interaction.
Collapse
Affiliation(s)
- Robert Yan
- Maurice Wohl Institute, King’s College London, 5 Cutcombe Rd, SE5, London, United Kingdom
| | - Martin Friemel
- Department of Molecular Enzymology, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Claudia Aloisi
- Maurice Wohl Institute, King’s College London, 5 Cutcombe Rd, SE5, London, United Kingdom
| | - Martijn Huynen
- CMBI 260, Radboud University Medical Centre, PO Box 9101, 6500 HB, Nijmegen, The Netherlands
| | - Ian A. Taylor
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, London, NW7 1AA, United Kingdom
| | - Silke Leimkühler
- Department of Molecular Enzymology, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Annalisa Pastore
- Maurice Wohl Institute, King’s College London, 5 Cutcombe Rd, SE5, London, United Kingdom
- * E-mail:
| |
Collapse
|
13
|
Mróz TL, Havey MJ, Bartoszewski G. Cucumber Possesses a Single Terminal Alternative Oxidase Gene That is Upregulated by Cold Stress and in the Mosaic (MSC) Mitochondrial Mutants. PLANT MOLECULAR BIOLOGY REPORTER 2015; 33:1893-1906. [PMID: 26752808 PMCID: PMC4695503 DOI: 10.1007/s11105-015-0883-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Alternative oxidase (AOX) is a mitochondrial terminal oxidase which is responsible for an alternative route of electron transport in the respiratory chain. This nuclear-encoded enzyme is involved in a major path of survival under adverse conditions by transfer of electrons from ubiquinol instead of the main cytochrome pathway. AOX protects against unexpected inhibition of the cytochrome c oxidase pathway and plays an important role in stress tolerance. Two AOX subfamilies (AOX1 and AOX2) exist in higher plants and are usually encoded by small gene families. In this study, genome-wide searches and cloning were completed to identify and characterize AOX genes in cucumber (Cucumis sativus L.). Our results revealed that cucumber possesses no AOX1 gene(s) and only a single AOX2 gene located on chromosome 4. Expression studies showed that AOX2 in wild-type cucumber is constitutively expressed at low levels and is upregulated by cold stress. AOX2 transcripts and protein were detected in leaves and flowers of wild-type plants, with higher levels in the three independently derived mosaic (MSC) mitochondrial mutants. Because cucumber possesses a single AOX gene and its expression increases under cold stress and in the MSC mutants, this plant is a unique and intriguing model to study AOX expression and regulation particularly in the context of mitochondria-to-nucleus retrograde signaling.
Collapse
Affiliation(s)
- Tomasz L. Mróz
- />Department of Plant Genetics, Breeding and Biotechnology, Faculty of Horticulture, Biotechnology and Landscape Architecture, Warsaw University of Life Sciences, ul. Nowoursynowska 159, 02-776 Warsaw, Poland
| | - Michael J. Havey
- />Agricultural Research Service, U.S. Department of Agriculture, Vegetable Crops Unit, Department of Horticulture, University of Wisconsin, 1575 Linden Dr., Madison, WI 53706 USA
| | - Grzegorz Bartoszewski
- />Department of Plant Genetics, Breeding and Biotechnology, Faculty of Horticulture, Biotechnology and Landscape Architecture, Warsaw University of Life Sciences, ul. Nowoursynowska 159, 02-776 Warsaw, Poland
| |
Collapse
|
14
|
Costa JH, McDonald AE, Arnholdt-Schmitt B, Fernandes de Melo D. A classification scheme for alternative oxidases reveals the taxonomic distribution and evolutionary history of the enzyme in angiosperms. Mitochondrion 2014; 19 Pt B:172-83. [DOI: 10.1016/j.mito.2014.04.007] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Revised: 03/23/2014] [Accepted: 04/11/2014] [Indexed: 10/25/2022]
|
15
|
Gijsbers EF, Feenstra KA, van Nuenen AC, Navis M, Heringa J, Schuitemaker H, Kootstra NA. HIV-1 replication fitness of HLA-B*57/58:01 CTL escape variants is restored by the accumulation of compensatory mutations in gag. PLoS One 2013; 8:e81235. [PMID: 24339913 PMCID: PMC3855271 DOI: 10.1371/journal.pone.0081235] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 10/10/2013] [Indexed: 11/30/2022] Open
Abstract
Expression of HLA-B*57 and the closely related HLA-B*58:01 are associated with prolonged survival after HIV-1 infection. However, large differences in disease course are observed among HLA-B*57/58:01 patients. Escape mutations in CTL epitopes restricted by these HLA alleles come at a fitness cost and particularly the T242N mutation in the TW10 CTL epitope in Gag has been demonstrated to decrease the viral replication capacity. Additional mutations within or flanking this CTL epitope can partially restore replication fitness of CTL escape variants. Five HLA-B*57/58:01 progressors and 5 HLA-B*57/58:01 long-term nonprogressors (LTNPs) were followed longitudinally and we studied which compensatory mutations were involved in the restoration of the viral fitness of variants that escaped from HLA-B*57/58:01-restricted CTL pressure. The Sequence Harmony algorithm was used to detect homology in amino acid composition by comparing longitudinal Gag sequences obtained from HIV-1 patients positive and negative for HLA-B*57/58:01 and from HLA-B*57/58:01 progressors and LTNPs. Although virus isolates from HLA-B*57/58:01 individuals contained multiple CTL escape mutations, these escape mutations were not associated with disease progression. In sequences from HLA-B*57/58:01 progressors, 5 additional mutations in Gag were observed: S126N, L215T, H219Q, M228I and N252H. The combination of these mutations restored the replication fitness of CTL escape HIV-1 variants. Furthermore, we observed a positive correlation between the number of escape and compensatory mutations in Gag and the replication fitness of biological HIV-1 variants isolated from HLA-B*57/58:01 patients, suggesting that the replication fitness of HLA-B*57/58:01 escape variants is restored by accumulation of compensatory mutations.
Collapse
Affiliation(s)
- Esther F. Gijsbers
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - K. Anton Feenstra
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Ad C. van Nuenen
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Marjon Navis
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU) and Amsterdam Institute for Molecules, Medicines and Systems (AIMMS), VU University, Amsterdam, The Netherlands
- Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Hanneke Schuitemaker
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Neeltje A. Kootstra
- Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
- * E-mail:
| |
Collapse
|
16
|
Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012; 13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open
Abstract
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, 801 West Baltimore St, Baltimore, MD 21201, USA.
| | | | | |
Collapse
|
17
|
Chakraborty A, Mandloi S, Lanczycki CJ, Panchenko AR, Chakrabarti S. SPEER-SERVER: a web server for prediction of protein specificity determining sites. Nucleic Acids Res 2012; 40:W242-8. [PMID: 22689646 PMCID: PMC3394334 DOI: 10.1093/nar/gks559] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Sites that show specific conservation patterns within subsets of proteins in a protein family are likely to be involved in the development of functional specificity. These sites, generally termed specificity determining sites (SDS), might play a crucial role in binding to a specific substrate or proteins. Identification of SDS through experimental techniques is a slow, difficult and tedious job. Hence, it is very important to develop efficient computational methods that can more expediently identify SDS. Herein, we present Specificity prediction using amino acids’ Properties, Entropy and Evolution Rate (SPEER)-SERVER, a web server that predicts SDS by analyzing quantitative measures of the conservation patterns of protein sites based on their physico-chemical properties and the heterogeneity of evolutionary changes between and within the protein subfamilies. This web server provides an improved representation of results, adds useful input and output options and integrates a wide range of analysis and data visualization tools when compared with the original standalone version of the SPEER algorithm. Extensive benchmarking finds that SPEER-SERVER exhibits sensitivity and precision performance that, on average, meets or exceeds that of other currently available methods. SPEER-SERVER is available at http://www.hpppi.iicb.res.in/ss/.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR)-Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | | | | | | | | |
Collapse
|
18
|
Huynen MA, Duarte I, Chrzanowska-Lightowlers ZMA, Nabuurs SB. Structure based hypothesis of a mitochondrial ribosome rescue mechanism. Biol Direct 2012; 7:14. [PMID: 22569235 PMCID: PMC3418547 DOI: 10.1186/1745-6150-7-14] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/27/2012] [Indexed: 11/29/2022] Open
Abstract
Background mtRF1 is a vertebrate mitochondrial protein with an unknown function that arose from a duplication of the mitochondrial release factor mtRF1a. To elucidate the function of mtRF1, we determined the positions that are conserved among mtRF1 sequences but that are different in their mtRF1a paralogs. We subsequently modeled the 3D structure of mtRF1a and mtRF1 bound to the ribosome, highlighting the structural implications of these differences to derive a hypothesis for the function of mtRF1. Results Our model predicts, in agreement with the experimental data, that the 3D structure of mtRF1a allows it to recognize the stop codons UAA and UAG in the A-site of the ribosome. In contrast, we show that mtRF1 likely can only bind the ribosome when the A-site is devoid of mRNA. Furthermore, while mtRF1a will adopt its catalytic conformation, in which it functions as a peptidyl-tRNA hydrolase in the ribosome, only upon binding of a stop codon in the A-site, mtRF1 appears specifically adapted to assume this extended, peptidyl-tRNA hydrolyzing conformation in the absence of mRNA in the A-site. Conclusions We predict that mtRF1 specifically recognizes ribosomes with an empty A-site and is able to function as a peptidyl-tRNA hydrolase in those situations. Stalled ribosomes with empty A-sites that still contain a tRNA bound to a peptide chain can result from the translation of truncated, stop-codon less mRNAs. We hypothesize that mtRF1 recycles such stalled ribosomes, performing a function that is analogous to that of tmRNA in bacteria. Reviewers This article was reviewed by Dr. Eugene Koonin, Prof. Knud H. Nierhaus (nominated by Dr. Sarah Teichmann) and Dr. Shamil Sunyaev.
Collapse
Affiliation(s)
- Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, P,O, Box 9101, 6400, HB Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
19
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
20
|
Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms. Stat Appl Genet Mol Biol 2011; 10:Article 36. [PMID: 22331370 DOI: 10.2202/1544-6115.1666] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these mysterious biochemical phenomena with a view to formulating experimentally testable hypotheses. One approach is to access the implicit biochemical information encoded within the vast amount of genomic sequence data now becoming available. Here, a new Gibbs sampling strategy is formulated and implemented that can partition hundreds of thousands of sequences within a major protein class into multiple, functionally-divergent categories based on those pattern residues that best discriminate between categories. The sampler precisely defines the partition and pattern for each category by explicitly modeling unrelated, non-functional and related-yet-divergent proteins that would otherwise obscure the analysis. To aid biological interpretation, auxiliary routines can characterize pattern residues within available crystal structures and identify those structures most likely to shed light on the roles of pattern residues. This approach can be used to define and annotate automatically subgroup-specific conserved domain profiles based on statistically-rigorous empirical criteria rather than on the subjective and labor-intensive process of manual curation. Incorporating such profiles into domain database search sites (such as the NCBI BLAST site) will provide biologists with previously inaccessible molecular information useful for hypothesis generation and experimental design. Analyses of P-loop GTPases and of AAA+ ATPases illustrate the sampler's ability to obtain such information.
Collapse
|
21
|
Wang L, Mavisakalyan V, Tillier ERM, Clark GW, Savchenko AV, Yakunin AF, Master ER. Mining bacterial genomes for novel arylesterase activity. Microb Biotechnol 2011; 3:677-90. [PMID: 21255363 PMCID: PMC3815341 DOI: 10.1111/j.1751-7915.2010.00185.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
One hundred and seventy-one genes encoding potential esterases from 11 bacterial genomes were cloned and overexpressed in Escherichia coli; 74 of the clones produced soluble proteins. All 74 soluble proteins were purified and screened for esterase activity; 36 proteins showed carboxyl esterase activity on short-chain esters, 17 demonstrated arylesterase activity, while 38 proteins did not exhibit any activity towards the test substrates. Esterases from Rhodopseudomonas palustris (RpEST-1, RpEST-2 and RpEST-3), Pseudomonas putida (PpEST-1, PpEST-2 and PpEST-3), Pseudomonas aeruginosa (PaEST-1) and Streptomyces avermitilis (SavEST-1) were selected for detailed biochemical characterization. All of the enzymes showed optimal activity at neutral or alkaline pH, and the half-life of each enzyme at 50°C ranged from < 5 min to over 5 h. PpEST-3, RpEST-1 and RpEST-2 demonstrated the highest specific activity with pNP-esters; these enzymes were also among the most stable at 50°C and in the presence of detergents, polar and non-polar organic solvents, and imidazolium ionic liquids. Accordingly, these enzymes are particularly interesting targets for subsequent application trials. Finally, biochemical and bioinformatic analyses were compared to reveal sequence features that could be correlated to enzymes with arylesterase activity, facilitating subsequent searches for new esterases in microbial genome sequences.
Collapse
Affiliation(s)
- Lijun Wang
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College Street, Toronto, ON, M5S 3E5, Canada
| | | | | | | | | | | | | |
Collapse
|
22
|
Mazin PV, Gelfand MS, Mironov AA, Rakhmaninova AB, Rubinov AR, Russell RB, Kalinina OV. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms Mol Biol 2010; 5:29. [PMID: 20633297 PMCID: PMC2914642 DOI: 10.1186/1748-7188-5-29] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 07/15/2010] [Indexed: 11/30/2022] Open
Abstract
Background Recent progress in sequencing and 3 D structure determination techniques stimulated development of approaches aimed at more precise annotation of proteins, that is, prediction of exact specificity to a ligand or, more broadly, to a binding partner of any kind. Results We present a method, SDPclust, for identification of protein functional subfamilies coupled with prediction of specificity-determining positions (SDPs). SDPclust predicts specificity in a phylogeny-independent stochastic manner, which allows for the correct identification of the specificity for proteins that are separated on a phylogenetic tree, but still bind the same ligand. SDPclust is implemented as a Web-server http://bioinf.fbb.msu.ru/SDPfoxWeb/ and a stand-alone Java application available from the website. Conclusions SDPclust performs a simultaneous identification of specificity determinants and specificity groups in a statistically robust and phylogeny-independent manner.
Collapse
|
23
|
Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 2010; 38:W35-40. [PMID: 20525785 PMCID: PMC2896201 DOI: 10.1093/nar/gkq415] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.
Collapse
Affiliation(s)
- Bernd W Brandt
- Centre for Integrative Bioinformatics, VU University Amsterdam, De Boelelaan 1081A, 1081HV Amsterdam, The Netherlands
| | | | | |
Collapse
|
24
|
Costa JH, Cardoso HG, Campos MD, Zavattieri A, Frederico AM, Fernandes de Melo D, Arnholdt-Schmitt B. Daucus carota L.--an old model for cell reprogramming gains new importance through a novel expansion pattern of alternative oxidase (AOX) genes. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2009; 47:753-9. [PMID: 19372042 DOI: 10.1016/j.plaphy.2009.03.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2008] [Revised: 03/19/2009] [Accepted: 03/24/2009] [Indexed: 05/13/2023]
Abstract
The paper highlights Daucus carota L. as an ideal model to complement plant stress research on Arabidopsis thaliana L. Recently, alternative oxidase (AOX) is discussed as functional marker candidate for cell reprogramming upon stress. Carrot is the most studied species for cell reprogramming and our current research reveals that it is the only one that has expanded both AOX sub-family genes. We point to recently published, but not discussed results on conserved differences in the vicinity of the most active functional site of AOX1 and AOX2, which indicate the importance of studying AOX sequence polymorphism, structure and functionality. Thus, stress-inducible experimental systems of D. carota are especially appropriate to bring research on stress tolerance a significant step forward.
Collapse
Affiliation(s)
- J H Costa
- Department of Biochemistry and Molecular Biology, Federal University of Ceará, PO Box 6029, 60455-900, Fortaleza, Ceará, Brazil
| | | | | | | | | | | | | |
Collapse
|
25
|
Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009; 10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.
Collapse
|
26
|
Sankararaman S, Kolaczkowski B, Sjölander K. INTREPID: a web server for prediction of functionally important residues by evolutionary analysis. Nucleic Acids Res 2009; 37:W390-5. [PMID: 19443452 PMCID: PMC2703888 DOI: 10.1093/nar/gkp339] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We present the INTREPID web server for predicting functionally important residues in proteins. INTREPID has been shown to boost the recall and precision of catalytic residue prediction over other sequence-based methods and can be used to identify other types of functional residues. The web server takes an input protein sequence, gathers homologs, constructs a multiple sequence alignment and phylogenetic tree and finally runs the INTREPID method to assign a score to each position. Residues predicted to be functionally important are displayed on homologous 3D structures (where available), highlighting spatial patterns of conservation at various significance thresholds. The INTREPID web server is available at http://phylogenomics.berkeley.edu/intrepid.
Collapse
|
27
|
Brandt BW, Heringa J. webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res 2009; 37:W48-52. [PMID: 19420063 PMCID: PMC2703954 DOI: 10.1093/nar/gkp279] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Profile–profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.
Collapse
Affiliation(s)
- Bernd W Brandt
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, The Netherlands.
| | | |
Collapse
|
28
|
Donald JE, Shakhnovich EI. SDR: a database of predicted specificity-determining residues in proteins. Nucleic Acids Res 2008; 37:D191-4. [PMID: 18927118 PMCID: PMC2686543 DOI: 10.1093/nar/gkn716] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA, USA.
| | | |
Collapse
|
29
|
Ye K, Feenstra KA, Heringa J, Ijzerman AP, Marchiori E. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. ACTA ACUST UNITED AC 2007; 24:18-25. [PMID: 18024975 DOI: 10.1093/bioinformatics/btm537] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Identification of residues that account for protein function specificity is crucial, not only for understanding the nature of functional specificity, but also for protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. Available algorithms generally use multiple sequence alignments to identify residue positions conserved within subfamilies but divergent in between. However, many biological examples show a much subtler picture than simple intra-group conservation versus inter-group divergence. RESULTS We present multi-RELIEF, a novel approach for identifying specificity residues that is based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected 'local' functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we show that (a) multi-RELIEF identifies specificity residues for the seven test sets used, (b) incorporating structural information improves prediction for specificity of interaction with small molecules and (c) comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. AVAILABILITY A web-server implementation of multi-RELIEF is available at www.ibi.vu.nl/programs/multirelief. Matlab source code of the algorithm and data sets are available on request for academic use.
Collapse
Affiliation(s)
- Kai Ye
- Division of Medical Chemistry, LACDR, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands
| | | | | | | | | |
Collapse
|