1701
|
Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection. BMC Bioinformatics 2008; 9:298. [PMID: 18590572 PMCID: PMC2459191 DOI: 10.1186/1471-2105-9-298] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 07/01/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement. RESULTS The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC50 scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins. CONCLUSION The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.
Collapse
|
1702
|
Ozyurt AS, Selby TL. Computational active site analysis of molecular pathways to improve functional classification of enzymes. Proteins 2008; 72:184-96. [DOI: 10.1002/prot.21907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
1703
|
Kosinski J, Plotz G, Guarné A, Bujnicki JM, Friedhoff P. The PMS2 subunit of human MutLalpha contains a metal ion binding domain of the iron-dependent repressor protein family. J Mol Biol 2008; 382:610-27. [PMID: 18619468 DOI: 10.1016/j.jmb.2008.06.056] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2008] [Revised: 06/13/2008] [Accepted: 06/23/2008] [Indexed: 12/22/2022]
Abstract
DNA mismatch repair (MMR) is responsible for correcting replication errors. MutLalpha, one of the main players in MMR, has been recently shown to harbor an endonuclease/metal-binding activity, which is important for its function in vivo. This endonuclease activity has been confined to the C-terminal domain of the hPMS2 subunit of the MutLalpha heterodimer. In this work, we identify a striking sequence-structure similarity of hPMS2 to the metal-binding/dimerization domain of the iron-dependent repressor protein family and present a structural model of the metal-binding domain of MutLalpha. According to our model, this domain of MutLalpha comprises at least three highly conserved sequence motifs, which are also present in most MutL homologs from bacteria that do not rely on the endonuclease activity of MutH for strand discrimination. Furthermore, based on our structural model, we predict that MutLalpha is a zinc ion binding protein and confirm this prediction by way of biochemical analysis of zinc ion binding using the full-length and C-terminal domain of MutLalpha. Finally, we demonstrate that the conserved residues of the metal ion binding domain are crucial for MMR activity of MutLalpha in vitro.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | | | | | | | | |
Collapse
|
1704
|
Sanchez-Pulido L, Devos D, Sung ZR, Calonje M. RAWUL: a new ubiquitin-like domain in PRC1 ring finger proteins that unveils putative plant and worm PRC1 orthologs. BMC Genomics 2008; 9:308. [PMID: 18588675 PMCID: PMC2447854 DOI: 10.1186/1471-2164-9-308] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 06/27/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Polycomb group (PcG) proteins are a set of chromatin-modifying proteins that play a key role in epigenetic gene regulation. The PcG proteins form large multiprotein complexes with different activities. The two best-characterized PcG complexes are the PcG repressive complex 1 (PRC1) and 2 (PRC2) that respectively possess histone 2A lysine 119 E3 ubiquitin ligase and histone 3 lysine 27 methyltransferase activities. While PRC2-like complexes are conserved throughout the eukaryotic kingdoms, PRC1-like complexes have only been described in Drosophila and vertebrates. Since both complexes are required for the gene silencing mechanism in Drosophila and vertebrates, how PRC1 function is realized in organisms that apparently lack PRC1 such as plants, is so far unknown. In vertebrates, PRC1 includes three proteins, Ring1B, Ring1A, and Bmi-1 that form an E3 ubiquitin ligase complex. These PRC1 proteins have an N-terminally located Ring finger domain associated to a poorly characterized conserved C-terminal region. RESULTS We obtained statistically significant evidences of sequence similarity between the C-terminal region of the PRC1 Ring finger proteins and the ubiquitin (Ubq)-like family proteins, thus defining a new Ubq-like domain, the RAWUL domain. In addition, our analysis revealed the existence of plant and worm proteins that display the conserved combination of a Ring finger domain at the N-terminus and a RAWUL domain at the C-terminus. CONCLUSION Analysis of the conserved domain architecture among PRC1 Ring finger proteins revealed the existence of long sought PRC1 protein orthologs in these organisms, suggesting the functional conservation of PRC1 throughout higher eukaryotes.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Centro Nacional de Biotecnología (CNB-CSIC). Cantoblanco, E-28049 Madrid, Spain.
| | | | | | | |
Collapse
|
1705
|
Hahn P, Böse J, Edler S, Lengeling A. Genomic structure and expression of Jmjd6 and evolutionary analysis in the context of related JmjC domain containing proteins. BMC Genomics 2008; 9:293. [PMID: 18564434 PMCID: PMC2453528 DOI: 10.1186/1471-2164-9-293] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2007] [Accepted: 06/18/2008] [Indexed: 12/24/2022] Open
Abstract
Background The jumonji C (JmjC) domain containing gene 6 (Jmjd6, previously known as phosphatidylserine receptor) has misleadingly been annotated to encode a transmembrane receptor for the engulfment of apoptotic cells. Given the importance of JmjC domain containing proteins in controlling a wide range of diverse biological functions, we undertook a comparative genomic analysis to gain further insights in Jmjd6 gene organisation, evolution, and protein function. Results We describe here a semiautomated computational pipeline to identify and annotate JmjC domain containing proteins. Using a sequence segment N-terminal of the Jmjd6 JmjC domain as query for a reciprocal BLAST search, we identified homologous sequences in 62 species across all major phyla. Retrieved Jmjd6 sequences were used to phylogenetically analyse corresponding loci and their genomic neighbourhood. This analysis let to the identification and characterisation of a bi-directional transcriptional unit compromising the Jmjd6 and 1110005A03Rik genes and to the recognition of a new, before overseen Jmjd6 exon in mammals. Using expression studies, two novel Jmjd6 splice variants were identified and validated in vivo. Analysis of the Jmjd6 neighbouring gene 1110005A03Rik revealed an incident deletion of this gene in two out of three earlier reported Jmjd6 knockout mice, which might affect previously described conflicting phenotypes. To determine potentially important residues for Jmjd6 function a structural model of the Jmjd6 protein was calculated based on sequence conservation. This approach identified a conserved double-stranded β-helix (DSBH) fold and a HxDxnH facial triad as structural motifs. Moreover, our systematic annotation in nine species identified 313 DSBH fold-containing proteins that split into 25 highly conserved subgroups. Conclusion We give further evidence that Jmjd6 most likely has a function as a nonheme-Fe(II)-2-oxoglutarate-dependent dioxygenase as previously suggested. Further, we provide novel insights into the evolution of Jmjd6 and other related members of the superfamily of JmjC domain containing proteins. Finally, we discuss possibilities of the involvement of Jmjd6 and 1110005A03Rik in an antagonistic biochemical pathway.
Collapse
Affiliation(s)
- Phillip Hahn
- Research Group Infection Genetics, Department of Experimental Mouse Genetics, Helmholtz Centre for Infection Research, D-31824 Braunschweig, Germany.
| | | | | | | |
Collapse
|
1706
|
Zhang W, Du Y, Khudyakov I, Fan Q, Gao H, Ning D, Wolk CP, Xu X. A gene cluster that regulates both heterocyst differentiation and pattern formation in Anabaena sp. strain PCC 7120. Mol Microbiol 2008; 66:1429-43. [PMID: 18045384 DOI: 10.1111/j.1365-2958.2007.05997.x] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Wild-type Anabaena sp. strain PCC 7120, a filamentous nitrogen-fixing cyanobacterium, produces single heterocysts at semi-regular intervals. asr0100 (patU5) and alr0101 (patU3) are homologous to the 5' and 3' portions of patU of Nostoc punctiforme. alr0099 (hetZ) overlaps the 5' end of patU5. hetZ, patU5 and patU3 were all upregulated, or expressed specifically, in proheterocysts and heterocysts. Mutants of hetZ showed delayed or no heterocyst differentiation. In contrast, a patU3 mutation produced a multiple contiguous heterocyst (Mch) phenotype and restored the formation of otherwise lost intercalary heterocysts in a patA background. Decreasing the expression of patU3 greatly increased the frequency of heterocysts in a mini-patS strain. Two promoter regions and two principal, corresponding transcripts were detected in the hetZ-patU5-patU3 region. Transcription of hetZ was upregulated in a hetZ mutant and downregulated in a patU3 mutant. When mutants hetZ::C.K2 and hetZ::Tn5-1087b were nitrogen-deprived, P(hetC)-gfp was very weakly expressed, and in hetZ::Tn5-1087b, P(hetR)-gfp was relatively strongly expressed in cells that had neither a regular pattern nor altered morphology. We conclude that the hetZ-patU5-patU3 cluster plays an important role in co-ordination of heterocyst differentiation and pattern formation. The presence of homologous clusters in filamentous genera without heterocysts is suggestive of a more general role.
Collapse
Affiliation(s)
- Wei Zhang
- The State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei 430072, China
| | | | | | | | | | | | | | | |
Collapse
|
1707
|
Abstract
Background Profile Hidden Markov Model (HMM) is a powerful statistical model to represent a family of DNA, RNA, and protein sequences. Profile HMM has been widely used in bioinformatics research such as sequence alignment, gene structure prediction, motif identification, protein structure prediction, and biological database search. However, few comprehensive, visual editing tools for profile HMM are publicly available. Results We develop a visual editor for profile Hidden Markov Models (HMMEditor). HMMEditor can visualize the profile HMM architecture, transition probabilities, and emission probabilities. Moreover, it provides functions to edit and save HMM and parameters. Furthermore, HMMEditor allows users to align a sequence against the profile HMM and to visualize the corresponding Viterbi path. Conclusion HMMEditor provides a set of unique functions to visualize and edit a profile HMM. It is a useful tool for biological sequence analysis and modeling. Both HMMEditor software and web service are freely available.
Collapse
Affiliation(s)
- Jianyong Dai
- School of Electrical Engineering and Computer Science, University of Central Florida, Orland, FL 32816, USA.
| | | |
Collapse
|
1708
|
Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. Curr Opin Struct Biol 2008; 18:358-65. [DOI: 10.1016/j.sbi.2008.02.006] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2007] [Accepted: 02/14/2008] [Indexed: 11/19/2022]
|
1709
|
Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008; 18:342-8. [PMID: 18436442 PMCID: PMC2680823 DOI: 10.1016/j.sbi.2008.02.004] [Citation(s) in RCA: 311] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 02/14/2008] [Indexed: 10/22/2022]
Abstract
Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, United States.
| |
Collapse
|
1710
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1819] [Impact Index Per Article: 107.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
1711
|
Abstract
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that ∼92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.
Collapse
Affiliation(s)
- Bas E Dutilh
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
1712
|
A novel trehalose synthesizing pathway in the hyperthermophilic Crenarchaeon Thermoproteus tenax: the unidirectional TreT pathway. Arch Microbiol 2008; 190:355-69. [PMID: 18483808 DOI: 10.1007/s00203-008-0377-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Revised: 04/11/2008] [Accepted: 04/20/2008] [Indexed: 10/22/2022]
Abstract
In the genome of the hyperthermophilic archaeon Thermoproteus tenax a gene (treS/P) encoding a protein with similarity to annotated trehalose phosphorylase (TreP), trehalose synthase (TreS) and more recently characterized trehalose glycosyltransferring synthase (TreT) was identified. The treS/P gene as well as an upstream located ORF of unknown function (orfY) were cloned, heterologously expressed in E. coli and purified. The enzymatic characterization of the putative TreS/P revealed TreT activity. However, contrary to the previously characterized reversible TreT from Thermococcus litoralis and Pyrococcus horikoshii, the T. tenax enzyme is unidirectional and catalyzes only the formation of trehalose from UDP (ADP)-glucose and glucose. The T. tenax enzyme differs from the reversible TreT of T. litoralis by its preference for UDP-glucose as co-substrate. Phylogenetic and comparative gene context analyses reveal a conserved organization of the unidirectional TreT and OrfY gene cluster that is present in many Archaea and a few Bacteria. In contrast, the reversible TreT pathway seems to be restricted to only a few archaeal (e.g. Thermococcales) and bacterial (Thermotogales) members. Here we present a new pathway exclusively involved in trehalose synthesis--the unidirectional TreT pathway--and discuss its physiological role as well as its phylogenetic distribution.
Collapse
|
1713
|
Anantharaman V, Aravind L. Analysis of DBC1 and its homologs suggests a potential mechanism for regulation of sirtuin domain deacetylases by NAD metabolites. Cell Cycle 2008; 7:1467-72. [PMID: 18418069 PMCID: PMC2423810 DOI: 10.4161/cc.7.10.5883] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Deleted in Breast Cancer-1 (DBC1) and its paralog CARP-1 are large multi-domain proteins, with a nuclear or perinuclear localization, and a role in promoting apoptosis upon processing by caspases. Recent studies on human DBC1 show that it is a specific inhibitor of the sirtuin-type deacetylase, Sirt1, which deacetylates histones and p53. Using sensitive sequence profile searches and HMM-HMM comparisons we show that the central conserved globular domain present in the DBC1 and it homologs from diverse eukaryotes is a catalytically inactive version of the Nudix hydrolase (MutT) domain. Given that Nudix domains are known to bind nucleoside diphosphate sugars and NAD, we predict that this domain in DBC1 and its homologs binds NAD metabolites such as ADP-ribose. Hence, we propose that DBC1 and its homologs are likely to regulate the activity of SIRT1 or related deacetylases by sensing the soluble products or substrates of the NAD-dependent deacetylation reaction. The complex domain architectures of the members of the DBC1 family, which include fusions to the RNA-binding S1-like domain, the DNA-binding SAP domain and EF-hand domains, suggest that they are likely to function as integrators of distinct regulatory signals including chromatin protein modification, soluble compounds in NAD metabolism, apoptotic stimuli and RNA recognition.
Collapse
Affiliation(s)
- Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L. Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
1714
|
Orlowski J, Bujnicki JM. Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses. Nucleic Acids Res 2008; 36:3552-69. [PMID: 18456708 PMCID: PMC2441816 DOI: 10.1093/nar/gkn175] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
For a very long time, Type II restriction enzymes (REases) have been a paradigm of ORFans: proteins with no detectable similarity to each other and to any other protein in the database, despite common cellular and biochemical function. Crystallographic analyses published until January 2008 provided high-resolution structures for only 28 of 1637 Type II REase sequences available in the Restriction Enzyme database (REBASE). Among these structures, all but two possess catalytic domains with the common PD-(D/E)XK nuclease fold. Two structures are unrelated to the others: R.BfiI exhibits the phospholipase D (PLD) fold, while R.PabI has a new fold termed 'half-pipe'. Thus far, bioinformatic studies supported by site-directed mutagenesis have extended the number of tentatively assigned REase folds to five (now including also GIY-YIG and HNH folds identified earlier in homing endonucleases) and provided structural predictions for dozens of REase sequences without experimentally solved structures. Here, we present a comprehensive study of all Type II REase sequences available in REBASE together with their homologs detectable in the nonredundant and environmental samples databases at the NCBI. We present the summary and critical evaluation of structural assignments and predictions reported earlier, new classification of all REase sequences into families, domain architecture analysis and new predictions of three-dimensional folds. Among 289 experimentally characterized (not putative) Type II REases, whose apparently full-length sequences are available in REBASE, we assign 199 (69%) to contain the PD-(D/E)XK domain. The HNH domain is the second most common, with 24 (8%) members. When putative REases are taken into account, the fraction of PD-(D/E)XK and HNH folds changes to 48% and 30%, respectively. Fifty-six characterized (and 521 predicted) REases remain unassigned to any of the five REase folds identified so far, and may exhibit new architectures. These enzymes are proposed as the most interesting targets for structure determination by high-resolution experimental methods. Our analysis provides the first comprehensive map of sequence-structure relationships among Type II REases and will help to focus the efforts of structural and functional genomics of this large and biotechnologically important class of enzymes.
Collapse
Affiliation(s)
- Jerzy Orlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | | |
Collapse
|
1715
|
Abstract
beta-Propellers are toroidal folds, in which repeated, four-stranded beta-meanders are arranged in a circular and slightly tilted fashion, like the blades of a propeller. They are found in all domains of life, with a strong preponderance among eukaryotes. Propellers show considerable sequence diversity and are classified into six separate structural groups by the SCOP and CATH databases. Despite this diversity, they often show similarities across groups, not only in structure but also in sequence, raising the possibility of a common origin. In agreement with this hypothesis, most propellers group together in a cluster map of all-beta folds generated by sequence similarity, because of numerous pairwise matches, many of which are individually nonsignificant. In total, 45 of 60 propellers in the SCOP25 database, covering four SCOP folds, are clustered in this group and analysis with sensitive sequence comparison methods shows that they are similar at a level indicative of homology. Two mechanisms appear to contribute to the evolution of beta-propellers: amplification from single blades and subsequent functional differentiation. The observation of propellers with nearly identical blades in genomic sequences show that these mechanisms are still operating today.
Collapse
Affiliation(s)
- Indronil Chaudhuri
- Department for Protein Evolution, Max Planck Institute for Developmental Biology, 72076 Tuebingen, Germany
| | | | | |
Collapse
|
1716
|
Niv MY, Skrabanek L, Roberts RJ, Scheraga HA, Weinstein H. Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S--a novel motif scan algorithm with optional secondary structure constraints. Proteins 2008; 71:631-40. [PMID: 17972284 PMCID: PMC2465807 DOI: 10.1002/prot.21777] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.
Collapse
Affiliation(s)
- Masha Y Niv
- Department of Physiology and Biophysics, Weill Medical College of Cornell University, 1300 York Ave., New York, New York 10021, USA.
| | | | | | | | | |
Collapse
|
1717
|
Roovers M, Kaminska KH, Tkaczuk KL, Gigot D, Droogmans L, Bujnicki JM. The YqfN protein of Bacillus subtilis is the tRNA: m1A22 methyltransferase (TrmK). Nucleic Acids Res 2008; 36:3252-62. [PMID: 18420655 PMCID: PMC2425500 DOI: 10.1093/nar/gkn169] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
N1-methylation of adenosine to m1A occurs in several different positions in tRNAs from various organisms. A methyl group at position N1 prevents Watson–Crick-type base pairing by adenosine and is therefore important for regulation of structure and stability of tRNA molecules. Thus far, only one family of genes encoding enzymes responsible for m1A methylation at position 58 has been identified, while other m1A methyltransferases (MTases) remain elusive. Here, we show that Bacillus subtilis open reading frame yqfN is necessary and sufficient for N1-adenosine methylation at position 22 of bacterial tRNA. Thus, we propose to rename YqfN as TrmK, according to the traditional nomenclature for bacterial tRNA MTases, or TrMet(m1A22) according to the nomenclature from the MODOMICS database of RNA modification enzymes. tRNAs purified from a ΔtrmK strain are a good substrate in vitro for the recombinant TrmK protein, which is sufficient for m1A methylation at position 22 as are tRNAs from Escherichia coli, which natively lacks m1A22. TrmK is conserved in Gram-positive bacteria and present in some Gram-negative bacteria, but its orthologs are apparently absent from archaea and eukaryota. Protein structure prediction indicates that the active site of TrmK does not resemble the active site of the m1A58 MTase TrmI, suggesting that these two enzymatic activities evolved independently.
Collapse
Affiliation(s)
- Martine Roovers
- Institut de Recherches Microbiologiques Jean-Marie Wiame, B-1070 Bruxelles, Belgium
| | | | | | | | | | | |
Collapse
|
1718
|
Abstract
The linear biosynthetic pathway leading from alpha-ketoisovalerate to pantothenate (vitamin B5) and on to CoA comprises eight steps in the Bacteria and Eukaryota. Genes for up to six steps of this pathway can be identified by sequence homology in individual archaeal genomes. However, there are no archaeal homologs to known isoforms of pantothenate synthetase (PS) or pantothenate kinase. Using comparative genomics, we previously identified two conserved archaeal protein families as the best candidates for the missing steps. Here we report the characterization of the predicted PS gene from Methanosarcina mazei, which encodes a hypothetical protein (MM2281) with no obvious homologs outside its own family. When expressed in Escherichia coli, MM2281 partially complemented an auxotrophic mutant without PS activity. Purified recombinant MM2281 showed no PS activity on its own, but the enzyme enabled substantial synthesis of [14C]4'-phosphopantothenate from [14C]beta-alanine, pantoate and ATP when coupled with E. coli pantothenate kinase. ADP, but not AMP, was detected as a coproduct of the coupled reaction. MM2281 also transferred the 14C-label from [14C]beta-alanine to pantothenate in the presence of pantoate and ADP, presumably through isotope exchange. No exchange took place when pantoate was removed or ADP replaced with AMP. Our results indicate that MM2281 represents a novel type of PS that forms ADP and is strongly inhibited by its product pantothenate. These properties differ substantially from those of bacterial PS, and may explain why PS genes, in contrast to other pantothenate biosynthetic genes, were not exchanged horizontally between the Bacteria and Archaea.
Collapse
Affiliation(s)
- Silvia Ronconi
- Lehrstuhl für Genetik, Technische Universität München, Freising, Germany
| | | | | |
Collapse
|
1719
|
Abstract
Motivation: Trimeric autotransporter adhesins (TAAs), such as Yersinia YadA, Neisseria NadA, Moraxella UspAs, Haemophilus Hia and Bartonella BadA, are important pathogenicity factors of proteobacteria. Their high sequence diversity and distinct mosaic-like structure lead to difficulties in the annotation of their sequences. These stem from the large number of short repeats, the presence of compositionally unusual coiled-coils, fuzzy domain boundaries and regions of seemingly low sequence complexity. Results: We have developed a workflow, named daTAA, for the accurate domain annotation of TAAs. Its core consists of manually curated alignments and of knowledge-based rules that enhance assignments made by sequence similarity. Compared to general domain annotation servers such as PFAM, daTAA captures more domains and provides more sensitive domain detection, as well as integrated and detailed coiled-coil assignments. Availability: The daTAA server is freely accessible at http://toolkit.tuebingen.mpg.de/dataa Contact:andrei.lupas@tuebingen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pawel Szczesny
- Department of Protein Evolution, Max-Planck Institute for Developmental Biology, Spemannstr 35, 72076 Tuebingen, Germany
| | | |
Collapse
|
1720
|
Tcheremenskaia O, Giuliani A, Tomasi M. PROFALIGN algorithm identifies the regions containing folding determinants by scoring pairs of hydrophobic profiles of remotely related proteins. J Comput Biol 2008; 15:445-55. [PMID: 18386966 DOI: 10.1089/cmb.2007.0100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Profile comparison methods have been shown to be very powerful in creating accurate alignments of protein sequences, especially in the case of remotely related proteins (RRP). These methods take advantage of the observation that hydrophobic profiles are more conserved than the corresponding amino acid sequences. Here, we present the PROFALIGN algorithm, which allows one to perform a detailed comparative analysis, at both local and global levels of two protein sequence profiles. The user can either choose among four different hydrophobic scales (Miyazawa-Jernigan, Eisenberg, Engelman-Steiz, and Kyte-Doolittle) or can add a personal scale. The interface is designed for a wide range of users, including those who are not involved in protein research. It allows one to vary the alignment parameters (such as gap penalties, embedding, and profile smoothness). Secondary structure propensity is added as an optional alignment filter. Similar segments of two proteins are singled out on the basis of score. We have tested the algorithm with different Src homology 3 (SH3) domain fragments sharing low sequence homology but very similar three-dimensional (3D) structures. By using the Miyazawa-Jernigan hydrophobic scale, PROFALIGN was able to detect the strong correlation between the regions that are known to be crucial for SH3 transition state topology. PROFALIGN seems able to identify most of the mutual alignment of structures on the basis of their hydrophobic profiles, delimiting the regions containing the key determinants of folding. Therefore, the present methodology may be useful for the detection of the most structurally relevant positions inside remote related proteins.
Collapse
Affiliation(s)
- Olga Tcheremenskaia
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome, Italy.
| | | | | |
Collapse
|
1721
|
Cheng H, Kim BH, Grishin NV. Discrimination between distant homologs and structural analogs: lessons from manually constructed, reliable data sets. J Mol Biol 2008; 377:1265-78. [PMID: 18313074 PMCID: PMC4494761 DOI: 10.1016/j.jmb.2007.12.076] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Accepted: 12/20/2007] [Indexed: 10/22/2022]
Abstract
A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.
Collapse
Affiliation(s)
- Hua Cheng
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA.
| | | | | |
Collapse
|
1722
|
Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 2008; 24:924-31. [PMID: 18296462 PMCID: PMC2648832 DOI: 10.1093/bioinformatics/btn069] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pair-wise residue-residue contacts in proteins can be predicted from both threading templates and sequence-based machine learning. However, most structure modeling approaches only use the template-based contact predictions in guiding the simulations; this is partly because the sequence-based contact predictions are usually considered to be less accurate than that by threading. With the rapid progress in sequence databases and machine-learning techniques, it is necessary to have a detailed and comprehensive assessment of the contact-prediction methods in different template conditions. RESULTS We develop two methods for protein-contact predictions: SVM-SEQ is a sequence-based machine learning approach which trains a variety of sequence-derived features on contact maps; SVM-LOMETS collects consensus contact predictions from multiple threading templates. We test both methods on the same set of 554 proteins which are categorized into 'Easy', 'Medium', 'Hard' and 'Very Hard' targets based on the evolutionary and structural distance between templates and targets. For the Easy and Medium targets, SVM-LOMETS obviously outperforms SVM-SEQ; but for the Hard and Very Hard targets, the accuracy of the SVM-SEQ predictions is higher than that of SVM-LOMETS by 12-25%. If we combine the SVM-SEQ and SVM-LOMETS predictions together, the total number of correctly predicted contacts in the Hard proteins will increase by more than 60% (or 70% for the long-range contact with a sequence separation > or =24), compared with SVM-LOMETS alone. The advantage of SVM-SEQ is also shown in the CASP7 free modeling targets where the SVM-SEQ is around four times more accurate than SVM-LOMETS in the long-range contact prediction. These data demonstrate that the state-of-the-art sequence-based contact prediction has reached a level which may be helpful in assisting tertiary structure modeling for the targets which do not have close structure templates. The maximum yield should be obtained by the combination of both sequence- and template-based predictions.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | | |
Collapse
|
1723
|
Cheng H, Kim BH, Grishin NV. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs. Proteins 2008; 70:1162-6. [PMID: 17932926 DOI: 10.1002/prot.21783] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup.
Collapse
Affiliation(s)
- Hua Cheng
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | | | |
Collapse
|
1724
|
Cheng J. A multi-template combination algorithm for protein comparative modeling. BMC STRUCTURAL BIOLOGY 2008; 8:18. [PMID: 18366648 PMCID: PMC2311309 DOI: 10.1186/1472-6807-8-18] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 03/17/2008] [Indexed: 11/26/2022]
Abstract
BACKGROUND Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available. RESULTS Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure. CONCLUSION We have developed a novel multi-template algorithm to improve protein comparative modeling.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO 65211-2060, USA.
| |
Collapse
|
1725
|
Poleksic A, Fienup M. Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms. Bioinformatics 2008; 24:1145-53. [PMID: 18337259 DOI: 10.1093/bioinformatics/btn097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench. RESULTS We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences. AVAILABILITY UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
| | | |
Collapse
|
1726
|
Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits. Mol Cell Biol 2008; 28:3151-61. [PMID: 18332120 DOI: 10.1128/mcb.01674-07] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
BUD23 was identified from a bioinformatics analysis of Saccharomyces cerevisiae genes involved in ribosome biogenesis. Deletion of BUD23 leads to severely impaired growth, reduced levels of the small (40S) ribosomal subunit, and a block in processing 20S rRNA to 18S rRNA, a late step in 40S maturation. Bud23 belongs to the S-adenosylmethionine-dependent Rossmann-fold methyltransferase superfamily and is related to small-molecule methyltransferases. Nevertheless, we considered that Bud23 methylates rRNA. Methylation of G1575 is the only mapped modification for which the methylase has not been assigned. Here, we show that this modification is lost in bud23 mutants. The nuclear accumulation of the small-subunit reporters Rps2-green fluorescent protein (GFP) and Rps3-GFP, as well as the rRNA processing intermediate, the 5' internal transcribed spacer 1, indicate that bud23 mutants are defective for small-subunit export. Mutations in Bud23 that inactivated its methyltransferase activity complemented a bud23Delta mutant. In addition, mutant ribosomes in which G1575 was changed to adenosine supported growth comparable to that of cells with wild-type ribosomes. Thus, Bud23 protein, but not its methyltransferase activity, is important for biogenesis and export of the 40S subunit in yeast.
Collapse
|
1727
|
Carrière C, Mornon JP, Venien-Bryan C, Boisset N, Callebaut I. Calcineurin B-like domains in the large regulatory α/β subunits of phosphorylase kinase. Proteins 2008; 71:1597-606. [DOI: 10.1002/prot.22006] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
1728
|
Sadreyev RI, Grishin NV. Accurate statistical model of comparison between multiple sequence alignments. Nucleic Acids Res 2008; 36:2240-8. [PMID: 18285364 PMCID: PMC2367703 DOI: 10.1093/nar/gkn065] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Comparison of multiple protein sequence alignments (MSA) reveals unexpected evolutionary relations between protein families and leads to exciting predictions of spatial structure and function. The power of MSA comparison critically depends on the quality of statistical model used to rank the similarities found in a database search, so that biologically relevant relationships are discriminated from spurious connections. Here, we develop an accurate statistical description of MSA comparison that does not originate from conventional models of single sequence comparison and captures essential features of protein families. As a final result, we compute E-values for the similarity between any two MSA using a mathematical function that depends on MSA lengths and sequence diversity. To develop these estimates of statistical significance, we first establish a procedure for generating realistic alignment decoys that reproduce natural patterns of sequence conservation dictated by protein secondary structure. Second, since similarity scores between these alignments do not follow the classic Gumbel extreme value distribution, we propose a novel distribution that yields statistically perfect agreement with the data. Third, we apply this random model to database searches and show that it surpasses conventional models in the accuracy of detecting remote protein similarities.
Collapse
|
1729
|
Obarska-Kosinska A, Taylor JEN, Callow P, Orlowski J, Bujnicki JM, Kneale GG. HsdR subunit of the type I restriction-modification enzyme EcoR124I: biophysical characterisation and structural modelling. J Mol Biol 2008; 376:438-452. [PMID: 18164032 PMCID: PMC2878639 DOI: 10.1016/j.jmb.2007.11.024] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Revised: 11/08/2007] [Accepted: 11/09/2007] [Indexed: 01/19/2023]
Abstract
Type I restriction-modification (RM) systems are large, multifunctional enzymes composed of three different subunits. HsdS and HsdM form a complex in which HsdS recognizes the target DNA sequence, and HsdM carries out methylation of adenosine residues. The HsdR subunit, when associated with the HsdS-HsdM complex, translocates DNA in an ATP-dependent process and cleaves unmethylated DNA at a distance of several thousand base-pairs from the recognition site. The molecular mechanism by which these enzymes translocate the DNA is not fully understood, in part because of the absence of crystal structures. To date, crystal structures have been determined for the individual HsdS and HsdM subunits and models have been built for the HsdM-HsdS complex with the DNA. However, no structure is available for the HsdR subunit. In this work, the gene coding for the HsdR subunit of EcoR124I was re-sequenced, which showed that there was an error in the published sequence. This changed the position of the stop codon and altered the last 17 amino acid residues of the protein sequence. An improved purification procedure was developed to enable HsdR to be purified efficiently for biophysical and structural analysis. Analytical ultracentrifugation shows that HsdR is monomeric in solution, and the frictional ratio of 1.21 indicates that the subunit is globular and fairly compact. Small angle neutron-scattering of the HsdR subunit indicates a radius of gyration of 3.4 nm and a maximum dimension of 10 nm. We constructed a model of the HsdR using protein fold-recognition and homology modelling to model individual domains, and small-angle neutron scattering data as restraints to combine them into a single molecule. The model reveals an ellipsoidal shape of the enzymatic core comprising the N-terminal and central domains, and suggests conformational heterogeneity of the C-terminal region implicated in binding of HsdR to the HsdS-HsdM complex.
Collapse
Affiliation(s)
- Agnieszka Obarska-Kosinska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | - James E N Taylor
- Biophysics Laboratories, Institute of Biomedical and Biomolecular Sciences, University of Portsmouth, PO1 2DT, UK
| | - Philip Callow
- EPSAM and ISTM Research Institutes, Keele University, Staffordshire ST5 5BG, UK; ILL-EMBL Deuteration Laboratory, Partnership for Structural Biology, Institut Laue Langevin, 38042 Grenoble Cedex 9, Grenoble, France
| | - Jerzy Orlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| | - G Geoff Kneale
- Biophysics Laboratories, Institute of Biomedical and Biomolecular Sciences, University of Portsmouth, PO1 2DT, UK.
| |
Collapse
|
1730
|
Agüero-Chapín G, González-Díaz H, de la Riva G, Rodríguez E, Sánchez-Rodríguez A, Podda G, Vazquez-Padrón RI. MMM-QSAR Recognition of Ribonucleases without Alignment: Comparison with an HMM Model and Isolation from Schizosaccharomyces pombe, Prediction, and Experimental Assay of a New Sequence. J Chem Inf Model 2008; 48:434-48. [DOI: 10.1021/ci7003225] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Guillermín Agüero-Chapín
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Humberto González-Díaz
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Gustavo de la Riva
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Edrey Rodríguez
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Aminael Sánchez-Rodríguez
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Gianni Podda
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| | - Roberto I. Vazquez-Padrón
- Dipartimento Farmaco Chimico Tecnologico, Universitá Degli Studi di Cagliari, Cagliari, 09124, Italy, CAP, Faculty of Chemistry and Pharmacy, IBP, and CBQ, UCLV, Santa Clara 54830, Cuba, Unit for Bioinformatics & Connectivity Analysis (UBICA), Institute of Industrial Pharmacy and Department of Organic Chemistry, Faculty of Pharmacy, USC, Santiago de Compostela 15782, Spain, CINVESTAV-LANGEBIO, Irapuato, Guanajuato 36821, México, Caribbean Vitroplants, Santo Domingo 1464, Dominican Republic, and Vascular
| |
Collapse
|
1731
|
Shah AR, Oehmen CS, Webb-Robertson BJ. SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection. Bioinformatics 2008; 24:783-90. [DOI: 10.1093/bioinformatics/btn028] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
1732
|
Biegert A, Söding J. De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 2008; 24:807-14. [DOI: 10.1093/bioinformatics/btn039] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
1733
|
Abstract
We developed and tested the I-TASSER protein structure prediction algorithm in the CASP7 experiment, where targets are first threaded through the PDB library and continuous fragments in the threading alignments are exploited to assemble the global structure. The final models are obtained from the progressive refinements started from the last round structure clusters. A majority of the targets in the template-based modeling (TBM) category have the templates drawn closer to the native structure by more than 1 A within the aligned regions. For the free-modeling (FM) targets, I-TASSER builds correct topology for 7/19 cases with sequence up to 155 residues long. For the first time, the automated server prediction generates models as good as the human-expert does in all the categories, which shows the robustness of the method and the potential of the application to genome-wide structure prediction. Despite the success, the accuracy of I-TASSER modeling is still dominated by the similarity of the template and target structures with a strong correlation coefficient ( approximately 0.9) between the root-mean-squared deviation (RMSD) to native of the templates and the final models. Especially, there is no high-resolution model below 2 A for the FM targets. These problems highlight the issues that need to be addressed in the next generation of atomic-level I-TASSER development especially for the FM target modeling.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics, Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047, USA.
| |
Collapse
|
1734
|
Sánchez-Flores A, Pérez-Rueda E, Segovia L. Protein homology detection and fold inference through multiple alignment entropy profiles. Proteins 2008; 70:248-56. [PMID: 17671981 DOI: 10.1002/prot.21506] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.
Collapse
Affiliation(s)
- Alejandro Sánchez-Flores
- Departamento de Ingeniería Celular y Biocatálisis, Instituto de Biotecnología, Universidad Nacional Autónoma de México
| | | | | |
Collapse
|
1735
|
Thoms S, Debelyy MO, Nau K, Meyer HE, Erdmann R. Lpx1p is a peroxisomal lipase required for normal peroxisome morphology. FEBS J 2008; 275:504-14. [DOI: 10.1111/j.1742-4658.2007.06217.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
1736
|
Godlewska R, Pawlowski M, Dzwonek A, Mikula M, Ostrowski J, Drela N, Jagusztyn-Krynicka EK. Tip-alpha (hp0596 gene product) is a highly immunogenic Helicobacter pylori protein involved in colonization of mouse gastric mucosa. Curr Microbiol 2008; 56:279-86. [PMID: 18172719 DOI: 10.1007/s00284-007-9083-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 11/06/2007] [Indexed: 02/06/2023]
Abstract
A product of the Helicobacter pylori hp0596 gene (Tip-alpha) is a highly immunogenic homodimeric protein, unique for this bacterium. Cell fractionation experiments indicate that Tip-alpha is anchored to the inner membrane. In contrast, the three-dimensional model of the protein suggests that Tip-alpha is soluble or, at least, largely exposed to the solvent. hp0596 gene knockout resulted in a significant decrease in the level of H. pylori colonization as measured by real-time PCR assay. In addition, the Tip-alpha recombinant protein was determined to stimulate macrophage to produce IL-1alpha and TNF-alpha. Both results imply that Tip-alpha is rather loosely connected to the inner membrane and potentially released during infection.
Collapse
Affiliation(s)
- Renata Godlewska
- Department of Bacterial Genetics, Institute of Microbiology, University of Warsaw, ul. Miecznikowa 1, 02-096 Warsaw, Poland.
| | | | | | | | | | | | | |
Collapse
|
1737
|
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer ELL, Bateman A. The Pfam protein families database. Nucleic Acids Res 2008; 36:D281-8. [PMID: 18039703 PMCID: PMC2238907 DOI: 10.1093/nar/gkm960] [Citation(s) in RCA: 1709] [Impact Index Per Article: 100.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2007] [Revised: 10/10/2007] [Accepted: 10/16/2007] [Indexed: 12/14/2022] Open
Abstract
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments and as profile hidden Markov models. The current release of Pfam (22.0) contains 9318 protein families. Pfam is now based not only on the UniProtKB sequence database, but also on NCBI GenPept and on sequences from selected metagenomics projects. Pfam is available on the web from the consortium members using a new, consistent and improved website design in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/), as well as from mirror sites in France (http://pfam.jouy.inra.fr/) and South Korea (http://pfam.ccbb.re.kr/).
Collapse
Affiliation(s)
- Robert D. Finn
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - John Tate
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Jaina Mistry
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Penny C. Coggill
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Stephen John Sammut
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Hans-Rudolf Hotz
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Goran Ceric
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Kristoffer Forslund
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Sean R. Eddy
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Erik L. L. Sonnhammer
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| | - Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambridgeshire, CB10 1SA, UK, Howard Hughes Medical Institute Janelia Farm Research Campus, 19700 Helix Drive, Ashburn, VA 20147, USA and Stockholm Bioinformatics Center, Albanova, Stockholm University, SE-10691 Stockholm, Sweden
| |
Collapse
|
1738
|
Jianlin Cheng, Tegge A, Baldi P. Machine Learning Methods for Protein Structure Prediction. IEEE Rev Biomed Eng 2008; 1:41-9. [DOI: 10.1109/rbme.2008.2008239] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
1739
|
Abstract
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
1740
|
Schwarzenbacher R, Godzik A, Jaroszewski L. The JCSG MR pipeline: optimized alignments, multiple models and parallel searches. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2008; 64:133-40. [PMID: 18094477 PMCID: PMC2394805 DOI: 10.1107/s0907444907050111] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2007] [Accepted: 10/12/2007] [Indexed: 12/05/2022]
Abstract
The success rate of molecular replacement (MR) falls considerably when search models share less than 35% sequence identity with their templates, but can be improved significantly by using fold-recognition methods combined with exhaustive MR searches. Models based on alignments calculated with fold-recognition algorithms are more accurate than models based on conventional alignment methods such as FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement and allow parallel processing of such calculations, one can effectively increase the success rate of MR. Here, updated results from the JCSG MR pipeline are presented, which to date has solved 33 MR structures with less than 35% sequence identity to the closest homologue of known structure. By using difficult MR problems as examples, it is demonstrated that successful MR phasing is possible even in cases where the similarity between the model and the template can only be detected with fold-recognition algorithms. In the first step, several search models are built based on all homologues found in the PDB by fold-recognition algorithms. The models resulting from this process are used in parallel MR searches with different combinations of input parameters of the MR phasing algorithm. The putative solutions are subjected to rigid-body and restrained crystallographic refinement and ranked based on the final values of free R factor, figure of merit and deviations from ideal geometry. Finally, crystal packing and electron-density maps are checked to identify the correct solution. If this procedure does not yield a solution with interpretable electron-density maps, then even more alternative models are prepared. The structurally variable regions of a protein family are identified based on alignments of sequences and known structures from that family and appropriate trimmings of the models are proposed. All combinations of these trimmings are applied to the search models and the resulting set of models is used in the MR pipeline. It is estimated that with the improvements in model building and exhaustive parallel searches with existing phasing algorithms, MR can be successful for more than 50% of recognizable homologues of known structures below the threshold of 35% sequence identity. This implies that about one-third of the proteins in a typical bacterial proteome are potential MR targets.
Collapse
Affiliation(s)
| | - Adam Godzik
- Joint Center for Structural Genomics, Bioinformatics Core, The Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92093, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, Bioinformatics Core, The Burnham Institute for Medical Research, 10901 North Torrey Pines Road, La Jolla, CA 92093, USA
| |
Collapse
|
1741
|
Abstract
Modern molecular biology approaches often result in the accumulation of abundant biological sequence data. Ideally, the function of individual proteins predicted using such data would be determined experimentally. However, if a gene of interest has no predictable function or if the amount of data is too large to experimentally assess individual genes, bioinformatics techniques may provide additional information to allow the inference of function. This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function. The basis and use of sequence similarity methods of homologue detection are described, with emphasis on BLAST and PSI-BLAST. Annotation of gene function through protein domain detection using SMART and Pfam, and the potential for comparison to whole genome data are discussed.
Collapse
|
1742
|
Kosinski J, Kubareva E, Bujnicki JM. A model of restriction endonuclease MvaI in complex with DNA: a template for interpretation of experimental data and a guide for specificity engineering. Proteins 2007; 68:324-36. [PMID: 17407166 DOI: 10.1002/prot.21460] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
R.MvaI is a Type II restriction enzyme (REase), which specifically recognizes the pentanucleotide DNA sequence 5'-CCWGG-3' (W indicates A or T). It belongs to a family of enzymes, which recognize related sequences, including 5'-CCSGG-3' (S indicates G or C) in the case of R.BcnI, or 5'-CCNGG-3' (where N indicates any nucleoside) in the case of R.ScrFI. REases from this family hydrolyze the phosphodiester bond in the DNA between the 2nd and 3rd base in both strands, thereby generating a double strand break with 5'-protruding single nucleotides. So far, no crystal structures of REases with similar cleavage patterns have been solved. Characterization of sequence-structure-function relationships in this family would facilitate understanding of evolution of sequence specificity among REases and could aid in engineering of enzymes with new specificities. However, sequences of R.MvaI or its homologs show no significant similarity to any proteins with known structures, thus precluding straightforward comparative modeling. We used a fold recognition approach to identify a remote relationship between R.MvaI and the structure of DNA repair enzyme MutH, which belongs to the PD-(D/E)XK superfamily together with many other REases. We constructed a homology model of R.MvaI and used it to predict functionally important amino acid residues and the mode of interaction with the DNA. In particular, we predict that only one active site of R.MvaI interacts with the DNA target at a time, and the cleavage of both strands (5'-CCAGG-3' and 5'-CCTGG-3') is achieved by two independent catalytic events. The model is in good agreement with the available experimental data and will serve as a template for further analyses of R.MvaI, R.BcnI, R.ScrFI and other related enzymes.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| | | | | |
Collapse
|
1743
|
Sundaram S, Rathinasabapathi B, Ma LQ, Rosen BP. An arsenate-activated glutaredoxin from the arsenic hyperaccumulator fern Pteris vittata L. regulates intracellular arsenite. J Biol Chem 2007; 283:6095-101. [PMID: 18156657 DOI: 10.1074/jbc.m704149200] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
To elucidate the mechanisms of arsenic resistance in the arsenic hyperaccumulator fern Pteris vittata L., a cDNA for a glutaredoxin (Grx) Pv5-6 was isolated from a frond expression cDNA library based on the ability of the cDNA to increase arsenic resistance in Escherichia coli. The deduced amino acid sequence of Pv5-6 showed high homology with an Arabidopsis chloroplastic Grx and contained two CXXS putative catalytic motifs. Purified recombinant Pv5-6 exhibited glutaredoxin activity that was increased 1.6-fold by 10 mm arsenate. Site-specific mutation of Cys(67) to Ala(67) resulted in the loss of both GRX activity and arsenic resistance. PvGrx5 was expressed in E. coli mutants in which the arsenic resistance genes of the ars operon were deleted (strain AW3110), a deletion of the gene for the ArsC arsenate reductase (strain WC3110), and a strain in which the ars operon was deleted and the gene for the GlpF aquaglyceroporin was disrupted (strain OSBR1). Expression of PvGrx5 increased arsenic tolerance in strains AW3110 and WC3110, but not in OSBR1, suggesting that PvGrx5 had a role in cellular arsenic resistance independent of the ars operon genes but dependent on GlpF. AW3110 cells expressing PvGrx5 had significantly lower levels of arsenite when compared with vector controls when cultured in medium containing 2.5 mm arsenate. Our results are consistent with PvGrx5 having a role in regulating intracellular arsenite levels, by either directly or indirectly modulating the aquaglyceroporin. To our knowledge, PvGrx5 is the first plant Grx implicated in arsenic metabolism.
Collapse
Affiliation(s)
- Sabarinath Sundaram
- Plant Molecular and Cellular Biology Program, Horticultural Sciences Department, University of Florida, Gainesville, FL 32611-0690, USA
| | | | | | | |
Collapse
|
1744
|
Sadowski MI, Jones DT. Benchmarking template selection and model quality assessment for high-resolution comparative modeling. Proteins 2007; 69:476-85. [PMID: 17623860 DOI: 10.1002/prot.21531] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.
Collapse
Affiliation(s)
- M I Sadowski
- Bioinformatics Unit, Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | | |
Collapse
|
1745
|
Glazko G, Makarenkov V, Liu J, Mushegian A. Evolutionary history of bacteriophages with double-stranded DNA genomes. Biol Direct 2007; 2:36. [PMID: 18062816 PMCID: PMC2222618 DOI: 10.1186/1745-6150-2-36] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 12/06/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstruction of evolutionary history of bacteriophages is a difficult problem because of fast sequence drift and lack of omnipresent genes in phage genomes. Moreover, losses and recombinational exchanges of genes are so pervasive in phages that the plausibility of phylogenetic inference in phage kingdom has been questioned. RESULTS We compiled the profiles of presence and absence of 803 orthologous genes in 158 completely sequenced phages with double-stranded DNA genomes and used these gene content vectors to infer the evolutionary history of phages. There were 18 well-supported clades, mostly corresponding to accepted genera, but in some cases appearing to define new taxonomic groups. Conflicts between this phylogeny and trees constructed from sequence alignments of phage proteins were exploited to infer 294 specific acts of intergenome gene transfer. CONCLUSION A notoriously reticulate evolutionary history of fast-evolving phages can be reconstructed in considerable detail by quantitative comparative genomics.
Collapse
Affiliation(s)
- Galina Glazko
- Stowers Institute for Medical Research, 1000 E 50th St,, Kansas City, MO 64110, USA.
| | | | | | | |
Collapse
|
1746
|
Michelsen K, Schmid V, Metz J, Heusser K, Liebel U, Schwede T, Spang A, Schwappach B. Novel cargo-binding site in the beta and delta subunits of coatomer. ACTA ACUST UNITED AC 2007; 179:209-17. [PMID: 17954604 PMCID: PMC2064757 DOI: 10.1083/jcb.200704142] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Arginine (R)-based ER localization signals are sorting motifs that confer transient ER localization to unassembled subunits of multimeric membrane proteins. The COPI vesicle coat binds R-based signals but the molecular details remain unknown. Here, we use reporter membrane proteins based on the proteolipid Pmp2 fused to GFP and allele swapping of COPI subunits to map the recognition site for R-based signals. We show that two highly conserved stretches—in the β- and δ-COPI subunits—are required to maintain Pmp2GFP reporters exposing R-based signals in the ER. Combining a deletion of 21 residues in δ-COP together with the mutation of three residues in β-COP gave rise to a COPI coat that had lost its ability to recognize R-based signals, whilst the recognition of C-terminal di-lysine signals remained unimpaired. A homology model of the COPI trunk domain illustrates the recognition of R-based signals by COPI.
Collapse
Affiliation(s)
- Kai Michelsen
- Zentrum für Molekulare Biologie der Universität Heidelberg, Im Neuenheimer Feld 282, D-69120 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
1747
|
Bernardes JS, Dávila AMR, Costa VS, Zaverucha G. Improving model construction of profile HMMs for remote homology detection through structural alignment. BMC Bioinformatics 2007; 8:435. [PMID: 17999748 PMCID: PMC2245980 DOI: 10.1186/1471-2105-8-435] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 11/09/2007] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. RESULTS We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. CONCLUSION We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.
Collapse
Affiliation(s)
- Juliana S Bernardes
- COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | | | - Vítor S Costa
- DCC-FCUP e LIACC, Universidade do Porto, Porto, Portugal
| | - Gerson Zaverucha
- COPPE, Programa de Engenharia de Sistemas e Computação, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
1748
|
Sanchez-Pulido L, Andrade-Navarro MA. The FTO (fat mass and obesity associated) gene codes for a novel member of the non-heme dioxygenase superfamily. BMC BIOCHEMISTRY 2007; 8:23. [PMID: 17996046 PMCID: PMC2241624 DOI: 10.1186/1471-2091-8-23] [Citation(s) in RCA: 150] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Accepted: 11/08/2007] [Indexed: 11/10/2022]
Abstract
BACKGROUND Genetic variants in the FTO (fat mass and obesity associated) gene have been associated with an increased risk of obesity. However, the function of its protein product has not been experimentally studied and previously reported sequence similarity analyses suggested the absence of homologs in existing protein databases. Here, we present the first detailed computational analysis of the sequence and predicted structure of the protein encoded by FTO. RESULTS We performed a sequence similarity search using the human FTO protein as query and then generated a profile using the multiple sequence alignment of the homologous sequences. Profile-to-sequence and profile-to-profile based comparisons identified remote homologs of the non-heme dioxygenase family. CONCLUSION Our analysis suggests that human FTO is a member of the non-heme dioxygenase (Fe(II)- and 2-oxoglutarate-dependent dioxygenases) superfamily. Amino acid conservation patterns support this hypothesis and indicate that both 2-oxoglutarate and iron should be important for FTO function. This computational prediction of the function of FTO should suggest further steps for its experimental characterization and help to formulate hypothesis about the mechanisms by which it relates to obesity in humans.
Collapse
Affiliation(s)
| | - Miguel A Andrade-Navarro
- Molecular Medicine, Ottawa Health Research Institute, Ottawa, Canada
- Faculty of Medicine, University of Ottawa, Ottawa, Canada
- Max Delbrück Center for Molecular Medicine, Berlin, Germany
| |
Collapse
|
1749
|
Minovitsky S, Stegmaier P, Kel A, Kondrashov AS, Dubchak I. Short sequence motifs, overrepresented in mammalian conserved non-coding sequences. BMC Genomics 2007; 8:378. [PMID: 17945028 PMCID: PMC2176071 DOI: 10.1186/1471-2164-8-378] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2007] [Accepted: 10/18/2007] [Indexed: 12/22/2022] Open
Abstract
Background A substantial fraction of non-coding DNA sequences of multicellular eukaryotes is under selective constraint. In particular, ~5% of the human genome consists of conserved non-coding sequences (CNSs). CNSs differ from other genomic sequences in their nucleotide composition and must play important functional roles, which mostly remain obscure. Results We investigated relative abundances of short sequence motifs in all human CNSs present in the human/mouse whole-genome alignments vs. three background sets of sequences: (i) weakly conserved or unconserved non-coding sequences (non-CNSs); (ii) near-promoter sequences (located between nucleotides -500 and -1500, relative to a start of transcription); and (iii) random sequences with the same nucleotide composition as that of CNSs. When compared to non-CNSs and near-promoter sequences, CNSs possess an excess of AT-rich motifs, often containing runs of identical nucleotides. In contrast, when compared to random sequences, CNSs contain an excess of GC-rich motifs which, however, lack CpG dinucleotides. Thus, abundance of short sequence motifs in human CNSs, taken as a whole, is mostly determined by their overall compositional properties and not by overrepresentation of any specific short motifs. These properties are: (i) high AT-content of CNSs, (ii) a tendency, probably due to context-dependent mutation, of A's and T's to clump, (iii) presence of short GC-rich regions, and (iv) avoidance of CpG contexts, due to their hypermutability. Only a small number of short motifs, overrepresented in all human CNSs are similar to binding sites of transcription factors from the FOX family. Conclusion Human CNSs as a whole appear to be too broad a class of sequences to possess strong footprints of any short sequence-specific functions. Such footprints should be studied at the level of functional subclasses of CNSs, such as those which flank genes with a particular pattern of expression. Overall properties of CNSs are affected by patterns in mutation, suggesting that selection which causes their conservation is not always very strong.
Collapse
Affiliation(s)
- Simon Minovitsky
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | | | | | | | | |
Collapse
|
1750
|
Hardies SC, Thomas JA, Serwer P. Comparative genomics of Bacillus thuringiensis phage 0305phi8-36: defining patterns of descent in a novel ancient phage lineage. Virol J 2007; 4:97. [PMID: 17919320 PMCID: PMC2147016 DOI: 10.1186/1743-422x-4-97] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 10/05/2007] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The recently sequenced 218 kb genome of morphologically atypical Bacillus thuringiensis phage 0305phi8-36 exhibited only limited detectable homology to known bacteriophages. The only known relative of this phage is a string of phage-like genes called BtI1 in the chromosome of B. thuringiensis israelensis. The high degree of divergence and novelty of phage genomes pose challenges in how to describe the phage from its genomic sequences. RESULTS Phage 0305phi8-36 and BtI1 are estimated to have diverged 2.0 - 2.5 billion years ago. Positionally biased Blast searches aligned 30 homologous structure or morphogenesis genes between 0305phi8-36 and BtI1 that have maintained the same gene order. Functional clustering of the genes helped identify additional gene functions. A conserved long tape measure gene indicates that a long tail is an evolutionarily stable property of this phage lineage. An unusual form of the tail chaperonin system split to two genes was characterized, as was a hyperplastic homologue of the T4gp27 hub gene. Within this region some segments were best described as encoding a conservative array of structure domains fused with a variable component of exchangeable domains. Other segments were best described as multigene units engaged in modular horizontal exchange. The non-structure genes of 0305phi8-36 appear to include the remnants of two replicative systems leading to the hypothesis that the genome plan was created by fusion of two ancestral viruses. The case for a member of the RNAi RNA-directed RNA polymerase family residing in 0305phi8-36 was strengthened by extending the hidden Markov model of this family. Finally, it was noted that prospective transcriptional promoters were distributed in a gradient of small to large transcripts starting from a fixed end of the genome. CONCLUSION Genomic organization at a level higher than individual gene sequence comparison can be analyzed to aid in understanding large phage genomes. Methods of analysis include 1) applying a time scale, 2) augmenting blast scores with positional information, 3) categorizing genomic rearrangements into one of several processes with characteristic rates and outcomes, and 4) correlating apparent transcript sizes with genomic position, gene content, and promoter motifs.
Collapse
Affiliation(s)
- Stephen C Hardies
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900, USA
| | - Julie A Thomas
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900, USA
| | - Philip Serwer
- Department of Biochemistry, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900, USA
| |
Collapse
|