1801
|
Gabaldón T, Snel B, Zimmeren FV, Hemrika W, Tabak H, Huynen MA. Origin and evolution of the peroxisomal proteome. Biol Direct 2006; 1:8. [PMID: 16556314 PMCID: PMC1472686 DOI: 10.1186/1745-6150-1-8] [Citation(s) in RCA: 137] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2006] [Accepted: 03/23/2006] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Peroxisomes are ubiquitous eukaryotic organelles involved in various oxidative reactions. Their enzymatic content varies between species, but the presence of common protein import and organelle biogenesis systems support a single evolutionary origin. The precise scenario for this origin remains however to be established. The ability of peroxisomes to divide and import proteins post-translationally, just like mitochondria and chloroplasts, supports an endosymbiotic origin. However, this view has been challenged by recent discoveries that mutant, peroxisome-less cells restore peroxisomes upon introduction of the wild-type gene, and that peroxisomes are formed from the Endoplasmic Reticulum. The lack of a peroxisomal genome precludes the use of classical analyses, as those performed with mitochondria or chloroplasts, to settle the debate. We therefore conducted large-scale phylogenetic analyses of the yeast and rat peroxisomal proteomes. RESULTS Our results show that most peroxisomal proteins (39-58%) are of eukaryotic origin, comprising all proteins involved in organelle biogenesis or maintenance. A significant fraction (13-18%), consisting mainly of enzymes, has an alpha-proteobacterial origin and appears to be the result of the recruitment of proteins originally targeted to mitochondria. Consistent with the findings that peroxisomes are formed in the Endoplasmic Reticulum, we find that the most universally conserved Peroxisome biogenesis and maintenance proteins are homologous to proteins from the Endoplasmic Reticulum Assisted Decay pathway. CONCLUSION Altogether our results indicate that the peroxisome does not have an endosymbiotic origin and that its proteins were recruited from pools existing within the primitive eukaryote. Moreover the reconstruction of primitive peroxisomal proteomes suggests that ontogenetically as well as phylogenetically, peroxisomes stem from the Endoplasmic Reticulum. REVIEWERS This article was reviewed by Arcady Mushegian, Gáspár Jékely and John Logsdon. OPEN PEER REVIEW Reviewed by Arcady Mushegian, Gáspar Jékely and John Logsdon. For the full reviews, please go to the Reviewers' comments section.
Collapse
Affiliation(s)
- Toni Gabaldón
- CMBI, Center for Molecular and Biomolecular Informatics; NCMLS, Nijmegen Center for Molecular Life Sciences. Radboud University Nijmegen Medical Center. Toernooiveld 1. 6525 ED Nijmegen. The Netherlands
- Present address: Bioinformatics department, Centro de Investigación Principe Felipe. Avda. Autopista del Saler, 16. 46013 Valencia, Spain
| | - Berend Snel
- CMBI, Center for Molecular and Biomolecular Informatics; NCMLS, Nijmegen Center for Molecular Life Sciences. Radboud University Nijmegen Medical Center. Toernooiveld 1. 6525 ED Nijmegen. The Netherlands
| | - Frank van Zimmeren
- CMBI, Center for Molecular and Biomolecular Informatics; NCMLS, Nijmegen Center for Molecular Life Sciences. Radboud University Nijmegen Medical Center. Toernooiveld 1. 6525 ED Nijmegen. The Netherlands
| | - Wieger Hemrika
- ABC-Expression Centre, University of Utrecht, Padualaan 8, 3584 CX Utrecht, The Netherlands
| | - Henk Tabak
- Laboratory of Cellular Protein Chemistry, University of Utrecht, Padualaan 8, 3584 CX Utrecht, The Netherlands
| | - Martijn A Huynen
- CMBI, Center for Molecular and Biomolecular Informatics; NCMLS, Nijmegen Center for Molecular Life Sciences. Radboud University Nijmegen Medical Center. Toernooiveld 1. 6525 ED Nijmegen. The Netherlands
| |
Collapse
|
1802
|
Cheng J, Baldi P. A machine learning information retrieval approach to protein fold recognition. Bioinformatics 2006; 22:1456-63. [PMID: 16547073 DOI: 10.1093/bioinformatics/btl102] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recognizing proteins that have similar tertiary structure is the key step of template-based protein structure prediction methods. Traditionally, a variety of alignment methods are used to identify similar folds, based on sequence similarity and sequence-structure compatibility. Although these methods are complementary, their integration has not been thoroughly exploited. Statistical machine learning methods provide tools for integrating multiple features, but so far these methods have been used primarily for protein and fold classification, rather than addressing the retrieval problem of fold recognition-finding a proper template for a given query protein. RESULTS Here we present a two-stage machine learning, information retrieval, approach to fold recognition. First, we use alignment methods to derive pairwise similarity features for query-template protein pairs. We also use global profile-profile alignments in combination with predicted secondary structure, relative solvent accessibility, contact map and beta-strand pairing to extract pairwise structural compatibility features. Second, we apply support vector machines to these features to predict the structural relevance (i.e. in the same fold or not) of the query-template pairs. For each query, the continuous relevance scores are used to rank the templates. The FOLDpro approach is modular, scalable and effective. Compared with 11 other fold recognition methods, FOLDpro yields the best results in almost all standard categories on a comprehensive benchmark dataset. Using predictions of the top-ranked template, the sensitivity is approximately 85, 56, and 27% at the family, superfamily and fold levels respectively. Using the 5 top-ranked templates, the sensitivity increases to 90, 70, and 48%.
Collapse
Affiliation(s)
- Jianlin Cheng
- Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, CA, USA
| | | |
Collapse
|
1803
|
Becker E, Meyer V, Madaoui H, Guerois R. Detection of a tandem BRCT in Nbs1 and Xrs2 with functional implications in the DNA damage response. Bioinformatics 2006; 22:1289-92. [PMID: 16522671 DOI: 10.1093/bioinformatics/btl075] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Human Nbs1 and its homolog Xrs2 in Saccharomyces cerevisiae are part of the conserved MRN complex (MRX in yeast) which plays a crucial role in maintaining genomic stability. NBS1 corresponds to the gene mutated in the Nijmegen breakage syndrome (NBS) known as a radiation hyper-sensitive disease. Despite the conservation and the importance of the MRN complex, the high sequence divergence between Nbs1 and Xrs2 precluded the identification of common domains downstream of the N-terminal Fork-Head Associated (FHA) domain. RESULTS Using HMM-HMM profile comparisons and structure modelling, we assessed the existence of a tandem BRCT in both Nbs1 and Xrs2 after the FHA. The structure-based conservation analysis of the tandem BRCT in Nbs1 supports its function as a phosphoserine binding domain. Remarkably, the 5 bp deletion observed in 95% of NBS patients cleaves the tandem at the linker region while preserving the structural integrity of each BRCT domain in the resulting truncated gene products.
Collapse
Affiliation(s)
- Emmanuelle Becker
- Service de Biophysique des Fonctions Membranaires, URA CNRS 2096, Département de Biologie Joliot-Curie, CEA Saclay, 91191 Gif-Sur-Yvette Cedex, France
| | | | | | | |
Collapse
|
1804
|
Banfield JF, Verberkmoes NC, Hettich RL, Thelen MP. Proteogenomic approaches for the molecular characterization of natural microbial communities. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 9:301-33. [PMID: 16402891 DOI: 10.1089/omi.2005.9.301] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
At the present time we know little about how microbial communities function in their natural habitats. For example, how do microorganisms interact with each other and their physical and chemical surroundings and respond to environmental perturbations? We might begin to answer these questions if we could monitor the ways in which metabolic roles are partitioned amongst members as microbial communities assemble, determine how resources such as carbon, nitrogen, and energy are allocated into metabolic pathways, and understand the mechanisms by which organisms and communities respond to changes in their surroundings. Because many organisms cannot be cultivated, and given that the metabolisms of those growing in monoculture are likely to differ from those of organisms growing as part of consortia, it is vital to develop methods to study microbial communities in situ. Chemoautotrophic biofilms growing in mine tunnels hundreds of meters underground drive pyrite (FeS(2)) dissolution and acid and metal release, creating habitats that select for a small number of organism types. The geochemical and microbial simplicity of these systems, the significant biomass, and clearly defined biological-inorganic feedbacks make these ecosystem microcosms ideal for development of methods for the study of uncultivated microbial consortia. Our approach begins with the acquisition of genomic data from biofilms that are sampled over time and in different growth conditions. We have demonstrated that it is possible to assemble shotgun sequence data to reveal the gene complement of the dominant community members and to use these data to confidently identify a significant fraction of proteins from the dominant organisms by mass spectrometry (MS)-based proteomics. However, there are technical obstacles currently restricting this type of "proteogenomic" analysis. Composite genomic sequences assembled from environmental data from natural microbial communities do not capture the full range of genetic potential of the associated populations. Thus, it is necessary to develop bioinformatics approaches to generate relatively comprehensive gene inventories for each organism type. These inventories are critical for expression and functional analyses. In proteomic studies, for example, peptides that differ from those predicted from gene sequences can be measured, but they generally cannot be identified by database matching, even if the difference is only a single amino acid residue. Furthermore, many of the identified proteins have no known function. We propose that these challenges can be addressed by development of proteogenomic, biochemical, and geochemical methods that will be initially deployed in a simple, natural model ecosystem. The resulting approach should be broadly applicable and will enhance the utility and significance of genomic data from isolates and consortia for study of organisms in many habitats. Solutions draining pyrite-rich deposits are referred to as acid mine drainage (AMD). AMD is a very prevalent, international environmental problem associated with energy and metal resources. The biological-mineralogical interactions that define these systems can be harnessed for energy-efficient metal recovery and removal of sulfur from coal. The detailed understanding of microbial ecology and ecosystem dynamics resulting from the proposed work will provide a scientific foundation for dealing with the environmental challenges and technological opportunities, and yield new methods for analysis of more complex natural communities.
Collapse
Affiliation(s)
- Jillian F Banfield
- Department of Earth and Planetary Science, and Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
1805
|
Gordon PMK, Weinel C, Jacobi C, Kämpf U, Kriventseva E, Sensen CW. Creating hierarchical models of protein families based on Expressed Sequence Tags: The “Sprockets” analysis pipeline. Anal Chim Acta 2006; 564:123-32. [PMID: 17723370 DOI: 10.1016/j.aca.2006.01.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2005] [Revised: 01/18/2006] [Accepted: 01/23/2006] [Indexed: 11/23/2022]
Abstract
We have created an analysis pipeline called Sprockets, which can be used to classify proteins into various hierarchical "families", and build searchable models of these families. The construction of these families is based on data from Expressed Sequence Tags (ESTs) and Coding DNA Sequences (CDSs), making Sprockets clusters especially suitable for studying gene families in organisms for which the completely sequenced genome does not (yet) exist. The pipeline consists of two main parts: pair-wise analysis and grouping of sequences with Z-score statistics, followed by hierarchical splitting of clusters into alignable protein families. Various computational and statistical techniques applied in Sprockets allow it to act like a massive and selective multiple sequence alignment engine for combining individual sequence collections and related public sequences. The end result is a database of gene Hidden Markov Models, each related to the other by three levels of similarity: secondary structure, function and evolutionary origin. For a sample 20,000 EST set from Lactuca spp., Sprockets provided a 9% improvement in mapping of function to unknown sequences over traditional pair-wise search methods and InterPro mapping.
Collapse
Affiliation(s)
- Paul M K Gordon
- University of Calgary, Faculty of Medicine, Sun Center of Excellence for Visual Genomics, 3330 Hospital Drive NW, Calgary, AB, Canada T2N 4N1
| | | | | | | | | | | |
Collapse
|
1806
|
Liu J, Glazko G, Mushegian A. Protein repertoire of double-stranded DNA bacteriophages. Virus Res 2006; 117:68-80. [PMID: 16490276 DOI: 10.1016/j.virusres.2006.01.015] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2005] [Revised: 01/11/2006] [Accepted: 01/18/2006] [Indexed: 01/21/2023]
Abstract
The complexity and diversity of phage gene sets, which are produced by rapid evolution of phage genomes and rampant gene exchanges among phages, hamper the efforts to decipher the evolutionary relationships between individual phage proteins and reconstruct the complete set of evolutionary events leading to the known phages. To start unraveling the natural history of phages, we built the phage orthologous groups (POGs), a natural system of phage protein families that includes 6378 genes from 164 complete genome sequences of double-stranded DNA bacteriophages. Phage proteomes have high POG coverage: on average, 39 genes per phage genome belong to POGs, which is close to half of all genes in most phages. In an agreement with the notion of phage role in horizontal gene transfer, we see many cases of likely gene exchange between phages and their microbial hosts. At the same time, about 80% of all POGs are highly specific to phage genomes and are not commonly found in microbial genomes, indicating coherence and large degree of evolutionary independence of phage gene sets. The information on orthologous genes is essential for evolutionary classification of known bacteriophages and for reconstruction of ancestral phage genomes.
Collapse
Affiliation(s)
- Jing Liu
- Stowers Institute for Medical Research, 1000 E 50th St., Kansas City, MO 64110, USA
| | | | | |
Collapse
|
1807
|
Claverie JM, Ogata H, Audic S, Abergel C, Suhre K, Fournier PE. Mimivirus and the emerging concept of "giant" virus. Virus Res 2006; 117:133-44. [PMID: 16469402 DOI: 10.1016/j.virusres.2006.01.008] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Revised: 01/06/2006] [Accepted: 01/09/2006] [Indexed: 11/15/2022]
Abstract
The recently discovered Acanthamoeba polyphaga Mimivirus is the largest known DNA virus. Its particle size (750 nm), genome length (1.2 million bp) and large gene repertoire (911 protein coding genes) blur the established boundaries between viruses and parasitic cellular organisms. In addition, the analysis of its genome sequence identified many types of genes never before encountered in a virus, including aminoacyl-tRNA synthetases and other central components of the translation machinery previously thought to be the signature of cellular organisms. In this article, we examine how the finding of such a giant virus might durably influence the way we look at microbial biodiversity, and lead us to revise the classification of microbial domains and life forms. We propose to introduce the word "girus" to recognize the intermediate status of these giant DNA viruses, the genome complexity of which makes them closer to small parasitic prokaryotes than to regular viruses.
Collapse
Affiliation(s)
- Jean-Michel Claverie
- Information Génomique et Structurale, CNRS UPR 2589, IBSM, Parc Scientifique de Luminy, 163 Avenue de Luminy, Case 934, 13288 Marseille Cedex 9, France.
| | | | | | | | | | | |
Collapse
|
1808
|
Devos D, Dokudovskaya S, Williams R, Alber F, Eswar N, Chait BT, Rout MP, Sali A. Simple fold composition and modular architecture of the nuclear pore complex. Proc Natl Acad Sci U S A 2006; 103:2172-7. [PMID: 16461911 PMCID: PMC1413685 DOI: 10.1073/pnas.0506345103] [Citation(s) in RCA: 219] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Indexed: 11/18/2022] Open
Abstract
The nuclear pore complex (NPC) consists of multiple copies of approximately 30 different proteins [nucleoporins (nups)], forming a channel in the nuclear envelope that mediates macromolecular transport between the cytosol and the nucleus. With <5% of the nup residues currently available in experimentally determined structures, little is known about the detailed structure of the NPC. Here, we use a combined computational and biochemical approach to assign folds for approximately 95% of the residues in the yeast and vertebrate nups. These fold assignments suggest an underlying simplicity in the composition and modularity in the architecture of all eukaryotic NPCs. The simplicity in NPC composition is reflected in the presence of only eight fold types, with the three most frequent folds accounting for approximately 85% of the residues. The modularity in NPC architecture is reflected in its hierarchical and symmetrical organization that partitions the predicted nup folds into three groups: the transmembrane group containing transmembrane helices and a cadherin fold, the central scaffold group containing beta-propeller and alpha-solenoid folds, and the peripheral FG group containing predominantly the FG repeats and the coiled-coil fold. Moreover, similarities between structures in coated vesicles and those in the NPC support our prior hypothesis for their common evolutionary origin in a progenitor protocoatomer. The small number of predicted fold types in the NPC and their internal symmetries suggest that the bulk of the NPC structure has evolved through extensive motif and gene duplication from a simple precursor set of only a few proteins.
Collapse
Affiliation(s)
- Damien Devos
- *Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of California, Mission Bay QB3, 1700 4th Street, Suite 503B, San Francisco, CA 94143-2552; and Laboratories of
| | | | | | - Frank Alber
- *Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of California, Mission Bay QB3, 1700 4th Street, Suite 503B, San Francisco, CA 94143-2552; and Laboratories of
| | - Narayanan Eswar
- *Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of California, Mission Bay QB3, 1700 4th Street, Suite 503B, San Francisco, CA 94143-2552; and Laboratories of
| | - Brian T. Chait
- Mass Spectrometry and Gaseous Ion Chemistry, The Rockefeller University, 1230 York Avenue, New York, NY 10021-6399
| | | | - Andrej Sali
- *Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biomedical Research, University of California, Mission Bay QB3, 1700 4th Street, Suite 503B, San Francisco, CA 94143-2552; and Laboratories of
| |
Collapse
|
1809
|
Casbon JA, Saqi MAS. On single and multiple models of protein families for the detection of remote sequence relationships. BMC Bioinformatics 2006; 7:48. [PMID: 16448555 PMCID: PMC1397874 DOI: 10.1186/1471-2105-7-48] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2005] [Accepted: 01/31/2006] [Indexed: 11/23/2022] Open
Abstract
Background The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. Results Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. Conclusion Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.
Collapse
Affiliation(s)
- James A Casbon
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| | - Mansoor AS Saqi
- Bioinformatics Group, Institute of Cell and Molecular Science, The Genome Centre, Queen Mary's School of Medicine and Dentistry, Charterhouse Square, London, EC1M 6BQ, UK
| |
Collapse
|
1810
|
Abstract
Gene duplication is key to molecular evolution in all three domains of life and may be the first step in the emergence of new gene function. It is a well-recognized feature in large DNA viruses but has not been studied extensively in the largest known virus to date, the recently discovered Acanthamoeba polyphaga Mimivirus. Here, I present a systematic analysis of gene and genome duplication events in the mimivirus genome. I found that one-third of the mimivirus genes are related to at least one other gene in the mimivirus genome, either through a large segmental genome duplication event that occurred in the more remote past or through more recent gene duplication events, which often occur in tandem. This shows that gene and genome duplication played a major role in shaping the mimivirus genome. Using multiple alignments, together with remote-homology detection methods based on Hidden Markov Model comparison, I assign putative functions to some of the paralogous gene families. I suggest that a large part of the duplicated mimivirus gene families are likely to interfere with important host cell processes, such as transcription control, protein degradation, and cell regulatory processes. My findings support the view that large DNA viruses are complex evolving organisms, possibly deeply rooted within the tree of life, and oppose the paradigm that viral evolution is dominated by lateral gene acquisition, at least in regard to large DNA viruses.
Collapse
Affiliation(s)
- Karsten Suhre
- Information Génomique et Structurale, UPR CNRS 2589, 31 Chemin Joseph-Aiguier, 13402 Marseille Cedex 20, France.
| |
Collapse
|
1811
|
Tkaczuk KL, Obarska A, Bujnicki JM. Molecular phylogenetics and comparative modeling of HEN1, a methyltransferase involved in plant microRNA biogenesis. BMC Evol Biol 2006; 6:6. [PMID: 16433904 PMCID: PMC1397878 DOI: 10.1186/1471-2148-6-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2005] [Accepted: 01/24/2006] [Indexed: 11/17/2022] Open
Abstract
Background Recently, HEN1 protein from Arabidopsis thaliana was discovered as an essential enzyme in plant microRNA (miRNA) biogenesis. HEN1 transfers a methyl group from S-adenosylmethionine to the 2'-OH or 3'-OH group of the last nucleotide of miRNA/miRNA* duplexes produced by the nuclease Dicer. Previously it was found that HEN1 possesses a Rossmann-fold methyltransferase (RFM) domain and a long N-terminal extension including a putative double-stranded RNA-binding motif (DSRM). However, little is known about the details of the structure and the mechanism of action of this enzyme, and about its phylogenetic origin. Results Extensive database searches were carried out to identify orthologs and close paralogs of HEN1. Based on the multiple sequence alignment a phylogenetic tree of the HEN1 family was constructed. The fold-recognition approach was used to identify related methyltransferases with experimentally solved structures and to guide the homology modeling of the HEN1 catalytic domain. Additionally, we identified a La-like predicted RNA binding domain located C-terminally to the DSRM domain and a domain with a peptide prolyl cis/trans isomerase (PPIase) fold, but without the conserved PPIase active site, located N-terminally to the catalytic domain. Conclusion The bioinformatics analysis revealed that the catalytic domain of HEN1 is not closely related to any known RNA:2'-OH methyltransferases (e.g. to the RrmJ/fibrillarin superfamily), but rather to small-molecule methyltransferases. The structural model was used as a platform to identify the putative active site and substrate-binding residues of HEN and to propose its mechanism of action.
Collapse
Affiliation(s)
- Karolina L Tkaczuk
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
- Institute of Technical Biochemistry, Technical University of Lodz, Stefanowskiego 4/10, 90-924 Lodz, Poland
| | - Agnieszka Obarska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
- Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| |
Collapse
|
1812
|
Neugebauer H, Herrmann C, Kammer W, Schwarz G, Nordheim A, Braun V. ExbBD-dependent transport of maltodextrins through the novel MalA protein across the outer membrane of Caulobacter crescentus. J Bacteriol 2006; 187:8300-11. [PMID: 16321934 PMCID: PMC1317028 DOI: 10.1128/jb.187.24.8300-8311.2005] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Analysis of the genome sequence of Caulobacter crescentus predicts 67 TonB-dependent outer membrane proteins. To demonstrate that among them are proteins that transport nutrients other than chelated Fe(3+) and vitamin B(12)-the substrates hitherto known to be transported by TonB-dependent transporters-the outer membrane protein profile of cells grown on different substrates was determined by two-dimensional electrophoresis. Maltose induced the synthesis of a hitherto unknown 99.5-kDa protein, designated here as MalA, encoded by the cc2287 genomic locus. MalA mediated growth on maltodextrins and transported [(14)C]maltodextrins from [(14)C]maltose to [(14)C]maltopentaose. [(14)C]maltose transport showed biphasic kinetics, with a fast initial rate and a slower second rate. The initial transport had a K(d) of 0.2 microM, while the second transport had a K(d) of 5 microM. It is proposed that the fast rate reflects binding to MalA and the second rate reflects transport into the cells. Energy depletion of cells by 100 microM carbonyl cyanide 3-chlorophenylhydrazone abolished maltose binding and transport. Deletion of the malA gene diminished maltose transport to 1% of the wild-type malA strain and impaired transport of the larger maltodextrins. The malA mutant was unable to grow on maltodextrins larger than maltotetraose. Deletion of two C. crescentus genes homologous to the exbB exbD genes of Escherichia coli abolished [(14)C]maltodextrin binding and transport and growth on maltodextrins larger than maltotetraose. These mutants also showed impaired growth on Fe(3+)-rhodotorulate as the sole iron source, which provided evidence of energy-coupled transport. Unexpectedly, a deletion mutant of a tonB homolog transported maltose at the wild-type rate and grew on all maltodextrins tested. Since Fe(3+)-rhodotorulate served as an iron source for the tonB mutant, an additional gene encoding a protein with a TonB function is postulated. Permeation of maltose and maltotriose through the outer membrane of the C. crescentus malA mutant was slower than permeation through the outer membrane of an E. coli lamB mutant, which suggests a low porin activity in C. crescentus. The pores of the C. crescentus porins are slightly larger than those of E. coli K-12, since maltotetraose supported growth of the C. crescentus malA mutant but failed to support growth of the E. coli lamB mutant. The data are consistent with the proposal that binding of maltodextrins to MalA requires energy and MalA actively transports maltodextrins with K(d) values 1,000-fold smaller than those for the LamB porin and 100-fold larger than those for the vitamin B(12) and ferric siderophore outer membrane transporters. MalA is the first example of an outer membrane protein for which an ExbB/ExbD-dependent transport of a nutrient other than iron and vitamin B(12) has been demonstrated.
Collapse
Affiliation(s)
- Heidi Neugebauer
- Mikrobiologie/Membranphysiologie, Universität Tübingen, Auf der Morgenstelle 28, D-72076 Tübingen, Germany
| | | | | | | | | | | |
Collapse
|
1813
|
Ginzinger SW, Fischer J. SimShift: Identifying structural similarities from NMR chemical shifts. Bioinformatics 2005; 22:460-5. [PMID: 16317071 DOI: 10.1093/bioinformatics/bti805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION An important quantity that arises in NMR spectroscopy experiments is the chemical shift. The interpretation of these data is mostly done by human experts; to our knowledge there are no algorithms that predict protein structure from chemical shift sequences alone. One approach to facilitate this process could be to compare two such sequences, where the structure of one protein has already been resolved. Our claim is that similarity of chemical shifts thereby found implies structural similarity of the respective proteins. RESULTS We present an algorithm to identify structural similarities of proteins by aligning their associated chemical shift sequences. To evaluate the correctness of our predictions, we propose a benchmark set of protein pairs that have high structural similarity, but low sequence similarity (because with high sequence similarity the structural similarities could easily be detected by a sequence alignment algorithm). We compare our results with those of HHsearch and SSEA and show that our method outperforms both in >50% of all cases.
Collapse
Affiliation(s)
- Simon W Ginzinger
- LFE Bioinformatik, Institut für Informatik, Ludwig-Maximilians-Universität München Amalienstrasse 17, D-80333 München, Germany.
| | | |
Collapse
|
1814
|
Jin J, Cai Y, Yao T, Gottschalk AJ, Florens L, Swanson SK, Gutiérrez JL, Coleman MK, Workman JL, Mushegian A, Washburn MP, Conaway RC, Conaway JW. A mammalian chromatin remodeling complex with similarities to the yeast INO80 complex. J Biol Chem 2005; 280:41207-12. [PMID: 16230350 DOI: 10.1074/jbc.m509128200] [Citation(s) in RCA: 188] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The mammalian Tip49a and Tip49b proteins belong to an evolutionarily conserved family of AAA+ ATPases. In Saccharomyces cerevisiae, orthologs of Tip49a and Tip49b, called Rvb1 and Rvb2, respectively, are subunits of two distinct ATP-dependent chromatin remodeling complexes, SWR1 and INO80. We recently demonstrated that the mammalian Tip49a and Tip49b proteins are integral subunits of a chromatin remodeling complex bearing striking similarities to the S. cerevisiae SWR1 complex (Cai, Y., Jin, J., Florens, L., Swanson, S. K., Kusch, T., Li, B., Workman, J. L., Washburn, M. P., Conaway, R. C., and Conaway, J. W. (2005) J. Biol. Chem. 280, 13665-13670). In this report, we identify a new mammalian Tip49a- and Tip49b-containing ATP-dependent chromatin remodeling complex, which includes orthologs of 8 of the 15 subunits of the S. cerevisiae INO80 chromatin remodeling complex as well as at least five additional subunits unique to the human INO80 (hINO80) complex. Finally, we demonstrate that, similar to the yeast INO80 complex, the hINO80 complex exhibits DNA- and nucleosome-activated ATPase activity and catalyzes ATP-dependent nucleosome sliding.
Collapse
Affiliation(s)
- Jingji Jin
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1815
|
Söding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005; 33:W244-8. [PMID: 15980461 PMCID: PMC1160169 DOI: 10.1093/nar/gki408] [Citation(s) in RCA: 2953] [Impact Index Per Article: 147.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at http://protevo.eb.tuebingen.mpg.de/hhpred.
Collapse
Affiliation(s)
- Johannes Söding
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology Spemannstrasse 35, 72076 Tübingen, Germany.
| | | | | |
Collapse
|
1816
|
Kosinski J, Feder M, Bujnicki JM. The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function. BMC Bioinformatics 2005; 6:172. [PMID: 16011798 PMCID: PMC1189080 DOI: 10.1186/1471-2105-6-172] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2005] [Accepted: 07/12/2005] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The PD-(D/E)XK nuclease superfamily, initially identified in type II restriction endonucleases and later in many enzymes involved in DNA recombination and repair, is one of the most challenging targets for protein sequence analysis and structure prediction. Typically, the sequence similarity between these proteins is so low, that most of the relationships between known members of the PD-(D/E)XK superfamily were identified only after the corresponding structures were determined experimentally. Thus, it is tempting to speculate that among the uncharacterized protein families, there are potential nucleases that remain to be discovered, but their identification requires more sensitive tools than traditional PSI-BLAST searches. RESULTS The low degree of amino acid conservation hampers the possibility of identification of new members of the PD-(D/E)XK superfamily based solely on sequence comparisons to known members. Therefore, we used a recently developed method HHsearch for sensitive detection of remote similarities between protein families represented as profile Hidden Markov Models enhanced by secondary structure. We carried out a comparison of known families of PD-(D/E)XK nucleases to the database comprising the COG and PFAM profiles corresponding to both functionally characterized as well as uncharacterized protein families to detect significant similarities. The initial candidates for new nucleases were subsequently verified by sequence-structure threading, comparative modeling, and identification of potential active site residues. CONCLUSION In this article, we report identification of the PD-(D/E)XK nuclease domain in numerous proteins implicated in interactions with DNA but with unknown structure and mechanism of action (such as putative recombinase RmuC, DNA competence factor CoiA, a DNA-binding protein SfsA, a large human protein predicted to be a DNA repair enzyme, predicted archaeal transcription regulators, and the head completion protein of phage T4) and in proteins for which no function was assigned to date (such as YhcG, various phage proteins, novel candidates for restriction enzymes). Our results contributes to the reduction of "white spaces" on the sequence-structure-function map of the protein universe and will help to jump-start the experimental characterization of new nucleases, of which many may be of importance for the complete understanding of mechanisms that govern the evolution and stability of the genome.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, PL-02-109 Warsaw, Poland
| | - Marcin Feder
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, PL-02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, PL-02-109 Warsaw, Poland
| |
Collapse
|
1817
|
Simossis VA, Heringa J. PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res 2005; 33:W289-94. [PMID: 15980472 PMCID: PMC1160151 DOI: 10.1093/nar/gki390] [Citation(s) in RCA: 370] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2005] [Revised: 03/10/2005] [Accepted: 03/10/2005] [Indexed: 11/14/2022] Open
Abstract
PRofile ALIgNEment (PRALINE) is a fully customizable multiple sequence alignment application. In addition to a number of available alignment strategies, PRALINE can integrate information from database homology searches to generate a homology-extended multiple alignment. PRALINE also provides a choice of seven different secondary structure prediction programs that can be used individually or in combination as a consensus for integrating structural information into the alignment process. The program can be used through two separate interfaces: one has been designed to cater to more advanced needs of researchers in the field, and the other for standard construction of high confidence alignments. The web-based output is designed to facilitate the comprehensive visualization of the generated alignments by means of five default colour schemes based on: residue type, position conservation, position reliability, residue hydrophobicity and secondary structure, depending on the options set. A user can also define a custom colour scheme by selecting which colour will represent one or more amino acids in the alignment. All generated alignments are also made available in the PDF format for easy figure generation for publications. The grouping of sequences, on which the alignment is based, can also be visualized as a dendrogram. PRALINE is available at http://ibivu.cs.vu.nl/programs/pralinewww/.
Collapse
Affiliation(s)
- V. A. Simossis
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Heringa
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
- Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth & Life Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|
1818
|
Chatterjee I, Richmond A, Putiri E, Shakes DC, Singson A. TheCaenorhabditis elegans spe-38gene encodes a novel four-pass integral membrane protein required for sperm function at fertilization. Development 2005; 132:2795-808. [PMID: 15930110 DOI: 10.1242/dev.01868] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A mutation in the Caenorhabditis elegans spe-38 gene results in a sperm-specific fertility defect. spe-38 sperm are indistinguishable from wild-type sperm with regards to their morphology, motility and migratory behavior. spe-38 sperm make close contact with oocytes but fail to fertilize them. spe-38 sperm can also stimulate ovulation and engage in sperm competition. The spe-38 gene is predicted to encode a novel four-pass (tetraspan) integral membrane protein. Structurally similar tetraspan molecules have been implicated in processes such as gamete adhesion/fusion in mammals, membrane adhesion/fusion during yeast mating, and the formation/function of tight-junctions in metazoa. In antibody localization experiments, SPE-38 was found to concentrate on the pseudopod of mature sperm,consistent with it playing a direct role in gamete interactions.
Collapse
Affiliation(s)
- Indrani Chatterjee
- Waksman Institute and Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | | | | | | | | |
Collapse
|
1819
|
Tilburn J, Sánchez-Ferrero JC, Reoyo E, Arst HN, Peñalva MA. Mutational analysis of the pH signal transduction component PalC of Aspergillus nidulans supports distant similarity to BRO1 domain family members. Genetics 2005; 171:393-401. [PMID: 15944343 PMCID: PMC1456523 DOI: 10.1534/genetics.105.044644] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The alkaline ambient pH signal transduction pathway component PalC has no assigned molecular role. Therefore we attempted a gene-specific mutational analysis and obtained 55 new palC loss-of-function alleles including 24 single residue substitutions. Refined similarity searches reveal conserved PalC regions including one with convincing similarity to the BRO1 domain, denoted PCBROH, where clustering of mutational changes, including PCBROH key residue substitutions, supports its structural and/or functional importance. Since the BRO1 domain occurs in the multivesicular body (MVB) pathway protein Bro1/Vps31 and also the pH signal transduction protein PalA (Rim20), both of which interact with MVB component (ESCRT-III protein) Vps32/Snf7, this might reflect a further link between the pH response and endocytosis.
Collapse
Affiliation(s)
- Joan Tilburn
- Department of Infectious Diseases, Faculty of Medicine, Imperial College London, United Kingdom.
| | | | | | | | | |
Collapse
|
1820
|
Coles M, Djuranovic S, Söding J, Frickey T, Koretke K, Truffault V, Martin J, Lupas AN. AbrB-like Transcription Factors Assume a Swapped Hairpin Fold that Is Evolutionarily Related to Double-Psi β Barrels. Structure 2005; 13:919-28. [PMID: 15939023 DOI: 10.1016/j.str.2005.03.017] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2005] [Revised: 03/29/2005] [Accepted: 03/29/2005] [Indexed: 11/24/2022]
Abstract
AbrB is a key transition-state regulator of Bacillus subtilis. Based on the conservation of a betaalphabeta structural unit, we proposed a beta barrel fold for its DNA binding domain, similar to, but topologically distinct from, double-psi beta barrels. However, the NMR structure revealed a novel fold, the "looped-hinge helix." To understand this discrepancy, we undertook a bioinformatics study of AbrB and its homologs; these form a large superfamily, which includes SpoVT, PrlF, MraZ, addiction module antidotes (PemI, MazE), plasmid maintenance proteins (VagC, VapB), and archaeal PhoU homologs. MazE and MraZ form swapped-hairpin beta barrels. We therefore reexamined the fold of AbrB by NMR spectroscopy and found that it also forms a swapped-hairpin barrel. The conservation of the core betaalphabeta element supports a common evolutionary origin for swapped-hairpin and double-psi barrels, which we group into a higher-order class, the cradle-loop barrels, based on the peculiar shape of their ligand binding site.
Collapse
Affiliation(s)
- Murray Coles
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, 72076 Tübingen, Germany
| | | | | | | | | | | | | | | |
Collapse
|
1821
|
Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005; 15:275-84. [PMID: 15963890 DOI: 10.1016/j.sbi.2005.04.003] [Citation(s) in RCA: 204] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2005] [Revised: 02/04/2005] [Accepted: 04/18/2005] [Indexed: 10/25/2022]
Abstract
When a protein's function cannot be experimentally determined, it can often be inferred from sequence similarity. Should this process fail, analysis of the protein structure can provide functional clues or confirm tentative functional assignments inferred from the sequence. Many structure-based approaches exist (e.g. fold similarity, three-dimensional templates), but as no single method can be expected to be successful in all cases, a more prudent approach involves combining multiple methods. Several automated servers that integrate evidence from multiple sources have been released this year and particular improvements have been seen with methods utilizing the Gene Ontology functional annotation schema.
Collapse
Affiliation(s)
- James D Watson
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
1822
|
Gibson A, Lewis AP, Affleck K, Aitken AJ, Meldrum E, Thompson N. hCLCA1 and mCLCA3 are secreted non-integral membrane proteins and therefore are not ion channels. J Biol Chem 2005; 280:27205-12. [PMID: 15919655 DOI: 10.1074/jbc.m504654200] [Citation(s) in RCA: 110] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Proteins of the CLCA gene family have been proposed to mediate calcium-activated chloride currents. In this study, we used detailed bioinformatics analysis and found that no transmembrane domains are predicted in hCLCA1 or mCLCA3 (Gob-5). Further analysis suggested that they are globular proteins containing domains that are likely to be involved in protein-protein interactions. In support of the bioinformatics analysis, biochemical studies showed that hCLCA1 and mCLCA3, when expressed in HEK293 cells, could be removed from the cell surface and could be detected in the extracellular medium, even after short incubation times. The accumulation in the medium was shown to be brefeldin A-sensitive, demonstrating that hCLCA1 is constitutively secreted. The N-terminal cleavage products of hCLCA1 and mCLCA3 could be detected in bronchoalveolar lavage fluid taken from asthmatic subjects and ovalbumin-challenged mice, demonstrating release from cells in a physiological setting. We conclude that hCLCA1 and mCLCA3 are non-integral membrane proteins and therefore cannot be chloride channels in their own right.
Collapse
Affiliation(s)
- Adele Gibson
- GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, United Kingdom
| | | | | | | | | | | |
Collapse
|
1823
|
Wistrand M, Sonnhammer ELL. Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER. BMC Bioinformatics 2005; 6:99. [PMID: 15831105 PMCID: PMC1097716 DOI: 10.1186/1471-2105-6-99] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2005] [Accepted: 04/15/2005] [Indexed: 11/24/2022] Open
Abstract
Background Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring. Results Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently. Conclusion SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.
Collapse
Affiliation(s)
- Markus Wistrand
- Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
| | - Erik LL Sonnhammer
- Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden
| |
Collapse
|
1824
|
Abstract
UNLABELLED The availability of advanced profile-profile comparison tools, such as PRC or HHsearch demands sophisticated visualization tools not presently available. We introduce an approach built upon the concept of HMM logos. The method illustrates the similarities of pairs of protein family profiles in an intuitive way. Two HMM logos, one for each profile, are drawn one upon the other. The aligned states are then highlighted and connected. AVAILABILITY A web interface offering online creation of pairwise HMM logos is available at http://www.sanger.ac.uk/Software/analysis/logomat-p. Furthermore, software developers may download a Perl package that includes methods for creation of pairwise HMM logos locally. CONTACT bsb@sanger.ac.uk.
Collapse
|
1825
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|