101
|
Sanchez-Pulido L, Devos D, Sung ZR, Calonje M. RAWUL: a new ubiquitin-like domain in PRC1 ring finger proteins that unveils putative plant and worm PRC1 orthologs. BMC Genomics 2008; 9:308. [PMID: 18588675 PMCID: PMC2447854 DOI: 10.1186/1471-2164-9-308] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 06/27/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Polycomb group (PcG) proteins are a set of chromatin-modifying proteins that play a key role in epigenetic gene regulation. The PcG proteins form large multiprotein complexes with different activities. The two best-characterized PcG complexes are the PcG repressive complex 1 (PRC1) and 2 (PRC2) that respectively possess histone 2A lysine 119 E3 ubiquitin ligase and histone 3 lysine 27 methyltransferase activities. While PRC2-like complexes are conserved throughout the eukaryotic kingdoms, PRC1-like complexes have only been described in Drosophila and vertebrates. Since both complexes are required for the gene silencing mechanism in Drosophila and vertebrates, how PRC1 function is realized in organisms that apparently lack PRC1 such as plants, is so far unknown. In vertebrates, PRC1 includes three proteins, Ring1B, Ring1A, and Bmi-1 that form an E3 ubiquitin ligase complex. These PRC1 proteins have an N-terminally located Ring finger domain associated to a poorly characterized conserved C-terminal region. RESULTS We obtained statistically significant evidences of sequence similarity between the C-terminal region of the PRC1 Ring finger proteins and the ubiquitin (Ubq)-like family proteins, thus defining a new Ubq-like domain, the RAWUL domain. In addition, our analysis revealed the existence of plant and worm proteins that display the conserved combination of a Ring finger domain at the N-terminus and a RAWUL domain at the C-terminus. CONCLUSION Analysis of the conserved domain architecture among PRC1 Ring finger proteins revealed the existence of long sought PRC1 protein orthologs in these organisms, suggesting the functional conservation of PRC1 throughout higher eukaryotes.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Centro Nacional de Biotecnología (CNB-CSIC). Cantoblanco, E-28049 Madrid, Spain.
| | | | | | | |
Collapse
|
102
|
Grunt M, Žárský V, Cvrčková F. Roots of angiosperm formins: the evolutionary history of plant FH2 domain-containing proteins. BMC Evol Biol 2008; 8:115. [PMID: 18430232 PMCID: PMC2386819 DOI: 10.1186/1471-2148-8-115] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 04/22/2008] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Shuffling of modular protein domains is an important source of evolutionary innovation. Formins are a family of actin-organizing proteins that share a conserved FH2 domain but their overall domain architecture differs dramatically between opisthokonts (metazoans and fungi) and plants. We performed a phylogenomic analysis of formins in most eukaryotic kingdoms, aiming to reconstruct an evolutionary scenario that may have produced the current diversity of domain combinations with focus on the origin of the angiosperm formin architectures. RESULTS The Rho GTPase-binding domain (GBD/FH3) reported from opisthokont and Dictyostelium formins was found in all lineages except plants, suggesting its ancestral character. Instead, mosses and vascular plants possess the two formin classes known from angiosperms: membrane-anchored Class I formins and Class II formins carrying a PTEN-like domain. PTEN-related domains were found also in stramenopile formins, where they have been probably acquired independently rather than by horizontal transfer, following a burst of domain rearrangements in the chromalveolate lineage. A novel RhoGAP-related domain was identified in some algal, moss and lycophyte (but not angiosperm) formins that define a specific branch (Class III) of the formin family. CONCLUSION We propose a scenario where formins underwent multiple domain rearrangements in several eukaryotic lineages, especially plants and chromalveolates. In plants this replaced GBD/FH3 by a probably inactive RhoGAP-like domain, preserving a formin-mediated association between (membrane-anchored) Rho GTPases and the actin cytoskeleton. Subsequent amplification of formin genes, possibly coincident with the expansion of plants to dry land, was followed by acquisition of alternative membrane attachment mechanisms present in extant Class I and Class II formins, allowing later loss of the RhoGAP-like domain-containing formins in angiosperms.
Collapse
Affiliation(s)
- Michal Grunt
- Department of Plant Physiology, Faculty of Sciences, Charles University, Vinièná 5, CZ 128 43 Praha 2, Czech Republic
| | - Viktor Žárský
- Department of Plant Physiology, Faculty of Sciences, Charles University, Vinièná 5, CZ 128 43 Praha 2, Czech Republic
- Institute of Experimental Botany, Academy of Sciences of the Czech Republic, Rozvojová 135, CZ 165 02 Praha 6, Czech Republic
| | - Fatima Cvrčková
- Department of Plant Physiology, Faculty of Sciences, Charles University, Vinièná 5, CZ 128 43 Praha 2, Czech Republic
| |
Collapse
|
103
|
Xie L, Bourne PE. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci U S A 2008; 105:5441-6. [PMID: 18385384 PMCID: PMC2291117 DOI: 10.1073/pnas.0704422105] [Citation(s) in RCA: 181] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Indexed: 11/18/2022] Open
Abstract
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structure relationship remains obscure. Results suggest evolutionary relationships across several previously evolutionary distinct protein structure superfamilies. SOIPPA, along with an increased coverage of protein fold space afforded by the structural genomics initiative, can be used to further test the notion that fold space is continuous rather than discrete.
Collapse
Affiliation(s)
- Lei Xie
- *San Diego Supercomputer Center and
| | - Philip E. Bourne
- *San Diego Supercomputer Center and
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093
| |
Collapse
|
104
|
Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genes Dev 2007; 17:1572-85. [PMID: 17908824 PMCID: PMC2045140 DOI: 10.1101/gr.6454307] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2007] [Accepted: 08/23/2007] [Indexed: 11/25/2022]
Abstract
The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya.
Collapse
Affiliation(s)
- Minglei Wang
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Liudmila S. Yafremava
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Derek Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Jay E. Mittenthal
- Department of Cell and Developmental Biology, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
105
|
Wicher KB, Fries E. Convergent Evolution of Human and Bovine Haptoglobin: Partial Duplication of the Genes. J Mol Evol 2007; 65:373-9. [DOI: 10.1007/s00239-007-9002-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2006] [Accepted: 04/17/2007] [Indexed: 10/22/2022]
|
106
|
Rasteiro R, Pereira-Leal JB. Multiple domain insertions and losses in the evolution of the Rab prenylation complex. BMC Evol Biol 2007; 7:140. [PMID: 17705859 PMCID: PMC1994686 DOI: 10.1186/1471-2148-7-140] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2007] [Accepted: 08/17/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rab proteins are regulators of vesicular trafficking, requiring a lipid modification for proper function, prenylation of C-terminal cysteines. This is catalysed by a complex of a catalytic heterodimer (Rab Geranylgeranyl Transferase - RabGGTase) and an accessory protein (Rab Escort Protein. REP). Components of this complex display domain insertions relative to paralogous proteins. The function of these inserted domains is unclear. RESULTS We profiled the domain architecture of the components of the Rab prenylation complex in evolution. We identified the orthologues of the components of the Rab prenylation machinery in 43 organisms, representing the crown eukaryotic groups. We characterize in detail the domain structure of all these components and the phylogenetic relationships between the individual domains. CONCLUSION We found different domain insertions in different taxa, in alpha-subunits of RGGTase and REP. Our results suggest that there were multiple insertions, expansions and contractions in the evolution of this prenylation complex.
Collapse
Affiliation(s)
- Rita Rasteiro
- Instituto Gulbenkian de Ciência, Apartado 14, P-2781-901 Oeiras Portugal
| | | |
Collapse
|
107
|
Derelle R, Lopez P, Le Guyader H, Manuel M. Homeodomain proteins belong to the ancestral molecular toolkit of eukaryotes. Evol Dev 2007; 9:212-9. [PMID: 17501745 DOI: 10.1111/j.1525-142x.2007.00153.x] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Multicellular organization arose several times by convergence during the evolution of eukaryotes (e.g., in terrestrial plants, several lineages of "algae," fungi, and metazoans). To reconstruct the evolutionary transitions between unicellularity and multicellularity, we need a proper understanding of the origin and diversification of regulatory molecules governing the construction of a multicellular organism in these various lineages. Homeodomain (HD) proteins offer a paradigm for studying such issues, because in multicellular eukaryotes, like animals, fungi and plants, these transcription factors are extensively used in fundamental developmental processes and are highly diversified. A number of large eukaryote lineages are exclusively unicellular, however, and it remains unclear to what extent this condition reflects their primitive lack of "good building blocks" such as the HD proteins. Taking advantage from the recent burst of sequence data from a wide variety of eukaryote taxa, we show here that HD-containing transcription factors were already existing and diversified (in at least two main classes) in the last common eukaryote ancestor. Although the family was retained and independently expanded in the multicellular taxa, it was lost in several lineages of unicellular parasites or intracellular symbionts. Our findings are consistent with the idea that the common ancestor of eukaryotes was complex in molecular terms, and already possessed many of the regulatory molecules, which later favored the multiple convergent acquisition of multicellularity.
Collapse
Affiliation(s)
- Romain Derelle
- Université Pierre et Marie Curie-Paris 6, UMR 7138 CNRS UPMC MNHN IRD, Case 05, 7 quai St Bernard, 75005 Paris, France
| | | | | | | |
Collapse
|
108
|
Ekman D, Björklund AK, Elofsson A. Quantification of the elevated rate of domain rearrangements in metazoa. J Mol Biol 2007; 372:1337-48. [PMID: 17689563 DOI: 10.1016/j.jmb.2007.06.022] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2007] [Revised: 06/07/2007] [Accepted: 06/08/2007] [Indexed: 11/24/2022]
Abstract
Most eukaryotic proteins consist of multiple domains created through gene fusions or internal duplications. The most frequent change of a domain architecture (DA) is insertion or deletion of a domain at the N or C terminus. Still, the mechanisms underlying the evolution of multidomain proteins are not very well studied. Here, we have studied the evolution of multidomain architectures (MDA), guided by evolutionary information in the form of a phylogenetic tree. Our results show that Pfam domain families and MDAs have been created with comparable rates (0.1-1 per million years (My)). The major changes in DA evolution have occurred in the process of multicellularization and within the metazoan lineage. In contrast, creation of domains seems to have been frequent already in the early evolution. Furthermore, most of the architectures have been created from older domains or architectures, whereas novel domains are mainly found in single-domain proteins. However, a particular group of exon-bordering domains may have contributed to the rapid evolution of novel multidomain proteins in metazoan organisms. Finally, MDAs have evolved predominantly through insertions of domains, whereas domain deletions are less common. In conclusion, the rate of creation of multidomain proteins has accelerated in the metazoan lineage, which may partly be explained by the frequent insertion of exon-bordering domains into new architectures. However, our results indicate that other factors have contributed as well.
Collapse
Affiliation(s)
- Diana Ekman
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-10691 Stockholm, Sweden
| | | | | |
Collapse
|
109
|
Taylor WR. Evolutionary transitions in protein fold space. Curr Opin Struct Biol 2007; 17:354-61. [PMID: 17580115 DOI: 10.1016/j.sbi.2007.06.002] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Revised: 04/11/2007] [Accepted: 06/06/2007] [Indexed: 10/23/2022]
Abstract
With the number of known protein folds potentially approaching completion, the problems associated with their systematic classification are evaluated. It is argued that it will be difficult, if not impossible, to find a general metric based on pairwise comparison that will provide a satisfactory classification. It is suggested that some progress may be made through comparison against a library of idealised 'template' folds, but a proper solution can only be attained if this includes a model of the underlying evolutionary processes. These processes are considered with examples of some unexpected relationships among folds, including circular permutations. The problem is finally set in the wider context of the genetic environment, introducing complications relating to introns, gene fixation and population size.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| |
Collapse
|
110
|
Caetano-Anollés G, Kim HS, Mittenthal JE. The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Natl Acad Sci U S A 2007; 104:9358-63. [PMID: 17517598 PMCID: PMC1890499 DOI: 10.1073/pnas.0701214104] [Citation(s) in RCA: 118] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Metabolism represents a complex collection of enzymatic reactions and transport processes that convert metabolites into molecules capable of supporting cellular life. Here we explore the origins and evolution of modern metabolism. Using phylogenomic information linked to the structure of metabolic enzymes, we sort out recruitment processes and discover that most enzymatic activities were associated with the nine most ancient and widely distributed protein fold architectures. An analysis of newly discovered functions showed enzymatic diversification occurred early, during the onset of the modern protein world. Most importantly, phylogenetic reconstruction exercises and other evidence suggest strongly that metabolism originated in enzymes with the P-loop hydrolase fold in nucleotide metabolism, probably in pathways linked to the purine metabolic subnetwork. Consequently, the first enzymatic takeover of an ancient biochemistry or prebiotic chemistry was related to the synthesis of nucleotides for the RNA world.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Departments of Crop Sciences and Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | | | | |
Collapse
|
111
|
te Velthuis AJ, Isogai T, Gerrits L, Bagowski CP. Insights into the molecular evolution of the PDZ/LIM family and identification of a novel conserved protein motif. PLoS One 2007; 2:e189. [PMID: 17285143 PMCID: PMC1781342 DOI: 10.1371/journal.pone.0000189] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2006] [Accepted: 01/11/2007] [Indexed: 01/01/2023] Open
Abstract
The PDZ and LIM domain-containing protein family is encoded by a diverse group of genes whose phylogeny has currently not been analyzed. In mammals, ten genes are found that encode both a PDZ- and one or several LIM-domains. These genes are: ALP, RIL, Elfin (CLP36), Mystique, Enigma (LMP-1), Enigma homologue (ENH), ZASP (Cypher, Oracle), LMO7 and the two LIM domain kinases (LIMK1 and LIMK2). As conventional alignment and phylogenetic procedures of full-length sequences fell short of elucidating the evolutionary history of these genes, we started to analyze the PDZ and LIM domain sequences themselves. Using information from most sequenced eukaryotic lineages, our phylogenetic analysis is based on full-length cDNA-, EST-derived- and genomic- PDZ and LIM domain sequences of over 25 species, ranging from yeast to humans. Plant and protozoan homologs were not found. Our phylogenetic analysis identifies a number of domain duplication and rearrangement events, and shows a single convergent event during evolution of the PDZ/LIM family. Further, we describe the separation of the ALP and Enigma subfamilies in lower vertebrates and identify a novel consensus motif, which we call ‘ALP-like motif’ (AM). This motif is highly-conserved between ALP subfamily proteins of diverse organisms. We used here a combinatorial approach to define the relation of the PDZ and LIM domain encoding genes and to reconstruct their phylogeny. This analysis allowed us to classify the PDZ/LIM family and to suggest a meaningful model for the molecular evolution of the diverse gene architectures found in this multi-domain family.
Collapse
Affiliation(s)
- Aartjan J.W. te Velthuis
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Tadamoto Isogai
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Lieke Gerrits
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Christoph P. Bagowski
- Department of Integrative Zoology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
112
|
Abstract
Phylogenetic analysis has changed greatly in the last decade, and the most important themes in that change are reviewed here. Sequence data have become the most common source of phylogenetic information. This means that explicit models for evolutionary processes have been developed in a likelihood context, which allow more realistic data analyses. These models are becoming increasingly complex, both for nucleotides and for amino acid sequences, and so all such models need to be quantitatively assessed for each data set, to find the most appropriate one for use in any particular tree-building analysis. Bayesian analysis has been developed for tree-building and is greatly increasing in popularity. This is because a good heuristic strategy exists, which allows large data sets to be analyzed with complex evolutionary models in a practical time. Perhaps the most disappointing aspect of tree interpretation is the ongoing confusion between rooted and unrooted trees, while the effect of taxon and character sampling is often overlooked when constructing a phylogeny (especially in parasitology). The review finishes with a detailed consideration of the analysis of a multi-gene data set for several dozen taxa of Cryptosporidium (Apicomplexa), illustrating many of the theoretical and practical points highlighted in the review.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden
| |
Collapse
|
113
|
Wang M, Caetano-Anollés G. Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol 2006; 23:2444-54. [PMID: 16971695 DOI: 10.1093/molbev/msl117] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The majority of proteins consist of multiple domains that are either repeated or combined in defined order. In this study, we survey the combination of protein domains defined at fold and fold superfamily levels in 185 genomes belonging to organisms that have been fully sequenced and introduce a method that reconstructs rooted phylogenomic trees from the content and arrangement of domains in proteins at a genomic level. We find that the majority of domain combinations were unique to Archaea, Bacteria, or Eukarya, suggesting most combinations originated after life had diversified. Domain repeat and domain repeat within multidomain proteins increased notably in eukaryotes, mainly at the expense of single-domain and domain-pair proteins. This increase was mostly confined to Metazoa. We also find an unbalanced sharing of domain combinations which suggests that Eukarya is more closely related to Bacteria than to Archaea, an observation that challenges the widely assumed eukaryote-archaebacterial sisterhood relationship. The occurrence and abundance of the molecular repertoire (interactome) of domain combinations was used to generate phylogenomic trees. These global interactome-based phylogenies described organismal histories satisfactorily, revealing the tripartite nature of life, and supporting controversial evolutionary patterns, such as the Coelomata hypothesis, the grouping of plants and animals, and the Gram-positive origin of bacteria. Results suggest strongly that the process of domain combination is not random but curved by evolution, rejecting the null hypothesis of domain modules combining in the absence of natural selection or an optimality criterion.
Collapse
Affiliation(s)
- Minglei Wang
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | |
Collapse
|
114
|
Abstract
Many classification schemes for proteins and domains are either hierarchical or semi-hierarchical yet most databases, especially those offering genome-wide analysis, only provide assignments to sequences at one level of their hierarchy. Given an established hierarchy, the problem of assigning new sequences to lower levels of that existing hierarchy is less hard (but no less important) than the initial top level assignment which requires the detection of the most distant relationships. A solution to this problem is described here in the form of a new procedure which can be thought of as a hybrid between pairwise and profile methods. The hybrid method is a general procedure that can be applied to any pre-defined hierarchy, at any level, including in principle multiple sub-levels. It has been tested on the SCOP classification via the SUPERFAMILY database and performs significantly better than either pairwise or profile methods alone. Perhaps the greatest advantage of the hybrid method over other possible approaches to the problem is that within the framework of an existing profile library, the assignments are fully automatic and come at almost no additional computational cost. Hence it has already been applied at the SCOP family level to all genomes in the SUPERFAMILY database, providing a wealth of new data to the biological and bioinformatics communities.
Collapse
Affiliation(s)
- Julian Gough
- Unite de Bioinformatique Structurale, Institut Pasteur, 25-28 Rue du Docteur Roux, 75724 Paris Cedex 15, Paris, France.
| |
Collapse
|
115
|
Weiner J, Beaussart F, Bornberg-Bauer E. Domain deletions and substitutions in the modular protein evolution. FEBS J 2006; 273:2037-47. [PMID: 16640566 DOI: 10.1111/j.1742-4658.2006.05220.x] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The main mechanisms shaping the modular evolution of proteins are gene duplication, fusion and fission, recombination and loss of fragments. While a large body of research has focused on duplications and fusions, we concentrated, in this study, on how domains are lost. We investigated motif databases and introduced a measure of protein similarity that is based on domain arrangements. Proteins are represented as strings of domains and comparison was based on the classic dynamic alignment scheme. We found that domain losses and duplications were more frequent at the ends of proteins. We showed that losses can be explained by the introduction of start and stop codons which render the terminal domains nonfunctional, such that further shortening, until the whole domain is lost, is not evolutionarily selected against. We demonstrated that domains which also occur as single-domain proteins are less likely to be lost at the N terminus and in the middle, than at the C terminus. We conclude that fission/fusion events with single-domain proteins occur mostly at the C terminus. We found that domain substitutions are rare, in particular in the middle of proteins. We also showed that many cases of substitutions or losses result from erroneous annotations, but we were also able to find courses of evolutionary events where domains vanish over time. This is explained by a case study on the bacterial formate dehydrogenases.
Collapse
Affiliation(s)
- January Weiner
- Division of Bioinformatics, School of Biological Sciences, The Westfalian Wilhelms University of Münster, Germany
| | | | | |
Collapse
|
116
|
Han JH, Kerrison N, Chothia C, Teichmann SA. Divergence of interdomain geometry in two-domain proteins. Structure 2006; 14:935-45. [PMID: 16698554 DOI: 10.1016/j.str.2006.01.016] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 12/23/2005] [Accepted: 01/18/2006] [Indexed: 10/24/2022]
Abstract
For homologous protein chains composed of two domains, we have determined the extent to which they conserve (1) their interdomain geometry and (2) the molecular structure of the domain interface. This work was carried out on 128 unique two-domain architectures. Of the 128, we find 75 conserve their interdomain geometry and the structure of their domain interface; 5 conserve their interdomain geometry but not the structure of their interface; and 48 have variable geometries and divergent interface structure. We describe how different types of interface changes or the absence of an interface is responsible for these differences in geometry. Variable interdomain geometries can be found in homologous structures with high sequence identities (70%).
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom.
| | | | | | | |
Collapse
|
117
|
Abstract
Background The Engrailed Homology 1 (EH1) motif is a small region, believed to have evolved convergently in homeobox and forkhead containing proteins, that interacts with the Drosophila protein groucho (C. elegans unc-37, Human Transducin-like Enhancers of Split). The small size of the motif makes its reliable identification by computational means difficult. I have systematically searched the predicted proteomes of Drosophila, C. elegans and human for further instances of the motif. Results Using motif identification methods and database searching techniques, I delimit which homeobox and forkhead domain containing proteins also have likely EH1 motifs. I show that despite low database search scores, there is a significant association of the motif with transcription factor function. I further show that likely EH1 motifs are found in combination with T-Box, Zinc Finger and Doublesex domains as well as discussing other plausible candidate associations. I identify strong candidate EH1 motifs in basal metazoan phyla. Conclusion Candidate EH1 motifs exist in combination with a variety of transcription factor domains, suggesting that these proteins have repressor functions. The distribution of the EH1 motif is suggestive of convergent evolution, although in many cases, the motif has been conserved throughout bilaterian orthologs. Groucho mediated repression was established prior to the evolution of bilateria.
Collapse
Affiliation(s)
- Richard R Copley
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK.
| |
Collapse
|
118
|
Light S, Kraulis P, Elofsson A. Preferential attachment in the evolution of metabolic networks. BMC Genomics 2005; 6:159. [PMID: 16281983 PMCID: PMC1316878 DOI: 10.1186/1471-2164-6-159] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2005] [Accepted: 11/10/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many biological networks show some characteristics of scale-free networks. Scale-free networks can evolve through preferential attachment where new nodes are preferentially attached to well connected nodes. In networks which have evolved through preferential attachment older nodes should have a higher average connectivity than younger nodes. Here we have investigated preferential attachment in the context of metabolic networks. RESULTS The connectivities of the enzymes in the metabolic network of Escherichia coli were determined and representatives for these enzymes were located in 11 eukaryotes, 17 archaea and 46 bacteria. E. coli enzymes which have representatives in eukaryotes have a higher average connectivity while enzymes which are represented only in the prokaryotes, and especially the enzymes only present in betagamma-proteobacteria, have lower connectivities than expected by chance. Interestingly, the enzymes which have been proposed as candidates for horizontal gene transfer have a higher average connectivity than the other enzymes. Furthermore, It was found that new edges are added to the highly connected enzymes at a faster rate than to enzymes with low connectivities which is consistent with preferential attachment. CONCLUSION Here, we have found indications of preferential attachment in the metabolic network of E. coli. A possible biological explanation for preferential attachment growth of metabolic networks is that novel enzymes created through gene duplication maintain some of the compounds involved in the original reaction, throughout its future evolution. In addition, we found that enzymes which are candidates for horizontal gene transfer have a higher average connectivity than other enzymes. This indicates that while new enzymes are attached preferentially to highly connected enzymes, these highly connected enzymes have sometimes been introduced into the E. coli genome by horizontal gene transfer. We speculate that E. coli has adjusted its metabolic network to a changing environment by replacing the relatively central enzymes for better adapted orthologs from other prokaryotic species.
Collapse
Affiliation(s)
- Sara Light
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophyhsics, Albanova University Center, Stockholm University, Stockholm SE-10691, Sweden
| | - Per Kraulis
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophyhsics, Albanova University Center, Stockholm University, Stockholm SE-10691, Sweden
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophyhsics, Albanova University Center, Stockholm University, Stockholm SE-10691, Sweden
| |
Collapse
|
119
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2447491 DOI: 10.1002/cfg.425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|