101
|
Pang E, Tan T, Lin K. Promiscuous domains: facilitating stability of the yeast protein–protein interaction network. ACTA ACUST UNITED AC 2012; 8:766-71. [DOI: 10.1039/c1mb05364g] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
102
|
Yu GX. RULEMINER: A KNOWLEDGE SYSTEM FOR SUPPORTING HIGH-THROUGHPUT PROTEIN FUNCTION ANNOTATIONS. J Bioinform Comput Biol 2011; 2:615-37. [PMID: 15617156 DOI: 10.1142/s0219720004000752] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2003] [Revised: 03/23/2004] [Accepted: 03/24/2004] [Indexed: 11/18/2022]
Abstract
In this paper, we present RuleMiner, a knowledge system to facilitate a seamless integration of multi-sequence analysis tools and define profile-based rules for supporting high-throughput protein function annotations. This system consists of three essential components, Protein Function Groups (PFGs), PFG profiles and rules. The PFGs, established from an integrated analysis of current knowledge of protein functions from Swiss-Prot database and protein family-based sequence classifications, cover all possible cellular functions available in the database. The PFG profiles illustrate detailed protein features in the PFGs as in sequence conservations, the occurrences of sequence-based motifs, domains and species distributions. The rules, extracted from the PFG profiles, describe the clear relationships between these PFGs and all possible features. As a result, the RuleMiner is able to provide an enhanced capability for protein function analysis, such as results from the integrated sequence analysis tools for given proteins can be comparatively analyzed due to the clear feature-PFG relationships. Also, much needed guidance is readily available for such analysis. If the rules describe one-to-one (unique) relationships between the protein features and the PFGs, then these features can be utilized as unique functional identifiers and cellular functions of unknown proteins can be reliably determined. Otherwise, additional information has to be provided.
Collapse
Affiliation(s)
- Gong-Xin Yu
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| |
Collapse
|
103
|
Parikesit AA, Stadler PF, Prohaska SJ. Evolution and quantitative comparison of genome-wide protein domain distributions. Genes (Basel) 2011; 2:912-24. [PMID: 24710298 PMCID: PMC3927604 DOI: 10.3390/genes2040912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 10/07/2011] [Accepted: 10/25/2011] [Indexed: 02/01/2023] Open
Abstract
The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey essentially the same information and at the same time are much more readily accessible, although protein domain models trained for one phylogenetic group frequently fail on distantly related sequences. Pooling related domain models based on their GO-annotation in combination with de novo gene prediction methods provides estimates that seem to be less affected by phylogenetic biases. We show here for 18 diverse representatives from all eukaryotic kingdoms that a pooled analysis of the tendencies for co-occurrence or avoidance of protein domains is indeed feasible. This type of analysis can reveal general large-scale patterns in the domain co-occurrence and helps to identify lineage-specific variations in the evolution of protein domains. Somewhat surprisingly, we do not find strong ubiquitous patterns governing the evolutionary behavior of specific functional classes. Instead, there are strong variations between the major groups of Eukaryotes, pointing at systematic differences in their evolutionary constraints.
Collapse
Affiliation(s)
- Arli A Parikesit
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - Peter F Stadler
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - Sonja J Prohaska
- Computational EvoDevo Group, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| |
Collapse
|
104
|
Nasir A, Naeem A, Khan MJ, Nicora HDL, Caetano-Anollés G. Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms. Genes (Basel) 2011; 2:869-911. [PMID: 24710297 PMCID: PMC3927607 DOI: 10.3390/genes2040869] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Revised: 10/28/2011] [Accepted: 10/28/2011] [Indexed: 12/28/2022] Open
Abstract
The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.
Collapse
Affiliation(s)
- Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA.
| | - Aisha Naeem
- Mammalian NutriPhysioGenomics Laboratory, Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA.
| | - Muhammad Jawad Khan
- Mammalian NutriPhysioGenomics Laboratory, Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA.
| | - Horacio D Lopez Nicora
- Plant Pathology Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA.
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA.
| |
Collapse
|
105
|
Wu YC, Rasmussen MD, Kellis M. Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny. Mol Biol Evol 2011; 29:689-705. [PMID: 21900599 PMCID: PMC3258039 DOI: 10.1093/molbev/msr222] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, USA.
| | | | | |
Collapse
|
106
|
Chen FC, Pan CL, Lin HY. Independent effects of alternative splicing and structural constraint on the evolution of mammalian coding exons. Mol Biol Evol 2011; 29:187-93. [PMID: 21795252 DOI: 10.1093/molbev/msr182] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Alternative splicing (AS) is known to significantly affect exon-level protein evolutionary rates in mammals. Particularly, alternatively spliced exons (ASEs) have a higher nonsynonymous-to-synonymous substitution rate (dN/dS) ratio than constitutively spliced exons (CSEs), possibly because the former are required only occasionally for normal biological functions. Meanwhile, intrinsically disordered regions (IDRs), the protein regions lacking fixed 3D structures, are also reported to have an increased evolutionary rate due to lack of structural constraint. Interestingly, IDRs tend to be located in alternative protein regions. Yet which of these two factors is the major determinant of the increased dN/dS in mammalian ASEs remains unclear. By comparing human-macaque and human-mouse one-to-one orthologous genes, we demonstrate that AS and protein structural disorder have independent effects on mammalian exon evolution. We performed analyses of covariance to demonstrate that the slopes of the (dN/dS-percentage of IDR) regression lines differ significantly between CSEs and ASEs. In other words, the dN/dS ratios of both ASEs and CSEs increase with the proportion of IDR (PIDR), whereas ASEs have higher dN/dS ratios than CSEs when they have similar PIDRs. Since ASEs and IDRs may less frequently overlap with protein domains (which also affect dN/dS), we also examined the correlations between dN/dS ratio and exon type/PIDR by controlling for the density of protein domain. We found that the effects of exon type and PIDR on dN/dS are both independent of domain density. Our results imply that nature can select for different biological features with regard to ASEs and IDRs, even though the two biological features tend to be localized in the same protein regions.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, Republic of China.
| | | | | |
Collapse
|
107
|
Kersse K, Verspurten J, Vanden Berghe T, Vandenabeele P. The death-fold superfamily of homotypic interaction motifs. Trends Biochem Sci 2011; 36:541-52. [PMID: 21798745 DOI: 10.1016/j.tibs.2011.06.006] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2011] [Revised: 06/19/2011] [Accepted: 06/22/2011] [Indexed: 11/16/2022]
Abstract
The death-fold superfamily encompasses four structurally homologous subfamilies that engage in homotypic, subfamily-restricted interactions. The Death Domains (DDs), the Death Effector Domains (DEDs), the CAspase Recruitment Domains (CARDs) and the PYrin Domains (PYDs) constitute key building blocks involved in the assembly of multimeric complexes implicated in signaling cascades leading to inflammation and cell death. We review the molecular basis of these homotypic domain-domain interactions in light of their structure, function and evolution. In addition, we elaborate on three distinct types of asymmetric interactions that were recently identified from the crystal structures of three multimeric, death-fold complexes: the MyDDosome, the PIDDosome and the Fas/FADD-DISC. Insights into the mechanisms of interaction of death-fold domains will be useful to design strategies for specific modulation of complex formation and might lead to novel therapeutic applications.
Collapse
Affiliation(s)
- Kristof Kersse
- Department for Molecular Biomedical Research, VIB, B-9052 Ghent (Zwijnaarde), Belgium
| | | | | | | |
Collapse
|
108
|
Lisacek F, Chichester C, Gonnet P, Jaillet O, Kappus S, Nikitin F, Roland P, Rossier G, Truong L, Appel R. Shaping biological knowledge: applications in proteomics. Comp Funct Genomics 2011; 5:190-5. [PMID: 18629073 PMCID: PMC2447358 DOI: 10.1002/cfg.379] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2003] [Revised: 12/12/2003] [Accepted: 12/18/2003] [Indexed: 12/01/2022] Open
Abstract
The central dogma of molecular biology has provided a meaningful principle
for data integration in the field of genomics. In this context, integration reflects
the known transitions from a chromosome to a protein sequence: transcription,
intron splicing, exon assembly and translation. There is no such clear principle for
integrating proteomics data, since the laws governing protein folding and interactivity
are not quite understood. In our effort to bring together independent pieces of
information relative to proteins in a biologically meaningful way, we assess the bias of
bioinformatics resources and consequent approximations in the framework of small-scale
studies. We analyse proteomics data while following both a data-driven (focus
on proteins smaller than 10 kDa) and a hypothesis-driven (focus on whole bacterial
proteomes) approach. These applications are potentially the source of specialized
complements to classical biological ontologies.
Collapse
Affiliation(s)
- F Lisacek
- R&D GeneBio, 25 Avenue de Champel, Geneva 1206, Switzerland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
109
|
Reassessing domain architecture evolution of metazoan proteins: major impact of gene prediction errors. Genes (Basel) 2011; 2:449-501. [PMID: 24710207 PMCID: PMC3927609 DOI: 10.3390/genes2030449] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Revised: 06/14/2011] [Accepted: 06/20/2011] [Indexed: 11/17/2022] Open
Abstract
In view of the fact that appearance of novel protein domain architectures (DA) is closely associated with biological innovations, there is a growing interest in the genome-scale reconstruction of the evolutionary history of the domain architectures of multidomain proteins. In such analyses, however, it is usually ignored that a significant proportion of Metazoan sequences analyzed is mispredicted and that this may seriously affect the validity of the conclusions. To estimate the contribution of errors in gene prediction to differences in DA of predicted proteins, we have used the high quality manually curated UniProtKB/Swiss-Prot database as a reference. For genome-scale analysis of domain architectures of predicted proteins we focused on RefSeq, EnsEMBL and NCBI's GNOMON predicted sequences of Metazoan species with completely sequenced genomes. Comparison of the DA of UniProtKB/Swiss-Prot sequences of worm, fly, zebrafish, frog, chick, mouse, rat and orangutan with those of human Swiss-Prot entries have identified relatively few cases where orthologs had different DA, although the percentage with different DA increased with evolutionary distance. In contrast with this, comparison of the DA of human, orangutan, rat, mouse, chicken, frog, zebrafish, worm and fly RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with those of the corresponding/orthologous human Swiss-Prot entries identified a significantly higher proportion of domain architecture differences than in the case of the comparison of Swiss-Prot entries. Analysis of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences with DAs different from those of their Swiss-Prot orthologs confirmed that the higher rate of domain architecture differences is due to errors in gene prediction, the majority of which could be corrected with our FixPred protocol. We have also demonstrated that contamination of databases with incomplete, abnormal or mispredicted sequences introduces a bias in DA differences in as much as it increases the proportion of terminal over internal DA differences. Here we have shown that in the case of RefSeq, EnsEMBL and NCBI's GNOMON predicted protein sequences of Metazoan species, the contribution of gene prediction errors to domain architecture differences of orthologs is comparable to or greater than those due to true gene rearrangements. We have also demonstrated that domain architecture comparison may serve as a useful tool for the quality control of gene predictions and may thus guide the correction of sequence errors. Our findings caution that earlier genome-scale studies based on comparison of predicted (frequently mispredicted) protein sequences may have led to some erroneous conclusions about the evolution of novel domain architectures of multidomain proteins. A reassessment of the DA evolution of orthologous and paralogous proteins is presented in an accompanying paper [1].
Collapse
|
110
|
Cohen-Gihon I, Sharan R, Nussinov R. Processes of fungal proteome evolution and gain of function: gene duplication and domain rearrangement. Phys Biol 2011; 8:035009. [PMID: 21572172 PMCID: PMC3140765 DOI: 10.1088/1478-3975/8/3/035009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
During evolution, organisms have gained functional complexity mainly by modifying and improving existing functioning systems rather than creating new ones ab initio. Here we explore the interplay between two processes which during evolution have had major roles in the acquisition of new functions: gene duplication and protein domain rearrangements. We consider four possible evolutionary scenarios: gene families that have undergone none of these event types; only gene duplication; only domain rearrangement, or both events. We characterize each of the four evolutionary scenarios by functional attributes. Our analysis of ten fungal genomes indicates that at least for the fungi clade, species significantly appear to gain complexity by gene duplication accompanied by the expansion of existing domain architectures via rearrangements. We show that paralogs gaining new domain architectures via duplication tend to adopt new functions compared to paralogs that preserve their domain architectures. We conclude that evolution of protein families through gene duplication and domain rearrangement is correlated with their functional properties. We suggest that in general, new functions are acquired via the integration of gene duplication and domain rearrangements rather than each process acting independently.
Collapse
Affiliation(s)
- Inbar Cohen-Gihon
- Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ruth Nussinov
- Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Center for Cancer Research Nanobiology Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702
| |
Collapse
|
111
|
Lucchetti-Miganeh C, Goudenège D, Thybert D, Salbert G, Barloy-Hubler F. SORGOdb: Superoxide Reductase Gene Ontology curated DataBase. BMC Microbiol 2011; 11:105. [PMID: 21575179 PMCID: PMC3116461 DOI: 10.1186/1471-2180-11-105] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Accepted: 05/16/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Superoxide reductases (SOR) catalyse the reduction of superoxide anions to hydrogen peroxide and are involved in the oxidative stress defences of anaerobic and facultative anaerobic organisms. Genes encoding SOR were discovered recently and suffer from annotation problems. These genes, named sor, are short and the transfer of annotations from previously characterized neelaredoxin, desulfoferrodoxin, superoxide reductase and rubredoxin oxidase has been heterogeneous. Consequently, many sor remain anonymous or mis-annotated. DESCRIPTION SORGOdb is an exhaustive database of SOR that proposes a new classification based on domain architecture. SORGOdb supplies a simple user-friendly web-based database for retrieving and exploring relevant information about the proposed SOR families. The database can be queried using an organism name, a locus tag or phylogenetic criteria, and also offers sequence similarity searches using BlastP. Genes encoding SOR have been re-annotated in all available genome sequences (prokaryotic and eukaryotic (complete and in draft) genomes, updated in May 2010). CONCLUSIONS SORGOdb contains 325 non-redundant and curated SOR, from 274 organisms. It proposes a new classification of SOR into seven different classes and allows biologists to explore and analyze sor in order to establish correlations between the class of SOR and organism phenotypes. SORGOdb is freely available at http://sorgo.genouest.org/index.php.
Collapse
Affiliation(s)
- Céline Lucchetti-Miganeh
- CNRS UMR 6026, ICM, Equipe Sp@rte, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes, France.
| | | | | | | | | |
Collapse
|
112
|
Eickholt J, Deng X, Cheng J. DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning. BMC Bioinformatics 2011; 12:43. [PMID: 21284866 PMCID: PMC3036623 DOI: 10.1186/1471-2105-12-43] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Accepted: 02/01/2011] [Indexed: 11/17/2022] Open
Abstract
Background Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved. Results We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines. Conclusions The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.
Collapse
Affiliation(s)
- Jesse Eickholt
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | | | | |
Collapse
|
113
|
Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol 2011; 12:R4. [PMID: 21241503 PMCID: PMC3091302 DOI: 10.1186/gb-2011-12-1-r4] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Revised: 12/23/2010] [Accepted: 01/17/2011] [Indexed: 12/11/2022] Open
Abstract
Background Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes. Results We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes. Conclusions While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals.
Collapse
|
114
|
Zhao H, Yang Y, Zhou Y. Structure-based prediction of RNA-binding domains and RNA-binding sites and application to structural genomics targets. Nucleic Acids Res 2010; 39:3017-25. [PMID: 21183467 PMCID: PMC3082898 DOI: 10.1093/nar/gkq1266] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Mechanistic understanding of many key cellular processes often involves identification of RNA binding proteins (RBPs) and RNA binding sites in two separate steps. Here, they are predicted simultaneously by structural alignment to known protein-RNA complex structures followed by binding assessment with a DFIRE-based statistical energy function. This method achieves 98% accuracy and 91% precision for predicting RBPs and 93% accuracy and 78% precision for predicting RNA-binding amino-acid residues for a large benchmark of 212 RNA binding and 6761 non-RNA binding domains (leave-one-out cross-validation). Additional tests revealed that the method makes no false positive prediction from 311 DNA binding domains but correctly detects six domains binding with both DNA and RNA. In addition, it correctly identified 31 of 75 unbound RNA-binding domains with 92% accuracy and 65% precision for predicted binding residues and achieved 86% success rate in its application to SCOP RNA binding domain superfamily (Structural Classification Of Proteins). It further predicts 25 targets as RBPs in 2076 structural genomics targets: 20 of 25 predicted ones (80%) are putatively RNA binding. The superior performance over existing methods indicates the importance of dividing structures into domains, using a Z-score to measure relative structural similarity, and a statistical energy function to measure protein-RNA binding affinity.
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
115
|
Arenas AF, Osorio-Méndez JF, Gutierrez AJ, Gomez-Marin JE. Genome-wide survey and evolutionary analysis of trypsin proteases in apicomplexan parasites. GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 8:103-12. [PMID: 20691395 PMCID: PMC5054444 DOI: 10.1016/s1672-0229(10)60011-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Apicomplexa are an extremely diverse group of unicellular organisms that infect humans and other animals. Despite the great advances in combating infectious diseases over the past century, these parasites still have a tremendous social and economic burden on human societies, particularly in tropical and subtropical regions of the world. Proteases from apicomplexa have been characterized at the molecular and cellular levels, and central roles have been proposed for proteases in diverse processes. In this work, 16 new genes encoding for trypsin proteases are identified in 8 apicomplexan genomes by a genome-wide survey. Phylogenetic analysis suggests that these genes were gained through both intracellular gene transfer and vertical gene transfer. Identification, characterization and understanding of the evolutionary origin of protease-mediated processes are crucial to increase the knowledge and improve the strategies for the development of novel chemotherapeutic agents and vaccines.
Collapse
Affiliation(s)
- Aylan Farid Arenas
- Grupo de Parasitología Molecular (GEPAMOL), Centro de Investigaciones Biomédicas, Universidad del Quindío, Armenia, Colombia
| | | | | | | |
Collapse
|
116
|
|
117
|
Wong WC, Maurer-Stroh S, Eisenhaber F. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Comput Biol 2010; 6:e1000867. [PMID: 20686689 PMCID: PMC2912341 DOI: 10.1371/journal.pcbi.1000867] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2010] [Accepted: 06/25/2010] [Indexed: 12/16/2022] Open
Abstract
Large-scale genome sequencing gained general importance for life science because functional annotation of otherwise experimentally uncharacterized sequences is made possible by the theory of biomolecular sequence homology. Historically, the paradigm of similarity of protein sequences implying common structure, function and ancestry was generalized based on studies of globular domains. Having the same fold imposes strict conditions over the packing in the hydrophobic core requiring similarity of hydrophobic patterns. The implications of sequence similarity among non-globular protein segments have not been studied to the same extent; nevertheless, homology considerations are silently extended for them. This appears especially detrimental in the case of transmembrane helices (TMs) and signal peptides (SPs) where sequence similarity is necessarily a consequence of physical requirements rather than common ancestry. Thus, matching of SPs/TMs creates the illusion of matching hydrophobic cores. Therefore, inclusion of SPs/TMs into domain models can give rise to wrong annotations. More than 1001 domains among the 10,340 models of Pfam release 23 and 18 domains of SMART version 6 (out of 809) contain SP/TM regions. As expected, fragment-mode HMM searches generate promiscuous hits limited to solely the SP/TM part among clearly unrelated proteins. More worryingly, we show explicit examples that the scores of clearly false-positive hits, even in global-mode searches, can be elevated into the significance range just by matching the hydrophobic runs. In the PIR iProClass database v3.74 using conservative criteria, we find that at least between 2.1% and 13.6% of its annotated Pfam hits appear unjustified for a set of validated domain models. Thus, false-positive domain hits enforced by SP/TM regions can lead to dramatic annotation errors where the hit has nothing in common with the problematic domain model except the SP/TM region itself. We suggest a workflow of flagging problematic hits arising from SP/TM-containing models for critical reconsideration by annotation users. Sequence homology is a fundamental principle of biology. It implies common phylogenetic ancestry of genes and, subsequently, similarity of their protein products with regard to amino acid sequence, three-dimensional structure and molecular and cellular function. Originally an esoteric concept, homology with the proxy of sequence similarity is used to justify the transfer of functional annotation from well-studied protein examples to new sequences. Yet, functional annotation via sequence similarity seems to have hit a plateau in recent years since relentless annotation transfer led to error propagation across sequence databases; thus, leading experimental follow-up work astray. It must be emphasized that the trinity of sequence, 3D structural and functional similarity has only been proven for globular segments of proteins. For non-globular regions, similarity of sequence is not necessarily a result of divergent evolution from a common ancestor but the consequence of amino acid sequence bias. In our investigation, we found that protein domain databases contain many domain models with transmembrane regions and signal peptides, non-globular segments of proteins having hydrophobic bias. Many proteins have inherited completely wrong function assignments from these domain models. We fear that future function predictions will turn out futile if this issue is not immediately addressed.
Collapse
Affiliation(s)
- Wing-Cheong Wong
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore
- * E-mail: (WCW); (SMS); (FE)
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), Singapore
- * E-mail: (WCW); (SMS); (FE)
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), Singapore
- * E-mail: (WCW); (SMS); (FE)
| |
Collapse
|
118
|
Sharma P, Ignatchenko V, Grace K, Ursprung C, Kislinger T, Gramolini AO. Endoplasmic reticulum protein targeting of phospholamban: a common role for an N-terminal di-arginine motif in ER retention? PLoS One 2010; 5:e11496. [PMID: 20634894 PMCID: PMC2901339 DOI: 10.1371/journal.pone.0011496] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Accepted: 06/16/2010] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Phospholamban (PLN) is an effective inhibitor of the sarco(endo)plasmic reticulum Ca(2+)-ATPase, which transports Ca(2+) into the SR lumen, leading to muscle relaxation. A mutation of PLN in which one of the di-arginine residues at positions 13 and 14 was deleted led to a severe, early onset dilated cardiomyopathy. Here we were interested in determining the cellular mechanisms involved in this disease-causing mutation. METHODOLOGY/PRINCIPAL FINDING Mutations deleting codons for either or both Arg13 or Arg14 resulted in the mislocalization of PLN from the ER. Our data show that PLN is recycled via the retrograde Golgi to ER membrane traffic pathway involving COP-I vesicles, since co-immunoprecipitation assays determined that COP I interactions are dependent on an intact di-arginine motif as PLN RDelta14 did not co-precipitate with COP I containing vesicles. Bioinformatic analysis determined that the di-arginine motif is present in the first 25 residues in a large number of all ER/SR Gene Ontology (GO) annotated proteins. Mutations in the di-arginine motif of the Sigma 1-type opioid receptor, the beta-subunit of the signal recognition particle receptor, and Sterol-O-acyltransferase, three proteins identified in our bioinformatic screen also caused mislocalization of these known ER-resident proteins. CONCLUSION We conclude that PLN is enriched in the ER due to COP I-mediated transport that is dependent on its intact di-arginine motif and that the N-terminal di-arginine motif may act as a general ER retrieval sequence.
Collapse
Affiliation(s)
- Parveen Sharma
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | | | - Kevin Grace
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Claudia Ursprung
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
| | - Thomas Kislinger
- Ontario Cancer Institute, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Anthony O. Gramolini
- Department of Physiology, University of Toronto, Toronto, Ontario, Canada
- Division of Cellular and Molecular Biology, Toronto General Research Institute, Toronto, Ontario, Canada
| |
Collapse
|
119
|
Deeds EJ, Shakhnovich EI. A structure-centric view of protein evolution, design, and adaptation. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2010; 75:133-91, xi-xii. [PMID: 17124867 DOI: 10.1002/9780471224464.ch2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent in the indispensability of proteins for living cells is the fact that a given protein can adopt a specific three-dimensional shape that is specified solely by the protein's sequence of amino acids. Over the past several decades, structural biologists have demonstrated that the array of structures that proteins may adopt is quite astounding, and this has lead to a strong interest in understanding how protein structures change and evolve over time. In this review we consider a large body of recent work that attempts to illuminate this structure-centric picture of protein evolution. Much of this work has focused on the question of how completely new protein structures (i.e., new folds or topologies) are discovered by protein sequences as they evolve. Pursuant to this question of structural innovation has been a desire to describe and understand the observation that certain types of protein structures are far more abundant than others and how this uneven distribution of proteins implicates on the process through which new shapes are discovered. We consider a number of theoretical models that have been successful at explaining this heterogeneity in protein populations and discuss the increasing amount of evidence that indicates that the process of structural evolution involves the divergence of protein sequences and structures from one another. We also consider the topic of protein designability, which concerns itself with understanding how a protein's structure influences the number of sequences that can fold successfully into that structure. Understanding and quantifying the relationship between the physical feature of a structure and its designability has been a long-standing goal of the study of protein structure and evolution, and we discuss a number of recent advances that have yielded a promising answer to this question. Finally, we review the relatively new field of protein structural phylogeny, an area of study in which information about the distribution of protein structures among different organisms is used to reconstruct the evolutionary relationships between them. Taken together, the work that we review presents an increasingly coherent picture of how these unique polymers have evolved over the course of life on Earth.
Collapse
Affiliation(s)
- Eric J Deeds
- Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA
| | | |
Collapse
|
120
|
The evolutionary history of protein domains viewed by species phylogeny. PLoS One 2009; 4:e8378. [PMID: 20041107 PMCID: PMC2794708 DOI: 10.1371/journal.pone.0008378] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022] Open
Abstract
Background Protein structural domains are evolutionary units whose relationships can be detected over long evolutionary distances. The evolutionary history of protein domains, including the origin of protein domains, the identification of domain loss, transfer, duplication and combination with other domains to form new proteins, and the formation of the entire protein domain repertoire, are of great interest. Methodology/Principal Findings A methodology is presented for providing a parsimonious domain history based on gain, loss, vertical and horizontal transfer derived from the complete genomic domain assignments of 1015 organisms across the tree of life. When mapped to species trees the evolutionary history of domains and domain combinations is revealed, and the general evolutionary trend of domain and combination is analyzed. Conclusions/Significance We show that this approach provides a powerful tool to study how new proteins and functions emerged and to study such processes as horizontal gene transfer among more distant species.
Collapse
|
121
|
Schmidt-Goenner T, Guerler A, Kolbeck B, Knapp EW. Circular permuted proteins in the universe of protein folds. Proteins 2009; 78:1618-30. [DOI: 10.1002/prot.22678] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
122
|
Abstract
Background The general method used to determine the function of newly discovered proteins is to transfer annotations from well-characterized homologous proteins. The process of selecting homologous proteins can largely be classified into sequence-based and domain-based approaches. Domain-based methods have several advantages for identifying distant homology and homology among proteins with multiple domains, as compared to sequence-based methods. However, these methods are challenged by large families defined by 'promiscuous' (or 'mobile') domains. Results Here we present a measure, called Weighed Domain Architecture Comparison (WDAC), of domain architecture similarity, which can be used to identify homolog of multidomain proteins. To distinguish these promiscuous domains from conventional protein domains, we assigned a weight score to Pfam domain extracted from RefSeq proteins, based on its abundance and versatility. To measure the similarity of two domain architectures, cosine similarity (a similarity measure used in information retrieval) is used. We combined sequence similarity with domain architecture comparisons to identify proteins belonging to the same domain architecture. Using human and nematode proteomes, we compared WDAC with an unweighted domain architecture method (DAC) to evaluate the effectiveness of domain weight scores. We found that WDAC is better at identifying homology among multidomain proteins. Conclusion Our analysis indicates that considering domain weight scores in domain architecture comparisons improves protein homology identification. We developed a web-based server to allow users to compare their proteins with protein domain architectures.
Collapse
Affiliation(s)
- Byungwook Lee
- Korean BioInformation Center, KRIBB, Daejeon 305-806, Korea.
| | | |
Collapse
|
123
|
Jin J, Xie X, Chen C, Park JG, Stark C, James DA, Olhovsky M, Linding R, Mao Y, Pawson T. Eukaryotic protein domains as functional units of cellular evolution. Sci Signal 2009; 2:ra76. [PMID: 19934434 DOI: 10.1126/scisignal.2000546] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Modular protein domains are functional units that can be modified through the acquisition of new intrinsic activities or by the formation of novel domain combinations, thereby contributing to the evolution of proteins with new biological properties. Here, we assign proteins to groups with related domain compositions and functional properties, termed "domain clubs," which we use to compare multiple eukaryotic proteomes. This analysis shows that different domain types can take distinct evolutionary trajectories, which correlate with the conservation, gain, expansion, or decay of particular biological processes. Evolutionary jumps are associated with a domain that coordinately acquires a new intrinsic function and enters new domain clubs, thereby providing the modified domain with access to a new cellular microenvironment. We also coordinately analyzed the covalent and noncovalent interactions of different domain types to assess the molecular compartment occupied by each domain. This reveals that specific subsets of domains demarcate particular cellular processes, such as growth factor signaling, chromatin remodeling, apoptotic and inflammatory responses, or vesicular trafficking. We suggest that domains, and the proteins in which they reside, are selected during evolution through reciprocal interactions with protein domains in their local microenvironment. Based on this scheme, we propose a mechanism by which Tudor domains may have evolved to support different modes of epigenetic regulation and suggest a role for the germline group of mammalian Tudor domains in Piwi-regulated RNA biology.
Collapse
Affiliation(s)
- Jing Jin
- Centre for Systems Biology, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Ontario, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
124
|
Peng Q, Li H. Direct Observation of Tug-of-War during the Folding of a Mutually Exclusive Protein. J Am Chem Soc 2009; 131:13347-54. [DOI: 10.1021/ja903480j] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Qing Peng
- Department of Chemistry, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
| | - Hongbin Li
- Department of Chemistry, The University of British Columbia, Vancouver, BC V6T 1Z1, Canada
| |
Collapse
|
125
|
Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, Lee D, Fiser A, Godzik A, Rost B, Orengo C. PSI-2: structural genomics to cover protein domain family space. Structure 2009; 17:869-81. [PMID: 19523904 PMCID: PMC2920419 DOI: 10.1016/j.str.2009.03.015] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2008] [Revised: 03/18/2009] [Accepted: 03/22/2009] [Indexed: 11/25/2022]
Abstract
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
Collapse
Affiliation(s)
- Benoît H Dessailly
- Department of Structural and Molecular Biology, University College of London, London WC1E6BT, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
126
|
Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009; 91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]
Abstract
Three-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire (EBGM), DSIMB, Université Paris Diderot - Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France
| | | | | |
Collapse
|
127
|
Moreira IS, Fernandes PA, Ramos MJ. Protein-protein docking dealing with the unknown. J Comput Chem 2009; 31:317-42. [DOI: 10.1002/jcc.21276] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
128
|
Aravind P, Suman SK, Mishra A, Sharma Y, Sankaranarayanan R. Three-dimensional domain swapping in nitrollin, a single-domain betagamma-crystallin from Nitrosospira multiformis, controls protein conformation and stability but not dimerization. J Mol Biol 2008; 385:163-77. [PMID: 18976659 DOI: 10.1016/j.jmb.2008.10.035] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Revised: 10/08/2008] [Accepted: 10/09/2008] [Indexed: 11/24/2022]
Abstract
The betagamma-crystallin superfamily has a well-characterized protein fold, with several members found in both prokaryotic and eukaryotic worlds. A majority of them contain two betagamma-crystallin domains. A few examples, such as ciona crystallin and spherulin 3a exist that represent the eukaryotic single-domain proteins of this superfamily. This study reports the high-resolution crystal structure of a single-domain betagamma-crystallin protein, nitrollin, from the ammonium-oxidizing soil bacterium Nitrosospira multiformis. The structure retains the characteristic betagamma-crystallin fold despite a very low sequence identity. The protein exhibits a unique case of homodimerization in betagamma-crystallins by employing its N-terminal extension to undergo three-dimensional (3D) domain swapping with its partner. Removal of the swapped strand results in partial loss of structure and stability but not dimerization per se as determined using gel filtration and equilibrium unfolding studies. Overall, nitrollin represents a distinct single-domain prokaryotic member that has evolved a specialized mode of dimerization hitherto unknown in the realm of betagamma-crystallins.
Collapse
Affiliation(s)
- Penmatsa Aravind
- Center for Cellular and Molecular Biology, Council of Scientific and Industrial Research, Hyderabad, India
| | | | | | | | | |
Collapse
|
129
|
Phylogenetic profiles reveal evolutionary relationships within the "twilight zone" of sequence similarity. Proc Natl Acad Sci U S A 2008; 105:13474-9. [PMID: 18765810 DOI: 10.1073/pnas.0803860105] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Inferring evolutionary relationships among highly divergent protein sequences is a daunting task. In particular, when pairwise sequence alignments between protein sequences fall <25% identity, the phylogenetic relationships among sequences cannot be estimated with statistical certainty. Here, we show that phylogenetic profiles generated with the Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) are capable of deriving, ab initio, phylogenetic relationships for highly divergent proteins in a quantifiable and robust manner. Notably, the results from our computational case study of the highly divergent family of retroelements accord with previous estimates of their evolutionary relationships. Taken together, these data demonstrate that GDDA-BLAST provides an independent and powerful measure of evolutionary relationships that does not rely on potentially subjective sequence alignment. We demonstrate that evolutionary relationships can be measured with phylogenetic profiles, and therefore propose that these measurements can provide key insights into relationships among distantly related and/or rapidly evolving proteins.
Collapse
|
130
|
Strickland D, Moffat K, Sosnick TR. Light-activated DNA binding in a designed allosteric protein. Proc Natl Acad Sci U S A 2008; 105:10709-14. [PMID: 18667691 PMCID: PMC2504796 DOI: 10.1073/pnas.0709610105] [Citation(s) in RCA: 241] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Indexed: 11/18/2022] Open
Abstract
An understanding of how allostery, the conformational coupling of distant functional sites, arises in highly evolvable systems is of considerable interest in areas ranging from cell biology to protein design and signaling networks. We reasoned that the rigidity and defined geometry of an alpha-helical domain linker would make it effective as a conduit for allosteric signals. To test this idea, we rationally designed 12 fusions between the naturally photoactive LOV2 domain from Avena sativa phototropin 1 and the Escherichia coli trp repressor. When illuminated, one of the fusions selectively binds operator DNA and protects it from nuclease digestion. The ready success of our rational design strategy suggests that the helical "allosteric lever arm" is a general scheme for coupling the function of two proteins.
Collapse
Affiliation(s)
- Devin Strickland
- Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, University of Chicago, 929 East 57th Street, Chicago, IL 60637
| | - Keith Moffat
- Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, University of Chicago, 929 East 57th Street, Chicago, IL 60637
| | - Tobin R. Sosnick
- Department of Biochemistry and Molecular Biology and Institute for Biophysical Dynamics, University of Chicago, 929 East 57th Street, Chicago, IL 60637
| |
Collapse
|
131
|
Moore AD, Björklund AK, Ekman D, Bornberg-Bauer E, Elofsson A. Arrangements in the modular evolution of proteins. Trends Biochem Sci 2008; 33:444-51. [PMID: 18656364 DOI: 10.1016/j.tibs.2008.05.008] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2007] [Revised: 05/28/2008] [Accepted: 05/28/2008] [Indexed: 11/17/2022]
Abstract
It has been known for the last couple of decades that proteins evolve partly through rearrangements of larger fragments, typically domains. These units are considered the basic modules of protein structure, evolution and function. In the last few years, the analysis of protein-domain rearrangements has provided us with functional and evolutionary insights and has aided improved functional predictions and domain assignments to previously uncharacterised genes and proteins. Although some mechanisms that govern modular rearrangements of protein domains have been uncovered, such as the addition or deletion of a single N- or C-terminal domain, much is still unknown about the genetics behind these arrangements.
Collapse
Affiliation(s)
- Andrew D Moore
- Evolutionary Bioinformatics, IEB, University of Münster, Hüfferstrasse 1, Münster, Germany
| | | | | | | | | |
Collapse
|
132
|
Mahdavi MA, Lin YH. Prediction of protein-protein interactions using protein signature profiling. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:177-86. [PMID: 18267299 PMCID: PMC5963007 DOI: 10.1016/s1672-0229(08)60005-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein domains are conserved and functionally independent structures that play an important role in interactions among related proteins. Domain-domain interactions have been recently used to predict protein-protein interactions (PPI). In general, the interaction probability of a pair of domains is scored using a trained scoring function. Satisfying a threshold, the protein pairs carrying those domains are regarded as “interacting”. In this study, the signature contents of proteins were utilized to predict PPI pairs in Saccharomyces cerevisiae, Caenorhabditis elegans, and Homo sapiens. Similarity between protein signature patterns was scored and PPI predictions were drawn based on the binary similarity scoring function. Results show that the true positive rate of prediction by the proposed approach is approximately 32% higher than that using the maximum likelihood estimation method when compared with a test set, resulting in 22% increase in the area under the receiver operating characteristic (ROC) curve. When proteins containing one or two signatures were removed, the sensitivity of the predicted PPI pairs increased significantly. The predicted PPI pairs are on average 11 times more likely to interact than the random selection at a confidence level of 0.95, and on average 4 times better than those predicted by either phylogenetic profiling or gene expression profiling.
Collapse
Affiliation(s)
- Mahmood A Mahdavi
- Department of Chemical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | | |
Collapse
|
133
|
Berrondo M, Ostermeier M, Gray JJ. Structure prediction of domain insertion proteins from structures of individual domains. Structure 2008; 16:513-27. [PMID: 18400174 DOI: 10.1016/j.str.2008.01.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Revised: 12/17/2007] [Accepted: 01/13/2008] [Indexed: 11/28/2022]
Abstract
Multidomain proteins continue to be a major challenge in protein structure prediction. Here we present a Monte Carlo (MC) algorithm, implemented within Rosetta, to predict the structure of proteins in which one domain is inserted into another. Three MC moves combine rigid-body and loop movements to search the constrained conformation by structure disruption and subsequent repair of chain breaks. Local searches find that the algorithm samples and recovers near-native structures consistently. Further global searches produced top-ranked structures within 5 A in 31 of 50 cases in low-resolution mode, and refinement of top-ranked low-resolution structures produced models within 2 A in 21 of 50 cases. Rigid-body orientations were often correctly recovered despite errors in linker conformation. The algorithm is broadly applicable to de novo structure prediction of both naturally occurring and engineered domain insertion proteins.
Collapse
Affiliation(s)
- Monica Berrondo
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | | | | |
Collapse
|
134
|
Ye L, Liu T, Wu Z, Zhou R. Sequence-based protein domain boundary prediction using BP neural network with various property profiles. Proteins 2008; 71:300-7. [PMID: 17932915 DOI: 10.1002/prot.21745] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Given the rapid growth in the number of sequences without known structures, it is becoming increasingly important to not only accurately define protein structural domains but also predict domain boundaries from the amino-acid sequence alone. In this article, we present a Back-Propagation (BP) neural network method using 9 different sequence profiles, based on chemical, physical, and statistical properties, to predict the domain boundary of two-domain proteins from one dimensional sequences. We have achieved an accuracy of 69% with a 10-fold cross validation on a 238 nonredundant two-domain protein dataset that we built based on a common set from both SCOP and CATH classifications. The method has also been applied to a larger third-party dataset with 522 proteins; and an accuracy of 62% has been achieved. Our prediction results on both datasets are found to be significantly better than those from some other methods, such as DomCut and DGS on the same datasets, and also comparable to that from the PPRODO method, upon which the larger dataset was based. Our cross validation results are also noticeably better than previous ones from other BP neural network methods, probably because we have used more property descriptors with significantly more training nodes in our neural network. The integration with PPRODO method also indicates that the information obtained from our current approach is complementary to that available through multiple sequence alignments. Moreover, the relative importance of each property profile has been analyzed in detail.
Collapse
Affiliation(s)
- Lei Ye
- Department of Computer Science, Zhejiang University, Hangzhou, China
| | | | | | | |
Collapse
|
135
|
Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008; 90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
|
136
|
Haase A, Nordmann C, Sedehizade F, Borrmann C, Reiser G. RanBPM, a novel interaction partner of the brain-specific protein p42IP4/centaurin α-1. J Neurochem 2008; 105:2237-48. [DOI: 10.1111/j.1471-4159.2008.05308.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
137
|
Minary P, Levitt M. Probing protein fold space with a simplified model. J Mol Biol 2008; 375:920-33. [PMID: 18054792 PMCID: PMC2254652 DOI: 10.1016/j.jmb.2007.10.087] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 10/15/2007] [Accepted: 10/31/2007] [Indexed: 11/24/2022]
Abstract
We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all alpha: 4.77 A, all beta: 2.93 A, alpha/beta: 3.09 A, alpha+beta: 4.89 A on average and within 6 A for 71.41%, 92.85%, 94.29% and 64.28% for all-alpha, all-beta, alpha/beta and alpha+beta, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of alpha and beta folds. We find that alpha/beta proteins with alternating alpha and beta segments (such as the beta-barrel) are more stable than proteins in other fold classes.
Collapse
Affiliation(s)
- Peter Minary
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | | |
Collapse
|
138
|
Ferguson BJ, Alexander C, Rossi SW, Liiv I, Rebane A, Worth CL, Wong J, Laan M, Peterson P, Jenkinson EJ, Anderson G, Scott HS, Cooke A, Rich T. AIRE's CARD revealed, a new structure for central tolerance provokes transcriptional plasticity. J Biol Chem 2008; 283:1723-1731. [PMID: 17974569 DOI: 10.1074/jbc.m707211200] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2023] Open
Abstract
Developing T cells encounter peripheral self-antigens in the thymus in order to delete autoreactive clones. It is now known that the autoimmune regulator protein (AIRE), which is expressed in thymic medullary epithelial cells, plays a key role in regulating the thymic transcription of these peripheral tissue-specific antigens. Mutations in the AIRE gene are associated with a severe multiorgan autoimmune syndrome (APECED), and autoimmune reactivities are manifest in AIRE-deficient mice. Functional AIRE protein is expressed as distinct nuclear puncta, although no structural basis existed to explain their relevance to disease. In addressing the cell biologic basis for APECED, we made the unexpected discovery that an AIRE mutation hot spot lies in a caspase recruitment domain. Combined homology modeling and in vitro data now show how APECED mutations influence the activity of this transcriptional regulator. We also provide novel in vivo evidence for AIRE's association with a global transcription cofactor, which may underlie AIRE's focal, genome-wide, alteration of the transcriptome.
Collapse
Affiliation(s)
- Brian J Ferguson
- Department of Pathology, Divisions of Immunology and Cellular Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom
| | - Clare Alexander
- Department of Pathology, Divisions of Immunology and Cellular Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom
| | - Simona W Rossi
- Medical Research Council Centre for Immune Regulation, Institute for Biomedical Research, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Ingrid Liiv
- Molecular Pathology, University of Tartu, Biomedicum, 50411 Tartu, Estonia
| | - Ana Rebane
- Molecular Pathology, University of Tartu, Biomedicum, 50411 Tartu, Estonia
| | - Catherine L Worth
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom
| | - Joyce Wong
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom
| | - Martti Laan
- Molecular Pathology, University of Tartu, Biomedicum, 50411 Tartu, Estonia
| | - Pärt Peterson
- Molecular Pathology, University of Tartu, Biomedicum, 50411 Tartu, Estonia
| | - Eric J Jenkinson
- Medical Research Council Centre for Immune Regulation, Institute for Biomedical Research, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Graham Anderson
- Medical Research Council Centre for Immune Regulation, Institute for Biomedical Research, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Hamish S Scott
- Walter and Eliza Hall Institute of Medical Research, 3050 Melbourne, Australia
| | - Anne Cooke
- Department of Pathology, Divisions of Immunology and Cellular Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom
| | - Tina Rich
- Department of Pathology, Divisions of Immunology and Cellular Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, United Kingdom.
| |
Collapse
|
139
|
Franzosa E, Xia Y. Structural Perspectives on Protein Evolution. ANNUAL REPORTS IN COMPUTATIONAL CHEMISTRY 2008. [DOI: 10.1016/s1574-1400(08)00001-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
140
|
Abstract
Modern molecular biology approaches often result in the accumulation of abundant biological sequence data. Ideally, the function of individual proteins predicted using such data would be determined experimentally. However, if a gene of interest has no predictable function or if the amount of data is too large to experimentally assess individual genes, bioinformatics techniques may provide additional information to allow the inference of function. This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function. The basis and use of sequence similarity methods of homologue detection are described, with emphasis on BLAST and PSI-BLAST. Annotation of gene function through protein domain detection using SMART and Pfam, and the potential for comparison to whole genome data are discussed.
Collapse
|
141
|
Yesylevskyy SO, Kharkyanen VN, Demchenko AP. The blind search for the closed states of hinge-bending proteins. Proteins 2007; 71:831-43. [DOI: 10.1002/prot.21743] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
142
|
Sonderegger P, Patthy L. Comment on "Tequila, a neurotrypsin ortholog, regulates long-term memory formation in Drosophila". Science 2007. [PMID: 17588915 DOI: 10.1126/science.1138410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Didelot et al. (Reports, 11 August 2006, p. 851) claimed that Drosophila Tequila (Teq) and human neurotrypsin are orthologs and concluded that deficient long-term memory after Teq inactivation indicates that neurotrypsin plays its essential role for human cognitive functions through a similar mechanism. Our analyses suggest that Teq and neurotrypsin are not orthologous, leading us to question their equivalent roles in higher brain function.
Collapse
Affiliation(s)
- Peter Sonderegger
- Department of Biochemistry, University of Zurich, CH-8057 Zürich, Switzerland.
| | | |
Collapse
|
143
|
Lerman G, Shakhnovich BE. Defining functional distance using manifold embeddings of gene ontology annotations. Proc Natl Acad Sci U S A 2007; 104:11334-9. [PMID: 17595300 PMCID: PMC2040899 DOI: 10.1073/pnas.0702965104] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure-function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules.
Collapse
Affiliation(s)
- Gilad Lerman
- *Department of Mathematics, University of Minnesota, Minneapolis, MN 55455; and
- To whom correspondence may be addressed. E-mail: or
| | - Boris E. Shakhnovich
- Program in Bioinformatics, Boston University, Boston, MA 02215
- To whom correspondence may be addressed. E-mail: or
| |
Collapse
|
144
|
Zhou H, Xue B, Zhou Y. DDOMAIN: Dividing structures into domains using a normalized domain-domain interaction profile. Protein Sci 2007; 16:947-55. [PMID: 17456745 PMCID: PMC2206635 DOI: 10.1110/ps.062597307] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | |
Collapse
|
145
|
Preat T, Da Lage JL, Colleaux L, Didelot G, Molinari F, Tchenio P, Milhiet E, Munnich A, Cariou ML. Response to Comment on "Tequila, a Neurotrypsin Ortholog, Regulates Long-Term Memory Formation in
Drosophila
". Science 2007. [DOI: 10.1126/science.1138579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Sonderegger and Patthy argue that the trypsin catalytic domains of
Drosophila
Tequila and human neurotrypsin are not linked by an orthology relationship. We present analyses based both on BLAST (basic local alignment search tool) comparisons and on phylogenetic relationships, which show that these two proteases do share an orthologous region that includes the trypsin domain.
Collapse
Affiliation(s)
- Thomas Preat
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Jean-Luc Da Lage
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Laurence Colleaux
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Gérard Didelot
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Florence Molinari
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Paul Tchenio
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Elodie Milhiet
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Arnold Munnich
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| | - Marie-Louise Cariou
- Gènes et Dynamique des Systèmes de Mémoire, UMR CNRS 7637, Ecole Supérieure de Physique et de Chimie Industrielles, 10 rue Vauquelin 75005 Paris, France
- Evolution, Génomes et Spéciation, UPR CNRS 9034, avenue de la Terrasse 91198 Gif-sur-Yvette Cedex, France
- Département de Génétique et Unité de Recherche sur les Handicaps Génétiques de l'Enfant, INSERM U781, Hôpital Necker-Enfants Malades, 149 Rue de Sèvres, 75743 Paris Cedex 15, France
- Laboratoire Aimé Cotton, UPR CNRS 3321, Bâtiment 505, 91405 Orsay Cedex, France
| |
Collapse
|
146
|
Mitrophanov AY, Borodovsky M. Convergence rate estimation for the TKF91 model of biological sequence length evolution. Math Biosci 2007; 209:470-85. [PMID: 17448505 DOI: 10.1016/j.mbs.2007.02.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Revised: 02/17/2007] [Accepted: 02/23/2007] [Indexed: 10/23/2022]
Abstract
The TKF91 model of biological sequence evolution describes changes in the sequence length via an infinite state-space birth-death process, which we term the TKF91-BD process. The TKF91 model assumes that, for any pair of modern sequences, the ancestral sequence has equilibrium length distribution, an assumption whose validity has not been rigorously investigated. We obtain explicit upper and lower bounds on the rate of convergence to equilibrium for the distribution of the TKF91-BD process. We show that the rate of convergence of the TKF91-BD process for protein sequences with parameter values inferred from sequence data on alpha and beta globins is too low to guarantee convergence to equilibrium on a reasonable timescale. For the analyzed nucleotide sequences, the convergence is faster, but the equilibrium sequence length is unrealistically small. The Jukes-Cantor model of nucleotide substitutions can converge considerably faster than the length evolution model for both amino acid and nucleotide sequences, while the speed of convergence for the Kimura model is close to that for the TKF91-BD process describing nucleotide sequences.
Collapse
|
147
|
Anashkina A, Kuznetsov E, Esipova N, Tumanyan V. Comprehensive statistical analysis of residues interaction specificity at protein-protein interfaces. Proteins 2007; 67:1060-77. [PMID: 17357164 DOI: 10.1002/prot.21363] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We calculated interchain contacts on the atomic level for nonredundant set of 4602 protein-protein interfaces using an unbiased Voronoi-Delaune tessellation method, and made 20x20 residue contact matrixes both for homodimers and heterocomplexes. The area of contacts and the distance distribution for these contacts were calculated on both the residue and the atomic levels. We analyzed residue area distribution and showed the existence of two types of interresidue contacts: stochastic and specific. We also derived formulas describing the distribution of contact area for stochastic and specific interactions in parametric form. Maximum pairing preference index was found for Cys-Cys contacts and for oppositely charged interactions. A significant difference in residue contacts was observed between homodimers and heterocomplexes. Interfaces in homodimers were enriched with contacts between residues of the same type due to the effects of structure symmetry.
Collapse
Affiliation(s)
- Anastasya Anashkina
- Laboratory of bioinformatics and system biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | | | | | | |
Collapse
|
148
|
Suárez-Castillo EC, García-Arrarás JE. Molecular evolution of the ependymin protein family: a necessary update. BMC Evol Biol 2007; 7:23. [PMID: 17302986 PMCID: PMC1805737 DOI: 10.1186/1471-2148-7-23] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 02/15/2007] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Ependymin (Epd), the predominant protein in the cerebrospinal fluid of teleost fishes, was originally associated with neuroplasticity and regeneration. Ependymin-related proteins (Epdrs) have been identified in other vertebrates, including amphibians and mammals. Recently, we reported the identification and characterization of an Epdr in echinoderms, showing that there are ependymin family members in non-vertebrate deuterostomes. We have now explored multiple databases to find Epdrs in different metazoan species. Using these sequences we have performed genome mapping, molecular phylogenetic analyses using Maximum Likelihood and Bayesian methods, and statistical tests of tree topologies, to ascertain the phylogenetic relationship among ependymin proteins. RESULTS Our results demonstrate that ependymin genes are also present in protostomes. In addition, as a result of the putative fish-specific genome duplication event and posterior divergence, the ependymin family can be divided into four groups according to their amino acid composition and branching pattern in the gene tree: 1) a brain-specific group of ependymin sequences that is unique to teleost fishes and encompasses the originally described ependymin; 2) a group expressed in non-brain tissue in fishes; 3) a group expressed in several tissues that appears to be deuterostome-specific, and 4) a group found in invertebrate deuterostomes and protostomes, with a broad pattern of expression and that probably represents the evolutionary origin of the ependymins. Using codon-substitution models to statistically assess the selective pressures acting over the ependymin protein family, we found evidence of episodic positive Darwinian selection and relaxed selective constraints in each one of the postduplication branches of the gene tree. However, purifying selection (with among-site variability) appears to be the main influence on the evolution of each subgroup within the family. Functional divergence among the ependymin paralog groups is well supported and several amino acid positions are predicted to be critical for this divergence. CONCLUSION Ependymin proteins are present in vertebrates, invertebrate deuterostomes, and protostomes. Overall, our analyses suggest that the ependymin protein family is a suitable target to experimentally test subfunctionalization in gene copies that originated after gene or genome duplication events.
Collapse
Affiliation(s)
- Edna C Suárez-Castillo
- Department of Biology, University of Puerto Rico, Río Piedras Campus, 00931, Puerto Rico
| | - José E García-Arrarás
- Department of Biology, University of Puerto Rico, Río Piedras Campus, 00931, Puerto Rico
| |
Collapse
|
149
|
te Velthuis AJ, Isogai T, Gerrits L, Bagowski CP. Insights into the molecular evolution of the PDZ/LIM family and identification of a novel conserved protein motif. PLoS One 2007; 2:e189. [PMID: 17285143 PMCID: PMC1781342 DOI: 10.1371/journal.pone.0000189] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2006] [Accepted: 01/11/2007] [Indexed: 01/01/2023] Open
Abstract
The PDZ and LIM domain-containing protein family is encoded by a diverse group of genes whose phylogeny has currently not been analyzed. In mammals, ten genes are found that encode both a PDZ- and one or several LIM-domains. These genes are: ALP, RIL, Elfin (CLP36), Mystique, Enigma (LMP-1), Enigma homologue (ENH), ZASP (Cypher, Oracle), LMO7 and the two LIM domain kinases (LIMK1 and LIMK2). As conventional alignment and phylogenetic procedures of full-length sequences fell short of elucidating the evolutionary history of these genes, we started to analyze the PDZ and LIM domain sequences themselves. Using information from most sequenced eukaryotic lineages, our phylogenetic analysis is based on full-length cDNA-, EST-derived- and genomic- PDZ and LIM domain sequences of over 25 species, ranging from yeast to humans. Plant and protozoan homologs were not found. Our phylogenetic analysis identifies a number of domain duplication and rearrangement events, and shows a single convergent event during evolution of the PDZ/LIM family. Further, we describe the separation of the ALP and Enigma subfamilies in lower vertebrates and identify a novel consensus motif, which we call ‘ALP-like motif’ (AM). This motif is highly-conserved between ALP subfamily proteins of diverse organisms. We used here a combinatorial approach to define the relation of the PDZ and LIM domain encoding genes and to reconstruct their phylogeny. This analysis allowed us to classify the PDZ/LIM family and to suggest a meaningful model for the molecular evolution of the diverse gene architectures found in this multi-domain family.
Collapse
Affiliation(s)
- Aartjan J.W. te Velthuis
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Tadamoto Isogai
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Lieke Gerrits
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Christoph P. Bagowski
- Department of Integrative Zoology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Department of Molecular and Cellular Biology, Institute of Biology, Leiden University, Leiden, The Netherlands
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
150
|
Dulin F, Callebaut I, Colloc'h N, Mornon JP. Sequence-based modeling of Aβ42 soluble oligomers. Biopolymers 2007; 85:422-37. [PMID: 17211889 DOI: 10.1002/bip.20675] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Abeta fibrils, which are central to the pathology of Alzheimer's disease, form a cross-beta-structure that contains likely parallel beta-sheets with a salt bridge between residues Asp23 and Lys28. Recent studies suggest that soluble oligomers of amyloid peptides have neurotoxic effects in cell cultures, raising the interest in studying the structures of these intermediate forms. Here, we present three models of possible soluble Abeta forms based on the sequences similarities, assumed to support local structural similarities, of the Abeta peptide with fragments of three proteins (adhesin, Semliki Forest virus capsid protein, and transthyretin). These three models share a similar structure in the C-terminal region composed of two beta-strands connected by a loop, which contain the Asp23-Lys28 salt bridge. This segment is also structurally well conserved in Abeta fibril forms. Differences between the three monomeric models occur in the N-terminal region and in the C-terminal tail. These three models might sample some of the most stable conformers of the soluble Abeta peptide within oligomeric assemblies, which were modeled here in the form of dimers, trimers, tetramers, and hexamers. The consistency of these models is discussed with respect to available experimental and theoretical data.
Collapse
Affiliation(s)
- Fabienne Dulin
- Département de Biologie Structurale, IMPMC, CNRS UMR7590, Universités Pierre et Marie Curie-Paris 6 et Denis Diderot-Paris 7, F-75005 France
| | | | | | | |
Collapse
|