1
|
Liu CY, Cheng HP, Lin CP, Liao YT, Ko TP, Lin SJ, Lin SS, Wang HC. Structural insights into the molecular mechanism of phytoplasma immunodominant membrane protein. IUCRJ 2024; 11:384-394. [PMID: 38656311 PMCID: PMC11067747 DOI: 10.1107/s2052252524003075] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 04/09/2024] [Indexed: 04/26/2024]
Abstract
Immunodominant membrane protein (IMP) is a prevalent membrane protein in phytoplasma and has been confirmed to be an F-actin-binding protein. However, the intricate molecular mechanisms that govern the function of IMP require further elucidation. In this study, the X-ray crystallographic structure of IMP was determined and insights into its interaction with plant actin are provided. A comparative analysis with other proteins demonstrates that IMP shares structural homology with talin rod domain-containing protein 1 (TLNRD1), which also functions as an F-actin-binding protein. Subsequent molecular-docking studies of IMP and F-actin reveal that they possess complementary surfaces, suggesting a stable interaction. The low potential energy and high confidence score of the IMP-F-actin binding model indicate stable binding. Additionally, by employing immunoprecipitation and mass spectrometry, it was discovered that IMP serves as an interaction partner for the phytoplasmal effector causing phyllody 1 (PHYL1). It was then shown that both IMP and PHYL1 are highly expressed in the S2 stage of peanut witches' broom phytoplasma-infected Catharanthus roseus. The association between IMP and PHYL1 is substantiated through in vivo immunoprecipitation, an in vitro cross-linking assay and molecular-docking analysis. Collectively, these findings expand the current understanding of IMP interactions and enhance the comprehension of the interaction of IMP with plant F-actin. They also unveil a novel interaction pathway that may influence phytoplasma pathogenicity and host plant responses related to PHYL1. This discovery could pave the way for the development of new strategies to overcome phytoplasma-related plant diseases.
Collapse
Affiliation(s)
- Chang-Yi Liu
- The PhD Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Translational Medicine, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Han-Pin Cheng
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
- Department of Plant Pathology and Microbiology, National Taiwan University, Taipei, Taiwan
| | - Chan-Pin Lin
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
- Department of Plant Pathology and Microbiology, National Taiwan University, Taipei, Taiwan
| | - Yi-Ting Liao
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Tzu-Ping Ko
- Institute of Biological Chemistry, Academia Sinica, Taipei, Taiwan
| | - Shin-Jen Lin
- International Center for the Scientific Development of Shrimp Aquaculture, National Cheng Kung University, Tainan, Taiwan
| | - Shih-Shun Lin
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan
- Center of Biotechnology, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Hao-Ching Wang
- The PhD Program for Translational Medicine, College of Medical Science and Technology, Taipei Medical University and Academia Sinica, Taipei, Taiwan
- Graduate Institute of Translational Medicine, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
2
|
Svedberg D, Winiger RR, Berg A, Sharma H, Tellgren-Roth C, Debrunner-Vossbrinck BA, Vossbrinck CR, Barandun J. Functional annotation of a divergent genome using sequence and structure-based similarity. BMC Genomics 2024; 25:6. [PMID: 38166563 PMCID: PMC10759460 DOI: 10.1186/s12864-023-09924-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Accepted: 12/18/2023] [Indexed: 01/04/2024] Open
Abstract
BACKGROUND Microsporidia are a large taxon of intracellular pathogens characterized by extraordinarily streamlined genomes with unusually high sequence divergence and many species-specific adaptations. These unique factors pose challenges for traditional genome annotation methods based on sequence similarity. As a result, many of the microsporidian genomes sequenced to date contain numerous genes of unknown function. Recent innovations in rapid and accurate structure prediction and comparison, together with the growing amount of data in structural databases, provide new opportunities to assist in the functional annotation of newly sequenced genomes. RESULTS In this study, we established a workflow that combines sequence and structure-based functional gene annotation approaches employing a ChimeraX plugin named ANNOTEX (Annotation Extension for ChimeraX), allowing for visual inspection and manual curation. We employed this workflow on a high-quality telomere-to-telomere sequenced tetraploid genome of Vairimorpha necatrix. First, the 3080 predicted protein-coding DNA sequences, of which 89% were confirmed with RNA sequencing data, were used as input. Next, ColabFold was used to create protein structure predictions, followed by a Foldseek search for structural matching to the PDB and AlphaFold databases. The subsequent manual curation, using sequence and structure-based hits, increased the accuracy and quality of the functional genome annotation compared to results using only traditional annotation tools. Our workflow resulted in a comprehensive description of the V. necatrix genome, along with a structural summary of the most prevalent protein groups, such as the ricin B lectin family. In addition, and to test our tool, we identified the functions of several previously uncharacterized Encephalitozoon cuniculi genes. CONCLUSION We provide a new functional annotation tool for divergent organisms and employ it on a newly sequenced, high-quality microsporidian genome to shed light on this uncharacterized intracellular pathogen of Lepidoptera. The addition of a structure-based annotation approach can serve as a valuable template for studying other microsporidian or similarly divergent species.
Collapse
Affiliation(s)
- Dennis Svedberg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Rahel R Winiger
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
| | - Alexandra Berg
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Himanshu Sharma
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden
- Department of Medical Biochemistry and Biophysics, Umeå University, Umeå, 90736, Sweden
| | - Christian Tellgren-Roth
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | | | - Charles R Vossbrinck
- Department of Environmental Science, Connecticut Agricultural Experiment Station, New Haven, CT, 06504, USA
| | - Jonas Barandun
- Department of Molecular Biology, The Laboratory for Molecular Infection Medicine Sweden (MIMS), Science for Life Laboratory, Umeå Centre for Microbial Research (UCMR), Umeå University, Umeå, 90187, Sweden.
| |
Collapse
|
3
|
RTA1 Is Involved in Resistance to 7-Aminocholesterol and Secretion of Fungal Proteins in Cryptococcus neoformans. Pathogens 2022; 11:pathogens11111239. [PMID: 36364991 PMCID: PMC9697666 DOI: 10.3390/pathogens11111239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 10/21/2022] [Accepted: 10/24/2022] [Indexed: 11/30/2022] Open
Abstract
Cryptococcus neoformans (Cn) is a pathogenic yeast that is the leading cause of fungal meningitis in immunocompromised patients. Various Cn virulence factors, such as the enzyme laccase and its product melanin, phospholipase, and capsular polysaccharide have been identified. During a screen of knockout mutants, the gene resistance to aminocholesterol 1 (RTA1) was identified, the function of which is currently unknown in Cn. Rta1 homologs in S. cerevisiae belong to a lipid-translocating exporter family of fungal proteins with transmembrane regions and confer resistance to the antimicrobial agent 7-aminocholesterol when overexpressed. To determine the role of RTA1 in Cn, the knock-out (rta1Δ) and reconstituted (rta1Δ+RTA1) strains were created and phenotypically tested. RTA1 was involved in resistance to 7-aminocholesterol, and also in exocyst complex component 3 (Sec6)-mediated secretion of urease, laccase, and the major capsule component, glucuronoxylomannan (GXM), which coincided with significantly smaller capsules in the rta1Δ and rta1Δ+RTA1 strains compared to the wild-type H99 strain. Furthermore, RTA1 expression was reduced in a secretory 14 mutant (sec14Δ) and increased in an RNAi Sec6 mutant. Transmission electron microscopy demonstrated vesicle accumulation inside the rta1Δ strain, predominantly near the cell membrane. Given that Rta1 is likely to be a transmembrane protein located at the plasma membrane, these data suggest that Rta1 may be involved in both secretion of various fungal virulence factors and resistance to 7-aminocholesterol in Cn.
Collapse
|
4
|
van den Bent I, Makrodimitris S, Reinders M. The Power of Universal Contextualized Protein Embeddings in Cross-species Protein Function Prediction. Evol Bioinform Online 2021; 17:11769343211062608. [PMID: 34880594 PMCID: PMC8647222 DOI: 10.1177/11769343211062608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 11/03/2021] [Indexed: 11/16/2022] Open
Abstract
Computationally annotating proteins with a molecular function is a difficult problem that is made even harder due to the limited amount of available labeled protein training data. Unsupervised protein embeddings partly circumvent this limitation by learning a universal protein representation from many unlabeled sequences. Such embeddings incorporate contextual information of amino acids, thereby modeling the underlying principles of protein sequences insensitive to the context of species. We used an existing pre-trained protein embedding method and subjected its molecular function prediction performance to detailed characterization, first to advance the understanding of protein language models, and second to determine areas of improvement. Then, we applied the model in a transfer learning task by training a function predictor based on the embeddings of annotated protein sequences of one training species and making predictions on the proteins of several test species with varying evolutionary distance. We show that this approach successfully generalizes knowledge about protein function from one eukaryotic species to various other species, outperforming both an alignment-based and a supervised-learning-based baseline. This implies that such a method could be effective for molecular function prediction in inadequately annotated species from understudied taxonomic kingdoms.
Collapse
Affiliation(s)
- Irene van den Bent
- Delft Bioinformatics Lab, Delft
University of Technology, Delft, the Netherlands
| | - Stavros Makrodimitris
- Delft Bioinformatics Lab, Delft
University of Technology, Delft, the Netherlands
- Keygene N.V., Wageningen, the
Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Delft
University of Technology, Delft, the Netherlands
| |
Collapse
|
5
|
Hassan S, Töpel M, Aronsson H. Ligand Binding Site Comparison - LiBiSCo - a web-based tool for analyzing interactions between proteins and ligands to explore amino acid specificity within active sites. Proteins 2021; 89:1530-1540. [PMID: 34240464 DOI: 10.1002/prot.26175] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 06/18/2021] [Accepted: 06/23/2021] [Indexed: 11/12/2022]
Abstract
Interaction between protein and ligands are ubiquitous in a biological cell, and understanding these interactions at the atom level in protein-ligand complexes is crucial for structural bioinformatics and drug discovery. Here, we present a web-based protein-ligand interaction application named Ligand Binding Site Comparison (LiBiSCo) for comparing the amino acid residues interacting with atoms of a ligand molecule between different protein-ligand complexes available in the Protein Data Bank (PDB) database. The comparison is performed at the ligand atom level irrespectively of having binding site similarity or not between the protein structures of interest. The input used in LiBiSCo is one or several PDB IDs of protein-ligand complex(es) and the tool returns a list of identified interactions at ligand atom level including both bonded and non-bonded interactions. A sequence profile for the interaction for each ligand atoms is provided as a WebLogo. The LiBiSco is useful in understanding ligand binding specificity and structural promiscuity among families that are structurally unrelated. The LiBiSCo tool can be accessed through https://albiorix.bioenv.gu.se/LiBiSCo/HomePage.py.
Collapse
Affiliation(s)
- Sameer Hassan
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Karolinska Institutet, Division of Neurogeriatrics, Stockholm, Sweden
| | - Mats Töpel
- Department of Marine Science, University of Gothenburg, Gothenburg, Sweden
| | - Henrik Aronsson
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
6
|
Lasso G, Honig B, Shapira SD. A Sweep of Earth's Virome Reveals Host-Guided Viral Protein Structural Mimicry and Points to Determinants of Human Disease. Cell Syst 2020; 12:82-91.e3. [PMID: 33053371 PMCID: PMC7552982 DOI: 10.1016/j.cels.2020.09.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/03/2020] [Accepted: 09/18/2020] [Indexed: 12/17/2022]
Abstract
Viruses deploy genetically encoded strategies to coopt host machinery and support viral replicative cycles. Here, we use protein structure similarity to scan for molecular mimicry, manifested by structural similarity between viral and endogenous host proteins, across thousands of cataloged viruses and hosts spanning broad ecological niches and taxonomic range, including bacteria, plants and fungi, invertebrates, and vertebrates. This survey identified over 6,000,000 instances of structural mimicry; more than 70% of viral mimics cannot be discerned through protein sequence alone. We demonstrate that the manner and degree to which viruses exploit molecular mimicry varies by genome size and nucleic acid type and identify 158 human proteins that are mimicked by coronaviruses, providing clues about cellular processes driving pathogenesis. Our observations point to molecular mimicry as a pervasive strategy employed by viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome. A record of this paper's transparent peer review process is included in the Supplemental Information.
Collapse
Affiliation(s)
- Gorka Lasso
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA; Department of Microbiology and Immunology, Columbia University Medical Center, New York, NY, USA; Department of Microbiology and Immunology, Albert Einstein College of Medicine, New York, NY, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Medical Center, New York, NY, USA; Zuckerman Mind Brain Behavior Institute, Columbia University Medical Center, New York, NY, USA; Department of Medicine, Columbia University, New York, NY, USA
| | - Sagi D Shapira
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA; Department of Microbiology and Immunology, Columbia University Medical Center, New York, NY, USA.
| |
Collapse
|
7
|
Mosquera J, García I, Henriksen-Lacey M, Martínez-Calvo M, Dhanjani M, Mascareñas JL, Liz-Marzán LM. Reversible Control of Protein Corona Formation on Gold Nanoparticles Using Host-Guest Interactions. ACS NANO 2020; 14:5382-5391. [PMID: 32105057 PMCID: PMC7254833 DOI: 10.1021/acsnano.9b08752] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Accepted: 02/27/2020] [Indexed: 05/18/2023]
Abstract
When nanoparticles (NPs) are exposed to biological media, proteins are adsorbed, forming a so-called protein corona (PC). This cloud of protein aggregates hampers the targeting and transport capabilities of the NPs, thereby compromising their biomedical applications. Therefore, there is a high interest in the development of technologies that allow control over PC formation, as this would provide a handle to manipulate NPs in biological fluids. We present a strategy that enables the reversible disruption of the PC using external stimuli, thereby allowing a precise regulation of NP cellular uptake. The approach, demonstrated for gold nanoparticles (AuNPs), is based on a biorthogonal, supramolecular host-guest interactions between an anionic dye bound to the AuNP surface and a positively charged macromolecular cage. This supramolecular complex effectively behaves as a zwitterionic NP ligand, which is able not only to prevent PC formation but also to disrupt a previously formed hard corona. With this supramolecular stimulus, the cellular internalization of AuNPs can be enhanced by up to 30-fold in some cases, and even NP cellular uptake in phagocytic cells can be regulated. Additionally, we demonstrate that the conditional cell uptake of purposely designed gold nanorods can be used to selectively enhance photothermal cell death.
Collapse
Affiliation(s)
- Jesús Mosquera
- CIC biomaGUNE, Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014 Donostia-San Sebastián, Spain
- (J.M.)
| | - Isabel García
- CIC biomaGUNE, Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014 Donostia-San Sebastián, Spain
- CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), 20014 Donostia-San
Sebastián, Spain
| | - Malou Henriksen-Lacey
- CIC biomaGUNE, Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014 Donostia-San Sebastián, Spain
- CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), 20014 Donostia-San
Sebastián, Spain
| | - Miguel Martínez-Calvo
- Departamento de Química
Orgánica and Centro Singular de Investigación en Química
Biolóxica e Materiais Moleculares (CIQUS), Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Mónica Dhanjani
- CIC biomaGUNE, Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014 Donostia-San Sebastián, Spain
| | - José L. Mascareñas
- Departamento de Química
Orgánica and Centro Singular de Investigación en Química
Biolóxica e Materiais Moleculares (CIQUS), Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain
| | - Luis M. Liz-Marzán
- CIC biomaGUNE, Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014 Donostia-San Sebastián, Spain
- CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), 20014 Donostia-San
Sebastián, Spain
- Ikerbasque, Basque
Foundation for Science, 48013 Bilbao, Spain
- (L.M.L.-M.)
| |
Collapse
|
8
|
Pseudo-Symmetric Assembly of Protodomains as a Common Denominator in the Evolution of Polytopic Helical Membrane Proteins. J Mol Evol 2020; 88:319-344. [PMID: 32189026 PMCID: PMC7162841 DOI: 10.1007/s00239-020-09934-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 02/16/2020] [Indexed: 11/05/2022]
Abstract
The polytopic helical membrane proteome is dominated by proteins containing seven transmembrane helices (7TMHs). They cannot be grouped under a monolithic fold or superfold. However, a parallel structural analysis of folds around that magic number of seven in distinct protein superfamilies (SWEET, PnuC, TRIC, FocA, Aquaporin, GPCRs) reveals a common homology, not in their structural fold, but in their systematic pseudo-symmetric construction during their evolution. Our analysis leads to guiding principles of intragenic duplication and pseudo-symmetric assembly of ancestral transmembrane helical protodomains, consisting of 3 (or 4) helices. A parallel deconstruction and reconstruction of these domains provides a structural and mechanistic framework for their evolutionary paths. It highlights the conformational plasticity inherent to fold formation itself, the role of structural as well as functional constraints in shaping that fold, and the usefulness of protodomains as a tool to probe convergent vs divergent evolution. In the case of FocA vs. Aquaporin, this protodomain analysis sheds new light on their potential divergent evolution at the protodomain level followed by duplication and parallel evolution of the two folds. GPCR domains, whose function does not seem to require symmetry, nevertheless exhibit structural pseudo-symmetry. Their construction follows the same protodomain assembly as any other pseudo-symmetric protein suggesting their potential evolutionary origins. Interestingly, all the 6/7/8TMH pseudo-symmetric folds in this study also assemble as oligomeric forms in the membrane, emphasizing the role of symmetry in evolution, revealing self-assembly and co-evolution not only at the protodomain level but also at the domain level.
Collapse
|
9
|
Chakravarty D, McElfresh GW, Kundrotas PJ, Vakser IA. How to choose templates for modeling of protein complexes: Insights from benchmarking template-based docking. Proteins 2020; 88:1070-1081. [PMID: 31994759 DOI: 10.1002/prot.25875] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 01/07/2020] [Accepted: 01/22/2020] [Indexed: 01/01/2023]
Abstract
Comparative docking is based on experimentally determined structures of protein-protein complexes (templates), following the paradigm that proteins with similar sequences and/or structures form similar complexes. Modeling utilizing structure similarity of target monomers to template complexes significantly expands structural coverage of the interactome. Template-based docking by structure alignment can be performed for the entire structures or by aligning targets to the bound interfaces of the experimentally determined complexes. Systematic benchmarking of docking protocols based on full and interface structure alignment showed that both protocols perform similarly, with top 1 docking success rate 26%. However, in terms of the models' quality, the interface-based docking performed marginally better. The interface-based docking is preferable when one would suspect a significant conformational change in the full protein structure upon binding, for example, a rearrangement of the domains in multidomain proteins. Importantly, if the same structure is selected as the top template by both full and interface alignment, the docking success rate increases 2-fold for both top 1 and top 10 predictions. Matching structural annotations of the target and template proteins for template detection, as a computationally less expensive alternative to structural alignment, did not improve the docking performance. Sophisticated remote sequence homology detection added templates to the pool of those identified by structure-based alignment, suggesting that for practical docking, the combination of the structure alignment protocols and the remote sequence homology detection may be useful in order to avoid potential flaws in generation of the structural templates library.
Collapse
Affiliation(s)
| | - G W McElfresh
- Computational Biology Program, The University of Kansas, Lawrence, Kansas
| | - Petras J Kundrotas
- Computational Biology Program, The University of Kansas, Lawrence, Kansas
| | - Ilya A Vakser
- Computational Biology Program, The University of Kansas, Lawrence, Kansas.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas
| |
Collapse
|
10
|
Bauer TL, Buchholz PCF, Pleiss J. The modular structure of α/β-hydrolases. FEBS J 2019; 287:1035-1053. [PMID: 31545554 DOI: 10.1111/febs.15071] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/15/2019] [Accepted: 09/19/2019] [Indexed: 12/22/2022]
Abstract
The α/β-hydrolase fold family is highly diverse in sequence, structure and biochemical function. To investigate the sequence-structure-function relationships, the Lipase Engineering Database (https://led.biocatnet.de) was updated. Overall, 280 638 protein sequences and 1557 protein structures were analysed. All α/β-hydrolases consist of the catalytically active core domain, but they might also contain additional structural modules, resulting in 12 different architectures: core domain only, additional lids at three different positions, three different caps, additional N- or C-terminal domains and combinations of N- and C-terminal domains with caps and lids respectively. In addition, the α/β-hydrolases were distinguished by their oxyanion hole signature (GX-, GGGX- and Y-types). The N-terminal domains show two different folds, the Rossmann fold or the β-propeller fold. The C-terminal domains show a β-sandwich fold. The N-terminal β-propeller domain and the C-terminal β-sandwich domain are structurally similar to carbohydrate-binding proteins such as lectins. The classification was applied to the newly discovered polyethylene terephthalate (PET)-degrading PETases and MHETases, which are core domain α/β-hydrolases of the GX- and the GGGX-type respectively. To investigate evolutionary relationships, sequence networks were analysed. The degree distribution followed a power law with a scaling exponent γ = 1.4, indicating a highly inhomogeneous network which consists of a few hubs and a large number of less connected sequences. The hub sequences have many functional neighbours and therefore are expected to be robust toward possible deleterious effects of mutations. The cluster size distribution followed a power law with an extrapolated scaling exponent τ = 2.6, which strongly supports the connectedness of the sequence space of α/β-hydrolases. DATABASE: Supporting data about domains from other proteins with structural similarity to the N- or C-terminal domains of α/β-hydrolases are available in Data Repository of the University of Stuttgart (DaRUS) under doi: https://doi.org/10.18419/darus-458.
Collapse
Affiliation(s)
- Tabea L Bauer
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Patrick C F Buchholz
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| | - Jürgen Pleiss
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Germany
| |
Collapse
|
11
|
Youkharibache P. Protodomains: Symmetry-Related Supersecondary Structures in Proteins and Self-Complementarity. Methods Mol Biol 2019; 1958:187-219. [PMID: 30945220 PMCID: PMC8323591 DOI: 10.1007/978-1-4939-9161-7_10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
We will consider in this chapter supersecondary structures (SSS) as a set of secondary structure elements (SSEs) found in protein domains. Some SSS arrangements/topologies have been consistently observed within known tertiary structural domains. We use them in the context of repeating supersecondary structures that self-assemble in a symmetric arrangement to form a domain. We call them protodomains (or protofolds). Protodomains are some of the most interesting and insightful SSSs. Within a given 3D protein domain/fold, recognizing such sets may give insights into a possible evolutionary process of duplication, fusion, and coevolution of these protodomains, pointing to possible original protogenes. On protein folding itself, pseudosymmetric domains may point to a "directed" assembly of pseudosymmetric protodomains, directed by the only fact that they are tethered together in a protein chain. On function, tertiary functional sites often occur at protodomain interfaces, as they often occur at domain-domain interfaces in quaternary arrangements.First, we will briefly review some lessons learned from a previously published census of pseudosymmetry in protein domains (Myers-Turnbull, D. et al., J Mol Biol. 426:2255-2268, 2014) to introduce protodomains/protofolds. We will observe that the most abundant and diversified folds, or superfolds, in the currently known protein structure universe are indeed pseudosymmetric. Then, we will learn by example and select a few domain representatives of important pseudosymmetric folds and chief among them the immunoglobulin (Ig) fold and go over a pseudosymmetry supersecondary structure (protodomain) analysis in tertiary and quaternary structures. We will point to currently available software tools to help in identifying pseudosymmetry, delineating protodomains, and see how the study of pseudosymmetry and the underlying supersecondary structures can enrich a structural analysis. This should potentially help in protein engineering, especially in the development of biologics and immunoengineering.
Collapse
|
12
|
Structure-based prediction of ligand-protein interactions on a genome-wide scale. Proc Natl Acad Sci U S A 2017; 114:13685-13690. [PMID: 29229851 DOI: 10.1073/pnas.1705381114] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We report a template-based method, LT-scanner, which scans the human proteome using protein structural alignment to identify proteins that are likely to bind ligands that are present in experimentally determined complexes. A scoring function that rapidly accounts for binding site similarities between the template and the proteins being scanned is a crucial feature of the method. The overall approach is first tested based on its ability to predict the residues on the surface of a protein that are likely to bind small-molecule ligands. The algorithm that we present, LBias, is shown to compare very favorably to existing algorithms for binding site residue prediction. LT-scanner's performance is evaluated based on its ability to identify known targets of Food and Drug Administration (FDA)-approved drugs and it too proves to be highly effective. The specificity of the scoring function that we use is demonstrated by the ability of LT-scanner to identify the known targets of FDA-approved kinase inhibitors based on templates involving other kinases. Combining sequence with structural information further improves LT-scanner performance. The approach we describe is extendable to the more general problem of identifying binding partners of known ligands even if they do not appear in a structurally determined complex, although this will require the integration of methods that combine protein structure and chemical compound databases.
Collapse
|
13
|
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A 2017; 114:11703-11708. [PMID: 29078314 PMCID: PMC5676897 DOI: 10.1073/pnas.1707642114] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We question a central paradigm: namely, that the protein domain is the “atomic unit” of evolution. In conflict with the current textbook view, our results unequivocally show that duplication of protein segments happens both above and below the domain level among amino acid segments of diverse lengths. Indeed, we show that significant evolutionary information is lost when the protein is approached as a string of domains. Our finer-grained approach reveals a far more complicated picture, where reused segments often intertwine and overlap with each other. Our results are consistent with a recursive model of evolution, in which segments of various lengths, typically smaller than domains, “hop” between environments. The fit segments remain, leaving traces that can still be detected. Proteins share similar segments with one another. Such “reused parts”—which have been successfully incorporated into other proteins—are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment “reuse” across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for “themes”—segments of at least 35 residues of similar sequence and structure—reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.
Collapse
|
14
|
Das S, Bhadra P, Ramakumar S, Pal D. Molecular Dynamics Information Improves cis-Peptide-Based Function Annotation of Proteins. J Proteome Res 2017. [PMID: 28633522 DOI: 10.1021/acs.jproteome.7b00217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
cis-Peptide bonds, whose occurrence in proteins is rare but evolutionarily conserved, are implicated to play an important role in protein function. This has led to their previous use in a homology-independent, fragment-match-based protein function annotation method. However, proteins are not static molecules; dynamics is integral to their activity. This is nicely epitomized by the geometric isomerization of cis-peptide to trans form for molecular activity. Hence we have incorporated both static (cis-peptide) and dynamics information to improve the prediction of protein molecular function. Our results show that cis-peptide information alone cannot detect functional matches in cases where cis-trans isomerization exists but 3D coordinates have been obtained for only the trans isomer or when the cis-peptide bond is incorrectly assigned as trans. On the contrary, use of dynamics information alone includes false-positive matches for cases where fragments with similar secondary structure show similar dynamics, but the proteins do not share a common function. Combining the two methods reduces errors while detecting the true matches, thereby enhancing the utility of our method in function annotation. A combined approach, therefore, opens up new avenues of improving existing automated function annotation methodologies.
Collapse
Affiliation(s)
- Sreetama Das
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Pratiti Bhadra
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Suryanarayanarao Ramakumar
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| | - Debnath Pal
- Department of Physics and ‡Department of Computational and Data Sciences, Indian Institute of Science , Bangalore 560012, India
| |
Collapse
|
15
|
Mudgal R, Srinivasan N, Chandra N. Resolving protein structure-function-binding site relationships from a binding site similarity network perspective. Proteins 2017; 85:1319-1335. [PMID: 28342236 DOI: 10.1002/prot.25293] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 03/18/2017] [Accepted: 03/20/2017] [Indexed: 11/05/2022]
Abstract
Functional annotation is seldom straightforward with complexities arising due to functional divergence in protein families or functional convergence between non-homologous protein families, leading to mis-annotations. An enzyme may contain multiple domains and not all domains may be involved in a given function, adding to the complexity in function annotation. To address this, we use binding site information from bound cognate ligands and catalytic residues, since it can help in resolving fold-function relationships at a finer level and with higher confidence. A comprehensive database of 2,020 fold-function-binding site relationships has been systematically generated. A network-based approach is employed to capture the complexity in these relationships, from which different types of associations are deciphered, that identify versatile protein folds performing diverse functions, same function associated with multiple folds and one-to-one relationships. Binding site similarity networks integrated with fold, function, and ligand similarity information are generated to understand the depth of these relationships. Apart from the observed continuity in the functional site space, network properties of these revealed versatile families with topologically different or dissimilar binding sites and structural families that perform very similar functions. As a case study, subtle changes in the active site of a set of evolutionarily related superfamilies are studied using these networks. Tracing of such similarities in evolutionarily related proteins provide clues into the transition and evolution of protein functions. Insights from this study will be helpful in accurate and reliable functional annotations of uncharacterized proteins, poly-pharmacology, and designing enzymes with new functional capabilities. Proteins 2017; 85:1319-1335. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| | | | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560 012, India
| |
Collapse
|
16
|
Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016; 84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]
Abstract
Structure conservation, functional similarities, and homologous relationships that exist across diverse protein topologies suggest that some regions of the protein fold universe are continuous. However, the current structure classification systems are based on hierarchical organizations, which cannot accommodate structural relationships that span fold definitions. Here, we describe a novel, super-secondary-structure motif-based, topology-independent structure comparison method (SmotifCOMP) that is able to quantitatively identify structural relationships between disparate topologies. The basis of SmotifCOMP is a systematically defined super-secondary-structure motif library whose representative geometries are shown to be saturated in the Protein Data Bank and exhibit a unique distribution within the known folds. SmotifCOMP offers a robust and quantitative technique to compare domains that adopt different topologies since the method does not rely on a global superposition. SmotifCOMP is used to perform an exhaustive comparison of the known folds and the identified relationships are used to produce a nonhierarchical representation of the fold space that reflects the notion of a continuous and connected fold universe. The current work offers insight into previously hypothesized evolutionary relationships between disparate folds and provides a resource for exploring novel ones. Proteins 2016; 84:1859-1874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Joseph M. Dybas
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue Bronx, NY 10461, USA
| |
Collapse
|
17
|
Molecular mechanisms involved in the side effects of fatty acid amide hydrolase inhibitors: a structural phenomics approach to proteome-wide cellular off-target deconvolution and disease association. NPJ Syst Biol Appl 2016; 2:16023. [PMID: 28725477 PMCID: PMC5516858 DOI: 10.1038/npjsba.2016.23] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/14/2016] [Accepted: 08/02/2016] [Indexed: 01/20/2023] Open
Abstract
Fatty acid amide hydrolase (FAAH) is a promising therapeutic target for the treatment of pain and CNS disorders. However, the development of potent and safe FAAH inhibitors is hindered by their off-target mediated side effect that leads to brain cell death. Its physiological off-targets and their associations with phenotypes may not be characterized using existing experimental and computational techniques as these methods fail to have sufficient proteome coverage and/or ignore native biological assemblies (BAs; i.e., protein quaternary structures). To understand the mechanisms of the side effects from FAAH inhibitors and other drugs, we develop a novel structural phenomics approach to identifying the physiological off-targets binding profile in the cellular context and on a structural proteome scale, and investigate the roles of these off-targets in impacting human physiology and pathology using text mining-based phenomics analysis. Using this integrative approach, we discover that FAAH inhibitors may bind to the dimerization interface of NMDA receptor (NMDAR) and several other BAs, and thus disrupt their cellular functions. Specifically, the malfunction of the NMDAR is associated with a wide spectrum of brain disorders that are directly related to the observed side effects of FAAH inhibitors. This finding is consistent with the existing literature, and provides testable hypotheses for investigating the molecular origin of the side effects of FAAH inhibitors. Thus, the in silico method proposed here, which can for the first time predict proteome-wide drug interactions with cellular BAs and link BA–ligand interaction with clinical outcomes, can be valuable in off-target screening. The development and application of such methods will accelerate the development of more safe and effective therapeutics.
Collapse
|
18
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
19
|
Structure of a herpesvirus nuclear egress complex subunit reveals an interaction groove that is essential for viral replication. Proc Natl Acad Sci U S A 2015; 112:9010-5. [PMID: 26150520 DOI: 10.1073/pnas.1511140112] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Herpesviruses require a nuclear egress complex (NEC) for efficient transit of nucleocapsids from the nucleus to the cytoplasm. The NEC orchestrates multiple steps during herpesvirus nuclear egress, including disruption of nuclear lamina and particle budding through the inner nuclear membrane. In the important human pathogen human cytomegalovirus (HCMV), this complex consists of nuclear membrane protein UL50, and nucleoplasmic protein UL53, which is recruited to the nuclear membrane through its interaction with UL50. Here, we present an NMR-determined solution-state structure of the murine CMV homolog of UL50 (M50; residues 1-168) with a strikingly intricate protein fold that is matched by no other known protein folds in its entirety. Using NMR methods, we mapped the interaction of M50 with a highly conserved UL53-derived peptide, corresponding to a segment that is required for heterodimerization. The UL53 peptide binding site mapped onto an M50 surface groove, which harbors a large cavity. Point mutations of UL50 residues corresponding to surface residues in the characterized M50 heterodimerization interface substantially decreased UL50-UL53 binding in vitro, eliminated UL50-UL53 colocalization, prevented disruption of nuclear lamina, and halted productive virus replication in HCMV-infected cells. Our results provide detailed structural information on a key protein-protein interaction involved in nuclear egress and suggest that NEC subunit interactions can be an attractive drug target.
Collapse
|
20
|
Lhota J, Hauptman R, Hart T, Ng C, Xie L. A new method to improve network topological similarity search: applied to fold recognition. Bioinformatics 2015; 31:2106-14. [PMID: 25717198 DOI: 10.1093/bioinformatics/btv125] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 02/21/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework-Enrichment of Network Topological Similarity (ENTS)-to improve the performance of large scale similarity searches in bioinformatics. RESULTS We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. AVAILABILITY AND IMPLEMENTATION Source code freely available upon request CONTACT : lxie@iscb.org.
Collapse
Affiliation(s)
- John Lhota
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Ruth Hauptman
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Thomas Hart
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Clara Ng
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Lei Xie
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A. Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| |
Collapse
|
21
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
22
|
Das S, Ramakumar S, Pal D. Identifying functionally important cis-peptide containing segments in proteins and their utility in molecular function annotation. FEBS J 2014; 281:5602-21. [PMID: 25291238 DOI: 10.1111/febs.13100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 09/21/2014] [Accepted: 10/03/2014] [Indexed: 01/09/2023]
Abstract
Cis-peptide embedded segments are rare in proteins but often highlight their important role in molecular function when they do occur. The high evolutionary conservation of these segments illustrates this observation almost universally, although no attempt has been made to systematically use this information for the purpose of function annotation. In the present study, we demonstrate how geometric clustering and level-specific Gene Ontology molecular-function terms (also known as annotations) can be used in a statistically significant manner to identify cis-embedded segments in a protein linked to its molecular function. The present study identifies novel cis-peptide fragments, which are subsequently used for fragment-based function annotation. Annotation recall benchmarks interpreted using the receiver-operator characteristic plot returned an area-under-curve > 0.9, corroborating the utility of the annotation method. In addition, we identified cis-peptide fragments occurring in conjunction with functionally important trans-peptide fragments, providing additional insights into molecular function. We further illustrate the applicability of our method in function annotation where homology-based annotation transfer is not possible. The findings of the present study add to the repertoire of function annotation approaches and also facilitate engineering, design and allied studies around the cis-peptide neighborhood of proteins.
Collapse
Affiliation(s)
- Sreetama Das
- Department of Physics, Indian Institute of Science, Bangalore, India
| | | | | |
Collapse
|
23
|
Skolnick J, Gao M, Zhou H. On the role of physics and evolution in dictating protein structure and function. Isr J Chem 2014; 54:1176-1188. [PMID: 25484448 PMCID: PMC4255337 DOI: 10.1002/ijch.201400013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
How many of the structural and functional properties of proteins are inherent? Computer simulations provide a powerful tool to address this question. A series of studies on QS, quasi-spherical, compact polypeptides which lack any secondary structure; ART, artificial, proteins comprised of compact homopolypeptides with protein-like secondary structure; and PDB, native, single domain proteins shows that essentially all native global folds, pockets and protein-protein interfaces are in the ART library. This suggests that many protein properties are inherent and that evolution is involved in fine-tuning. The completeness of the space of ligand binding pockets and protein-protein interfaces suggests that promiscuous interactions are intrinsic to proteins and that the capacity to perform the biochemistry of life at low level does not require evolution. If so, this has profound consequences for the origin of life.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| |
Collapse
|
24
|
Sheng R, Kim H, Lee H, Xin Y, Chen Y, Tian W, Cui Y, Choi JC, Doh J, Han JK, Cho W. Cholesterol selectively activates canonical Wnt signalling over non-canonical Wnt signalling. Nat Commun 2014; 5:4393. [PMID: 25024088 PMCID: PMC4100210 DOI: 10.1038/ncomms5393] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 06/13/2014] [Indexed: 12/19/2022] Open
Abstract
Wnt proteins control diverse biological processes through β-catenin-dependent canonical signalling and β-catenin-independent non-canonical signalling. The mechanisms by which these signalling pathways are differentially triggered and controlled are not fully understood. Dishevelled (Dvl) is a scaffold protein that serves as the branch point of these pathways. Here, we show that cholesterol selectively activates canonical Wnt signalling over non-canonical signalling under physiological conditions by specifically facilitating the membrane recruitment of the PDZ domain of Dvl and its interaction with other proteins. Single-molecule imaging analysis shows that cholesterol is enriched around the Wnt-activated Frizzled and low-density lipoprotein receptor-related protein 5/6 receptors and plays an essential role for Dvl-mediated formation and maintenance of the canonical Wnt signalling complex. Collectively, our results suggest a new regulatory role of cholesterol in Wnt signalling and a potential link between cellular cholesterol levels and the balance between canonical and non-canonical Wnt signalling activities.
Collapse
Affiliation(s)
- Ren Sheng
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | | | | | - Yao Xin
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Yong Chen
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Wen Tian
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Yang Cui
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Jong-Cheol Choi
- Mechanical Engineering, Pohang University of Science and Technology, Pohang, 790-784, Korea
| | - Junsang Doh
- Mechanical Engineering, Pohang University of Science and Technology, Pohang, 790-784, Korea
| | | | - Wonhwa Cho
- Department of Chemistry, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
25
|
Zhang D, Iyer LM, Burroughs AM, Aravind L. Resilience of biochemical activity in protein domains in the face of structural divergence. Curr Opin Struct Biol 2014; 26:92-103. [PMID: 24952217 DOI: 10.1016/j.sbi.2014.05.008] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 05/20/2014] [Indexed: 01/07/2023]
Abstract
Recent studies point to the prevalence of the evolutionary phenomenon of drastic structural transformation of protein domains while continuing to preserve their basic biochemical function. These transformations span a wide spectrum, including simple domains incorporated into larger structural scaffolds, changes in the structural core, major active site shifts, topological rewiring and extensive structural transmogrifications. Proteins from biological conflict systems, such as toxin-antitoxin, restriction-modification, CRISPR/Cas, polymorphic toxin and secondary metabolism systems commonly display such transformations. These include endoDNases, metal-independent RNases, deaminases, ADP ribosyltransferases, immunity proteins, kinases and E1-like enzymes. In eukaryotes such transformations are seen in domains involved in chromatin-related peptide recognition and protein/DNA-modification. Intense selective pressures from 'arms-race'-like situations in conflict and macromolecular modification systems could favor drastic structural divergence while preserving function.
Collapse
Affiliation(s)
- Dapeng Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - A Maxwell Burroughs
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
26
|
Pulavarti SVSRK, Eletsky A, Lee HW, Acton TB, Xiao R, Everett JK, Prestegard JH, Montelione GT, Szyperski T. Solution NMR structure of CD1104B from pathogenic Clostridium difficile reveals a distinct α-helical architecture and provides first structural representative of protein domain family PF14203. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2013; 14:155-160. [PMID: 24048810 PMCID: PMC3844015 DOI: 10.1007/s10969-013-9164-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 09/10/2013] [Indexed: 05/30/2023]
Abstract
A high-quality structure of the 68-residue protein CD1104B from Clostridium difficile strain 630 exhibits a distinct all α-helical fold. The structure presented here is the first representative of bacterial protein domain family PF14203 (currently 180 members) of unknown function (DUF4319) and reveals that the side-chains of the only two strictly conserved residues (Glu 8 and Lys 48) form a salt bridge. Moreover, these two residues are located in the vicinity of the largest surface cleft which is predicted to contribute to a surface area involved in protein-protein interactions. This, along with its coding in transposon CTn4, suggests that CD1104B (and very likely all members of Pfam 14203) functions by interacting with other proteins required for the transfer of transposons between different bacterial species.
Collapse
Affiliation(s)
- Surya VSRK Pulavarti
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Hsiau-Wei Lee
- Complex Carbohydrate Research Center, University at Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University at Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA, Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ, Piscataway NJ 08854, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
27
|
Shazman S, Lee H, Socol Y, Mann RS, Honig B. OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites. Nucleic Acids Res 2013; 42:D167-71. [PMID: 24271386 PMCID: PMC3965123 DOI: 10.1093/nar/gkt1165] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein–DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
Collapse
Affiliation(s)
- Shula Shazman
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Department of Systems Biology, Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Avenue, New York, NY 10032, USA, Department of Life Science, Open University of Israel, Ra'anana 43107, Israel and Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West 168th Street, HHSC 1104, New York, NY 10032, USA
| | | | | | | | | |
Collapse
|
28
|
Eberhardt RY, Chang Y, Bateman A, Murzin AG, Axelrod HL, Hwang WC, Aravind L. Filling out the structural map of the NTF2-like superfamily. BMC Bioinformatics 2013; 14:327. [PMID: 24246060 PMCID: PMC3924330 DOI: 10.1186/1471-2105-14-327] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2013] [Accepted: 11/15/2013] [Indexed: 12/03/2022] Open
Abstract
Background The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found. Results Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role. Conclusions Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).
Collapse
Affiliation(s)
- Ruth Y Eberhardt
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK.
| | | | | | | | | | | | | |
Collapse
|
29
|
Caprari S, Toti D, Viet Hung L, Di Stefano M, Polticelli F. ASSIST: a fast versatile local structural comparison tool. Bioinformatics 2013; 30:1022-4. [DOI: 10.1093/bioinformatics/btt664] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
30
|
Prediction and experimental validation of enzyme substrate specificity in protein structures. Proc Natl Acad Sci U S A 2013; 110:E4195-202. [PMID: 24145433 DOI: 10.1073/pnas.1305162110] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Structural Genomics aims to elucidate protein structures to identify their functions. Unfortunately, the variation of just a few residues can be enough to alter activity or binding specificity and limit the functional resolution of annotations based on sequence and structure; in enzymes, substrates are especially difficult to predict. Here, large-scale controls and direct experiments show that the local similarity of five or six residues selected because they are evolutionarily important and on the protein surface can suffice to identify an enzyme activity and substrate. A motif of five residues predicted that a previously uncharacterized Silicibacter sp. protein was a carboxylesterase for short fatty acyl chains, similar to hormone-sensitive-lipase-like proteins that share less than 20% sequence identity. Assays and directed mutations confirmed this activity and showed that the motif was essential for catalysis and substrate specificity. We conclude that evolutionary and structural information may be combined on a Structural Genomics scale to create motifs of mixed catalytic and noncatalytic residues that identify enzyme activity and substrate specificity.
Collapse
|
31
|
Pulavarti SVSRK, He Y, Feldmann EA, Eletsky A, Acton TB, Xiao R, Everett JK, Montelione GT, Kennedy MA, Szyperski T. Solution NMR structures provide first structural coverage of the large protein domain family PF08369 and complementary structural coverage of dark operative protochlorophyllide oxidoreductase complexes. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2013; 14:119-126. [PMID: 23963952 PMCID: PMC3982801 DOI: 10.1007/s10969-013-9159-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2013] [Accepted: 07/16/2013] [Indexed: 06/02/2023]
Abstract
High-quality NMR structures of the C-terminal domain comprising residues 484-537 of the 537-residue protein Bacterial chlorophyll subunit B (BchB) from Chlorobium tepidum and residues 9-61 of 61-residue Asr4154 from Nostoc sp. (strain PCC 7120) exhibit a mixed α/β fold comprised of three α-helices and a small β-sheet packed against second α-helix. These two proteins share 29% sequence similarity and their structures are globally quite similar. The structures of BchB(484-537) and Asr4154(9-61) are the first representative structures for the large protein family (Pfam) PF08369, a family of unknown function currently containing 610 members in bacteria and eukaryotes. Furthermore, BchB(484-537) complements the structural coverage of the dark-operating protochlorophyllide oxidoreductase.
Collapse
Affiliation(s)
- Surya VSRK Pulavarti
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Yunfen He
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Erik A. Feldmann
- Department of Chemistry and Biochemistry, Miami University, and Northeast Structural Genomics Consortium, Oxford, OH 45056, USA
| | - Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ, Piscataway NJ 08854, USA
| | - Michael A. Kennedy
- Department of Chemistry and Biochemistry, Miami University, and Northeast Structural Genomics Consortium, Oxford, OH 45056, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
32
|
Sheng R, Chen Y, Yung Gee H, Stec E, Melowic HR, Blatner NR, Tun MP, Kim Y, Källberg M, Fujiwara TK, Hye Hong J, Pyo Kim K, Lu H, Kusumi A, Goo Lee M, Cho W. Cholesterol modulates cell signaling and protein networking by specifically interacting with PDZ domain-containing scaffold proteins. Nat Commun 2013; 3:1249. [PMID: 23212378 PMCID: PMC3526836 DOI: 10.1038/ncomms2221] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 10/22/2012] [Indexed: 02/06/2023] Open
Abstract
Cholesterol is known to modulate the physical properties of cell membranes but its direct involvement in cellular signaling has not been thoroughly investigated. Here we show that cholesterol specifically binds many PDZ domains found in scaffold proteins, including the N-terminal PDZ domain of NHERF1/EBP50. This modular domain has a cholesterol-binding site topologically distinct from its canonical protein-binding site and serves as a dual specificity domain that bridges the membrane and juxta-membrane signaling complexes. Disruption of the cholesterol binding activity of NHERF1 largely abrogates its dynamic colocalization with and activation of cystic fibrosis transmembrane conductance regulator, one of its binding partners in the plasma membrane of mammalian cells. At least seven more PDZ domains from other scaffold proteins also bind cholesterol and have cholesterol-binding sites, suggesting that cholesterol modulates cell signaling through direct interactions with these scaffold proteins. This mechanism may provide an alternative explanation for the formation of signaling platforms in cholesterol-rich membrane domains.
Collapse
Affiliation(s)
- Ren Sheng
- Department of Chemistry, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Interplay of physics and evolution in the likely origin of protein biochemical function. Proc Natl Acad Sci U S A 2013; 110:9344-9. [PMID: 23690621 DOI: 10.1073/pnas.1300011110] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The intrinsic ability of protein structures to exhibit the geometric and sequence properties required for ligand binding without evolutionary selection is shown by the coincidence of the properties of pockets in native, single domain proteins with those in computationally generated, compact homopolypeptide, artificial (ART) structures. The library of native pockets is covered by a remarkably small number of representative pockets (∼400), with virtually every native pocket having a statistically significant match in the ART library, suggesting that the library is complete. When sequences are selected for ART structures based on fold stability, pocket sequence conservation is coincident to native. The fact that structurally and sequentially similar pockets occur across fold classes combined with the small number of representative pockets in native proteins implies that promiscuous interactions are inherent to proteins. Based on comparison of PDB (real, single domain protein structures found in the Protein Data Bank) and ART structures and pockets, the widespread assumption that the co-occurrence of global structure, pocket similarity, and amino acid conservation demands an evolutionary relationship between proteins is shown to significantly underestimate the random background probability. Indeed, many features of biochemical function arise from the physical properties of proteins that evolution likely fine-tunes to achieve specificity. Finally, our study suggests that a repertoire of thermodynamically (marginally) stable proteins could engage in many of the biochemical reactions needed for living systems without selection for function, a conclusion with significant implications for the origin of life.
Collapse
|
34
|
Understanding Protein–Protein Interactions Using Local Structural Features. J Mol Biol 2013; 425:1210-24. [DOI: 10.1016/j.jmb.2013.01.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 01/08/2013] [Accepted: 01/14/2013] [Indexed: 11/21/2022]
|
35
|
Eberhardt RY, Bartholdson SJ, Punta M, Bateman A. The SHOCT domain: a widespread domain under-represented in model organisms. PLoS One 2013; 8:e57848. [PMID: 23451277 PMCID: PMC3581485 DOI: 10.1371/journal.pone.0057848] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 01/29/2013] [Indexed: 11/18/2022] Open
Abstract
We have identified a new protein domain, which we have named the SHOCT domain (Short C-terminal domain). This domain is widespread in bacteria with over a thousand examples. But we found it is missing from the most commonly studied model organisms, despite being present in closely related species. It's predominantly C-terminal location, co-occurrence with numerous other domains and short size is reminiscent of the Gram-positive anchor motif, however it is present in a much wider range of species. We suggest several hypotheses about the function of SHOCT, including oligomerisation and nucleic acid binding. Our initial experiments do not support its role as an oligomerisation domain.
Collapse
Affiliation(s)
- Ruth Y Eberhardt
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
| | | | | | | |
Collapse
|
36
|
Dey F, Cliff Zhang Q, Petrey D, Honig B. Toward a "structural BLAST": using structural relationships to infer function. Protein Sci 2013; 22:359-66. [PMID: 23349097 DOI: 10.1002/pro.2225] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2012] [Revised: 01/17/2013] [Accepted: 01/17/2013] [Indexed: 02/05/2023]
Abstract
We outline a set of strategies to infer protein function from structure. The overall approach depends on extensive use of homology modeling, the exploitation of a wide range of global and local geometric relationships between protein structures and the use of machine learning techniques. The combination of modeling with broad searches of protein structure space defines a "structural BLAST" approach to infer function with high genomic coverage. Applications are described to the prediction of protein-protein and protein-ligand interactions. In the context of protein-protein interactions, our structure-based prediction algorithm, PrePPI, has comparable accuracy to high-throughput experiments. An essential feature of PrePPI involves the use of Bayesian methods to combine structure-derived information with non-structural evidence (e.g. co-expression) to assign a likelihood for each predicted interaction. This, combined with a structural BLAST approach significantly expands the range of applications of protein structure in the annotation of protein function, including systems level biological applications where it has previously played little role.
Collapse
Affiliation(s)
- Fabian Dey
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics and Initiative in Systems Biology, Columbia University, New York, New York 10032, USA
| | | | | | | |
Collapse
|
37
|
Finding protein targets for small biologically relevant ligands across fold space using inverse ligand binding predictions. Structure 2013; 20:1815-22. [PMID: 23141694 DOI: 10.1016/j.str.2012.09.011] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Revised: 08/14/2012] [Accepted: 09/16/2012] [Indexed: 01/12/2023]
Abstract
Inverse ligand binding prediction utilizes a few protein-ligand (drug) complexes to predict other secondary therapeutic and off-targets of a given drug molecule on a proteomic scale. We adapt two binding site predictors, FINDSITE and SMAP, to perform the inverse predictions and evaluate them on over 30 representative ligands. Use of just one complex allows the identification of other protein targets; the availability of additional complexes improves the results. Both methods offer comparable quality when using three complexes with diverse proteins. SMAP is better when fewer complexes are available, while FINDSITE provides stronger predictions for smaller ligands. We propose a consensus that combines (and outperforms) the two complementary approaches implemented by FINDSITE and SMAP. Most importantly, we demonstrate that these methods successfully find distant targets that belong to structurally different folds compared to the proteins in the input complexes.
Collapse
|
38
|
Goldman AD, Baross JA, Samudrala R. The enzymatic and metabolic capabilities of early life. PLoS One 2012; 7:e39912. [PMID: 22970111 PMCID: PMC3438178 DOI: 10.1371/journal.pone.0039912] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Accepted: 06/04/2012] [Indexed: 12/24/2022] Open
Abstract
We introduce the concept of metaconsensus and employ it to make high confidence predictions of early enzyme functions and the metabolic properties that they may have produced. Several independent studies have used comparative bioinformatics methods to identify taxonomically broad features of genomic sequence data, protein structure data, and metabolic pathway data in order to predict physiological features that were present in early, ancestral life forms. But all such methods carry with them some level of technical bias. Here, we cross-reference the results of these previous studies to determine enzyme functions predicted to be ancient by multiple methods. We survey modern metabolic pathways to identify those that maintain the highest frequency of metaconsensus enzymes. Using the full set of modern reactions catalyzed by these metaconsensus enzyme functions, we reconstruct a representative metabolic network that may reflect the core metabolism of early life forms. Our results show that ten enzyme functions, four hydrolases, three transferases, one oxidoreductase, one lyase, and one ligase, are determined by metaconsensus to be present at least as late as the last universal common ancestor. Subnetworks within central metabolic processes related to sugar and starch metabolism, amino acid biosynthesis, phospholipid metabolism, and CoA biosynthesis, have high frequencies of these enzyme functions. We demonstrate that a large metabolic network can be generated from this small number of enzyme functions.
Collapse
Affiliation(s)
- Aaron David Goldman
- Department of Ecology and Evolutionary Biology, Princeton, New Jersey, United States of America.
| | | | | |
Collapse
|
39
|
Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N. Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. MOLECULAR BIOSYSTEMS 2012; 8:2076-84. [PMID: 22692068 DOI: 10.1039/c2mb25113b] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of 'protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a 'roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Collapse
Affiliation(s)
- S Sandhya
- National Centre for Biological Sciences, UAS-GKVK Campus, Bangalore 560065, India
| | | | | | | | | | | |
Collapse
|
40
|
Wu Y, Punta M, Xiao R, Acton TB, Sathyamoorthy B, Dey F, Fischer M, Skerra A, Rost B, Montelione GT, Szyperski T. NMR structure of lipoprotein YxeF from Bacillus subtilis reveals a calycin fold and distant homology with the lipocalin Blc from Escherichia coli. PLoS One 2012; 7:e37404. [PMID: 22693626 PMCID: PMC3367933 DOI: 10.1371/journal.pone.0037404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2012] [Accepted: 04/19/2012] [Indexed: 11/18/2022] Open
Abstract
The soluble monomeric domain of lipoprotein YxeF from the Gram positive bacterium B. subtilis was selected by the Northeast Structural Genomics Consortium (NESG) as a target of a biomedical theme project focusing on the structure determination of the soluble domains of bacterial lipoproteins. The solution NMR structure of YxeF reveals a calycin fold and distant homology with the lipocalin Blc from the Gram-negative bacterium E.coli. In particular, the characteristic β-barrel, which is open to the solvent at one end, is extremely well conserved in YxeF with respect to Blc. The identification of YxeF as the first lipocalin homologue occurring in a Gram-positive bacterium suggests that lipocalins emerged before the evolutionary divergence of Gram positive and Gram negative bacteria. Since YxeF is devoid of the α-helix that packs in all lipocalins with known structure against the β-barrel to form a second hydrophobic core, we propose to introduce a new lipocalin sub-family named ‘slim lipocalins’, with YxeF and the other members of Pfam family PF11631 to which YxeF belongs constituting the first representatives. The results presented here exemplify the impact of structural genomics to enhance our understanding of biology and to generate new biological hypotheses.
Collapse
Affiliation(s)
- Yibing Wu
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Marco Punta
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Rong Xiao
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas B. Acton
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Bharathwaj Sathyamoorthy
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Fabian Dey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Markus Fischer
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- Northeast Structural Genomics Consortium
| | - Arne Skerra
- Munich Center for Integrated Protein Science, CIPS-M, and Lehrstuhl für Biologische Chemie, Technische Universität München, Freising-Weihenstephan, Germany
| | - Burkhard Rost
- Department of Computer Science and Institute for Advanced Study, Technical University of Munich, Munich, Germany
- Northeast Structural Genomics Consortium
| | - Gaetano T. Montelione
- Center of Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Robert Wood Johnson Medical School, The State University of New Jersey, Piscataway, New Jersey, United States of America
- Northeast Structural Genomics Consortium
| | - Thomas Szyperski
- Department of Chemistry, State University of New York at Buffalo, Buffalo, New York, United States of America
- Northeast Structural Genomics Consortium
- * E-mail:
| |
Collapse
|
41
|
Aramini JM, Petrey D, Lee DY, Janjua H, Xiao R, Acton TB, Everett JK, Montelione GT. Solution NMR structure of Alr2454 from Nostoc sp. PCC 7120, the first structural representative of Pfam domain family PF11267. ACTA ACUST UNITED AC 2012; 13:171-6. [PMID: 22592539 DOI: 10.1007/s10969-012-9135-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 02/02/2012] [Indexed: 10/28/2022]
Abstract
Protein domain family PF11267 (DUF3067) is a family of proteins of unknown function found in both bacteria and eukaryotes. Here we present the solution NMR structure of the 102-residue Alr2454 protein from Nostoc sp. PCC 7120, which constitutes the first structural representative from this conserved protein domain family. The structure of Nostoc sp. Alr2454 adopts a novel protein fold.
Collapse
Affiliation(s)
- James M Aramini
- Department of Molecular Biology and Biochemistry, Center for Advanced Biotechnology and Medicine, Piscataway, NJ, USA.
| | | | | | | | | | | | | | | |
Collapse
|
42
|
Janda JO, Busch M, Kück F, Porfenenko M, Merkl R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 2012; 13:55. [PMID: 22480135 PMCID: PMC3391178 DOI: 10.1186/1471-2105-13-55] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/05/2012] [Indexed: 11/12/2022] Open
Abstract
Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about residue-positions important for a set of homologous proteins and focuses on conservation and abundance signals. Thus, the algorithm can be applied in cases where function cannot be transferred from well-characterized proteins by means of sequence comparison.
Collapse
Affiliation(s)
- Jan-Oliver Janda
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, 93040 Regensburg, Germany.
| | | | | | | | | |
Collapse
|
43
|
Rai A, Suprasanna P, D'Souza SF, Kumar V. Membrane topology and predicted RNA-binding function of the 'early responsive to dehydration (ERD4)' plant protein. PLoS One 2012; 7:e32658. [PMID: 22431979 PMCID: PMC3303787 DOI: 10.1371/journal.pone.0032658] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 02/02/2012] [Indexed: 12/21/2022] Open
Abstract
Functional annotation of uncharacterized genes is the main focus of computational methods in the post genomic era. These tools search for similarity between proteins on the premise that those sharing sequence or structural motifs usually perform related functions, and are thus particularly useful for membrane proteins. Early responsive to dehydration (ERD) genes are rapidly induced in response to dehydration stress in a variety of plant species. In the present work we characterized function of Brassica juncea ERD4 gene using computational approaches. The ERD4 protein of unknown function possesses ubiquitous DUF221 domain (residues 312-634) and is conserved in all plant species. We suggest that the protein is localized in chloroplast membrane with at least nine transmembrane helices. We detected a globular domain of 165 amino acid residues (183-347) in plant ERD4 proteins and expect this to be posited inside the chloroplast. The structural-functional annotation of the globular domain was arrived at using fold recognition methods, which suggested in its sequence presence of two tandem RNA-recognition motif (RRM) domains each folded into βαββαβ topology. The structure based sequence alignment with the known RNA-binding proteins revealed conservation of two non-canonical ribonucleoprotein sub-motifs in both the putative RNA-recognition domains of the ERD4 protein. The function of highly conserved ERD4 protein may thus be associated with its RNA-binding ability during the stress response. This is the first functional annotation of ERD4 family of proteins that can be useful in designing experiments to unravel crucial aspects of stress tolerance mechanism.
Collapse
Affiliation(s)
- Archana Rai
- Nuclear Agricultural & Biotechnology Division, Bhabha Atomic Research Centre, Mumbai, India
| | - Penna Suprasanna
- Nuclear Agricultural & Biotechnology Division, Bhabha Atomic Research Centre, Mumbai, India
| | - Stanislaus F. D'Souza
- Nuclear Agricultural & Biotechnology Division, Bhabha Atomic Research Centre, Mumbai, India
- * E-mail: (SFD); (VK)
| | - Vinay Kumar
- High Pressure & Synchrotron Radiation Physics Division, Bhabha Atomic Research Centre, Mumbai, India
- * E-mail: (SFD); (VK)
| |
Collapse
|
44
|
Eletsky A, Acton TB, Xiao R, Everett JK, Montelione GT, Szyperski T. Solution NMR structures reveal a distinct architecture and provide first structures for protein domain family PF04536. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:9-14. [PMID: 22198206 PMCID: PMC3609422 DOI: 10.1007/s10969-011-9122-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 12/13/2011] [Indexed: 11/29/2022]
Abstract
The protein family (Pfam) PF04536 is a broadly conserved domain family of unknown function (DUF477), with more than 1,350 members in prokaryotic and eukaryotic proteins. High-quality NMR structures of the N-terminal domain comprising residues 41-180 of the 684-residue protein CG2496 from Corynebacterium glutamicum and the N-terminal domain comprising residues 35-182 of the 435-residue protein PG0361 from Porphyromonas gingivalis both exhibit an α/β fold comprised of a four-stranded β-sheet, three α-helices packed against one side of the sheet, and a fourth α-helix attached to the other side. In spite of low sequence similarity (18%) assessed by structure-based sequence alignment, the two structures are globally quite similar. However, moderate structural differences are observed for the relative orientation of two of the four helices. Comparison with known protein structures reveals that the α/β architecture of CG2496(41-180) and PG0361(35-182) has previously not been characterized. Moreover, calculation of surface charge potential and identification of surface clefts indicate that the two domains very likely have different functions.
Collapse
Affiliation(s)
- Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, Buffalo, NY 14260, USA
| | | | | | | | | | | |
Collapse
|
45
|
Eletsky A, Petrey D, Cliff Zhang Q, Lee HW, Acton TB, Xiao R, Everett JK, Prestegard JH, Honig B, Montelione GT, Szyperski T. Solution NMR structures reveal unique homodimer formation by a winged helix-turn-helix motif and provide first structures for protein domain family PF10771. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:1-7. [PMID: 22223187 PMCID: PMC3654790 DOI: 10.1007/s10969-011-9121-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 12/13/2011] [Indexed: 11/29/2022]
Abstract
High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.
Collapse
Affiliation(s)
- Alexander Eletsky
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| | - Donald Petrey
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Qiangfeng Cliff Zhang
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Hsiau-Wei Lee
- Complex Carbohydrate Research Center, University of Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Thomas B. Acton
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Rong Xiao
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - John K. Everett
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - James H. Prestegard
- Complex Carbohydrate Research Center, University of Georgia, and Northeast Structural Genomics Consortium, Athens, GA 30602, USA
| | - Barry Honig
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | - Gaetano T. Montelione
- Department of Molecular Biology and Biochemistry, Center of Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, and Northeast Structural Genomics Consortium, Piscataway, NJ 08854, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, and Northeast Structural Genomics Consortium, Buffalo, NY 14260, USA
| |
Collapse
|
46
|
Kuziemko A, Honig B, Petrey D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 2011; 7:e1002175. [PMID: 21998567 PMCID: PMC3188491 DOI: 10.1371/journal.pcbi.1002175] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/14/2011] [Indexed: 11/18/2022] Open
Abstract
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. It has been suggested that, for nearly every protein sequence, there is already a protein with a similar structure in current protein structure databases. However, with poor or undetectable sequence relationships, it is expected that accurate alignments and models cannot be generated. Here we show that this is not the case, and that whenever structural relationship exists, there are usually local sequence relationships that can be used to generate an accurate alignment, no matter what the global sequence identity. However, this requires an alternative to the traditional dynamic programming algorithm and the consideration of a small ensemble of alignments. We present an algorithm, S4, and demonstrate that it is capable of generating accurate alignments in nearly all cases where a structural relationship exists between two proteins. Our results thus constitute an important advance in the full exploitation of the information in structural databases. That is, the expectation of an accurate alignment suggests that a meaningful model can be generated for nearly every sequence for which a suitable template exists.
Collapse
Affiliation(s)
- Andrew Kuziemko
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
47
|
Nguyen CD, Gardiner KJ, Cios KJ. Protein annotation from protein interaction networks and Gene Ontology. J Biomed Inform 2011; 44:824-9. [PMID: 21571095 PMCID: PMC3176917 DOI: 10.1016/j.jbi.2011.04.010] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 04/17/2011] [Accepted: 04/26/2011] [Indexed: 01/12/2023]
Abstract
We introduce a novel method for annotating protein function that combines Naïve Bayes and association rules, and takes advantage of the underlying topology in protein interaction networks and the structure of graphs in the Gene Ontology. We apply our method to proteins from the Human Protein Reference Database (HPRD) and show that, in comparison with other approaches, it predicts protein functions with significantly higher recall with no loss of precision. Specifically, it achieves 51% precision and 60% recall versus 45% and 26% for Majority and 24% and 61% for χ²-statistics, respectively.
Collapse
Affiliation(s)
- Cao D Nguyen
- Centre for Diabetes Research, The Western Australian Institute for Medical Research, Australia.
| | | | | |
Collapse
|
48
|
A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds. Structure 2011; 19:613-21. [PMID: 21565696 DOI: 10.1016/j.str.2011.02.015] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Revised: 02/04/2011] [Accepted: 02/26/2011] [Indexed: 11/23/2022]
Abstract
Protein function annotation and rational drug discovery rely on the knowledge of binding sites for small organic compounds, and yet the quality of existing binding site predictors was never systematically evaluated. We assess predictions of ten representative geometry-, energy-, threading-, and consensus-based methods on a new benchmark data set that considers apo and holo protein structures with multiple binding sites for biologically relevant ligands. Statistical tests show that threading-based Findsite outperforms other predictors when its templates have high similarity with the input protein. However, Findsite is equivalent or inferior to some geometry-, energy-, and consensus-based methods when the similarity is lower. We demonstrate that geometry-, energy-, and consensus-based predictors benefit from the usage of holo structures and that the top four methods, Findsite, Q-SiteFinder, ConCavity, and MetaPocket, perform better for larger binding sites. Predictions from these four methods are complementary, and our simple meta-predictor improves over the best single predictor.
Collapse
|
49
|
Aramini JM, Rossi P, Fischer M, Xiao R, Acton TB, Montelione GT. Solution NMR structure of VF0530 from Vibrio fischeri reveals a nucleic acid-binding function. Proteins 2011; 79:2988-91. [PMID: 21905121 DOI: 10.1002/prot.23121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2011] [Revised: 06/10/2011] [Accepted: 06/15/2011] [Indexed: 11/11/2022]
Abstract
Protein domain family PF09905 (DUF2132) is a family of small domains of unknown function that are conserved in a wide range of bacteria. Here we describe the solution NMR structure of the 80-residue VF0530 protein from Vibrio fischeri, the first structural representative from this protein domain family. We demonstrate that the structure of VF0530 adopts a unique four-helix motif that shows some similarity to the C-terminal double-stranded DNA (dsDNA) binding domain of RecA, as well as other nucleic acid binding domains. Moreover, gel shift binding data indicate a potential dsDNA binding role for VF0530.
Collapse
Affiliation(s)
- James M Aramini
- Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Fischer M, Zhang QC, Dey F, Chen BY, Honig B, Petrey D. MarkUs: a server to navigate sequence-structure-function space. Nucleic Acids Res 2011; 39:W357-61. [PMID: 21672961 PMCID: PMC3125806 DOI: 10.1093/nar/gkr468] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe MarkUs, a web server for analysis and comparison of the structural and functional properties of proteins. In contrast to a ‘structure in/function out’ approach to protein function annotation, the server is designed to be highly interactive and to allow flexibility in the examination of possible functions, suggested either automatically by various similarity measures or specified by a user directly. This is combined with tools that allow a user to assess independently whether or not a suggested function is consistent with the bioinformatic and biophysical properties of a given query structure, further allowing the user to generate testable hypotheses. The server is available at http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:Mark-Us.
Collapse
Affiliation(s)
- Markus Fischer
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|