1
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
2
|
Trosset JY, Cavé C. In Silico Target Druggability Assessment: From Structural to Systemic Approaches. Methods Mol Biol 2019; 1953:63-88. [PMID: 30912016 DOI: 10.1007/978-1-4939-9145-7_5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
This chapter will focus on today's in silico direct and indirect approaches to assess therapeutic target druggability. The direct approach tries to infer from the 3D structure the capacity of the target protein to bind small molecule in order to modulate its biological function. Algorithms to recognize and characterize the quality of the ligand interaction sites whether within buried protein cavities or within large protein-protein interface will be reviewed in the first part of the paper. In the case a ligand-binding site is already identified, indirect aspects of target druggability can be assessed. These indirect approaches focus first on target promiscuity and the potential difficulties in developing specific drugs. It is based on large-scale comparison of protein-binding sites. The second aspect concerns the capacity of the target to induce resistant pathway once it is inhibited or activated by a drug. The emergence of drug-resistant pathways can be assessed through systemic analysis of biological networks implementing metabolism and/or cell regulation signaling.
Collapse
Affiliation(s)
| | - Christian Cavé
- BioCIS UFR Pharmacie UMR CNRS 8076, Université Paris Saclay, Orsay, France
| |
Collapse
|
3
|
Ivanisenko VA, Ivanisenko TV, Saik OV, Demenkov PS, Afonnikov DA, Kolchanov NA. Web-Based Computational Tools for the Prediction and Analysis of Posttranslational Modifications of Proteins. Methods Mol Biol 2019; 1934:1-20. [PMID: 31256369 DOI: 10.1007/978-1-4939-9055-9_1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The increase in the number of Web-based resources on posttranslational modification sites (PTMSs) in proteins is accelerating. This chapter presents a set of computational protocols describing how to work with the Internet resources when dealing with PTMSs. The protocols are intended for querying in PTMS-related databases, search of the PTMSs in the protein sequences and structures, and calculating the pI and molecular mass of the PTM isoforms. Thus, the modern bioinformatics prediction tools make it feasible to express protein modification in broader quantitative terms.
Collapse
Affiliation(s)
- Vladimir A Ivanisenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia.
| | - Timofey V Ivanisenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| | - Olga V Saik
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| | - Pavel S Demenkov
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| | - Dmitry A Afonnikov
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| | - Nikolay A Kolchanov
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
4
|
Medvedeva IV, Demenkov PS, Ivanisenko VA. SITEX 2.0: Projections of protein functional sites on eukaryotic genes. Extension with orthologous genes. J Bioinform Comput Biol 2017; 15:1650044. [PMID: 28110602 DOI: 10.1142/s021972001650044x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .
Collapse
Affiliation(s)
- Irina V Medvedeva
- * Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Lavrentyeva 10, Novosibirsk, 630090, Russia.,† Novosibirsk State University, Pirogova 1, Novosibirsk 630090, Russia
| | - Pavel S Demenkov
- * Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Lavrentyeva 10, Novosibirsk, 630090, Russia.,† Novosibirsk State University, Pirogova 1, Novosibirsk 630090, Russia
| | - Vladimir A Ivanisenko
- * Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, Lavrentyeva 10, Novosibirsk, 630090, Russia
| |
Collapse
|
5
|
Abstract
The dramatic increase in the number of protein sequences and structures deposited in biological databases has led to the development of many bioinformatics tools and programs to manage, validate, compare, and interpret this large volume of data. In addition, powerful tools are being developed to use this sequence and structural data to facilitate protein classification and infer biological function of newly identified proteins. This chapter covers freely available bioinformatics resources on the World Wide Web that are commonly used for protein structure analysis.
Collapse
Affiliation(s)
- Jason J Paxman
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Rm 521, LIMS1, Kingsbury Drive, Bundoora, Melbourne, VIC, 3086, Australia
| | - Begoña Heras
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Rm 521, LIMS1, Kingsbury Drive, Bundoora, Melbourne, VIC, 3086, Australia.
| |
Collapse
|
6
|
Sirota FL, Maurer-Stroh S, Eisenhaber B, Eisenhaber F. Single-residue posttranslational modification sites at the N-terminus, C-terminus or in-between: To be or not to be exposed for enzyme access. Proteomics 2016; 15:2525-46. [PMID: 26038108 PMCID: PMC4745020 DOI: 10.1002/pmic.201400633] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/17/2015] [Accepted: 05/29/2015] [Indexed: 11/30/2022]
Abstract
Many protein posttranslational modifications (PTMs) are the result of an enzymatic reaction. The modifying enzyme has to recognize the substrate protein's sequence motif containing the residue(s) to be modified; thus, the enzyme's catalytic cleft engulfs these residue(s) and the respective sequence environment. This residue accessibility condition principally limits the range where enzymatic PTMs can occur in the protein sequence. Non‐globular, flexible, intrinsically disordered segments or large loops/accessible long side chains should be preferred whereas residues buried in the core of structures should be void of what we call canonical, enzyme‐generated PTMs. We investigate whether PTM sites annotated in UniProtKB (with MOD_RES/LIPID keys) are situated within sequence ranges that can be mapped to known 3D structures. We find that N‐ or C‐termini harbor essentially exclusively canonical PTMs. We also find that the overwhelming majority of all other PTMs are also canonical though, later in the protein's life cycle, the PTM sites can become buried due to complex formation. Among the remaining cases, some can be explained (i) with autocatalysis, (ii) with modification before folding or after temporary unfolding, or (iii) as products of interaction with small, diffusible reactants. Others require further research how these PTMs are mechanistically generated in vivo.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Sebastian Maurer-Stroh
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,School of Biological Sciences (SBS), Nanyang Technological University (NTU), Singapore
| | - Birgit Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore
| | - Frank Eisenhaber
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, Singapore.,Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore.,School of Computer Engineering (SCE), Nanyang Technological University (NTU), Singapore
| |
Collapse
|
7
|
Cupincin: A Unique Protease Purified from Rice (Oryza sativa L.) Bran Is a New Member of the Cupin Superfamily. PLoS One 2016; 11:e0152819. [PMID: 27064905 PMCID: PMC4827828 DOI: 10.1371/journal.pone.0152819] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 03/18/2016] [Indexed: 12/03/2022] Open
Abstract
Cupin superfamily is one of the most diverse super families. This study reports the purification and characterization of a novel cupin domain containing protease from rice bran for the first time. Hypothetical protein OsI_13867 was identified and named as cupincin. Cupincin was purified to 4.4 folds with a recovery of 4.9%. Cupincin had an optimum pH and temperature of pH 4.0 and 60°C respectively. Cupincin was found to be a homotrimer, consisting of three distinct subunits with apparent molecular masses of 33.45 kDa, 22.35 kDa and 16.67 kDa as determined by MALDI-TOF, whereas it eluted as a single unit with an apparent molecular mass of 135.33 ± 3.52 kDa in analytical gel filtration and migrated as a single band in native page, suggesting its homogeneity. Sequence identity of cupincin was deduced by determining the amino-terminal sequence of the polypeptide chains and by and de novo sequencing. For understanding the hydrolysing mechanism of cupincin, its three-dimensional model was developed. Structural analysis indicated that cupincin contains His313, His326 and Glu318 with zinc ion as the putative active site residues, inhibition of enzyme activity by 1,10-phenanthroline and atomic absorption spectroscopy confirmed the presence of zinc ion. The cleavage specificity of cupincin towards oxidized B-chain of insulin was highly specific; cleaving at the Leu15-Tyr16 position, the specificity was also determined using neurotensin as a substrate, where it cleaved only at the Glu1-Tyr2 position. Limited proteolysis of the protease suggests a specific function for cupincin. These results demonstrated cupincin as a completely new protease.
Collapse
|
8
|
Parasuram R, Mills CL, Wang Z, Somasundaram S, Beuning PJ, Ondrechen MJ. Local structure based method for prediction of the biochemical function of proteins: Applications to glycoside hydrolases. Methods 2016; 93:51-63. [DOI: 10.1016/j.ymeth.2015.11.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 11/05/2015] [Accepted: 11/09/2015] [Indexed: 01/07/2023] Open
|
9
|
Nakamura T, Tomii K. Protein ligand-binding site comparison by a reduced vector representation derived from multidimensional scaling of generalized description of binding sites. Methods 2016; 93:35-40. [DOI: 10.1016/j.ymeth.2015.08.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/25/2015] [Accepted: 08/10/2015] [Indexed: 11/25/2022] Open
|
10
|
An approach for isolation of circulating nucleoprotein complexes from blood. Russ Chem Bull 2015. [DOI: 10.1007/s11172-015-1032-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
Hanson B, Westin C, Rosa M, Grier A, Osipovitch M, MacDonald ML, Dodge G, Boli PM, Corwin CW, Kessler H, McKay T, Bernstein HJ, Craig PA. Estimation of protein function using template-based alignment of enzyme active sites. BMC Bioinformatics 2014; 15:87. [PMID: 24669788 PMCID: PMC4229977 DOI: 10.1186/1471-2105-15-87] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 01/24/2014] [Indexed: 11/25/2022] Open
Abstract
Background The accumulation of protein structural data occurs more rapidly than it can be characterized by traditional laboratory means. This has motivated widespread efforts to predict enzyme function computationally. The most useful/accurate strategies employed to date are based on the detection of motifs in novel structures that correspond to a specific function. Functional residues are critical components of predictively useful motifs. We have implemented a novel method, to complement current approaches, which detects motifs solely on the basis of distance restraints between catalytic residues. Results ProMOL is a plugin for the PyMOL molecular graphics environment that can be used to create active site motifs for enzymes. A library of 181 active site motifs has been created with ProMOL, based on definitions published in the Catalytic Site Atlas (CSA). Searches with ProMOL produce better than 50% useful Enzyme Commission (EC) class suggestions for level 1 searches in EC classes 1, 4 and 5, and produce some useful results for other classes. 261 additional motifs automatically translated from Jonathan Barker’s JESS motif set [Bioinformatics 19:1644–1649, 2003] and a set of NMR motifs is under development. Alignments are evaluated by visual superposition, Levenshtein distance and root-mean-square deviation (RMSD) and are reasonably consistent with related search methods. Conclusion The ProMOL plugin for PyMOL provides ready access to template-based local alignments. Recent improvements to ProMOL, including the expanded motif library, RMSD calculations and output selection formatting, have greatly increased the program’s usability and speed, and have improved the way that the results are presented.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Paul A Craig
- Rochester Institute of Technology, School of Chemistry & Materials Science, 1 Lomb Memorial Drive, Rochester, NY 14623, USA.
| |
Collapse
|
12
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
13
|
Bazan JF, Macdonald BT, He X. The TIKI/TraB/PrgY family: a common protease fold for cell signaling from bacteria to metazoa? Dev Cell 2013; 25:225-7. [PMID: 23673329 DOI: 10.1016/j.devcel.2013.04.019] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We report that the metazoan Wnt protease and signaling inhibitor TIKI shares sequence homology with bacterial TraB/PrgY proteins, inhibitors of pheromone signaling essential for propagation of antibiotic resistance. Our analysis suggests that these proteins represent an ancient metalloprotease clan regulating cellular communications across biological kingdoms.
Collapse
|
14
|
Küçükural A, Szilagyi A, Sezerman OU, Zhang Y. Protein Homology Analysis for Function Prediction with Parallel Sub-Graph Isomorphism. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
To annotate the biological function of a protein molecule, it is essential to have information on its 3D structure. Many successful methods for function prediction are based on determining structurally conserved regions because the functional residues are proved to be more conservative than others in protein evolution. Since the 3D conformation of a protein can be represented by a contact map graph, graph matching, algorithms are often employed to identify the conserved residues in weakly homologous protein pairs. However, the general graph matching algorithm is computationally expensive because graph similarity searching is essentially a NP-hard problem. Parallel implementations of the graph matching are often exploited to speed up the process. In this chapter,the authors review theoretical and computational approaches of graph theory and the recently developed graph matching algorithms for protein function prediction.
Collapse
|
15
|
Wang Z, Yin P, Lee JS, Parasuram R, Somarowthu S, Ondrechen MJ. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs). BMC Bioinformatics 2013; 14 Suppl 3:S13. [PMID: 23514271 PMCID: PMC3584854 DOI: 10.1186/1471-2105-14-s3-s13] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions A local site match provides a more compelling function prediction than that obtainable from a simple 3D structure match. The present method can confirm putative annotations, identify misannotation, and in some cases suggest a more probable annotation.
Collapse
Affiliation(s)
- Zhouxi Wang
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
16
|
Abstract
The focus of this chapter is on the important concepts behind the in silico techniques that are used today to assess target druggability. The first step of the assessment consists of finding cavity space in the protein using 2D and/or 3D topological concepts. These concepts underlie the geometry and energy-based pocketfinder algorithms. Analysis pursues on the physico-chemical complementarity between the binding site and the drug like molecule. Geometrical and molecular flexibility aspect are also included in this assessment. The presence of hot interaction spots are shown to be particularly important for targeting protein-protein interactions. Finally, binding site promiscuity can be assessed by large scale structural comparison with other targets. Common chemical features amongst protein cavities can predict potential cross-reactivity with unwanted targets.
Collapse
|
17
|
Pintus SS, Ivanisenko NV, Demenkov PS, Ivanisenko TV, Ramachandran S, Kolchanov NA, Ivanisenko VA. The substitutions G245C and G245D in the Zn(2+)-binding pocket of the p53 protein result in differences of conformational flexibility of the DNA-binding domain. J Biomol Struct Dyn 2012; 31:78-86. [PMID: 22803791 DOI: 10.1080/07391102.2012.691364] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Transcription activation of the proapoptotic target genes is a means by which the p53 protein implements its function of tumor suppression. Zn(2+) is a known regulator of p53 binding to the target genes. We have previously obtained an evidence that amino acid substitutions in the p53 Zn(2+)-binding pocket can presumably exert an influence on Zn(2+) position in the Zn(2+)-p53 complex and thereby affect p53 binding to DNA. With these background considerations, our aim was to estimate the effect of the putative changes in the Zn(2+) position in its binding pocket due to the G245C and G245D substitutions on the conformation of the p53 DNA-binding motif. Statistical analysis of the molecular dynamics (MD) trajectories of the mutant p53-Zn(2+) complexes was used to detect significant deviations in conformation of the mutant p53 forms. MD simulations demonstrated that (1) the two substitutions in the Zn(2+)-binding pocket caused changes in the conformation of the p53 DNA-binding motif, as compared with the wild-type (WT) p53; (2) binding of Zn(2+) to the p53 mutant forms reduced the effect of the substitutions on conformational change; and (3) Zn(2+) binding in the normal position compensated the effect of the mutations on the conformation in comparison to the altered Zn(2+) position.
Collapse
Affiliation(s)
- S S Pintus
- Laboratory of Computational Proteomics, Institute of Cytology and Genetics SB RAS, Lavrentyev av. 10, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | |
Collapse
|
18
|
Structure-based computational analysis of protein binding sites for function and druggability prediction. J Biotechnol 2012; 159:123-34. [DOI: 10.1016/j.jbiotec.2011.12.005] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2011] [Revised: 12/02/2011] [Accepted: 12/06/2011] [Indexed: 11/19/2022]
|
19
|
Medvedeva I, Demenkov P, Kolchanov N, Ivanisenko V. SitEx: a computer system for analysis of projections of protein functional sites on eukaryotic genes. Nucleic Acids Res 2011; 40:D278-83. [PMID: 22139920 PMCID: PMC3245165 DOI: 10.1093/nar/gkr1187] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Search of interrelationships between the structural-functional protein organization and exon structure of encoding gene provides insights into issues concerned with the function, origin and evolution of genes and proteins. The functions of proteins and their domains are defined mostly by functional sites. The relation of the exon-intron structure of the gene to the protein functional sites has been little studied. Development of resources containing data on projections of protein functional sites on eukaryotic genes is needed. We have developed SitEx, a database that contains information on functional site amino acid positions in the exon structure of encoding gene. SitEx is integrated with the BLAST and 3DExonScan programs. BLAST is used for searching sequence similarity between the query protein and polypeptides encoded by single exons stored in SitEx. The 3DExonScan program is used for searching for structural similarity of the given protein with these polypeptides using superimpositions. The developed computer system allows users to analyze the coding features of functional sites by taking into account the exon structure of the gene, to detect the exons involved in shuffling in protein evolution, also to design protein-engineering experiments. SitEx is accessible at http://www-bionet.sscc.ru/sitex/. Currently, it contains information about 9994 functional sites presented in 2021 proteins described in proteomes of 17 organisms.
Collapse
Affiliation(s)
- Irina Medvedeva
- Computer Proteomics Laboratory, Institute of Cytology and Genetics SB RAS, 10 Lavrentyeva Avenue, 630090 Novosibirsk, Russia
| | | | | | | |
Collapse
|
20
|
Kochańczyk M. Prediction of functionally important residues in globular proteins from unusual central distances of amino acids. BMC STRUCTURAL BIOLOGY 2011; 11:34. [PMID: 21923943 PMCID: PMC3188475 DOI: 10.1186/1472-6807-11-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2011] [Accepted: 09/18/2011] [Indexed: 12/12/2022]
Abstract
BACKGROUND Well-performing automated protein function recognition approaches usually comprise several complementary techniques. Beside constructing better consensus, their predictive power can be improved by either adding or refining independent modules that explore orthogonal features of proteins. In this work, we demonstrated how the exploration of global atomic distributions can be used to indicate functionally important residues. RESULTS Using a set of carefully selected globular proteins, we parametrized continuous probability density functions describing preferred central distances of individual protein atoms. Relative preferred burials were estimated using mixture models of radial density functions dependent on the amino acid composition of a protein under consideration. The unexpectedness of extraordinary locations of atoms was evaluated in the information-theoretic manner and used directly for the identification of key amino acids. In the validation study, we tested capabilities of a tool built upon our approach, called SurpResi, by searching for binding sites interacting with ligands. The tool indicated multiple candidate sites achieving success rates comparable to several geometric methods. We also showed that the unexpectedness is a property of regions involved in protein-protein interactions, and thus can be used for the ranking of protein docking predictions. The computational approach implemented in this work is freely available via a Web interface at http://www.bioinformatics.org/surpresi. CONCLUSIONS Probabilistic analysis of atomic central distances in globular proteins is capable of capturing distinct orientational preferences of amino acids as resulting from different sizes, charges and hydrophobic characters of their side chains. When idealized spatial preferences can be inferred from the sole amino acid composition of a protein, residues located in hydrophobically unfavorable environments can be easily detected. Such residues turn out to be often directly involved in binding ligands or interfacing with other proteins.
Collapse
Affiliation(s)
- Marek Kochańczyk
- Faculty of Physics, Jagiellonian University, ul, Reymonta 4, 30-059 Krakow, Poland.
| |
Collapse
|
21
|
Ivanisenko VA, Demenkov PS, Ivanisenko TV, Kolchanov NA. [Protein Structure Discovery: software package to perform computational proteomics tasks]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2011; 37:22-35. [PMID: 21460878 DOI: 10.1134/s1068162011010080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Software-information system Protein Structure Discovery was developed. The system can be used for the wide range of tasks in the field of computer proteomics including prediction of function, structure and immunological properties of proteins. A specially created section of the system allows evaluating the quantitative and qualitative effects of mutations on the structural and functional properties of proteins. There are 19 of different programs integrated into the system, including the database of protein functional sites PDBSite, a PDBSiteScan program for the prediction of functional sites in three-dimensional structures of proteins, and WebProAnalyst program for the quantitative analysis of the structure-activity relationship of proteins. Protein Structure Discovery program has a Web interface and is available for users through the Internet (http://www-bionet.sscc.ru/psd/). For example, binding sites of zinc ion and ADP showed high stability of the method to errors PDBSiteScan reconstruction of spatial structures of proteins in the recognition of functional sites in model structures.
Collapse
|
22
|
Han GW, Ko J, Farr CL, Deller MC, Xu Q, Chiu HJ, Miller MD, Sefcikova J, Somarowthu S, Beuning PJ, Elsliger MA, Deacon AM, Godzik A, Lesley SA, Wilson IA, Ondrechen MJ. Crystal structure of a metal-dependent phosphoesterase (YP_910028.1) from Bifidobacterium adolescentis: Computational prediction and experimental validation of phosphoesterase activity. Proteins 2011; 79:2146-60. [PMID: 21538547 DOI: 10.1002/prot.23035] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Revised: 03/07/2011] [Accepted: 03/15/2011] [Indexed: 11/09/2022]
Abstract
The crystal structures of an unliganded and adenosine 5'-monophosphate (AMP) bound, metal-dependent phosphoesterase (YP_910028.1) from Bifidobacterium adolescentis are reported at 2.4 and 1.94 Å, respectively. Functional characterization of this enzyme was guided by computational analysis and then confirmed by experiment. The structure consists of a polymerase and histidinol phosphatase (PHP, Pfam: PF02811) domain with a second domain (residues 105-178) inserted in the middle of the PHP sequence. The insert domain functions in binding AMP, but the precise function and substrate specificity of this domain are unknown. Initial bioinformatics analyses yielded multiple potential functional leads, with most of them suggesting DNA polymerase or DNA replication activity. Phylogenetic analysis indicated a potential DNA polymerase function that was somewhat supported by global structural comparisons identifying the closest structural match to the alpha subunit of DNA polymerase III. However, several other functional predictions, including phosphoesterase, could not be excluded. Theoretical microscopic anomalous titration curve shapes, a computational method for the prediction of active sites from protein 3D structures, identified potential reactive residues in YP_910028.1. Further analysis of the predicted active site and local comparison with its closest structure matches strongly suggested phosphoesterase activity, which was confirmed experimentally. Primer extension assays on both normal and mismatched DNA show neither extension nor degradation and provide evidence that YP_910028.1 has neither DNA polymerase activity nor DNA-proofreading activity. These results suggest that many of the sequence neighbors previously annotated as having DNA polymerase activity may actually be misannotated.
Collapse
Affiliation(s)
- Gye Won Han
- Joint Center for Structural Genomics, Scripps Research Institute, La Jolla, California 92037, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Gong S, Worth CL, Cheng TMK, Blundell TL. Meet Me Halfway: When Genomics Meets Structural Bioinformatics. J Cardiovasc Transl Res 2011; 4:281-303. [DOI: 10.1007/s12265-011-9259-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 02/08/2011] [Indexed: 01/08/2023]
|
24
|
Kato T, Nagano N. Discriminative structural approaches for enzyme active-site prediction. BMC Bioinformatics 2011; 12 Suppl 1:S49. [PMID: 21342581 PMCID: PMC3044306 DOI: 10.1186/1471-2105-12-s1-s49] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting enzyme active-sites in proteins is an important issue not only for protein sciences but also for a variety of practical applications such as drug design. Because enzyme reaction mechanisms are based on the local structures of enzyme active-sites, various template-based methods that compare local structures in proteins have been developed to date. In comparing such local sites, a simple measurement, RMSD, has been used so far. RESULTS This paper introduces new machine learning algorithms that refine the similarity/deviation for comparison of local structures. The similarity/deviation is applied to two types of applications, single template analysis and multiple template analysis. In the single template analysis, a single template is used as a query to search proteins for active sites, whereas a protein structure is examined as a query to discover the possible active-sites using a set of templates in the multiple template analysis. CONCLUSIONS This paper experimentally illustrates that the machine learning algorithms effectively improve the similarity/deviation measurements for both the analyses.
Collapse
Affiliation(s)
- Tsuyoshi Kato
- Graduate school of Engineering, Gunma University, Tenjin-cho 1-5-1, Kiryu, Gunma 376-8515, Japan.
| | | |
Collapse
|
25
|
Grant MA. INTEGRATING COMPUTATIONAL PROTEIN FUNCTION PREDICTION INTO DRUG DISCOVERY INITIATIVES. Drug Dev Res 2010; 72:4-16. [PMID: 25530654 DOI: 10.1002/ddr.20397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate innovative strategies for identifying valid targets and discovering leads against them as a way of accelerating drug discovery. The ever increasing number and diversity of novel protein sequences identified by genomic sequencing projects and the success of worldwide structural genomics initiatives have spurred great interest and impetus in the development of methods for accurate, computationally empowered protein function prediction and active site identification. Previously, in the absence of direct experimental evidence, homology-based protein function annotation remained the gold-standard for in silico analysis and prediction of protein function. However, with the continued exponential expansion of sequence databases, this approach is not always applicable, as fewer query protein sequences demonstrate significant homology to protein gene products of known function. As a result, several non-homology based methods for protein function prediction that are based on sequence features, structure, evolution, biochemical and genetic knowledge have emerged. Herein, we review current bioinformatic programs and approaches for protein function prediction/annotation and discuss their integration into drug discovery initiatives. The development of such methods to annotate protein functional sites and their application to large protein functional families is crucial to successfully utilizing the vast amounts of genomic sequence information available to drug discovery and development processes.
Collapse
Affiliation(s)
- Marianne A Grant
- Division of Molecular and Vascular Medicine and Center for Vascular Biology Research, Beth Israel Deaconess Medical Center, Department of Medicine, Harvard Medical School, Boston, Massachusetts, 02215
| |
Collapse
|
26
|
Abstract
Motivation: Finding functionally analogous enzymes based on the local structures of active sites is an important problem. Conventional methods use templates of local structures to search for analogous sites, but their performance depends on the selection of atoms for inclusion in the templates. Results: The automatic selection of atoms so that site matches can be discriminated from mismatches. The algorithm provides not only good predictions, but also some insights into which atoms are important for the prediction. Our experimental results suggest that the metric learning automatically provides more effective templates than those whose atoms are selected manually. Availability: Online software is available at http://www.net-machine.net/∼kato/lpmetric1/ Contact:kato-tsuyoshi@k.u-tokyo.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tsuyoshi Kato
- GSFS, University of Tokyo, 5-1-5 Kashiwahoha, Kashiwa, Chiba, Japan.
| | | |
Collapse
|
27
|
Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010; 11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open
Abstract
Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
Collapse
Affiliation(s)
- Ratna R Thangudu
- National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | |
Collapse
|
28
|
Martin J. Beauty is in the eye of the beholder: proteins can recognize binding sites of homologous proteins in more than one way. PLoS Comput Biol 2010; 6:e1000821. [PMID: 20585553 PMCID: PMC2887470 DOI: 10.1371/journal.pcbi.1000821] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 05/18/2010] [Indexed: 11/18/2022] Open
Abstract
Understanding the mechanisms of protein-protein interaction is a fundamental problem with many practical applications. The fact that different proteins can bind similar partners suggests that convergently evolved binding interfaces are reused in different complexes. A set of protein complexes composed of non-homologous domains interacting with homologous partners at equivalent binding sites was collected in 2006, offering an opportunity to investigate this point. We considered 433 pairs of protein-protein complexes from the ABAC database (AB and AC binary protein complexes sharing a homologous partner A) and analyzed the extent of physico-chemical similarity at the atomic and residue level at the protein-protein interface. Homologous partners of the complexes were superimposed using Multiprot, and similar atoms at the interface were quantified using a five class grouping scheme and a distance cut-off. We found that the number of interfacial atoms with similar properties is systematically lower in the non-homologous proteins than in the homologous ones. We assessed the significance of the similarity by bootstrapping the atomic properties at the interfaces. We found that the similarity of binding sites is very significant between homologous proteins, as expected, but generally insignificant between the non-homologous proteins that bind to homologous partners. Furthermore, evolutionarily conserved residues are not colocalized within the binding sites of non-homologous proteins. We could only identify a limited number of cases of structural mimicry at the interface, suggesting that this property is less generic than previously thought. Our results support the hypothesis that different proteins can interact with similar partners using alternate strategies, but do not support convergent evolution.
Collapse
Affiliation(s)
- Juliette Martin
- Université de Lyon, Lyon, France; Université Lyon 1, IFR 128, CNRS, UMR 5086 Institut de Biologie et Chimie des Protéines (IBCP), Lyon, France.
| |
Collapse
|
29
|
Das S, Krein MP, Breneman CM. PESDserv: a server for high-throughput comparison of protein binding site surfaces. Bioinformatics 2010; 26:1913-4. [PMID: 20538727 DOI: 10.1093/bioinformatics/btq288] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Structure-based approaches complement ligand-based approaches for lead-discovery and cross-reactivity prediction. We present to the scientific community a web server for comparing the surface of a ligand bound site of a protein against a ligand bound site surface database of 106 796 sites. The web server implements the property encoded shape distributions (PESD) algorithm for surface comparison. A typical virtual screen takes 5 min to complete. The output provides a ranked list of sites (by site similarity), hyperlinked to the corresponding entries in the PDB and PDBeChem databases. AVAILABILITY The server is freely accessible at http://reccr.chem.rpi.edu/Software/pesdserv/
Collapse
Affiliation(s)
- Sourav Das
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, 110 Eighth Street, Troy, NY 12180, USA
| | | | | |
Collapse
|
30
|
Ren J, Xie L, Li WW, Bourne PE. SMAP-WS: a parallel web service for structural proteome-wide ligand-binding site comparison. Nucleic Acids Res 2010; 38:W441-4. [PMID: 20484373 PMCID: PMC2896174 DOI: 10.1093/nar/gkq400] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
The proteome-wide characterization and analysis of protein ligand-binding sites and their interactions with ligands can provide pivotal information in understanding the structure, function and evolution of proteins and for designing safe and efficient therapeutics. The SMAP web service (SMAP-WS) meets this need through parallel computations designed for 3D ligand-binding site comparison and similarity searching on a structural proteome scale. SMAP-WS implements a shape descriptor (the Geometric Potential) that characterizes both local and global topological properties of the protein structure and which can be used to predict the likely ligand-binding pocket [Xie,L. and Bourne,P.E. (2007) A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand-binding sites. BMC bioinformatics, 8 (Suppl. 4.), S9.]. Subsequently a sequence order independent profile–profile alignment (SOIPPA) algorithm is used to detect and align similar pockets thereby finding protein functional and evolutionary relationships across fold space [Xie, L. and Bourne, P.E. (2008) Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc. Natl Acad. Sci. USA, 105, 5441–5446]. An extreme value distribution model estimates the statistical significance of the match [Xie, L., Xie, L. and Bourne, P.E. (2009) A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics, 25, i305–i312.]. These algorithms have been extensively benchmarked and shown to outperform most existing algorithms. Moreover, several predictions resulting from SMAP-WS have been validated experimentally. Thus far SMAP-WS has been applied to predict drug side effects, and to repurpose existing drugs for new indications. SMAP-WS provides both a user-friendly web interface and programming API for scientists to address a wide range of compute intense questions in biology and drug discovery. AVAILABILITY SMAP-WS is available from the URL http://smap.nbcr.net.
Collapse
Affiliation(s)
- Jingyuan Ren
- San Diego Supercomputer Center, National Biomedical Computation Resource and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Lei Xie
- San Diego Supercomputer Center, National Biomedical Computation Resource and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- *To whom correspondence should be addressed. Tel: +1 858 822 3686; Fax: +1 858 822 0873;
| | - Wilfred W. Li
- San Diego Supercomputer Center, National Biomedical Computation Resource and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
- *Correspondence may also be addressed to Wilfred W. Li. Tel: +1 858 534 0591; Fax: +1 858 822 1619;
| | - Philip E. Bourne
- San Diego Supercomputer Center, National Biomedical Computation Resource and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
31
|
Erdin S, Ward RM, Venner E, Lichtarge O. Evolutionary trace annotation of protein function in the structural proteome. J Mol Biol 2009; 396:1451-73. [PMID: 20036248 DOI: 10.1016/j.jmb.2009.12.037] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Revised: 12/05/2009] [Accepted: 12/18/2009] [Indexed: 11/16/2022]
Abstract
By design, structural genomics (SG) solves many structures that cannot be assigned function based on homology to known proteins. Alternative function annotation methods are therefore needed and this study focuses on function prediction with three-dimensional (3D) templates: small structural motifs built of just a few functionally critical residues. Although experimentally proven functional residues are scarce, we show here that Evolutionary Trace (ET) rankings of residue importance are sufficient to build 3D templates, match them, and then assign Gene Ontology (GO) functions in enzymes and non-enzymes alike. In a high-specificity mode, this Evolutionary Trace Annotation (ETA) method covered half (53%) of the 2384 annotated SG protein controls. Three-quarters (76%) of predictions were both correct and complete. The positive predictive value for all GO depths (all-depth PPV) was 84%, and it rose to 94% over GO depths 1-3 (depth 3 PPV). In a high-sensitivity mode, coverage rose significantly (84%), while accuracy fell moderately: 68% of predictions were both correct and complete, all-depth PPV was 75%, and depth 3 PPV was 86%. These data concur with prior mutational experiments showing that ET rank information identifies key functional determinants in proteins. In practice, ETA predicted functions in 42% of 3461 unannotated SG proteins. In 529 cases--including 280 non-enzymes and 21 for metal ion ligands--the expected accuracy is 84% at any GO depth and 94% down to GO depth 3, while for the remaining 931 the expected accuracies are 60% and 71%, respectively. Thus, local structural comparisons of evolutionarily important residues can help decipher protein functions to known reliability levels and without prior assumption on functional mechanisms. ETA is available at http://mammoth.bcm.tmc.edu/eta.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | |
Collapse
|
32
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
33
|
Bandyopadhyay A, Arora A, Jain S, Laskar A, Mandal C, Ivanisenko VA, Fomin ES, Pintus SS, Kolchanov NA, Maiti S, Ramachandran S. Expression and molecular characterization of the Mycobacterium tuberculosis PII protein. J Biochem 2009; 147:279-89. [PMID: 19884192 DOI: 10.1093/jb/mvp174] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The signal transduction protein PII plays an important role in cellular nitrogen assimilation and regulation. The molecular characteristics of the Mycobacterium tuberculosis PII (Mtb PII) were investigated using biophysical experiments. The Mtb PII coding ORF Rv2919c was cloned and expressed in Escherichia coli. The binding characteristics of the purified protein with ATP and ADP were investigated using surface plasmon resonance (SPR) and isothermal titration calorimetry (ITC). Mtb PII binds to ATP strongly with K(d) in the range 1.93-6.44 microM. This binding strength was not significantly affected by the presence of 2-ketoglutarate even in molar concentrations of 66 (ITC) or 636 (SPR) fold excess of protein concentration. However, an additional enthalpy of 0.3 kcal/mol was released in presence of 2-ketoglutarate. Binding of Mtb PII to ADP was weaker by an order of magnitude. Binding of ATP and 2-ketoglutarate were analysed by docking studies on the Mtb PII crystal structure (PDB id 3BZQ). We observed that hydrogen bonds involving the gamma-phosphate of ATP contribute to enhanced binding of ATP compared with ADP. Glutaraldehyde crosslinking showed that Mtb PII exists in homotrimeric state which is consistent with other PII proteins. Phylogenetic analysis showed that Mtb PII consistently grouped with other actinobacterial PII proteins.
Collapse
Affiliation(s)
- Anannya Bandyopadhyay
- Functional Genomics Unit, Institute of Genomics and Integrative Biology (CSIR), Delhi 110 007, India
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Abstract
Bioinformatics is a central discipline in modern life sciences aimed at describing the complex properties of living organisms starting from large-scale data sets of cellular constituents such as genes and proteins. In order for this wealth of information to provide useful biological knowledge, databases and software tools for data collection, analysis and interpretation need to be developed. In this paper, we review recent advances in the design and implementation of bioinformatics resources devoted to the study of metals in biological systems, a research field traditionally at the heart of bioinorganic chemistry. We show how metalloproteomes can be extracted from genome sequences, how structural properties can be related to function, how databases can be implemented, and how hints on interactions can be obtained from bioinformatics.
Collapse
Affiliation(s)
- Ivano Bertini
- Magnetic Resonance Center (CERM)-University of Florence, Via L. Sacconi 6, Sesto Fiorentino, Italy.
| | | |
Collapse
|
35
|
Redfern OC, Dessailly BH, Dallman TJ, Sillitoe I, Orengo CA. FLORA: a novel method to predict protein function from structure in diverse superfamilies. PLoS Comput Biol 2009; 5:e1000485. [PMID: 19714201 PMCID: PMC2721411 DOI: 10.1371/journal.pcbi.1000485] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 07/23/2009] [Indexed: 11/18/2022] Open
Abstract
Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues. Understanding how the three-dimensional (3D) molecular structure of proteins influences their function can provide insights into the workings of biological systems. Structural Genomics Initiatives have been set up to investigate these structures on a large scale and make the data available to the wider biological research community. However, in a significant number of cases, there is little known about the functions of the structures that are solved. To address this, computational methods can be used as a predictive tool to guide future experimental investigations. One such approach is to exploit global structural comparison to assign the protein in question to an evolutionary family, which has already been functionally characterised. However, this is problematic in some large evolutionary families, which contain a number of different functional sub-families. We have developed a new method (FLORA) which is able to calculate 3D “motifs” which are specific to each of these sub-families. Any new protein structure can then be compared against these motifs to make a more accurate prediction of its function. Our paper shows that FLORA substantially outperforms other standard approaches for predicting function from structure. We use our method to make confident functional predictions for a set of proteins solved by the structural genomics projects, which could not have been assigned reliably by global structure comparison.
Collapse
Affiliation(s)
- Oliver C. Redfern
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
- * E-mail:
| | - Benoît H. Dessailly
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Timothy J. Dallman
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine A. Orengo
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
36
|
Xie L, Xie L, Bourne PE. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009; 25:i305-12. [PMID: 19478004 PMCID: PMC2687974 DOI: 10.1093/bioinformatics/btp220] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.
Collapse
Affiliation(s)
- Lei Xie
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
37
|
Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 2009; 5:e1000387. [PMID: 19436720 PMCID: PMC2676506 DOI: 10.1371/journal.pcbi.1000387] [Citation(s) in RCA: 185] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Accepted: 04/13/2009] [Indexed: 01/11/2023] Open
Abstract
Systematic identification of protein-drug interaction networks is crucial to correlate complex modes of drug action to clinical indications. We introduce a novel computational strategy to identify protein-ligand binding profiles on a genome-wide scale and apply it to elucidating the molecular mechanisms associated with the adverse drug effects of Cholesteryl Ester Transfer Protein (CETP) inhibitors. CETP inhibitors are a new class of preventive therapies for the treatment of cardiovascular disease. However, clinical studies indicated that one CETP inhibitor, Torcetrapib, has deadly off-target effects as a result of hypertension, and hence it has been withdrawn from phase III clinical trials. We have identified a panel of off-targets for Torcetrapib and other CETP inhibitors from the human structural genome and map those targets to biological pathways via the literature. The predicted protein-ligand network is consistent with experimental results from multiple sources and reveals that the side-effect of CETP inhibitors is modulated through the combinatorial control of multiple interconnected pathways. Given that combinatorial control is a common phenomenon observed in many biological processes, our findings suggest that adverse drug effects might be minimized by fine-tuning multiple off-target interactions using single or multiple therapies. This work extends the scope of chemogenomics approaches and exemplifies the role that systems biology has in the future of drug discovery. Both the cost to launch a new drug and the attrition rate during the late stage of the drug discovery and development process are increasing. Torcetrapib is a case in point, having been withdrawn from phase III clinical trials after 15 years of development and an estimated cost of US $800 M. Torcetrapib represents a new class of therapies for the treatment of cardiovascular disease; however, clinical studies indicated that Torcetrapib has deadly side-effects as a result of hypertension. To understand the origins of these adverse drug reactions from Torcetrapib and other related drugs undergoing clinical trials, we introduce a systematic strategy to identify off-targets in the human structural proteome and investigate the roles of these off-targets in impacting human physiology and pathology using biochemical pathway analysis. Our findings suggest that potential side-effects of a new drug can be identified at an early stage of the development cycle and be minimized by fine-tuning multiple off-target interactions. The hope is that this can reduce both the cost of drug development and the mortality rates during clinical trials.
Collapse
|
38
|
Loewenstein Y, Raimondo D, Redfern OC, Watson J, Frishman D, Linial M, Orengo C, Thornton J, Tramontano A. Protein function annotation by homology-based inference. Genome Biol 2009; 10:207. [PMID: 19226439 PMCID: PMC2688287 DOI: 10.1186/gb-2009-10-2-207] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Where information on homologous proteins is available,
progress is being made in automated prediction of protein function
from sequence and structure. With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.
Collapse
Affiliation(s)
- Yaniv Loewenstein
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Sudarsky Center, Jerusalem 91904, Israel
| | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008; 4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Affiliation(s)
- Marco Punta
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Columbia University Center for Computational Biology and Bioinformatics (C2B2), New York, New York, United States of America
- Northeast Structural Genomics Consortium (NESG), Columbia University, New York, New York, United States of America
| | - Yanay Ofran
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- * E-mail:
| |
Collapse
|
40
|
Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ever increasing number of protein structures determined by structural genomic projects has spurred much interest in the development of methods for structure-based function prediction. Existing methods can be roughly classified in two groups: some use a comparative approach looking for the presence of structural motifs possibly associated with a known biochemical function. Other methods try to identify functional patches on the surface of a protein using only its physicochemical characteristics. This review will cover both kinds of approaches to structure-based function prediction as well as their use in real-world cases. The main issues and limitations in using protein structure to predict function will also be discussed. These are mainly: the assessment of the statistical significance of structural similarities and the extent to which these methods depend on the accuracy and availability of structural data.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Department of Biology, Centre for Molecular Bioinformatics, University of Tor Vergata, Rome, Italy.
| | | |
Collapse
|
41
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
42
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
43
|
Xie L, Bourne PE. Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments. Proc Natl Acad Sci U S A 2008; 105:5441-6. [PMID: 18385384 PMCID: PMC2291117 DOI: 10.1073/pnas.0704422105] [Citation(s) in RCA: 209] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Indexed: 11/18/2022] Open
Abstract
Here, a scalable, accurate, reliable, and robust protein functional site comparison algorithm is presented. The key components of the algorithm consist of a reduced representation of the protein structure and a sequence order-independent profile-profile alignment (SOIPPA). We show that SOIPPA is able to detect distant evolutionary relationships in cases where both a global sequence and structure relationship remains obscure. Results suggest evolutionary relationships across several previously evolutionary distinct protein structure superfamilies. SOIPPA, along with an increased coverage of protein fold space afforded by the structural genomics initiative, can be used to further test the notion that fold space is continuous rather than discrete.
Collapse
Affiliation(s)
- Lei Xie
- *San Diego Supercomputer Center and
| | - Philip E. Bourne
- *San Diego Supercomputer Center and
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093
| |
Collapse
|
44
|
Sgobba M, Degliesposti G, Ferrari AM, Rastelli G. Structural models and binding site prediction of the C-terminal domain of human Hsp90: a new target for anticancer drugs. Chem Biol Drug Des 2008; 71:420-433. [PMID: 18373550 DOI: 10.1111/j.1747-0285.2008.00650.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Heat shock protein 90 is a valuable target for anticancer drugs because of its role in the activation and stabilization of multiple oncogenic signalling proteins. While several compounds inhibit heat shock protein 90 by binding the N-terminal domain, recent studies have proved that the C-terminal domain is important for dimerization of the chaperone and contains an additional binding site for inhibitors. Heat shock protein 90 inhibition achieved with molecules binding to the C-terminal domain provides an additional and novel opportunity to design and develop drugs. Therefore, for the first time, we have investigated the structure and the dynamic behaviour of the C-terminal domain of human heat shock protein 90 with and without the small-middle domain, using homology modelling and molecular dynamics simulations. In addition, secondary structure predictions and peptide folding simulations proved useful to investigate a putative additional alpha-helix located between H18 and beta20 of the C-terminal domain. Finally, we used the structural information to infer the location of the binding site located in the C-terminal domain by using a number of computational tools. The predicted pocket is formed by two grooves located between helix H18, the loop downstream of H18 and the loop connecting helices H20 and H21 of each monomer of the C-terminal domain, with only two amino acids contributing from each middle domain.
Collapse
Affiliation(s)
- Miriam Sgobba
- Dipartimento di Scienze Farmaceutiche, Università di Modena e Reggio Emilia, via Campi 183, 41100 Modena, Italy
| | - Gianluca Degliesposti
- Dipartimento di Scienze Farmaceutiche, Università di Modena e Reggio Emilia, via Campi 183, 41100 Modena, Italy
| | - Anna Maria Ferrari
- Dipartimento di Scienze Farmaceutiche, Università di Modena e Reggio Emilia, via Campi 183, 41100 Modena, Italy
| | - Giulio Rastelli
- Dipartimento di Scienze Farmaceutiche, Università di Modena e Reggio Emilia, via Campi 183, 41100 Modena, Italy
| |
Collapse
|
45
|
Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics 2008; 9:17. [PMID: 18190718 PMCID: PMC2219985 DOI: 10.1186/1471-2105-9-17] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. RESULTS Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. CONCLUSION These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Ivanisenko VA, Afonnikov DA, Kolchanov NA. Web-based computational tools for the prediction and analysis of post-translational modifications of proteins. Methods Mol Biol 2008; 446:363-384. [PMID: 18373270 DOI: 10.1007/978-1-60327-084-7_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The increase in the number of Web-based resources on post-translational modification sites (PTMSs) in proteins is accelerating. The paper presents a set of computational protocols describing how to work with the Internet resources when dealing with PTMSs. The protocols are intended for querying in PTMSs related data bases, search of the PTMSs in the protein sequences and structures, calculating the pI and molecular mass of the PTM isoforms. Thus, the modern bioinformatics prediction tools make feasible to express protein modification in broader quantitative terms.
Collapse
Affiliation(s)
- Vladimir A Ivanisenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk State University, Novosibirsk, Russia
| | | | | |
Collapse
|
47
|
Lu CH, Lin YS, Chen YC, Yu CS, Chang SY, Hwang JK. The fragment transformation method to detect the protein structural motifs. Proteins 2006; 63:636-43. [PMID: 16470805 DOI: 10.1002/prot.20904] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
To identify functional structural motifs from protein structures of unknown function becomes increasingly important in recent years due to the progress of the structural genomics initiatives. Although certain structural patterns such as the Asp-His-Ser catalytic triad are easy to detect because of their conserved residues and stringently constrained geometry, it is usually more challenging to detect a general structural motifs like, for example, the betabetaalpha-metal binding motif, which has a much more variable conformation and sequence. At present, the identification of these motifs usually relies on manual procedures based on different structure and sequence analysis tools. In this study, we develop a structural alignment algorithm combining both structural and sequence information to identify the local structure motifs. We applied our method to the following examples: the betabetaalpha-metal binding motif and the treble clef motif. The betabetaalpha-metal binding motif plays an important role in nonspecific DNA interactions and cleavage in host defense and apoptosis. The treble clef motif is a zinc-binding motif adaptable to diverse functions such as the binding of nucleic acid and hydrolysis of phosphodiester bonds. Our results are encouraging, indicating that we can effectively identify these structural motifs in an automatic fashion. Our method may provide a useful means for automatic functional annotation through detecting structural motifs associated with particular functions.
Collapse
Affiliation(s)
- Chih-Hao Lu
- Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | | | | | | | | | | |
Collapse
|
48
|
Kristensen DM, Chen BY, Fofanov VY, Ward RM, Lisewski AM, Kimmel M, Kavraki LE, Lichtarge O. Recurrent use of evolutionary importance for functional annotation of proteins based on local structural similarity. Protein Sci 2006; 15:1530-6. [PMID: 16672239 PMCID: PMC2242527 DOI: 10.1110/ps.062152706] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Fomin ES, Ivanisenko VA. Corroboration of the functional role of the additional zinc binding site in the G245C mutant form of the p53 protein. Biophysics (Nagoya-shi) 2006. [DOI: 10.1134/s0006350906070074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
50
|
Ausiello G, Via A, Helmer-Citterich M. Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics 2005; 6 Suppl 4:S5. [PMID: 16351754 PMCID: PMC1866380 DOI: 10.1186/1471-2105-6-s4-s5] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background The identification of local similarities between two protein structures can provide clues of a common function. Many different methods exist for searching for similar subsets of residues in proteins of known structure. However, the lack of functional and structural information on single residues, together with the low level of integration of this information in comparison methods, is a limitation that prevents these methods from being fully exploited in high-throughput analyses. Results Here we describe Query3d, a program that is both a structural DBMS (Database Management System) and a local comparison method. The method conserves a copy of all the residues of the Protein Data Bank annotated with a variety of functional and structural information. New annotations can be easily added from a variety of methods and known databases. The algorithm makes it possible to create complex queries based on the residues' function and then to compare only subsets of the selected residues. Functional information is also essential to speed up the comparison and the analysis of the results. Conclusion With Query3d, users can easily obtain statistics on how many and which residues share certain properties in all proteins of known structure. At the same time, the method also finds their structural neighbours in the whole PDB. Programs and data can be accessed through the PdbFun web interface.
Collapse
Affiliation(s)
- Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Allegra Via
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| |
Collapse
|