Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lopez G, Valencia A, Tress M. FireDB--a database of functionally important residues from proteins of known structure. Nucleic Acids Res 2006;35:D219-23. [PMID: 17132832 PMCID: PMC1716728 DOI: 10.1093/nar/gkl897] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

For:	Lopez G, Valencia A, Tress M. FireDB--a database of functionally important residues from proteins of known structure. Nucleic Acids Res 2006;35:D219-23. [PMID: 17132832 PMCID: PMC1716728 DOI: 10.1093/nar/gkl897] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Number

Cited by Other Article(s)

Santana CA, Izidoro SC, de Melo-Minardi RC, Tyzack JD, Ribeiro AJM, Pires DEV, Thornton JM, de A Silveira S. GRaSP-web: a machine learning strategy to predict binding sites based on residue neighborhood graphs. Nucleic Acids Res 2022;50:W392-W397. [PMID: 35524575 PMCID: PMC9252730 DOI: 10.1093/nar/gkac323] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 04/14/2022] [Accepted: 04/22/2022] [Indexed: 11/14/2022] Open

Brackenridge DA, McGuffin LJ. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods with a Focus on FunFOLD3. Methods Mol Biol 2021;2365:43-58. [PMID: 34432238 DOI: 10.1007/978-1-0716-1665-9_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020;36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open

Abstract Abstract Motivation The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. Results We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average. Availability and implementation The source code and datasets are available at https://github.com/charles-abreu/GRaSP. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

CavBench: A benchmark for protein cavity detection methods. PLoS One 2019;14:e0223596. [PMID: 31609980 PMCID: PMC6791542 DOI: 10.1371/journal.pone.0223596] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 09/24/2019] [Indexed: 11/19/2022] Open

Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018;19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open

Ding Y, Tang J, Guo F. Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier. J Chem Inf Model 2017;57:3149-3161. [PMID: 29125297 DOI: 10.1021/acs.jcim.7b00307] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Pons T, Vazquez M, Matey-Hernandez ML, Brunak S, Valencia A, Izarzugaza JM. KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily. BMC Genomics 2016;17 Suppl 2:396. [PMID: 27357839 PMCID: PMC4928150 DOI: 10.1186/s12864-016-2723-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Abstract

Background

The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.

Results

KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified.

A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2.

Conclusions

KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users.

Collapse

Impact of germline and somatic missense variations on drug binding sites. THE PHARMACOGENOMICS JOURNAL 2016;17:128-136. [PMID: 26810135 PMCID: PMC5380835 DOI: 10.1038/tpj.2015.97] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 11/02/2015] [Accepted: 11/13/2015] [Indexed: 11/10/2022]

Abstract

Advancements in next-generation sequencing (NGS) technologies are generating a vast amount of data. This exacerbates the current challenge of translating NGS data into actionable clinical interpretations. We have comprehensively combined germline and somatic nonsynonymous single-nucleotide variations (nsSNVs) that affect drug binding sites in order to investigate their prevalence. The integrated data thus generated in conjunction with exome or whole-genome sequencing can be used to identify patients who may not respond to a specific drug because of alterations in drug binding efficacy due to nsSNVs in the target protein's gene. To identify the nsSNVs that may affect drug binding, protein–drug complex structures were retrieved from Protein Data Bank (PDB) followed by identification of amino acids in the protein–drug binding sites using an occluded surface method. Then, the germline and somatic mutations were mapped to these amino acids to identify which of these alter protein–drug binding sites. Using this method we identified 12 993 amino acid–drug binding sites across 253 unique proteins bound to 235 unique drugs. The integration of amino acid–drug binding sites data with both germline and somatic nsSNVs data sets revealed 3133 nsSNVs affecting amino acid–drug binding sites. In addition, a comprehensive drug target discovery was conducted based on protein structure similarity and conservation of amino acid–drug binding sites. Using this method, 81 paralogs were identified that could serve as alternative drug targets. In addition, non-human mammalian proteins bound to drugs were used to identify 142 homologs in humans that can potentially bind to drugs. In the current protein–drug pairs that contain somatic mutations within their binding site, we identified 85 proteins with significant differential gene expression changes associated with specific cancer types. Information on protein–drug binding predicted drug target proteins and prevalence of both somatic and germline nsSNVs that disrupt these binding sites can provide valuable knowledge for personalized medicine treatment. A web portal is available where nsSNVs from individual patient can be checked by scanning against DrugVar to determine whether any of the SNVs affect the binding of any drug in the database.

Collapse

Medvedeva IV, Demenkov PS, Ivanisenko VA. Computer analysis of protein functional sites projection on exon structure of genes in Metazoa. BMC Genomics 2015;16 Suppl 13:S2. [PMID: 26693737 PMCID: PMC4686782 DOI: 10.1186/1471-2164-16-s13-s2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem.

RESULTS

One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling.

CONCLUSIONS

These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.

Collapse

Yu DJ, Hu J, Li QM, Tang ZM, Yang JY, Shen HB. Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction. IEEE Trans Nanobioscience 2015;14:45-58. [PMID: 25730499 DOI: 10.1109/tnb.2015.2394328] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

We are facing an era with annotated biological data rapidly and continuously generated. How to effectively incorporate new annotated data into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-based methods have been extensively used for dealing with various biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when training dataset is huge. In view of this, we propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the existing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated according to the query input, as opposed to training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor based on the smaller set of data specifically selected from the entire annotated base dataset will be applied on the query. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset as the training set makes the constructed model having better generalization ability on the query, showing "part could be better than all" phenomenon. According to the new framework, we have implemented a dynamic protein-ligand binding sites predictor called OSML (On-site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of three hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework is a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-based predictors. OSML web server and datasets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for academic use.

Collapse

Vazquez M, Pons T, Brunak S, Valencia A, Izarzugaza JMG. wKinMut-2: Identification and Interpretation of Pathogenic Variants in Human Protein Kinases. Hum Mutat 2015;37:36-42. [PMID: 26443060 DOI: 10.1002/humu.22914] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 09/22/2015] [Indexed: 12/31/2022]

A fast topological analysis algorithm for large-scale similarity evaluations of ligands and binding pockets. J Cheminform 2015;7:42. [PMID: 26561508 PMCID: PMC4631714 DOI: 10.1186/s13321-015-0091-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 07/22/2015] [Indexed: 11/10/2022] Open

Giri Rao VVH, Gosavi S. In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces. PLoS Comput Biol 2014;10:e1003938. [PMID: 25393408 PMCID: PMC4230728 DOI: 10.1371/journal.pcbi.1003938] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Accepted: 09/25/2014] [Indexed: 12/30/2022] Open

Abstract

Having multiple domains in proteins can lead to partial folding and increased aggregation. Folding cooperativity, the all or nothing folding of a protein, can reduce this aggregation propensity. In agreement with bulk experiments, a coarse-grained structure-based model of the three-domain protein, E. coli Adenylate kinase (AKE), folds cooperatively. Domain interfaces have previously been implicated in the cooperative folding of multi-domain proteins. To understand their role in AKE folding, we computationally create mutants with deleted inter-domain interfaces and simulate their folding. We find that inter-domain interfaces play a minor role in the folding cooperativity of AKE. On further analysis, we find that unlike other multi-domain proteins whose folding has been studied, the domains of AKE are not singly-linked. Two of its domains have two linkers to the third one, i.e., they are inserted into the third one. We use circular permutation to modify AKE chain-connectivity and convert inserted-domains into singly-linked domains. We find that domain insertion in AKE achieves the following: (1) It facilitates folding cooperativity even when domains have different stabilities. Insertion constrains the N- and C-termini of inserted domains and stabilizes their folded states. Therefore, domains that perform conformational transitions can be smaller with fewer stabilizing interactions. (2) Inter-domain interactions are not needed to promote folding cooperativity and can be tuned for function. In AKE, these interactions help promote conformational dynamics limited catalysis. Finally, using structural bioinformatics, we suggest that domain insertion may also facilitate the cooperative folding of other multi-domain proteins.

Most individual protein domains fold in an all or nothing fashion. This cooperative folding is important because it reduces the existence of partially folded proteins which can stick to each other and create disease causing aggregates. However, numerous proteins have multiple domains, independent units of folding, stability and/or function. Several such proteins also fold cooperatively. It is thought that strong interactions between individual domains allow the folding to propagate from a nucleating domain to neighbouring ones and this enables cooperative folding in multi-domain proteins. Here, we computationally study the folding of the three-domain protein AKE and find instead that the topology of the protein, wherein the two less stable domains are inserted into the more stable one, promotes folding cooperativity. When the more stable domain is folded, the ends of the inserted domains are constrained and this allows them to fold easily. In such a protein topology, strong inter-domain interactions are not needed to promote folding cooperativity. Interface amino acids which would have been involved in ensuring that the domains fit together correctly can now be tuned for binding or catalysis or conformational transitions. Thus, inserted domains may be present in multi-domain proteins to promote both function and folding.

Collapse

Gallo Cassarino T, Bordoli L, Schwede T. Assessment of ligand binding site predictions in CASP10. Proteins 2014;82 Suppl 2:154-63. [PMID: 24339001 DOI: 10.1002/prot.24495] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 12/04/2013] [Accepted: 12/09/2013] [Indexed: 12/27/2022]

Izarzugaza JMG, Vazquez M, del Pozo A, Valencia A. wKinMut: an integrated tool for the analysis and interpretation of mutations in human protein kinases. BMC Bioinformatics 2013;14:345. [PMID: 24289158 PMCID: PMC3879071 DOI: 10.1186/1471-2105-14-345] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 05/30/2013] [Indexed: 11/13/2022] Open

Abstract

Background

Protein kinases are involved in relevant physiological functions and a broad number of mutations in this superfamily have been reported in the literature to affect protein function and stability. Unfortunately, the exploration of the consequences on the phenotypes of each individual mutation remains a considerable challenge.

Results

The wKinMut web-server offers direct prediction of the potential pathogenicity of the mutations from a number of methods, including our recently developed prediction method based on the combination of information from a range of diverse sources, including physicochemical properties and functional annotations from FireDB and Swissprot and kinase-specific characteristics such as the membership to specific kinase groups, the annotation with disease-associated GO terms or the occurrence of the mutation in PFAM domains, and the relevance of the residues in determining kinase subfamily specificity from S3Det. This predictor yields interesting results that compare favourably with other methods in the field when applied to protein kinases.

Together with the predictions, wKinMut offers a number of integrated services for the analysis of mutations. These include: the classification of the kinase, information about associations of the kinase with other proteins extracted from iHop, the mapping of the mutations onto PDB structures, pathogenicity records from a number of databases and the classification of mutations in large-scale cancer studies. Importantly, wKinMut is connected with the SNP2L system that extracts mentions of mutations directly from the literature, and therefore increases the possibilities of finding interesting functional information associated to the studied mutations.

Conclusions

wKinMut facilitates the exploration of the information available about individual mutations by integrating prediction approaches with the automatic extraction of information from the literature (text mining) and several state-of-the-art databases.

wKinMut has been used during the last year for the analysis of the consequences of mutations in the context of a number of cancer genome projects, including the recent analysis of Chronic Lymphocytic Leukemia cases and is publicly available at http://wkinmut.bioinfo.cnio.es.

Collapse

Khazanov NA, Carlson HA. Exploring the composition of protein-ligand binding sites on a large scale. PLoS Comput Biol 2013;9:e1003321. [PMID: 24277997 PMCID: PMC3836696 DOI: 10.1371/journal.pcbi.1003321] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 09/23/2013] [Indexed: 12/21/2022] Open

Abstract

The residue composition of a ligand binding site determines the interactions available for diffusion-mediated ligand binding, and understanding general composition of these sites is of great importance if we are to gain insight into the functional diversity of the proteome. Many structure-based drug design methods utilize such heuristic information for improving prediction or characterization of ligand-binding sites in proteins of unknown function. The Binding MOAD database if one of the largest curated sets of protein-ligand complexes, and provides a source of diverse, high-quality data for establishing general trends of residue composition from currently available protein structures. We present an analysis of 3,295 non-redundant proteins with 9,114 non-redundant binding sites to identify residues over-represented in binding regions versus the rest of the protein surface. The Binding MOAD database delineates biologically-relevant “valid” ligands from “invalid” small-molecule ligands bound to the protein. Invalids are present in the crystallization medium and serve no known biological function. Contacts are found to differ between these classes of ligands, indicating that residue composition of biologically relevant binding sites is distinct not only from the rest of the protein surface, but also from surface regions capable of opportunistic binding of non-functional small molecules. To confirm these trends, we perform a rigorous analysis of the variation of residue propensity with respect to the size of the dataset and the content bias inherent in structure sets obtained from a large protein structure database. The optimal size of the dataset for establishing general trends of residue propensities, as well as strategies for assessing the significance of such trends, are suggested for future studies of binding-site composition.

Describing the general structure of protein binding sites is fundamentally important for guiding drug design and better understanding structure-function relationships. Here, we analyze small molecules bound to proteins within our large database, Binding MOAD (Mother of All Databases, pronounced like “mode” as a pun referring to ligand-binding modes). We focus on different contacts across the residues in the binding sites, and we normalize the data relative to the protein's entire surface. A key feature of this study is the use of a “control” where we compare real, functional binding sites to the random contacts seen for crystallographic additives against the protein surface. Controls are required in experimental biology, but they are ill-defined in many computational approaches. This allows us to describe how true binding sites are unique on the protein surface and distinct from random patches that attract common, small molecules.

Collapse

Maietta P, Lopez G, Carro A, Pingilley BJ, Leon LG, Valencia A, Tress ML. FireDB: a compendium of biological and pharmacologically relevant ligands. Nucleic Acids Res 2013;42:D267-72. [PMID: 24243844 PMCID: PMC3965074 DOI: 10.1093/nar/gkt1127] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Yu DJ, Hu J, Yang J, Shen HB, Tang J, Yang JY. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:994-1008. [PMID: 24334392 DOI: 10.1109/tcbb.2013.104] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 2013;30:1713-9. [PMID: 23589455 DOI: 10.1093/molbev/mst069] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink JJ, Lopez G, Valencia A, Tress ML. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res 2012;41:D110-7. [PMID: 23161672 PMCID: PMC3531113 DOI: 10.1093/nar/gks1058] [Citation(s) in RCA: 165] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Yang J, Roy A, Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res 2012;41:D1096-103. [PMID: 23087378 PMCID: PMC3531193 DOI: 10.1093/nar/gks966] [Citation(s) in RCA: 487] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Izarzugaza JMG, Krallinger M, Valencia A. Interpretation of the consequences of mutations in protein kinases: combined use of bioinformatics and text mining. Front Physiol 2012;3:323. [PMID: 23055974 PMCID: PMC3449330 DOI: 10.3389/fphys.2012.00323] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 07/23/2012] [Indexed: 11/30/2022] Open

Valencia A, Hidalgo M. Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics. Genome Med 2012;4:61. [PMID: 22839973 PMCID: PMC3580417 DOI: 10.1186/gm362] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Izarzugaza JMG, del Pozo A, Vazquez M, Valencia A. Prioritization of pathogenic mutations in the protein kinase superfamily. BMC Genomics 2012;13 Suppl 4:S3. [PMID: 22759651 PMCID: PMC3303724 DOI: 10.1186/1471-2164-13-s4-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abstract

BACKGROUND

Most of the many mutations described in human protein kinases are tolerated without significant disruption of the corresponding structures or molecular functions, while some of them have been associated to a variety of human diseases, including cancer. In the last decade, a plethora of computational methods to predict the effect of missense single-nucleotide variants (SNVs) have been developed. Still, current high-throughput sequencing efforts and the concomitant need for massive interpretation of protein sequence variants will demand for more efficient and/or accurate computational methods in the forthcoming years.

RESULTS

We present KinMut, a support vector machine (SVM) approach, to identify pathogenic mutations in the protein kinase superfamily. KinMut relays on a combination of sequence-derived features that describe mutations at different levels: (1) Gene level: membership to a specific group in Kinbase and the annotation with GO terms; (2) Domain level: annotated PFAM domains; and (3) Residue level: physicochemical features of amino acids, specificity determining positions, and functional annotations from SwissProt and FireDB. The system has been trained with the set of 3492 human kinase mutations in UniProt for which experimental validation of their pathogenic or neutral character exists. In addition, we discuss the relative importance of these independent properties and their combination for the development of a kinase-specific predictor. Finally, we compare KinMut with other state-of-the-art prediction methods.

CONCLUSIONS

Family-specific features appear among the most discriminative information sources, which allow us to produce accurate results in a reliable and very simple way with minimal supervision. Our study aims to broaden the knowledge on the mechanisms by which mutations in the human kinome contribute to disease with a particular focus in cancer. The classifier as well as further documentation is available at http://kinmut.bioinfo.cnio.es/.

Collapse

Lahti JL, Tang GW, Capriotti E, Liu T, Altman RB. Bioinformatics and variability in drug response: a protein structural perspective. J R Soc Interface 2012;9:1409-37. [PMID: 22552919 DOI: 10.1098/rsif.2011.0843] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Real value prediction of protein folding rate change upon point mutation. J Comput Aided Mol Des 2012;26:339-47. [DOI: 10.1007/s10822-012-9560-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 03/02/2012] [Indexed: 10/28/2022]

Drew K, Winters P, Butterfoss GL, Berstis V, Uplinger K, Armstrong J, Riffle M, Schweighofer E, Bovermann B, Goodlett DR, Davis TN, Shasha D, Malmström L, Bonneau R. The Proteome Folding Project: proteome-scale prediction of structure and function. Genome Res 2011;21:1981-94. [PMID: 21824995 DOI: 10.1101/gr.121475.111] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Izarzugaza JMG, Hopcroft LEM, Baresic A, Orengo CA, Martin ACR, Valencia A. Characterization of pathogenic germline mutations in human protein kinases. BMC Bioinformatics 2011;12 Suppl 4:S1. [PMID: 21992016 PMCID: PMC3194193 DOI: 10.1186/1471-2105-12-s4-s1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Abstract

BACKGROUND

Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites.

RESULTS

Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families.

CONCLUSIONS

Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.

Collapse

Lopez G, Maietta P, Rodriguez JM, Valencia A, Tress ML. firestar--advances in the prediction of functionally important residues. Nucleic Acids Res 2011;39:W235-41. [PMID: 21672959 PMCID: PMC3125799 DOI: 10.1093/nar/gkr437] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Wass MN, David A, Sternberg MJE. Challenges for the prediction of macromolecular interactions. Curr Opin Struct Biol 2011;21:382-90. [DOI: 10.1016/j.sbi.2011.03.013] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Revised: 03/04/2011] [Accepted: 03/24/2011] [Indexed: 12/14/2022]

Roche DB, Tetchner SJ, McGuffin LJ. FunFOLD: an improved automated method for the prediction of ligand binding residues using 3D models of proteins. BMC Bioinformatics 2011;12:160. [PMID: 21575183 PMCID: PMC3123233 DOI: 10.1186/1471-2105-12-160] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Accepted: 05/16/2011] [Indexed: 11/30/2022] Open

Abstract

Background

The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.

Results

The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.

Conclusions

The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.

Collapse

Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011;79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]

Roche DB, Buenavista MT, Tetchner SJ, McGuffin LJ. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction. Nucleic Acids Res 2011;39:W171-6. [PMID: 21459847 PMCID: PMC3125722 DOI: 10.1093/nar/gkr184] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

Huang LT, Gromiha MM. First insight into the prediction of protein folding rate change upon point mutation. Bioinformatics 2010;26:2121-7. [DOI: 10.1093/bioinformatics/btq350] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Campagna-Slater V, Arrowsmith AG, Zhao Y, Schapira M. Pharmacophore screening of the protein data bank for specific binding site chemistry. J Chem Inf Model 2010;50:358-67. [PMID: 20112952 DOI: 10.1021/ci900427b] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Horan K, Shelton CR, Girke T. Predicting conserved protein motifs with Sub-HMMs. BMC Bioinformatics 2010;11:205. [PMID: 20420695 PMCID: PMC2879284 DOI: 10.1186/1471-2105-11-205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 04/26/2010] [Indexed: 11/16/2022] Open

Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010;67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]

Tendulkar AV, Krallinger M, de la Torre V, López G, Wangikar PP, Valencia A. FragKB: structural and literature annotation resource of conserved peptide fragments and residues. PLoS One 2010;5:e9679. [PMID: 20305778 PMCID: PMC2841175 DOI: 10.1371/journal.pone.0009679] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 02/12/2010] [Indexed: 01/21/2023] Open

Izarzugaza JMG, Redfern OC, Orengo CA, Valencia A. Cancer-associated mutations are preferentially distributed in protein kinase functional sites. Proteins 2010;77:892-903. [PMID: 19626714 DOI: 10.1002/prot.22512] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc Natl Acad Sci U S A 2010;107:1995-2000. [PMID: 20133844 DOI: 10.1073/pnas.0908044107] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

López G, Ezkurdia I, Tress ML. Assessment of ligand binding residue predictions in CASP8. Proteins 2010;77 Suppl 9:138-46. [PMID: 19714771 DOI: 10.1002/prot.22557] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Vanhee P, Reumers J, Stricher F, Baeten L, Serrano L, Schymkowitz J, Rousseau F. PepX: a structural database of non-redundant protein-peptide complexes. Nucleic Acids Res 2009;38:D545-51. [PMID: 19880386 PMCID: PMC2808939 DOI: 10.1093/nar/gkp893] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Oh M, Joo K, Lee J. Protein-binding site prediction based on three-dimensional protein modeling. Proteins 2009;77 Suppl 9:152-6. [DOI: 10.1002/prot.22572] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Reeves GA, Talavera D, Thornton JM. Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 2009;6:129-47. [PMID: 19019817 DOI: 10.1098/rsif.2008.0341] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Dessailly BH, Lensink MF, Orengo CA, Wodak SJ. LigASite--a database of biologically relevant binding sites in proteins with known apo-structures. Nucleic Acids Res 2007;36:D667-73. [PMID: 17933762 PMCID: PMC2238865 DOI: 10.1093/nar/gkm839] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

López G, Valencia A, Tress ML. firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res 2007;35:W573-7. [PMID: 17584799 PMCID: PMC1933227 DOI: 10.1093/nar/gkm297] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

López G, Rojas A, Tress M, Valencia A. Assessment of predictions submitted for the CASP7 function prediction category. Proteins 2007;69 Suppl 8:165-74. [PMID: 17654548 DOI: 10.1002/prot.21651] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]