Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	La D, Sutch B, Livesay DR. Predicting protein functional sites with phylogenetic motifs. Proteins 2006;58:309-20. [PMID: 15573397 DOI: 10.1002/prot.20321] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Number

Cited by Other Article(s)

Qiao F, Binkowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Discovery of a Novel Inducer of Anoikis in Human Melanoma. Cancers (Basel) 2024;16:3177. [PMID: 39335149 PMCID: PMC11429909 DOI: 10.3390/cancers16183177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 09/11/2024] [Accepted: 09/12/2024] [Indexed: 09/30/2024] Open

Qiao F, Binknowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Drug Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594634. [PMID: 38826221 PMCID: PMC11142055 DOI: 10.1101/2024.05.17.594634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Abstract

Drug discovery starts with known function, either of a compound or a protein, in-turn prompting investigations to probe 3D structure of the compound-protein interface. As protein structure determines function, we hypothesized that unique 3D structural motifs represent primary information denoting unique function that can drive discovery of novel agents. Using a physics-based protein structure analysis platform developed by us, designed to conduct computationally intensive analysis at supercomputing speeds, we probed a high-resolution protein x-ray crystallographic library developed by us. We selected 3D structural motifs whose function was not otherwise established, that offered environments supporting binding of drug-like chemicals and were present on proteins that were not established therapeutic targets. For each of eight potential binding pockets on six different proteins we accessed a 60 million compound library and used our analysis platform to evaluate binding. Using eight-day colony formation assays acquired compounds were screened for efficacy against human breast, prostate, colon and lung cancer cells and toxicity against human bone marrow stem cells. Compounds selectively inhibiting cancer growth segregated to two pockets on separate proteins. The compound, Dxr2-017, exhibited selective activity against human melanoma cells in the NCI-60 cell line screen, had an IC50 of 19 nM against human melanoma M14 cells in our eight-day assay, while over 2100-fold higher concentrations inhibited stem cells by less than 30%. We show that Dxr2-017 induces anoikis, a unique form of programmed cell death in need of targeted therapeutics. The predicted target protein for Dxr2-017 is expressed in bacteria, not in humans. This supports our strategy of focusing on unique 3D structural motifs. It is known that functionally important 3D structures are evolutionarily conserved. Here we demonstrate proof-of-concept that protein structure represents high value primary data to support discovery of novel therapeutics. This approach is widely applicable.

Collapse

Shen X, Zhang S, Long J, Chen C, Wang M, Cui Z, Chen B, Tan T. A Highly Sensitive Model Based on Graph Neural Networks for Enzyme Key Catalytic Residue Prediction. J Chem Inf Model 2023;63:4277-4290. [PMID: 37399293 DOI: 10.1021/acs.jcim.3c00273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]

Abstract

Determining the catalytic site of enzymes is a great help for understanding the relationship between protein sequence, structure, and function, which provides the basis and targets for designing, modifying, and enhancing enzyme activity. The unique local spatial configuration bound to the substrate at the active center of the enzyme determines the catalytic ability of enzymes and plays an important role in the catalytic site prediction. As a suitable tool, the graph neural network can better understand and identify the residue sites with unique local spatial configurations due to its remarkable ability to characterize the three-dimensional structural features of proteins. Consequently, a novel model for predicting enzyme catalytic sites has been developed, which incorporates a uniquely designed adaptive edge-gated graph attention neural network (AEGAN). This model is capable of effectively handling sequential and structural characteristics of proteins at various levels, and the extracted features enable an accurate description of the local spatial configuration of the enzyme active site by sampling the local space around candidate residues and special design of amino acid physical and chemical properties. To evaluate its performance, the model was compared with existing catalytic site prediction models using different benchmark datasets and achieved the best results on each benchmark dataset. The model exhibited a sensitivity of 0.9659, accuracy of 0.9226, and area under the precision-recall curve (AUPRC) of 0.9241 on the independent test set constructed for evaluation. Furthermore, the F1-score of this model is nearly four times higher than that of the best-performing similar model in previous studies. This research can serve as a valuable tool to help researchers understand protein sequence-structure-function relationships while facilitating the characterization of novel enzymes of unknown function.

Collapse

Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022;130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021;8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open

Song J, Li F, Takemoto K, Haffari G, Akutsu T, Chou KC, Webb GI. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. J Theor Biol 2018;443:125-137. [DOI: 10.1016/j.jtbi.2018.01.023] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 01/17/2018] [Accepted: 01/18/2018] [Indexed: 10/18/2022]

Rsite2: an efficient computational method to predict the functional sites of noncoding RNAs. Sci Rep 2016;6:19016. [PMID: 26751501 PMCID: PMC4707467 DOI: 10.1038/srep19016] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 12/02/2015] [Indexed: 01/11/2023] Open

Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015;83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]

Abstract

As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.

Collapse

EXIA2: web server of accurate and rapid protein catalytic residue prediction. BIOMED RESEARCH INTERNATIONAL 2014;2014:807839. [PMID: 25295274 PMCID: PMC4177735 DOI: 10.1155/2014/807839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 05/27/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]

Catazaro J, Caprez A, Guru A, Swanson D, Powers R. Functional evolution of PLP-dependent enzymes based on active-site structural similarities. Proteins 2014;82:2597-608. [PMID: 24920327 DOI: 10.1002/prot.24624] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 05/30/2014] [Accepted: 06/05/2014] [Indexed: 12/29/2022]

Roshan U. Multiple sequence alignment using Probcons and Probalign. Methods Mol Biol 2014;1079:147-153. [PMID: 24170400 DOI: 10.1007/978-1-62703-646-7_9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Parente DJ, Swint-Kruse L. Multiple co-evolutionary networks are supported by the common tertiary scaffold of the LacI/GalR proteins. PLoS One 2013;8:e84398. [PMID: 24391951 PMCID: PMC3877293 DOI: 10.1371/journal.pone.0084398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 11/15/2013] [Indexed: 11/18/2022] Open

Dukka BK. Structure-based Methods for Computational Protein Functional Site Prediction. Comput Struct Biotechnol J 2013;8:e201308005. [PMID: 24688745 PMCID: PMC3962076 DOI: 10.5936/csbj.201308005] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 11/07/2013] [Accepted: 11/11/2013] [Indexed: 11/22/2022] Open

On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013;2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]

Accurate prediction of protein catalytic residues by side chain orientation and residue contact density. PLoS One 2012;7:e47951. [PMID: 23110141 PMCID: PMC3480458 DOI: 10.1371/journal.pone.0047951] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/18/2012] [Indexed: 11/19/2022] Open

Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012;7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open

Abstract

Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.

Collapse

Dou Y, Wang J, Yang J, Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 2012;7:e35666. [PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/19/2012] [Indexed: 12/01/2022] Open

Choi K, Kim S. Sequence-based enzyme catalytic domain prediction using clustering and aggregated mutual information content. J Bioinform Comput Biol 2012;9:597-611. [PMID: 21976378 DOI: 10.1142/s0219720011005677] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2011] [Accepted: 08/08/2011] [Indexed: 11/18/2022]

Abstract

Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.

Collapse

Vishnoi N, Flaherty K, Hancock LC, Ferreira ME, Amin AD, Prochasson P. Separation-of-function mutation in HPC2, a member of the HIR complex in S. cerevisiae, results in derepression of the histone genes but does not confer cryptic TATA phenotypes. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2011;1809:557-66. [PMID: 21782987 DOI: 10.1016/j.bbagrm.2011.07.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Revised: 07/02/2011] [Accepted: 07/06/2011] [Indexed: 12/29/2022]

Tuominen LK, Johnson VE, Tsai CJ. Differential phylogenetic expansions in BAHD acyltransferases across five angiosperm taxa and evidence of divergent expression among Populus paralogues. BMC Genomics 2011;12:236. [PMID: 21569431 PMCID: PMC3123328 DOI: 10.1186/1471-2164-12-236] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 05/12/2011] [Indexed: 11/26/2022] Open

Tungtur S, Parente DJ, Swint-Kruse L. Functionally important positions can comprise the majority of a protein's architecture. Proteins 2011;79:1589-608. [PMID: 21374721 PMCID: PMC3076786 DOI: 10.1002/prot.22985] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 12/08/2010] [Accepted: 12/15/2010] [Indexed: 01/13/2023]

Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011;6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open

Kc DB, Livesay DR. Topology improves phylogenetic motif functional site predictions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:226-233. [PMID: 21071810 DOI: 10.1109/tcbb.2009.60] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: implications for sequence/function analyses. J Mol Biol 2009;395:785-802. [PMID: 19818797 DOI: 10.1016/j.jmb.2009.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Revised: 10/01/2009] [Accepted: 10/02/2009] [Indexed: 11/21/2022]

Abstract

The explosion of protein sequences deduced from genetic code has led to both a problem and a potential resource: Efficient data use requires interpreting the functional impact of sequence change without experimentally characterizing each protein variant. Several groups have hypothesized that interpretation could be aided by analyzing the sequences of naturally occurring homologues. To that end, myriad sequence/function analyses have been developed to predict which conserved, semi-conserved, and nonconserved positions are functionally important. These positions must be discriminated from the nonconserved positions that are functionally silent. However, the assumptions that underlie sequence analyses are based on experimental results that are sparse and usually designed to address different questions. Here, we use three homologues from a test family common to bioinformatics-the LacI/GalR transcription repressors-to test a common assumption: If a position is functionally important for one family member, it has similar importance in all homologues. We generated experimental sequence/function information for each nonconserved position in the 18 amino acids that link the DNA-binding and regulatory domains of three LacI/GalR homologues. We find that the functional importance of each position is preserved among the three linkers, albeit to different degrees. We also find that every linker position contributes to function, which has twofold implications. (1) Since the linker positions range from highly conserved to semi-conserved to nonconserved and contribute to affinity, selectivity, and allosteric response, we assert that sequence/function analyses must identify positions in the LacI/GalR linkers to be qualified as "successful". Many analyses overlook this region since most of the residues do not directly contact ligand. (2) No position in the LacI/GalR linker is functionally silent. This finding is inconsistent with another underlying principle of many analyses: Using sequence sets to discriminate important from non-contributing positions obligates silent positions, which denotes that most homologues tolerate a variety of amino acid substitutions at the position without functional change. Instead, additional combinatorial mutants in the LacI/GalR linkers show that particular substitutions can be silent in a context-dependent manner. Thus, specific permutations of sequence change (rather than change at silent positions) would facilitate neutral drift during evolution. Finally, the combinatorial mutants also reveal functional synergy between semi- and nonconserved positions. Such functional relationships would be missed by analyses that rely primarily upon co-evolution.

Collapse

Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of protein-protein interaction specificity from correlated mutations and interaction data. Proteins 2009;76:911-29. [DOI: 10.1002/prot.22398] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Wang K, Horst JA, Cheng G, Nickle DC, Samudrala R. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008;4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open

Dukka BKC, Livesay DR. Improving position-specific predictions of protein functional sites using phylogenetic motifs. ACTA ACUST UNITED AC 2008;24:2308-16. [PMID: 18723520 DOI: 10.1093/bioinformatics/btn454] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L. Accurate sequence-based prediction of catalytic residues. ACTA ACUST UNITED AC 2008;24:2329-38. [PMID: 18710875 DOI: 10.1093/bioinformatics/btn433] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

MOTIVATION

Prediction of catalytic residues provides useful information for the research on function of enzymes. Most of the existing prediction methods are based on structural information, which limits their use. We propose a sequence-based catalytic residue predictor that provides predictions with quality comparable to modern structure-based methods and that exceeds quality of state-of-the-art sequence-based methods.

RESULTS

Our method (CRpred) uses sequence-based features and the sequence-derived PSI-BLAST profile. We used feature selection to reduce the dimensionality of the input (and explain the input) to support vector machine (SVM) classifier that provides predictions. Tests on eight datasets and side-by-side comparison with six modern structure- and sequence-based predictors show that CRpred provides predictions with quality comparable to current structure-based methods and better than sequence-based methods. The proposed method obtains 15-19% precision and 48-58% TP (true positive) rate, depending on the dataset used. CRpred also provides confidence values that allow selecting a subset of predictions with higher precision. The improved quality is due to newly designed features and careful parameterization of the SVM. The features incorporate amino acids characterized by the highest and the lowest propensities to constitute catalytic residues, Gly that provides flexibility for catalytic sites and sequence motifs characteristic to certain catalytic reactions. Our features indicate that catalytic residues are on average more conserved when compared with the general population of residues and that highly conserved amino acids characterized by high catalytic propensity are likely to form catalytic sites. We also show that local (with respect to the sequence) hydrophobicity contributes towards the prediction.

Collapse

Chien TY, Chang DTH, Chen CY, Weng YZ, Hsu CM. E1DS: catalytic site prediction based on 1D signatures of concurrent conservation. Nucleic Acids Res 2008;36:W291-6. [PMID: 18524800 PMCID: PMC2447799 DOI: 10.1093/nar/gkn324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2008] [Revised: 04/25/2008] [Accepted: 05/07/2008] [Indexed: 11/21/2022] Open

Affiliation(s)

Ting-Ying Chien Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
Darby Tien-Hao Chang Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
Chien-Yu Chen Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
Yi-Zhong Weng Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC
Chen-Ming Hsu Department of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Department of Electrical Engineering, National Cheng Kung University, Tainan 701, Department of Bio-Industrial Mechatronics Engineering, National Taiwan University, Taipei 106 and Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 320, Taiwan, ROC

Collapse

Xie BB, Chen XL, Zhang XY, He HL, Zhang YZ, Zhou BC. Predicting protein interaction interfaces from protein sequences: case studies of subtilisin and phycocyanin. Proteins 2008;71:1461-74. [PMID: 18076046 DOI: 10.1002/prot.21836] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Li B, Turuvekere S, Agrawal M, La D, Ramani K, Kihara D. Characterization of local geometry of protein surfaces with the visibility criterion. Proteins 2008;71:670-83. [PMID: 17975834 DOI: 10.1002/prot.21732] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

Experimentally determined protein tertiary structures are rapidly accumulating in a database, partly due to the structural genomics projects. Included are proteins of unknown function, whose function has not been investigated by experiments and was not able to be predicted by conventional sequence-based search. Those uncharacterized protein structures highlight the urgent need of computational methods for annotating proteins from tertiary structures, which include function annotation methods through characterizing protein local surfaces. Toward structure-based protein annotation, we have developed VisGrid algorithm that uses the visibility criterion to characterize local geometric features of protein surfaces. Unlike existing methods, which only concerns identifying pockets that could be potential ligand-binding sites in proteins, VisGrid is also aimed to identify large protrusions, hollows, and flat regions, which can characterize geometric features of a protein structure. The visibility used in VisGrid is defined as the fraction of visible directions from a target position on a protein surface. A pocket or a hollow is recognized as a cluster of positions with a small visibility. A large protrusion in a protein structure is recognized as a pocket in the negative image of the structure. VisGrid correctly identified 95.0% of ligand-binding sites as one of the three largest pockets in 5616 benchmark proteins. To examine how natural flexibility of proteins affects pocket identification, VisGrid was tested on distorted structures by molecular dynamics simulation. Sensitivity decreased approximately 20% for structures of a root mean square deviation of 2.0 A to the original crystal structure, but specificity was not much affected. Because of its intuitiveness and simplicity, the visibility criterion will lay the foundation for characterization and function annotation of local shape of proteins.

Collapse

Sterner B, Singh R, Berger B. Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 2007;14:1058-73. [PMID: 17887954 DOI: 10.1089/cmb.2007.0042] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Abstract

We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.

Collapse

Sharma A, Rastogi T, Bhartiya M, Shasany AK, Khanuja SPS. Type 2 diabetes mellitus: phylogenetic motifs for predicting protein functional sites. J Biosci 2007;32:999-1004. [PMID: 17914241 DOI: 10.1007/s12038-007-0098-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Livesay DR, Kidd PD, Eskandari S, Roshan U. Assessing the ability of sequence-based methods to provide functional insight within membrane integral proteins: a case study analyzing the neurotransmitter/Na+ symporter family. BMC Bioinformatics 2007;8:397. [PMID: 17941992 PMCID: PMC2194793 DOI: 10.1186/1471-2105-8-397] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2007] [Accepted: 10/17/2007] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Efforts to predict functional sites from globular proteins is increasingly common; however, the most successful of these methods generally require structural insight. Unfortunately, despite several recent technological advances, structural coverage of membrane integral proteins continues to be sparse. ConSequently, sequence-based methods represent an important alternative to illuminate functional roles. In this report, we critically examine the ability of several computational methods to provide functional insight within two specific areas. First, can phylogenomic methods accurately describe the functional diversity across a membrane integral protein family? And second, can sequence-based strategies accurately predict key functional sites? Due to the presence of a recently solved structure and a vast amount of experimental mutagenesis data, the neurotransmitter/Na+ symporter (NSS) family is an ideal model system to assess the quality of our predictions.

RESULTS

The raw NSS sequence dataset contains 181 sequences, which have been aligned by various methods. The resultant phylogenetic trees always contain six major subfamilies are consistent with the functional diversity across the family. Moreover, in well-represented subfamilies, phylogenetic clustering recapitulates several nuanced functional distinctions. Functional sites are predicted using six different methods (phylogenetic motifs, two methods that identify subfamily-specific positions, and three different conservation scores). A canonical set of 34 functional sites identified by Yamashita et al. within the recently solved LeuTAa structure is used to assess the quality of the predictions, most of which are predicted by the bioinformatic methods. Remarkably, the importance of these sites is largely confirmed by experimental mutagenesis. Furthermore, the collective set of functional site predictions qualitatively clusters along the proposed transport pathway, further demonstrating their utility. Interestingly, the various prediction schemes provide results that are predominantly orthogonal to each other. However, when the methods do provide overlapping results, specificity is shown to increase dramatically (e.g., sites predicted by any three methods have both accuracy and coverage greater than 50%).

CONCLUSION

The results presented herein clearly establish the viability of sequence-based bioinformatic strategies to provide functional insight within the NSS family. As such, we expect similar bioinformatic investigations will streamline functional investigations within membrane integral families in the absence of structure.

Collapse

Ahmad I, Hoessli DC, Gupta R, Walker-Nasir E, Rafik SM, Choudhary MI, Shakoori AR. In silico determination of intracellular glycosylation and phosphorylation sites in human selectins: implications for biological function. J Cell Biochem 2007;100:1558-72. [PMID: 17230456 DOI: 10.1002/jcb.21156] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Ahmad I, Hoessli DC, Walker-Nasir E, Choudhary MI, Rafik SM, Shakoori AR. Phosphorylation and glycosylation interplay: protein modifications at hydroxy amino acids and prediction of signaling functions of the human beta3 integrin family. J Cell Biochem 2007;99:706-18. [PMID: 16676352 DOI: 10.1002/jcb.20814] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Xie L, Bourne PE. A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007;8 Suppl 4:S9. [PMID: 17570152 PMCID: PMC1892088 DOI: 10.1186/1471-2105-8-s4-s9] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

How accurate and statistically robust are catalytic site predictions based on closeness centrality? BMC Bioinformatics 2007;8:153. [PMID: 17498304 PMCID: PMC1876251 DOI: 10.1186/1471-2105-8-153] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 05/11/2007] [Indexed: 11/25/2022] Open

Abstract

Background

We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex i and all other vertices.

Results

We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.

Conclusion

Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.

Collapse

Dong Q, Wang X, Lin L, Guan Y. Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins. BMC Bioinformatics 2007;8:147. [PMID: 17480235 PMCID: PMC1885810 DOI: 10.1186/1471-2105-8-147] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2007] [Accepted: 05/05/2007] [Indexed: 01/14/2023] Open

Abstract

Background

Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.

Result

In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.

Conclusion

Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.

Collapse

Powers R, Copeland JC, Germer K, Mercier KA, Ramanathan V, Revesz P. Comparison of protein active site structures for functional annotation of proteins and drug design. Proteins 2006;65:124-35. [PMID: 16862592 DOI: 10.1002/prot.21092] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Roshan U, Livesay DR. Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 2006;22:2715-21. [PMID: 16954142 DOI: 10.1093/bioinformatics/btl472] [Citation(s) in RCA: 145] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Pazos F, Rausell A, Valencia A. Phylogeny-independent detection of functional residues. Bioinformatics 2006;22:1440-8. [PMID: 16551661 DOI: 10.1093/bioinformatics/btl104] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ahmad I, Hoessli DC, Walker-Nasir E, Rafik SM, Shakoori AR. Oct-2 DNA binding transcription factor: functional consequences of phosphorylation and glycosylation. Nucleic Acids Res 2006;34:175-84. [PMID: 16431844 PMCID: PMC1326018 DOI: 10.1093/nar/gkj401] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Aharonovsky E, Trifonov EN. Protein sequence modules. J Biomol Struct Dyn 2005;23:237-42. [PMID: 16218751 DOI: 10.1080/07391102.2005.10507062] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Sterner R, Höcker B. Catalytic Versatility, Stability, and Evolution of the (βα)8-Barrel Enzyme Fold. Chem Rev 2005;105:4038-55. [PMID: 16277370 DOI: 10.1021/cr030191z] [Citation(s) in RCA: 166] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Pei J, Cai W, Kinch LN, Grishin NV. Prediction of functional specificity determinants from protein sequences using log-likelihood ratios. Bioinformatics 2005;22:164-71. [PMID: 16278237 DOI: 10.1093/bioinformatics/bti766] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

La D, Livesay DR. MINER: software for phylogenetic motif identification. Nucleic Acids Res 2005;33:W267-70. [PMID: 15980467 PMCID: PMC1160226 DOI: 10.1093/nar/gki465] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Livesay DR, La D. The evolutionary origins and catalytic importance of conserved electrostatic networks within TIM-barrel proteins. Protein Sci 2005;14:1158-70. [PMID: 15840824 PMCID: PMC2253277 DOI: 10.1110/ps.041221105] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005;15:275-84. [PMID: 15963890 DOI: 10.1016/j.sbi.2005.04.003] [Citation(s) in RCA: 204] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2005] [Revised: 02/04/2005] [Accepted: 04/18/2005] [Indexed: 10/25/2022]

La D, Livesay DR. Predicting functional sites with an automated algorithm suitable for heterogeneous datasets. BMC Bioinformatics 2005;6:116. [PMID: 15890082 PMCID: PMC1142304 DOI: 10.1186/1471-2105-6-116] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2005] [Accepted: 05/13/2005] [Indexed: 11/25/2022] Open

Abstract

Background

In a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique.

Results

In this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure.

Conclusion

The automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at . Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website.

Collapse