1
|
Shehadi IA, Abyzov A, Uzun A, Wei Y, Murga LF, Ilyin V, Ondrechen MJ. ACTIVE SITE PREDICTION FOR COMPARATIVE MODEL STRUCTURES WITH THEMATICS. J Bioinform Comput Biol 2011; 3:127-43. [PMID: 15751116 DOI: 10.1142/s0219720005000916] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2004] [Revised: 06/25/2004] [Accepted: 07/10/2004] [Indexed: 11/18/2022]
Abstract
THEMATICS (Theoretical Microscopic Titration Curves) is a simple, reliable computational predictor of the active sites of enzymes from structure. Our method, based on well-established Finite Difference Poisson–Boltzmann techniques, identifies the ionisable residues with anomalous predicted titration behavior. A cluster of two or more such perturbed residues is a very reliable predictor of the active site. The protein does not have to bear any resemblance in sequence or structure to any previously characterized protein, but the method does require the three-dimensional structure. We now present evidence that THEMATICS can also locate the active site in structures built by comparative modeling from similar structures. Results are given for a total of 21 sets of proteins, including 21 templates and 83 comparative model structures. Detailed results are presented for three sets of orthologous proteins (Triosephosphate isomerase, 6-Hydroxymethyl-7,8-dihydropterin pyrophosphokinase, and Aspartate aminotransferase) and for one set of human homologues of Aldose reductase with different functions. THEMATICS correctly locates the active site in the model structures. This suggests that the method can be applicable to a much larger set of proteins for which an experimentally determined structure is unavailable. With a few exceptions, the predicted active sites in the comparative model structures are similar to that of the corresponding template structure.
Collapse
Affiliation(s)
- Ihsan A Shehadi
- Department of Chemistry, United Arab Emirates University, Al-Ain, United Arab Emirates.
| | | | | | | | | | | | | |
Collapse
|
2
|
Tong W, Wei Y, Murga LF, Ondrechen MJ, Williams RJ. Partial order optimum likelihood (POOL): maximum likelihood prediction of protein active site residues using 3D Structure and sequence properties. PLoS Comput Biol 2009; 5:e1000266. [PMID: 19148270 PMCID: PMC2612599 DOI: 10.1371/journal.pcbi.1000266] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2008] [Accepted: 12/04/2008] [Indexed: 11/24/2022] Open
Abstract
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers. Genome sequencing has revealed the codes for thousands of previously unknown proteins for humans and for hundreds of other species. Many of these proteins are of unknown or unclear function. The information contained in the genome sequences holds tremendous potential benefit to humankind, including new approaches to the diagnosis and treatment of disease. In order to realize these benefits, a key step is to understand the functions of the proteins for which these genes hold the code. A first step in understanding the function of a protein is to identify the functional site, the local area on the surface of a protein where it affects its functional activity. This paper reports on a new computational methodology to predict protein functional sites from protein 3D structures. A new machine learning approach called Partial Order Optimum Likelihood (POOL) is introduced here. It is shown that POOL outperforms previous methods for the prediction of protein functional sites from 3D structures.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
| | - Ying Wei
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Leonel F. Murga
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
| | - Mary Jo Ondrechen
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| | - Ronald J. Williams
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Complex Scientific Software, Northeastern University, Boston, Massachusetts, United States of America
- * E-mail: (MO); (RJW)
| |
Collapse
|
3
|
Abstract
The predictability of catalytic and binding sites from apo structures is addressed for proteins that undergo significant conformational change upon binding. Theoretical microscopic titration curves (THEMATICS), an electrostatics-based method for the prediction of functional sites, is performed on a test set of 24 proteins with both apo and holo structures available. For 23 of these 24 proteins (96%), THEMATICS predicts the correct catalytic or binding site for both the apo and holo forms. For only one of the 24 proteins, THEMATICS makes the correct prediction for the holo structure but fails for the apo structure. The metrics used by THEMATICS to identify functional residues generally are larger in absolute value for the functional residues in the holo forms compared to the corresponding residues in the apo forms. However, even in the apo forms, these identifying metrics are still statistically significantly larger for functional residues than for residues not involved in catalysis or binding. This indicates that some of the unusual electrostatic properties of functional residues are preserved in the apo conformation. Evidence is presented that certain residues immediately surrounding the active catalytic and binding residues impart functionally important chemical and electrostatic properties to the active residues. At least parts of these microenvironments exist in the unbound conformations, such that THEMATICS is able to distinguish the functional residues even in the apo structures.
Collapse
Affiliation(s)
- Leonel F Murga
- Department of Biochemistry, Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts 02454-9110, USA
| | | | | |
Collapse
|
4
|
Tong W, Williams RJ, Wei Y, Murga LF, Ko J, Ondrechen MJ. Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci 2007; 17:333-41. [PMID: 18096640 DOI: 10.1110/ps.073213608] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.
Collapse
Affiliation(s)
- Wenxu Tong
- College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115, USA
| | | | | | | | | | | |
Collapse
|
5
|
Murga LF, Wei Y, Ondrechen MJ. Computed protonation properties: unique capabilities for protein functional site prediction. Genome Inform 2007; 19:107-118. [PMID: 18546509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Prediction of protein functional sites from 3D structure is an important problem, particularly as structural genomics projects produce hundreds of structures of unknown function, including novel folds and the structures of orphan sequences. The present paper shows how computed protonation properties provide unique and powerful capabilities for the prediction of catalytic sites from the 3D structure alone. These protonation properties of the ionizable residues in a protein may be computed from the 3D structure using the calculated electrical potential function. In particular, the shapes of the theoretical microscopic titration curves (THEMATICS) enable selection of the residues involved in catalysis or small molecule recognition with good sensitivity and precision. Results are shown for 169 annotated enzymes in the Catalytic Site Atlas (CSA). Performance, as measured by residue recall and precision, is clearly better than that of other 3D-structure-based methods. When compared with methods based on sequence alignments and structural comparisons, THEMATICS performance is competitive for well-characterized enzymes. However THEMATICS performance does not degrade in the absence of similarity, as do the alignment-based methods, even if there are few or no sequence homologues or few or no proteins of similar structure. It is further shown that the protonation properties perform well on open, unbound structures, even if there is substantial conformational change upon ligand binding.
Collapse
Affiliation(s)
- Leonel F Murga
- Department of Chemistry & Chemical Biology and Institute for Complex Scientific Software, Northeastern University, Boston, MA 02115, USA.
| | | | | |
Collapse
|
6
|
Abstract
MOTIVATION Identification of functional information for a protein from its three-dimensional (3D) structure is a major challenge in genomics. The power of theoretical microscopic titration curves (THEMATICS), when coupled with a statistical analysis, provides a method for high-throughput screening for identification of catalytic sites and binding sites with high accuracy and precision. The method requires only the 3D structure of the query protein as input, but it performs as well as other methods that depend on sequence alignments and structural similarities.
Collapse
Affiliation(s)
- Jaeju Ko
- Department of Chemistry, Indiana University of Pennsylvania 975 Oakland Avenue, Indiana, PA 15705 USA
| | | | | | | |
Collapse
|
7
|
Ko J, Murga LF, André P, Yang H, Ondrechen MJ, Williams RJ, Agunwamba A, Budil DE. Statistical criteria for the identification of protein active sites using theoretical microscopic titration curves. Proteins 2005; 59:183-95. [PMID: 15739204 DOI: 10.1002/prot.20418] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Theoretical Microscopic Titration Curves (THEMATICS) may be used to identify chemically important residues in active sites of enzymes by characteristic deviations from the normal, sigmoidal Henderson-Hasselbalch titration behavior. Clusters of such deviant residues in physical proximity constitute reliable predictors of the location of the active site. Originally the residues with deviant predicted behavior were identified by human observation of the computed titration curves. However, it is preferable to select the unusual residues by mathematically well-defined criteria, in order to reduce the chance of error, eliminate any possible biases, and substantially speed up the selection process. Here we present some simple statistical tests that constitute such selection criteria. The first derivatives of the predicted titration curves resemble distribution functions and are normalized. The moments of these first derivative functions are computed. It is shown that the third and fourth moments, measures of asymmetry and kurtosis, respectively, are good measures of the deviations from normal behavior. Results are presented for 44 different enzymes. Detailed results are given for 4 enzymes with 4 different types of chemistry: arginine kinase from Limulus polyphemus (horseshoe crab); beta-lactamase from Escherichia coli; glutamate racemase from Aquifex pyrophilus; and 3-isopropylmalate dehydrogenase from Thiobacillus ferrooxidans. The relationship between the statistical measures of nonsigmoidal behavior in the predicted titration curves and the catalytic activity of the residue is discussed.
Collapse
Affiliation(s)
- Jaeju Ko
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, USA
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Murga LF, Wei Y, André P, Clifton JG, Ringe D, Jo Ondrechen M. Physicochemical Methods for Prediction of Functional Information for Proteins. Isr J Chem 2004. [DOI: 10.1560/q3yd-pedl-jru8-8fvm] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
9
|
Murga LF, Ondrechen MJ. Numerical Aspects of the Calculation of Second Hyperpolarizabilities Using the Finite Field Method Coupled with a Simple Lanczos Algorithm. J Comput Chem 2001. [DOI: 10.1002/1096-987x(200103)22:4<468::aid-jcc1017>3.0.co;2-a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
10
|
Ferretti A, Lami A, Murga LF, Shehadi IA, Ondrechen MJ, Villani G. Theory of Electroabsorption Spectroscopy in Pyrazine-Bridged Ru Dimers. J Am Chem Soc 1999. [DOI: 10.1021/ja9814218] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Alessandro Ferretti
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| | - Alessandro Lami
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| | - Leonel F. Murga
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| | - Ihsan A. Shehadi
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| | - Mary Jo Ondrechen
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| | - Giovanni Villani
- Contribution from the Istituto di Chimica Quantistica ed Energetica Molecolare del CNR, Via Risorgimento 35, I-56126 Pisa, Italy, and Department of Chemistry, Northeastern University, Boston, Massachusetts 02115
| |
Collapse
|
11
|
|
12
|
|
13
|
Murga LF, Ferretti A, Lami A, Ondrechen MJ, Villani G. Theory of the Stark Effect spectral lineshape for a delocalized mixed-valence complex. INORG CHEM COMMUN 1998. [DOI: 10.1016/s1387-7003(98)00036-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
14
|
|