1
|
Qiao F, Binkowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Discovery of a Novel Inducer of Anoikis in Human Melanoma. Cancers (Basel) 2024; 16:3177. [PMID: 39335149 PMCID: PMC11429909 DOI: 10.3390/cancers16183177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 09/11/2024] [Accepted: 09/12/2024] [Indexed: 09/30/2024] Open
Abstract
Drug discovery historically starts with an established function, either that of compounds or proteins. This can hamper discovery of novel therapeutics. As structure determines function, we hypothesized that unique 3D protein structures constitute primary data that can inform novel discovery. Using a computationally intensive physics-based analytical platform operating at supercomputing speeds, we probed a high-resolution protein X-ray crystallographic library developed by us. For each of the eight identified novel 3D structures, we analyzed binding of sixty million compounds. Top-ranking compounds were acquired and screened for efficacy against breast, prostate, colon, or lung cancer, and for toxicity on normal human bone marrow stem cells, both using eight-day colony formation assays. Effective and non-toxic compounds segregated to two pockets. One compound, Dxr2-017, exhibited selective anti-melanoma activity in the NCI-60 cell line screen. In eight-day assays, Dxr2-017 had an IC50 of 12 nM against melanoma cells, while concentrations over 2100-fold higher had minimal stem cell toxicity. Dxr2-017 induced anoikis, a unique form of programmed cell death in need of targeted therapeutics. Our findings demonstrate proof-of-concept that protein structures represent high-value primary data to support the discovery of novel acting therapeutics. This approach is widely applicable.
Collapse
Affiliation(s)
- Fangfang Qiao
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | | | - Irene Broughan
- Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Weining Chen
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Amarnath Natarajan
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Gary E Schiltz
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Karl A Scheidt
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Wayne F Anderson
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL 60611, USA
| | - Raymond Bergan
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| |
Collapse
|
2
|
Qiao F, Binknowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Drug Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594634. [PMID: 38826221 PMCID: PMC11142055 DOI: 10.1101/2024.05.17.594634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Drug discovery starts with known function, either of a compound or a protein, in-turn prompting investigations to probe 3D structure of the compound-protein interface. As protein structure determines function, we hypothesized that unique 3D structural motifs represent primary information denoting unique function that can drive discovery of novel agents. Using a physics-based protein structure analysis platform developed by us, designed to conduct computationally intensive analysis at supercomputing speeds, we probed a high-resolution protein x-ray crystallographic library developed by us. We selected 3D structural motifs whose function was not otherwise established, that offered environments supporting binding of drug-like chemicals and were present on proteins that were not established therapeutic targets. For each of eight potential binding pockets on six different proteins we accessed a 60 million compound library and used our analysis platform to evaluate binding. Using eight-day colony formation assays acquired compounds were screened for efficacy against human breast, prostate, colon and lung cancer cells and toxicity against human bone marrow stem cells. Compounds selectively inhibiting cancer growth segregated to two pockets on separate proteins. The compound, Dxr2-017, exhibited selective activity against human melanoma cells in the NCI-60 cell line screen, had an IC50 of 19 nM against human melanoma M14 cells in our eight-day assay, while over 2100-fold higher concentrations inhibited stem cells by less than 30%. We show that Dxr2-017 induces anoikis, a unique form of programmed cell death in need of targeted therapeutics. The predicted target protein for Dxr2-017 is expressed in bacteria, not in humans. This supports our strategy of focusing on unique 3D structural motifs. It is known that functionally important 3D structures are evolutionarily conserved. Here we demonstrate proof-of-concept that protein structure represents high value primary data to support discovery of novel therapeutics. This approach is widely applicable.
Collapse
Affiliation(s)
- Fangfang Qiao
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | | | - Irene Broughan
- Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Weining Chen
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Amarnath Natarajan
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Gary E. Schiltz
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Karl A. Scheidt
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Wayne F. Anderson
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL 60611, USA
| | - Raymond Bergan
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| |
Collapse
|
3
|
Konecki DM, Hamrick S, Wang C, Agosto MA, Wensel TG, Lichtarge O. CovET: A covariation-evolutionary trace method that identifies protein structure-function modules. J Biol Chem 2023; 299:104896. [PMID: 37290531 PMCID: PMC10338321 DOI: 10.1016/j.jbc.2023.104896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open
Abstract
Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.
Collapse
Affiliation(s)
- Daniel M Konecki
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Spencer Hamrick
- Chemical, Physical, and Structural Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Chen Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Theodore G Wensel
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
4
|
Recurrent high-impact mutations at cognate structural positions in class A G protein-coupled receptors expressed in tumors. Proc Natl Acad Sci U S A 2021; 118:2113373118. [PMID: 34916293 PMCID: PMC8713800 DOI: 10.1073/pnas.2113373118] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2021] [Indexed: 12/23/2022] Open
Abstract
GPCRs and GPCR pathways are increasingly being implicated in human malignancies, placing them among the most promising cancer drug candidates. Our results reveal enrichment of highly impactful, recurrent GPCR mutations within cancers. We found that cognate mutations in selected class A GPCRs have deleterious effects on signaling function. The results also suggest that olfactory receptors, often considered inconsequential, display a nonrandom mutation pattern in tumors in which they are expressed. These findings support the idea that protein paralogs can act in parallel as members of an onco-group. G protein-coupled receptors (GPCRs) are the largest family of human proteins. They have a common structure and, signaling through a much smaller set of G proteins, arrestins, and effectors, activate downstream pathways that often modulate hallmark mechanisms of cancer. Because there are many more GPCRs than effectors, mutations in different receptors could perturb signaling similarly so as to favor a tumor. We hypothesized that somatic mutations in tumor samples may not be enriched within a single gene but rather that cognate mutations with similar effects on GPCR function are distributed across many receptors. To test this possibility, we systematically aggregated somatic cancer mutations across class A GPCRs and found a nonrandom distribution of positions with variant amino acid residues. Individual cancer types were enriched for highly impactful, recurrent mutations at selected cognate positions of known functional motifs. We also discovered that no single receptor drives this pattern, but rather multiple receptors contain amino acid substitutions at a few cognate positions. Phenotypic characterization suggests these mutations induce perturbation of G protein activation and/or β-arrestin recruitment. These data suggest that recurrent impactful oncogenic mutations perturb different GPCRs to subvert signaling and promote tumor growth or survival. The possibility that multiple different GPCRs could moonlight as drivers or enablers of a given cancer through mutations located at cognate positions across GPCR paralogs opens a window into cancer mechanisms and potential approaches to therapeutics.
Collapse
|
5
|
Fleury C, Gracy J, Gautier MF, Pons JL, Dufayard JF, Labesse G, Ruiz M, de Lamotte F. Comprehensive classification of the plant non-specific lipid transfer protein superfamily towards its sequence-structure-function analysis. PeerJ 2019; 7:e7504. [PMID: 31428542 PMCID: PMC6698131 DOI: 10.7717/peerj.7504] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 07/17/2019] [Indexed: 11/20/2022] Open
Abstract
Background Non-specific Lipid Transfer Proteins (nsLTPs) are widely distributed in the plant kingdom and constitute a superfamily of related proteins. Several hundreds of different nsLTP sequences—and counting—have been characterized so far, but their biological functions remain unclear. It has been clear for years that they present a certain interest for agronomic and nutritional issues. Deciphering their functions means collecting and analyzing a variety of data from gene sequence to protein structure, from cellular localization to the physiological role. As a huge and growing number of new protein sequences are available nowadays, extracting meaningful knowledge from sequence–structure–function relationships calls for the development of new tools and approaches. As nsLTPs show high evolutionary divergence, but a conserved common right handed superhelix structural fold, and as they are involved in a large number of key roles in plant development and defense, they are a stimulating case study for validating such an approach. Methods In this study, we comprehensively investigated 797 nsLTP protein sequences, including a phylogenetic analysis on canonical protein sequences, three-dimensional structure modeling and functional annotation using several well-established bioinformatics programs. Additionally, two integrative methodologies using original tools were developed. The first was a new method for the detection of (i) conserved amino acid residues involved in structure stabilization and (ii) residues potentially involved in ligand interaction. The second was a structure–function classification based on the evolutionary trace display method using a new tree visualization interface. We also present a new tool for visualizing phylogenetic trees. Results Following this new protocol, an updated classification of the nsLTP superfamily was established and a new functional hypothesis for key residues is suggested. Lastly, this work allows a better representation of the diversity of plant nsLTPs in terms of sequence, structure and function.
Collapse
Affiliation(s)
| | - Jérôme Gracy
- CBS, CNRS Univ Montpellier INSERM, Montpellier, France
| | | | - Jean-Luc Pons
- CBS, CNRS Univ Montpellier INSERM, Montpellier, France
| | | | | | | | | |
Collapse
|
6
|
Evolutionary action and structural basis of the allosteric switch controlling β 2AR functional selectivity. Nat Commun 2017; 8:2169. [PMID: 29255305 PMCID: PMC5735088 DOI: 10.1038/s41467-017-02257-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Accepted: 11/15/2017] [Indexed: 12/18/2022] Open
Abstract
Functional selectivity of G-protein-coupled receptors is believed to originate from ligand-specific conformations that activate only subsets of signaling effectors. In this study, to identify molecular motifs playing important roles in transducing ligand binding into distinct signaling responses, we combined in silico evolutionary lineage analysis and structure-guided site-directed mutagenesis with large-scale functional signaling characterization and non-negative matrix factorization clustering of signaling profiles. Clustering based on the signaling profiles of 28 variants of the β2-adrenergic receptor reveals three clearly distinct phenotypical clusters, showing selective impairments of either the Gi or βarrestin/endocytosis pathways with no effect on Gs activation. Robustness of the results is confirmed using simulation-based error propagation. The structural changes resulting from functionally biasing mutations centered around the DRY, NPxxY, and PIF motifs, selectively linking these micro-switches to unique signaling profiles. Our data identify different receptor regions that are important for the stabilization of distinct conformations underlying functional selectivity. Ligand-induced biased signaling is thought to result in part from ligand-specific receptor conformations that cause the engagement of distinct effectors. Here the authors trace and evaluate the impact of mutations of the β2–adrenergic receptor on multiple signaling outputs to provide structural-level insight into the determinants of GPCR functional selectivity.
Collapse
|
7
|
Katsonis P, Lichtarge O. Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests. Hum Mutat 2017; 38:1072-1084. [PMID: 28544059 DOI: 10.1002/humu.23266] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 03/13/2017] [Accepted: 05/17/2017] [Indexed: 01/09/2023]
Abstract
A major challenge in genome interpretation is to estimate the fitness effect of coding variants of unknown significance (VUS). Labor, limited understanding of protein functions, and lack of assays generally limit direct experimental assessment of VUS, and make robust and accurate computational approaches a necessity. Often, however, algorithms that predict mutational effect disagree among themselves and with experimental data, slowing their adoption for clinical diagnostics. To objectively assess such methods, the Critical Assessment of Genome Interpretation (CAGI) community organizes contests to predict unpublished experimental data, available only to CAGI assessors. We review here the CAGI performance of evolutionary action (EA) predictions of mutational impact. EA models the fitness effect of coding mutations analytically, as a product of the gradient of the fitness landscape times the perturbation size. In practice, these terms are computed from phylogenetic considerations as the functional sensitivity of the mutated site and as the magnitude of amino acid substitution, respectively, and yield the percentage loss of wild-type activity. In five CAGI challenges, EA consistently performed on par or better than sophisticated machine learning approaches. This objective assessment suggests that a simple differential model of evolution can interpret the fitness effect of coding variations, opening diverse clinical applications.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas.,Department of Pharmacology, Baylor College of Medicine, Houston, Texas.,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
8
|
Abstract
Protein-ligand binding site prediction methods aim to predict, from amino acid sequence, protein-ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein-ligand interactions has become extremely important to help determine a protein's functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein-ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein-ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein-ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation.
Collapse
|
9
|
Roche DB, Brackenridge DA, McGuffin LJ. Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods. Int J Mol Sci 2015; 16:29829-42. [PMID: 26694353 PMCID: PMC4691145 DOI: 10.3390/ijms161226202] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Revised: 12/02/2015] [Accepted: 12/10/2015] [Indexed: 01/14/2023] Open
Abstract
Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.
Collapse
Affiliation(s)
- Daniel Barry Roche
- Institut de Biologie Computationnelle, LIRMM, CNRS, Université de Montpellier, Montpellier 34095, France.
- Centre de Recherche de Biochimie Macromoléculaire, CNRS-UMR 5237, Montpellier 34293, France.
| | | | | |
Collapse
|
10
|
Maghawry HA, Mostafa MGM, Gharib TF. A new protein structure representation for efficient protein function prediction. J Comput Biol 2015; 21:936-46. [PMID: 25343279 DOI: 10.1089/cmb.2014.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.
Collapse
Affiliation(s)
- Huda A Maghawry
- 1 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University , Cairo, Egypt
| | | | | |
Collapse
|
11
|
Neskey DM, Osman AA, Ow TJ, Katsonis P, McDonald T, Hicks SC, Hsu TK, Pickering CR, Ward A, Patel A, Yordy JS, Skinner HD, Giri U, Sano D, Story MD, Beadle BM, El-Naggar AK, Kies MS, William WN, Caulin C, Frederick M, Kimmel M, Myers JN, Lichtarge O. Evolutionary Action Score of TP53 Identifies High-Risk Mutations Associated with Decreased Survival and Increased Distant Metastases in Head and Neck Cancer. Cancer Res 2015; 75:1527-36. [PMID: 25634208 DOI: 10.1158/0008-5472.can-14-2735] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Accepted: 12/02/2014] [Indexed: 01/25/2023]
Abstract
TP53 is the most frequently altered gene in head and neck squamous cell carcinoma, with mutations occurring in over two-thirds of cases, but the prognostic significance of these mutations remains elusive. In the current study, we evaluated a novel computational approach termed evolutionary action (EAp53) to stratify patients with tumors harboring TP53 mutations as high or low risk, and validated this system in both in vivo and in vitro models. Patients with high-risk TP53 mutations had the poorest survival outcomes and the shortest time to the development of distant metastases. Tumor cells expressing high-risk TP53 mutations were more invasive and tumorigenic and they exhibited a higher incidence of lung metastases. We also documented an association between the presence of high-risk mutations and decreased expression of TP53 target genes, highlighting key cellular pathways that are likely to be dysregulated by this subset of p53 mutations that confer particularly aggressive tumor behavior. Overall, our work validated EAp53 as a novel computational tool that may be useful in clinical prognosis of tumors harboring p53 mutations.
Collapse
Affiliation(s)
- David M Neskey
- Department of Otolaryngology Head and Neck Surgery, Hollings Cancer Center, Medical University of South Carolina, Charleston, South Carolina
| | - Abdullah A Osman
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Thomas J Ow
- Department of Otolaryngology Head and Neck Surgery, Albert Einstein School of Medicine, Yeshiva University, New York, New York
| | - Panagiotis Katsonis
- Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, Texas
| | | | - Stephanie C Hicks
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Teng-Kuei Hsu
- Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, Texas
| | - Curtis R Pickering
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Alexandra Ward
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Ameeta Patel
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - John S Yordy
- Radiation Oncology, UT Southwestern Medical Center, Dallas, Texas
| | - Heath D Skinner
- Department of Thoracic Radiation Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Uma Giri
- Department of Experimental Radiation Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Daisuke Sano
- Department of Otolaryngology-Head and Neck Surgery, Yokahama University, Yokahama, Japan
| | - Michael D Story
- Radiation Oncology, UT Southwestern Medical Center, Dallas, Texas. Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, Texas
| | - Beth M Beadle
- Department of Head and Neck Radiation Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Adel K El-Naggar
- Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Merrill S Kies
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - William N William
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Carlos Caulin
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Mitchell Frederick
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
| | - Marek Kimmel
- Department of Statistics, Rice University, Houston, Texas
| | - Jeffrey N Myers
- Department of Head and Neck Surgery, The University of Texas M. D. Anderson Cancer Center, Houston, Texas.
| | - Olivier Lichtarge
- Department of Human and Molecular Genetics, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
12
|
Katsonis P, Lichtarge O. A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness. Genome Res 2014; 24:2050-8. [PMID: 25217195 PMCID: PMC4248321 DOI: 10.1101/gr.176214.114] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The relationship between genotype mutations and phenotype variations determines health in the short term and evolution over the long term, and it hinges on the action of mutations on fitness. A fundamental difficulty in determining this action, however, is that it depends on the unique context of each mutation, which is complex and often cryptic. As a result, the effect of most genome variations on molecular function and overall fitness remains unknown and stands apart from population genetics theories linking fitness effect to polymorphism frequency. Here, we hypothesize that evolution is a continuous and differentiable physical process coupling genotype to phenotype. This leads to a formal equation for the action of coding mutations on fitness that can be interpreted as a product of the evolutionary importance of the mutated site with the difference in amino acid similarity. Approximations for these terms are readily computable from phylogenetic sequence analysis, and we show mutational, clinical, and population genetic evidence that this action equation predicts the effect of point mutations in vivo and in vitro in diverse proteins, correlates disease-causing gene mutations with morbidity, and determines the frequency of human coding polymorphisms, respectively. Thus, elementary calculus and phylogenetics can be integrated into a perturbation analysis of the evolutionary relationship between genotype and phenotype that quantitatively links point mutations to function and fitness and that opens a new analytic framework for equations of biology. In practice, this work explicitly bridges molecular evolution with population genetics with applications from protein redesign to the clinical assessment of human genetic variations.
Collapse
Affiliation(s)
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Department of Biochemistry & Molecular Biology, Department of Pharmacology, Baylor College of Medicine, Houston, Texas 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
13
|
Computational Approaches and Resources in Single Amino Acid Substitutions Analysis Toward Clinical Research. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:365-423. [DOI: 10.1016/b978-0-12-800168-4.00010-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
14
|
Prediction and experimental validation of enzyme substrate specificity in protein structures. Proc Natl Acad Sci U S A 2013; 110:E4195-202. [PMID: 24145433 DOI: 10.1073/pnas.1305162110] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Structural Genomics aims to elucidate protein structures to identify their functions. Unfortunately, the variation of just a few residues can be enough to alter activity or binding specificity and limit the functional resolution of annotations based on sequence and structure; in enzymes, substrates are especially difficult to predict. Here, large-scale controls and direct experiments show that the local similarity of five or six residues selected because they are evolutionarily important and on the protein surface can suffice to identify an enzyme activity and substrate. A motif of five residues predicted that a previously uncharacterized Silicibacter sp. protein was a carboxylesterase for short fatty acyl chains, similar to hormone-sensitive-lipase-like proteins that share less than 20% sequence identity. Assays and directed mutations confirmed this activity and showed that the motif was essential for catalysis and substrate specificity. We conclude that evolutionary and structural information may be combined on a Structural Genomics scale to create motifs of mixed catalytic and noncatalytic residues that identify enzyme activity and substrate specificity.
Collapse
|
15
|
Wilkins AD, Venner E, Marciano DC, Erdin S, Atri B, Lua RC, Lichtarge O. Accounting for epistatic interactions improves the functional analysis of protein structures. Bioinformatics 2013; 29:2714-21. [PMID: 24021383 PMCID: PMC3799481 DOI: 10.1093/bioinformatics/btt489] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Motivation: The constraints under which sequence, structure and function coevolve are not fully understood. Bringing this mutual relationship to light can reveal the molecular basis of binding, catalysis and allostery, thereby identifying function and rationally guiding protein redesign. Underlying these relationships are the epistatic interactions that occur when the consequences of a mutation to a protein are determined by the genetic background in which it occurs. Based on prior data, we hypothesize that epistatic forces operate most strongly between residues nearby in the structure, resulting in smooth evolutionary importance across the structure. Methods and Results: We find that when residue scores of evolutionary importance are distributed smoothly between nearby residues, functional site prediction accuracy improves. Accordingly, we designed a novel measure of evolutionary importance that focuses on the interaction between pairs of structurally neighboring residues. This measure that we term pair-interaction Evolutionary Trace yields greater functional site overlap and better structure-based proteome-wide functional predictions. Conclusions: Our data show that the structural smoothness of evolutionary importance is a fundamental feature of the coevolution of sequence, structure and function. Mutations operate on individual residues, but selective pressure depends in part on the extent to which a mutation perturbs interactions with neighboring residues. In practice, this principle led us to redefine the importance of a residue in terms of the importance of its epistatic interactions with neighbors, yielding better annotation of functional residues, motivating experimental validation of a novel functional site in LexA and refining protein function prediction. Contact:lichtarge@bcm.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, CIBR Center for Computational and Integrative Biomedical Research and Program in Structural and Computational Biology & Molecular Biophysics, Baylor College of Medicine, Houston, TX 77030 and Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | | | | | | | | | | | | |
Collapse
|
16
|
Erdin S, Venner E, Lisewski AM, Lichtarge O. Function prediction from networks of local evolutionary similarity in protein structure. BMC Bioinformatics 2013; 14 Suppl 3:S6. [PMID: 23514548 PMCID: PMC3584919 DOI: 10.1186/1471-2105-14-s3-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary. RESULTS Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy. CONCLUSIONS We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Eric Venner
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
17
|
Skolnick J, Zhou H, Gao M. Are predicted protein structures of any value for binding site prediction and virtual ligand screening? Curr Opin Struct Biol 2013; 23:191-7. [PMID: 23415854 DOI: 10.1016/j.sbi.2013.01.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 01/04/2013] [Accepted: 01/23/2013] [Indexed: 01/03/2023]
Abstract
The recently developed field of ligand homology modeling (LHM) that extends the ideas of protein homology modeling to the prediction of ligand binding sites and for use in virtual ligand screening has emerged as a powerful new approach. Unlike traditional docking methodologies, LHM can be applied to low-to-moderate resolution predicted as well as experimental structures with little if any diminution in performance; thereby enabling ≈ 75% of an average proteome to have potentially significant virtual screening predictions. In large scale benchmarking, LHM is able to predict off-target ligand binding. Thus, despite the widespread belief to the contrary, low-to-moderate resolution predicted structures have considerable utility for biochemical function prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA.
| | | | | |
Collapse
|
18
|
Furnham N, Laskowski RA, Thornton JM. Abstracting knowledge from the protein data bank. Biopolymers 2012; 99:183-8. [DOI: 10.1002/bip.22107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 05/25/2012] [Indexed: 12/27/2022]
|
19
|
Durston KK, Chiu DKY, Wong AKC, Li GCL. Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2012; 2012:8. [PMID: 22793672 PMCID: PMC3524763 DOI: 10.1186/1687-4153-2012-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 05/29/2012] [Indexed: 11/10/2022]
Abstract
UNLABELLED BACKGROUND Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. RESULTS The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. CONCLUSIONS Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
Collapse
Affiliation(s)
- Kirk K Durston
- School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - David KY Chiu
- School of Computer Science, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Andrew KC Wong
- Department of System Design Engineering, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| | - Gary CL Li
- Department of System Design Engineering, University of Waterloo, 200 University Ave. W, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
20
|
Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol 2012; 22:316-25. [PMID: 22633559 DOI: 10.1016/j.sbi.2012.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 05/01/2012] [Indexed: 01/13/2023]
Abstract
With genomic data skyrocketing, their biological interpretation remains a serious challenge. Diverse computational methods address this problem by pointing to the existence of recurrent patterns among sequence, structure, and function. These patterns emerge naturally from evolutionary variation, natural selection, and divergence--the defining features of biological systems--and they identify molecular events and shapes that underlie specificity of function and allosteric communication. Here we review these methods, and the patterns they identify in case studies and in proteome-wide applications, to infer and rationally redesign function.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | |
Collapse
|
21
|
da Fonsêca MM, Zaha A, Caffarena ER, Vasconcelos ATR. Structure-based functional inference of hypothetical proteins from Mycoplasma hyopneumoniae. J Mol Model 2012; 18:1917-25. [PMID: 21870198 PMCID: PMC3340535 DOI: 10.1007/s00894-011-1212-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 08/05/2011] [Indexed: 10/27/2022]
Abstract
Enzootic pneumonia caused by Mycoplasma hyopneumoniae is a major constraint to efficient pork production throughout the world. This pathogen has a small genome with 716 coding sequences, of which 418 are homologous to proteins with known functions. However, almost 42% of the 716 coding sequences are annotated as hypothetical proteins. Alternative methodologies such as threading and comparative modeling can be used to predict structures and functions of such hypothetical proteins. Often, these alternative methods can answer questions about the properties of a model system faster than experiments. In this study, we predicted the structures of seven proteins annotated as hypothetical in M. hyopneumoniae, using the structure-based approaches mentioned above. Three proteins were predicted to be involved in metabolic processes, two proteins in transcription and two proteins where no function could be assigned. However, the modeled structures of the last two proteins suggested experimental designs to identify their functions. Our findings are important in diminishing the gap between the lack of annotation of important metabolic pathways and the great number of hypothetical proteins in the M. hyopneumoniae genome.
Collapse
Affiliation(s)
- Marbella Maria da Fonsêca
- Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ Brazil
- Laboratório Nacional de Computação Científica, Laboratório de Bioinformática, Petrópolis, 25651-075 RJ Brazil
| | - Arnaldo Zaha
- Laboratório de Genômica Estrutural e Funcional, Centro de Biotecnologia, UFRGS, Porto Alegre, RS Brazil
| | - Ernesto R. Caffarena
- Programa de Computação Científica, Fundação Oswaldo Cruz, Rio de Janeiro, RJ Brazil
| | | |
Collapse
|
22
|
Ueno K, Mineta K, Ito K, Endo T. Exploring functionally related enzymes using radially distributed properties of active sites around the reacting points of bound ligands. BMC STRUCTURAL BIOLOGY 2012; 12:5. [PMID: 22536854 PMCID: PMC3408369 DOI: 10.1186/1472-6807-12-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 04/26/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND Structural genomics approaches, particularly those solving the 3D structures of many proteins with unknown functions, have increased the desire for structure-based function predictions. However, prediction of enzyme function is difficult because one member of a superfamily may catalyze a different reaction than other members, whereas members of different superfamilies can catalyze the same reaction. In addition, conformational changes, mutations or the absence of a particular catalytic residue can prevent inference of the mechanism by which catalytic residues stabilize and promote the elementary reaction. A major hurdle for alignment-based methods for prediction of function is the absence (despite its importance) of a measure of similarity of the physicochemical properties of catalytic sites. To solve this problem, the physicochemical features radially distributed around catalytic sites should be considered in addition to structural and sequence similarities. RESULTS We showed that radial distribution functions (RDFs), which are associated with the local structural and physicochemical properties of catalytic active sites, are capable of clustering oxidoreductases and transferases by function. The catalytic sites of these enzymes were also characterized using the RDFs. The RDFs provided a measure of the similarity among the catalytic sites, detecting conformational changes caused by mutation of catalytic residues. Furthermore, the RDFs reinforced the classification of enzyme functions based on conventional sequence and structural alignments. CONCLUSIONS Our results demonstrate that the application of RDFs provides advantages in the functional classification of enzymes by providing information about catalytic sites.
Collapse
Affiliation(s)
- Keisuke Ueno
- Division of Bioinformatics, Hokkaido University Research Center for Zoonosis Control, North 20 West 10, Sapporo, Hokkaido 001-0020, Japan
| | | | | | | |
Collapse
|
23
|
Janda JO, Busch M, Kück F, Porfenenko M, Merkl R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 2012; 13:55. [PMID: 22480135 PMCID: PMC3391178 DOI: 10.1186/1471-2105-13-55] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/05/2012] [Indexed: 11/12/2022] Open
Abstract
Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about residue-positions important for a set of homologous proteins and focuses on conservation and abundance signals. Thus, the algorithm can be applied in cases where function cannot be transferred from well-characterized proteins by means of sequence comparison.
Collapse
Affiliation(s)
- Jan-Oliver Janda
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, 93040 Regensburg, Germany.
| | | | | | | | | |
Collapse
|
24
|
Abstract
The evolutionary trace (ET) is the single most validated approach to identify protein functional determinants and to target mutational analysis, protein engineering and drug design to the most relevant sites of a protein. It applies to the entire proteome; its predictions come with a reliability score; and its results typically reach significance in most protein families with 20 or more sequence homologs. In order to identify functional hot spots, ET scans a multiple sequence alignment for residue variations that correlate with major evolutionary divergences. In case studies this enables the selective separation, recoding, or mimicry of functional sites and, on a large scale, this enables specific function predictions based on motifs built from select ET-identified residues. ET is therefore an accurate, scalable and efficient method to identify the molecular determinants of protein function and to direct their rational perturbation for therapeutic purposes. Public ET servers are located at: http://mammoth.bcm.tmc.edu/.
Collapse
|
25
|
Erdin S, Lisewski AM, Lichtarge O. Protein function prediction: towards integration of similarity metrics. Curr Opin Struct Biol 2011; 21:180-8. [PMID: 21353529 PMCID: PMC3120633 DOI: 10.1016/j.sbi.2011.02.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Accepted: 02/03/2011] [Indexed: 11/16/2022]
Abstract
Genomic centers discover increasingly many protein sequences and structures, but not necessarily their full biological functions. Thus, currently, less than one percent of proteins have experimentally verified biochemical activities. To fill this gap, function prediction algorithms apply metrics of similarity between proteins on the premise that those sufficiently alike in sequence, or structure, will perform identical functions. Although high sensitivity is elusive, network analyses that integrate these metrics together hold the promise of rapid gains in function prediction specificity.
Collapse
Affiliation(s)
- Serkan Erdin
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, 1 Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
26
|
Brylinski M, Skolnick J. FINDSITE-metal: integrating evolutionary information and machine learning for structure-based metal-binding site prediction at the proteome level. Proteins 2011; 79:735-51. [PMID: 21287609 PMCID: PMC3060289 DOI: 10.1002/prot.22913] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2010] [Revised: 09/27/2010] [Accepted: 10/07/2010] [Indexed: 12/13/2022]
Abstract
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure-based approaches showing considerable promise. In this article, we present FINDSITE-metal, a new threading-based method designed specifically to detect metal-binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE-metal. Combining structure/evolutionary information with machine learning results in highly accurate metal-binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal-binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE-metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome-wide application of FINDSITE-metal that quantifies the metal-binding complement of the human proteome. FINDSITE-metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite-metal/.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
27
|
Venner E, Lisewski AM, Erdin S, Ward RM, Amin SR, Lichtarge O. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities. PLoS One 2010; 5:e14286. [PMID: 21179190 PMCID: PMC3001439 DOI: 10.1371/journal.pone.0014286] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 11/10/2010] [Indexed: 12/24/2022] Open
Abstract
High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC) levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.
Collapse
Affiliation(s)
- Eric Venner
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
| | - Shivas R. Amin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- W. M. Keck Center for Interdisciplinary Bioscience Training, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
28
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
29
|
Gracy J, Chiche L. Optimizing structural modeling for a specific protein scaffold: knottins or inhibitor cystine knots. BMC Bioinformatics 2010; 11:535. [PMID: 21029427 PMCID: PMC2984590 DOI: 10.1186/1471-2105-11-535] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Accepted: 10/28/2010] [Indexed: 12/03/2022] Open
Abstract
Background Knottins are small, diverse and stable proteins with important drug design potential. They can be classified in 30 families which cover a wide range of sequences (1621 sequenced), three-dimensional structures (155 solved) and functions (> 10). Inter knottin similarity lies mainly between 15% and 40% sequence identity and 1.5 to 4.5 Å backbone deviations although they all share a tightly knotted disulfide core. This important variability is likely to arise from the highly diverse loops which connect the successive knotted cysteines. The prediction of structural models for all knottin sequences would open new directions for the analysis of interaction sites and to provide a better understanding of the structural and functional organization of proteins sharing this scaffold. Results We have designed an automated modeling procedure for predicting the three-dimensionnal structure of knottins. The different steps of the homology modeling pipeline were carefully optimized relatively to a test set of knottins with known structures: template selection and alignment, extraction of structural constraints and model building, model evaluation and refinement. After optimization, the accuracy of predicted models was shown to lie between 1.50 and 1.96 Å from native structures at 50% and 10% maximum sequence identity levels, respectively. These average model deviations represent an improvement varying between 0.74 and 1.17 Å over a basic homology modeling derived from a unique template. A database of 1621 structural models for all known knottin sequences was generated and is freely accessible from our web server at http://knottin.cbs.cnrs.fr. Models can also be interactively constructed from any knottin sequence using the structure prediction module Knoter1D3D available from our protein analysis toolkit PAT at http://pat.cbs.cnrs.fr. Conclusions This work explores different directions for a systematic homology modeling of a diverse family of protein sequences. In particular, we have shown that the accuracy of the models constructed at a low level of sequence identity can be improved by 1) a careful optimization of the modeling procedure, 2) the combination of multiple structural templates and 3) the use of conserved structural features as modeling restraints.
Collapse
Affiliation(s)
- Jérôme Gracy
- CNRS, UMR5048, Université Montpellier 1 et 2, Centre de Biochimie Structurale, 34090 Montpellier, France.
| | | |
Collapse
|
30
|
Lua RC, Lichtarge O. PyETV: a PyMOL evolutionary trace viewer to analyze functional site predictions in protein complexes. Bioinformatics 2010; 26:2981-2. [PMID: 20929911 DOI: 10.1093/bioinformatics/btq566] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY PyETV is a PyMOL plugin for viewing, analyzing and manipulating predictions of evolutionarily important residues and sites in protein structures and their complexes. It seamlessly captures the output of the Evolutionary Trace server, namely ranked importance of residues, for multiple chains of a complex. It then yields a high resolution graphical interface showing their distribution and clustering throughout a quaternary structure, including at interfaces. Together with other tools in the popular PyMOL viewer, PyETV thus provides a novel tool to integrate evolutionary forces into the design of experiments targeting the most functionally relevant sites of a protein. AVAILABILITY The PyETV module is written in Python. Installation instructions and video demonstrations may be found at the URL http://mammoth.bcm.tmc.edu/traceview/HelpDocs/PyETVHelp/pyInstructions.html. CONTACT lichtarge@bcm.tmc.edu.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
31
|
Wilkins AD, Lua R, Erdin S, Ward RM, Lichtarge O. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci 2010; 19:1296-311. [PMID: 20506260 DOI: 10.1002/pro.406] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top-ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top-ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure-function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.
Collapse
Affiliation(s)
- A D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | |
Collapse
|
32
|
Lichtarge O, Wilkins A. Evolution: a guide to perturb protein function and networks. Curr Opin Struct Biol 2010; 20:351-9. [PMID: 20444593 PMCID: PMC2916956 DOI: 10.1016/j.sbi.2010.04.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Accepted: 04/08/2010] [Indexed: 12/11/2022]
Abstract
Protein interactions give rise to networks that control cell fate in health and disease; selective means to probe these interactions are therefore of wide interest. We discuss here Evolutionary Tracing (ET), a comparative method to identify protein functional sites and to guide experiments that selectively block, recode, or mimic their amino acid determinants. These studies suggest, in principle, a scalable approach to perturb individual links in protein networks.
Collapse
Affiliation(s)
- Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | |
Collapse
|
33
|
Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors. Proc Natl Acad Sci U S A 2010; 107:7787-92. [PMID: 20385837 DOI: 10.1073/pnas.0914877107] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
G protein-coupled receptors for dopamine and serotonin control signaling pathways targeted by many psychoactive drugs. A puzzle is how receptors with similar functions and nearly identical binding site structures, such as D2 dopamine receptors and 5-HT2A serotonin receptors, could evolve a mechanism that discriminates stringently in their cellular responses between endogenous neurotransmitters. We used the Difference Evolutionary Trace (Difference-ET) and residue-swapping to uncover two distinct sets of specificity-determining sequence positions. One at the ligand-binding pocket determines the relative affinities for these two ligands, and a distinct, surprising set of positions outside the binding site determines whether a bound ligand can trigger the conformational rearrangement leading to G protein activation. Thus one site specifies affinity while the other encodes a filter for efficacy. These findings demonstrate that allosteric pathways linking distant interactions via alternate conformational states enforce specificity independently of the ligand-binding site, such that either one may be rationally rekeyed to different ligands. The conversion of a dopamine receptor effectively into a serotonin receptor illustrates the plasticity of GPCR signaling during evolution, or in pathological states, and suggests new approaches to drug discovery, targeting both classes of sites.
Collapse
|