1
|
Sardag I, Duvenci ZS, Belkaya S, Timucin E. Rational design of monomeric IL37 variants guided by stability and dynamical analyses of IL37 dimers. Comput Struct Biotechnol J 2024; 23:1854-1863. [PMID: 38882680 PMCID: PMC11177541 DOI: 10.1016/j.csbj.2024.04.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/07/2024] [Accepted: 04/14/2024] [Indexed: 06/18/2024] Open
Abstract
IL37 plays important roles in the regulation of innate immunity and its oligomeric status is critical to these roles. In its monomeric state, IL37 can effectively inhibit the inflammatory response of IL18 by binding to IL18Rα, a capacity lost in its dimeric form, underlining the pivotal role of the oligomeric status of IL37 in its anti-inflammatory action. Until now, two IL37 dimer structures have been deposited in PDB, reflecting a substantial difference in their dimer interfaces. Given this discrepancy, we analyzed the PDB structures of the IL37 dimer (PDB IDs: 6ncu, 5hn1) along with a AF2-multimer prediction by molecular dynamics (MD) simulations. Results showed that the 5hn1 and AF2-predicted dimers have the same interface and stably maintained their conformations throughout simulations, while the recent IL37 dimer (PDB ID: 6ncu) with a different interface did not, proposing a possible issue with the recent IL37 dimer structure (6ncu). Next, focusing on the stable dimer structures, we have identified five critical positions of V71/Y85/I86/E89/S114, three new positions compared to the literature, that would reduce dimer stability without affecting the monomer structure. Two quintuple mutants were tested by MD simulations and showed partial or complete dissociation of the dimer. Overall, the insights gained from this study reinforce the validity of the 5hn1 and AF2 multimer structures, while also advancing our understanding of the IL37 dimer interface through the generation of monomer-locked IL37 variants.
Collapse
Affiliation(s)
- Inci Sardag
- Bogazici University, Department of Molecular Biology and Genetics, Istanbul 34342, Turkey
| | - Zeynep Sevval Duvenci
- Acibadem Mehmet Ali Aydinlar University, Institute of Health Sciences, Department of Biostatistics and Bioinformatics, Istanbul 34752, Turkey
| | - Serkan Belkaya
- Bilkent University, Department of Molecular Biology and Genetics, Ankara 06800, Turkey
- Bilkent University, The National Nanotechnology Research Center (UNAM), Ankara 06800, Turkey
| | - Emel Timucin
- Acibadem Mehmet Ali Aydinlar University, Institute of Health Sciences, Department of Biostatistics and Bioinformatics, Istanbul 34752, Turkey
- Acibadem Mehmet Ali Aydinlar University, School of Medicine, Biostatistics and Medical Informatics, Istanbul 34752, Turkey
| |
Collapse
|
2
|
Vorontsov IE, Kozin I, Abramov S, Boytsov A, Jolma A, Albu M, Ambrosini G, Faltejskova K, Gralak AJ, Gryzunov N, Inukai S, Kolmykov S, Kravchenko P, Kribelbauer-Swietek JF, Laverty KU, Nozdrin V, Patel ZM, Penzar D, Plescher ML, Pour SE, Razavi R, Yang AWH, Yevshin I, Zinkevich A, Weirauch MT, Bucher P, Deplancke B, Fornes O, Grau J, Grosse I, Kolpakov FA, Makeev VJ, Hughes TR, Kulakovskiy IV. Cross-platform DNA motif discovery and benchmarking to explore binding specificities of poorly studied human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.619379. [PMID: 39605530 PMCID: PMC11601219 DOI: 10.1101/2024.11.11.619379] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
A DNA sequence pattern, or "motif", is an essential representation of DNA-binding specificity of a transcription factor (TF). Any particular motif model has potential flaws due to shortcomings of the underlying experimental data and computational motif discovery algorithm. As a part of the Codebook/GRECO-BIT initiative, here we evaluated at large scale the cross-platform recognition performance of positional weight matrices (PWMs), which remain popular motif models in many practical applications. We applied ten different DNA motif discovery tools to generate PWMs from the "Codebook" data comprised of 4,237 experiments from five different platforms profiling the DNA-binding specificity of 394 human proteins, focusing on understudied transcription factors of different structural families. For many of the proteins, there was no prior knowledge of a genuine motif. By benchmarking-supported human curation, we constructed an approved subset of experiments comprising about 30% of all experiments and 50% of tested TFs which displayed consistent motifs across platforms and replicates. We present the Codebook Motif Explorer (https://mex.autosome.org), a detailed online catalog of DNA motifs, including the top-ranked PWMs, and the underlying source and benchmarking data. We demonstrate that in the case of high-quality experimental data, most of the popular motif discovery tools detect valid motifs and generate PWMs, which perform well both on genomic and synthetic data. Yet, for each of the algorithms, there were problematic combinations of proteins and platforms, and the basic motif properties such as nucleotide composition and information content offered little help in detecting such pitfalls. By combining multiple PMWs in decision trees, we demonstrate how our setup can be readily adapted to train and test binding specificity models more complex than PWMs. Overall, our study provides a rich motif catalog as a solid baseline for advanced models and highlights the power of the multi-platform multi-tool approach for reliable mapping of DNA binding specificities.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
| | - Ivan Kozin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121, Seattle, WA, USA
| | - Arttu Jolma
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Katerina Faltejskova
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, 160 00 Praha 6, Czech Republic
- Computer Science Institute, Faculty of Mathematics and Physics, Charles University, 118 00 Praha 1, Czech Republic
| | - Antoni J Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Nikita Gryzunov
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Sachi Inukai
- Chugai Pharmaceutical Co., Ltd, Tokyo, 103-8324, Japan
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
| | | | - Judith F Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Kaitlin U Laverty
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Vladimir Nozdrin
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Zain M Patel
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Marie-Luise Plescher
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Sara E Pour
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Rozita Razavi
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ally W H Yang
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | | | - Arsenii Zinkevich
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | | | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Fedor A Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340, Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Moscow Center for Advanced Studies, 123592, Moscow, Russia
| | - Timothy R Hughes
- Donnelly Centre and Department of Molecular Genetics, Toronto, ON M5S 3E1, Canada
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Life Improvement by Future Technologies (LIFT) Center, 121205, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| |
Collapse
|
3
|
O’Brien BCV, Thao S, Weber L, Danielson HL, Boldt AD, Hueffer K, Weltzin MM. The human alpha7 nicotinic acetylcholine receptor is a host target for the rabies virus glycoprotein. Front Cell Infect Microbiol 2024; 14:1394713. [PMID: 38836054 PMCID: PMC11148329 DOI: 10.3389/fcimb.2024.1394713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 04/29/2024] [Indexed: 06/06/2024] Open
Abstract
The rabies virus enters the nervous system by interacting with several molecular targets on host cells to modify behavior and trigger receptor-mediated endocytosis of the virion by poorly understood mechanisms. The rabies virus glycoprotein (RVG) interacts with the muscle acetylcholine receptor and the neuronal α4β2 subtype of the nicotinic acetylcholine receptor (nAChR) family by the putative neurotoxin-like motif. Given that the neurotoxin-like motif is highly homologous to the α7 nAChR subtype selective snake toxin α-bungarotoxin (αBTX), other nAChR subtypes are likely involved. The purpose of this study is to determine the activity of the RVG neurotoxin-like motif on nAChR subtypes that are expressed in brain regions involved in rabid animal behavior. nAChRs were expressed in Xenopus laevis oocytes, and two-electrode voltage clamp electrophysiology was used to collect concentration-response data to measure the functional effects. The RVG peptide preferentially and completely inhibits α7 nAChR ACh-induced currents by a competitive antagonist mechanism. Tested heteromeric nAChRs are also inhibited, but to a lesser extent than the α7 subtype. Residues of the RVG peptide with high sequence homology to αBTX and other neurotoxins were substituted with alanine. Altered RVG neurotoxin-like peptides showed that residues phenylalanine 192, arginine 196, and arginine 199 are important determinants of RVG peptide apparent potency on α7 nAChRs, while serine 195 is not. The evaluation of the rabies ectodomain reaffirmed the observations made with the RVG peptide, illustrating a significant inhibitory impact on α7 nAChR with potency in the nanomolar range. In a mammalian cell culture model of neurons, we confirm that the RVG peptide binds preferentially to cells expressing the α7 nAChR. Defining the activity of the RVG peptide on nAChRs expands our understanding of basic mechanisms in host-pathogen interactions that result in neurological disorders.
Collapse
Affiliation(s)
- Brittany C. V. O’Brien
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Shelly Thao
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Lahra Weber
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Helen L. Danielson
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Agatha D. Boldt
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Karsten Hueffer
- Department of Veterinary Medicine, University of Alaska Fairbanks, Fairbanks, AK, United States
| | - Maegan M. Weltzin
- Department of Chemistry and Biochemistry, University of Alaska Fairbanks, Fairbanks, AK, United States
| |
Collapse
|
4
|
Wenzel M, Grüner E, Strodthoff N. Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 2024; 40:btae031. [PMID: 38244570 PMCID: PMC10950482 DOI: 10.1093/bioinformatics/btae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/14/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
MOTIVATION We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. RESULTS The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. AVAILABILITY AND IMPLEMENTATION Source code can be accessed at https://github.com/markuswenzel/xai-proteins.
Collapse
Affiliation(s)
- Markus Wenzel
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Erik Grüner
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Nils Strodthoff
- School VI - Medicine and Health Services, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, 26129 Oldenburg, Germany
| |
Collapse
|
5
|
Hall MWJ, Shorthouse D, Alcraft R, Jones PH, Hall BA. Mutations observed in somatic evolution reveal underlying gene mechanisms. Commun Biol 2023; 6:753. [PMID: 37468606 PMCID: PMC10356810 DOI: 10.1038/s42003-023-05136-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/11/2023] [Indexed: 07/21/2023] Open
Abstract
Highly sensitive DNA sequencing techniques have allowed the discovery of large numbers of somatic mutations in normal tissues. Some mutations confer a competitive advantage over wild-type cells, generating expanding clones that spread through the tissue. Competition between mutant clones leads to selection. This process can be considered a large scale, in vivo screen for mutations increasing cell fitness. It follows that somatic missense mutations may offer new insights into the relationship between protein structure, function and cell fitness. We present a flexible statistical method for exploring the selection of structural features in data sets of somatic mutants. We show how this approach can evidence selection of specific structural features in key drivers in aged tissues. Finally, we show how drivers may be classified as fitness-enhancing and fitness-suppressing through different patterns of mutation enrichment. This method offers a route to understanding the mechanism of protein function through in vivo mutant selection.
Collapse
Affiliation(s)
| | - David Shorthouse
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London, WC1E 6BT, UK
| | - Rachel Alcraft
- Advanced Research Computing, University College London, London, UK
| | - Philip H Jones
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Oncology, University of Cambridge, Cambridge, CB2 0XZ, UK
| | - Benjamin A Hall
- Department of Medical Physics and Biomedical Engineering, Malet Place Engineering Building, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
6
|
Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Embeddings from protein language models predict conservation and variant effects. Hum Genet 2022; 141:1629-1647. [PMID: 34967936 PMCID: PMC8716573 DOI: 10.1007/s00439-021-02411-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]
Abstract
The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.
Collapse
Affiliation(s)
- Céline Marquet
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Kyra Erckert
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Michael Bernhofer
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Dmitrii Nechaev
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
7
|
Comprehensive mutagenesis identifies the peptide repertoire of a p53 T-cell receptor mimic antibody that displays no toxicity in mice transgenic for human HLA-A*0201. PLoS One 2021; 16:e0249967. [PMID: 33836029 PMCID: PMC8034716 DOI: 10.1371/journal.pone.0249967] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 03/11/2021] [Indexed: 11/24/2022] Open
Abstract
T-cell receptor mimic (TCRm) antibodies have expanded the repertoire of antigens targetable by monoclonal antibodies, to include peptides derived from intracellular proteins that are presented by major histocompatibility complex class I (MHC-I) molecules on the cell surface. We have previously used this approach to target p53, which represents a valuable target for cancer immunotherapy because of the high frequency of its deregulation by mutation or other mechanisms. The T1-116C TCRm antibody targets the wild type p5365-73 peptide (RMPEAAPPV) presented by HLA-A*0201 (HLA-A2) and exhibited in vivo efficacy against triple receptor negative breast cancer xenografts. Here we report a comprehensive mutational analysis of the p53 RMPEAAPPV peptide to assess the T1-116C epitope and its peptide specificity. Antibody binding absolutely required the N-terminal arginine residue, while amino acids in the center of the peptide contributed little to specificity. Data mining the immune epitope database with the T1-116C binding consensus and validation of peptide recognition using the T2 stabilization assay identified additional tumor antigens targeted by T1-116C, including WT1, gp100, Tyrosinase and NY-ESO-1. Most peptides recognized by T1-116C were conserved in mice and human HLA-A2 transgenic mice showed no toxicity when treated with T1-116C in vivo. We conclude that comprehensive validation of TCRm antibody target specificity is critical for assessing their safety profile.
Collapse
|
8
|
Corces MR, Shcherbina A, Kundu S, Gloudemans MJ, Frésard L, Granja JM, Louie BH, Eulalio T, Shams S, Bagdatli ST, Mumbach MR, Liu B, Montine KS, Greenleaf WJ, Kundaje A, Montgomery SB, Chang HY, Montine TJ. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer's and Parkinson's diseases. Nat Genet 2020; 52:1158-1168. [PMID: 33106633 PMCID: PMC7606627 DOI: 10.1038/s41588-020-00721-x] [Citation(s) in RCA: 240] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Accepted: 09/18/2020] [Indexed: 02/06/2023]
Abstract
Genome-wide association studies of neurological diseases have identified thousands of variants associated with disease phenotypes. However, most of these variants do not alter coding sequences, making it difficult to assign their function. Here, we present a multi-omic epigenetic atlas of the adult human brain through profiling of single-cell chromatin accessibility landscapes and three-dimensional chromatin interactions of diverse adult brain regions across a cohort of cognitively healthy individuals. We developed a machine-learning classifier to integrate this multi-omic framework and predict dozens of functional SNPs for Alzheimer's and Parkinson's diseases, nominating target genes and cell types for previously orphaned loci from genome-wide association studies. Moreover, we dissected the complex inverted haplotype of the MAPT (encoding tau) Parkinson's disease risk locus, identifying putative ectopic regulatory interactions in neurons that may mediate this disease association. This work expands understanding of inherited variation and provides a roadmap for the epigenomic dissection of causal regulatory variation in disease.
Collapse
Affiliation(s)
- M Ryan Corces
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - Anna Shcherbina
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Soumya Kundu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Michael J Gloudemans
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Laure Frésard
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Jeffrey M Granja
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Program in Biophysics, Stanford University, Stanford, CA, USA
| | - Bryan H Louie
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
| | - Tiffany Eulalio
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Shadi Shams
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - S Tansu Bagdatli
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Maxwell R Mumbach
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Boxiang Liu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Biology, Stanford University, Stanford, CA, USA
- Baidu Research, Sunnyvale, CA, USA
| | - Kathleen S Montine
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - William J Greenleaf
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
- Program in Epithelial Biology, Stanford University, Stanford, CA, USA.
- Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA.
| | - Thomas J Montine
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
9
|
Qiu J, Nechaev D, Rost B. Protein-protein and protein-nucleic acid binding residues important for common and rare sequence variants in human. BMC Bioinformatics 2020; 21:452. [PMID: 33050876 PMCID: PMC7557062 DOI: 10.1186/s12859-020-03759-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Any two unrelated people differ by about 20,000 missense mutations (also referred to as SAVs: Single Amino acid Variants or missense SNV). Many SAVs have been predicted to strongly affect molecular protein function. Common SAVs (> 5% of population) were predicted to have, on average, more effect on molecular protein function than rare SAVs (< 1% of population). We hypothesized that the prevalence of effect in common over rare SAVs might partially be caused by common SAVs more often occurring at interfaces of proteins with other proteins, DNA, or RNA, thereby creating subgroup-specific phenotypes. We analyzed SAVs from 60,706 people through the lens of two prediction methods, one (SNAP2) predicting the effects of SAVs on molecular protein function, the other (ProNA2020) predicting residues in DNA-, RNA- and protein-binding interfaces. RESULTS Three results stood out. Firstly, SAVs predicted to occur at binding interfaces were predicted to more likely affect molecular function than those predicted as not binding (p value < 2.2 × 10-16). Secondly, for SAVs predicted to occur at binding interfaces, common SAVs were predicted more strongly with effect on protein function than rare SAVs (p value < 2.2 × 10-16). Restriction to SAVs with experimental annotations confirmed all results, although the resulting subsets were too small to establish statistical significance for any result. Thirdly, the fraction of SAVs predicted at binding interfaces differed significantly between tissues, e.g. urinary bladder tissue was found abundant in SAVs predicted at protein-binding interfaces, and reproductive tissues (ovary, testis, vagina, seminal vesicle and endometrium) in SAVs predicted at DNA-binding interfaces. CONCLUSIONS Overall, the results suggested that residues at protein-, DNA-, and RNA-binding interfaces contributed toward predicting that common SAVs more likely affect molecular function than rare SAVs.
Collapse
Affiliation(s)
- Jiajun Qiu
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany. .,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany. .,Biobank of Ninth People's Hospital, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200125, China.
| | - Dmitrii Nechaev
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), 85748, Garching, Germany
| | - Burkhard Rost
- Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany.,Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching, Munich, Germany.,Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354, Freising, Germany
| |
Collapse
|
10
|
Shrikumar A, Prakash E, Kundaje A. GkmExplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs. Bioinformatics 2020; 35:i173-i182. [PMID: 31510661 PMCID: PMC6612808 DOI: 10.1093/bioinformatics/btz322] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
SUMMARY Support Vector Machines with gapped k-mer kernels (gkm-SVMs) have been used to learn predictive models of regulatory DNA sequence. However, interpreting predictive sequence patterns learned by gkm-SVMs can be challenging. Existing interpretation methods such as deltaSVM, in-silico mutagenesis (ISM) or SHAP either do not scale well or make limiting assumptions about the model that can produce misleading results when the gkm kernel is combined with nonlinear kernels. Here, we propose GkmExplain: a computationally efficient feature attribution method for interpreting predictive sequence patterns from gkm-SVM models that has theoretical connections to the method of Integrated Gradients. Using simulated regulatory DNA sequences, we show that GkmExplain identifies predictive patterns with high accuracy while avoiding pitfalls of deltaSVM and ISM and being orders of magnitude more computationally efficient than SHAP. By applying GkmExplain and a recently developed motif discovery method called TF-MoDISco to gkm-SVM models trained on in vivo transcription factor (TF) binding data, we recover consolidated, non-redundant TF motifs. Mutation impact scores derived using GkmExplain consistently outperform deltaSVM and ISM at identifying regulatory genetic variants from gkm-SVM models of chromatin accessibility in lymphoblastoid cell-lines. AVAILABILITY AND IMPLEMENTATION Code and example notebooks to reproduce results are at https://github.com/kundajelab/gkmexplain. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Eva Prakash
- Computer Science, BASIS Independent Silicon Valley, San Jose, CA, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
11
|
Wada M, Hayashi Y, Arai M. Mutational analysis of a catalytically important loop containing active site and substrate-binding site in Escherichia coli phytase AppA. Biosci Biotechnol Biochem 2019; 83:860-868. [DOI: 10.1080/09168451.2019.1571897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
ABSTRACT
A phytase from Escherichia coli, AppA, has been the target of protein engineering to reduce the amount of undigested phosphates from livestock manure by making phosphorous from phytic acid available as a nutrient. To understand the contribution of each amino acid in the active site loop to the AppA activity, alanine and glycine scanning mutagenesis was undertaken. The results of phytase activity assay demonstrated loss of activity by mutations at charged residues within the conserved motif, supporting their importance in catalytic activity. In contrast, both conserved, non-polar residues and non-conserved residues tended to be tolerant to Ala and/or Gly mutations. Correlation analyses of chemical/structural characteristics of each mutation site against mutant activity revealed that the loop residues located closer to the substrate have greater contribution to the activity of AppA. These results may be useful in efficiently engineering AppA to improve its catalytic activity.
Abbreviations: AppA: pH 2.5 acid phosphatase; CSU: contacts of structural units; HAPs: histidine acid phosphatases; SASA: solvent accessible surface area; SDS-PAGE: sodium dodecyl sulfate-polyacrylamide gel electrophoresis; SSM: site-saturation mutagenesis; WT: wild type
Collapse
Affiliation(s)
- Manami Wada
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Yuuki Hayashi
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
| | - Munehito Arai
- Department of Life Sciences, Graduate School of Arts and Sciences, The University of Tokyo, Tokyo, Japan
- Department of Physics, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
12
|
Raimondi D, Orlando G, Tabaro F, Lenaerts T, Rooman M, Moreau Y, Vranken WF. Large-scale in-silico statistical mutagenesis analysis sheds light on the deleteriousness landscape of the human proteome. Sci Rep 2018; 8:16980. [PMID: 30451933 PMCID: PMC6242909 DOI: 10.1038/s41598-018-34959-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 10/26/2018] [Indexed: 12/18/2022] Open
Abstract
Next generation sequencing technologies are providing increasing amounts of sequencing data, paving the way for improvements in clinical genetics and precision medicine. The interpretation of the observed genomic variants in the light of their phenotypic effects is thus emerging as a crucial task to solve in order to advance our understanding of how exomic variants affect proteins and how the proteins' functional changes affect human health. Since the experimental evaluation of the effects of every observed variant is unfeasible, Bioinformatics methods are being developed to address this challenge in-silico, by predicting the impact of millions of variants, thus providing insight into the deleteriousness landscape of entire proteomes. Here we show the feasibility of this approach by using the recently developed DEOGEN2 variant-effect predictor to perform the largest in-silico mutagenesis scan to date. We computed the deleteriousness score of 170 million variants over 15000 human proteins and we analysed the results, investigating how the predicted deleteriousness landscape of the proteins relates to known functionally and structurally relevant protein regions and biophysical properties. Moreover, we qualitatively validated our results by comparing them with two mutagenesis studies targeting two specific proteins, showing the consistency of DEOGEN2 predictions with respect to experimental data.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001, Leuven, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
| | - Francesco Tabaro
- Institute of Biosciences and Medical Technology, Arvo Ylpőn katu 34, 33520, Tampere, Finland
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Machine Learning Group, ULB, La Plaine Campus, 1050, Brussels, Belgium
| | - Marianne Rooman
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium
- Department of BioModeling, BioInformatics & BioProcesses, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, 3001, Leuven, Belgium
- Imec, 3001, Leuven, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, 1050, Brussels, Belgium.
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium.
| |
Collapse
|
13
|
Roxas JL, Monasky RC, Roxas BAP, Agellon AB, Mansoor A, Kaper JB, Vedantam G, Viswanathan V. Enteropathogenic Escherichia coli EspH-Mediated Rho GTPase Inhibition Results in Desmosomal Perturbations. Cell Mol Gastroenterol Hepatol 2018; 6:163-180. [PMID: 30003123 PMCID: PMC6039986 DOI: 10.1016/j.jcmgh.2018.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Accepted: 04/20/2018] [Indexed: 12/27/2022]
Abstract
BACKGROUND & AIMS The diarrheagenic pathogen, enteropathogenic Escherichia coli (EPEC), uses a type III secretion system to deliver effector molecules into intestinal epithelial cells (IECs). While exploring the basis for the lateral membrane separation of EPEC-infected IECs, we observed infection-induced loss of the desmosomal cadherin desmoglein-2 (DSG2). We sought to identify the molecule(s) involved in, and delineate the mechanisms and consequences of, EPEC-induced DSG2 loss. METHODS DSG2 abundance and localization was monitored via immunoblotting and immunofluorescence, respectively. Junctional perturbations were visualized by electron microscopy, and cell-cell adhesion was assessed using dispase assays. EspH alanine-scan mutants as well as pharmacologic agents were used to evaluate impacts on desmosomal alterations. EPEC-mediated DSG2 loss, and its impact on bacterial colonization in vivo, was assessed using a murine model. RESULTS The secreted virulence protein EspH mediates EPEC-induced DSG2 degradation, and contributes to desmosomal perturbation, loss of cell junction integrity, and barrier disruption in infected IECs. EspH sequesters Rho guanine nucleotide exchange factors and inhibits Rho guanosine triphosphatase signaling; EspH mutants impaired for Rho guanine nucleotide exchange factor interaction failed to inhibit RhoA or deplete DSG2. Cytotoxic necrotizing factor 1, which locks Rho guanosine triphosphatase in the active state, jasplakinolide, a molecule that promotes actin polymerization, and the lysosomal inhibitor bafilomycin A, respectively, rescued infected cells from EPEC-induced DSG2 loss. Wild-type EPEC, but not an espH-deficient strain, colonizes mouse intestines robustly, widens paracellular junctions, and induces DSG2 re-localization in vivo. CONCLUSIONS Our studies define the mechanism and consequences of EPEC-induced desmosomal alterations in IECs. These perturbations contribute to the colonization and virulence of EPEC, and likely related pathogens.
Collapse
Key Words
- A/E, attaching and effacing
- BSA, bovine serum albumin
- CM, calcium and magnesium
- DMEM, Dulbecco's modified Eagle medium
- DSC, desmocollin
- DSG, desmoglein
- DSG2
- Desmoglein
- EPEC
- EPEC, enteropathogenic Escherichia coli
- GEF, guanine nucleotide exchange factors
- GTPase, guanosine triphosphatase
- Host–Pathogen Interaction
- IEC, intestinal epithelial cell
- IF, intermediate filament
- PBS, phosphate-buffered saline
- T3SS, type 3 secretion system
- TER, transepithelial electrical resistance
- TJ, tight junction
- WT, wild-type
Collapse
Affiliation(s)
- Jennifer Lising Roxas
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - Ross Calvin Monasky
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - Bryan Angelo P. Roxas
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - Al B. Agellon
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
- BIO5 Institute for Collaborative Research, University of Arizona, Tucson, Arizona
| | - Asad Mansoor
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - James B. Kaper
- University of Maryland School of Medicine, Baltimore, Maryland
| | - Gayatri Vedantam
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
- BIO5 Institute for Collaborative Research, University of Arizona, Tucson, Arizona
- Department of Immunobiology, University of Arizona, Tucson, Arizona
- Southern Arizona VA Healthcare System, Tucson, Arizona
| | - V.K. Viswanathan
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
- BIO5 Institute for Collaborative Research, University of Arizona, Tucson, Arizona
- Department of Immunobiology, University of Arizona, Tucson, Arizona
- Correspondence Address correspondence to: V. K. Viswanathan, PhD, School of Animal and Comparative Biomedical Sciences, 1006 E. Lowell, Building 106, Room 231, University of Arizona, Tucson, Arizona 85721. fax: (520) 621-6366.
| |
Collapse
|
14
|
Pandey B, Grover S, Goyal S, Kumari A, Singh A, Jamal S, Kaur J, Grover A. Alanine mutation of the catalytic sites of Pantothenate Synthetase causes distinct conformational changes in the ATP binding region. Sci Rep 2018; 8:903. [PMID: 29343701 PMCID: PMC5772511 DOI: 10.1038/s41598-017-19075-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 12/19/2017] [Indexed: 02/01/2023] Open
Abstract
The enzyme Pantothenate synthetase (PS) represents a potential drug target in Mycobacterium tuberculosis. Its X-ray crystallographic structure has demonstrated the significance and importance of conserved active site residues including His44, His47, Asn69, Gln72, Lys160 and Gln164 in substrate binding and formation of pantoyl adenylate intermediate. In the current study, molecular mechanism of decreased affinity of the enzyme for ATP caused by alanine mutations was investigated using molecular dynamics (MD) simulations and free energy calculations. A total of seven systems including wild-type + ATP, H44A + ATP, H47A + ATP, N69A + ATP, Q72A + ATP, K160A + ATP and Q164A + ATP were subjected to 50 ns MD simulations. Docking score, MM-GBSA and interaction profile analysis showed weak interactions between ATP (substrate) and PS (enzyme) in H47A and H160A mutants as compared to wild-type, leading to reduced protein catalytic activity. However, principal component analysis (PCA) and free energy landscape (FEL) analysis revealed that ATP was strongly bound to the catalytic core of the wild-type, limiting its movement to form a stable complex as compared to mutants. The study will give insight about ATP binding to the PS at the atomic level and will facilitate in designing of non-reactive analogue of pantoyl adenylate which will act as a specific inhibitor for PS.
Collapse
Affiliation(s)
- Bharati Pandey
- Department of Biotechnology, Panjab University, Chandigarh, 160014, India
| | - Sonam Grover
- Kusuma School of Biological Sciences, Indian Institute of Technology Delhi, New Delhi, 110016, India
| | - Sukriti Goyal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, 304022, India
| | - Anchala Kumari
- Department of Biotechnology, TERI University, VasantKunj, New Delhi, 110070, India
| | - Aditi Singh
- Department of Biotechnology, TERI University, VasantKunj, New Delhi, 110070, India
| | - Salma Jamal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, 304022, India
| | - Jagdeep Kaur
- Department of Biotechnology, Panjab University, Chandigarh, 160014, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067, India.
| |
Collapse
|
15
|
Analysis of Large-Scale Mutagenesis Data To Assess the Impact of Single Amino Acid Substitutions. Genetics 2017; 207:53-61. [PMID: 28751422 PMCID: PMC5586385 DOI: 10.1534/genetics.117.300064] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Accepted: 07/24/2017] [Indexed: 11/18/2022] Open
Abstract
Mutagenesis is a widely used method for identifying protein positions that are important for function or ligand binding. Advances in high-throughput DNA sequencing and mutagenesis techniques have enabled measurement of the effects of nearly all possible amino acid substitutions in many proteins. The resulting large-scale mutagenesis data sets offer a unique opportunity to draw general conclusions about the effects of different amino acid substitutions. Thus, we analyzed 34,373 mutations in 14 proteins whose effects were measured using large-scale mutagenesis approaches. Methionine was the most tolerated substitution, while proline was the least tolerated. We found that several substitutions, including histidine and asparagine, best recapitulated the effects of other substitutions, even when the identity of the wild-type amino acid was considered. The effects of histidine and asparagine substitutions also correlated best with the effects of other substitutions in different structural contexts. Furthermore, highly disruptive substitutions like aspartic and glutamic acid had the most discriminatory power for detecting ligand interface positions. Our work highlights the utility of large-scale mutagenesis data, and our conclusions can help guide future single substitution mutational scans.
Collapse
|
16
|
Edrees BM, Athar M, Abduljaleel Z, Al-Allaf FA, Taher MM, Khan W, Bouazzaoui A, Al-Harbi N, Safar R, Al-Edressi H, Alansary K, Anazi A, Altayeb N, Ahmed MA. Functional alterations due to amino acid changes and evolutionary comparative analysis of ARPKD and ADPKD genes. GENOMICS DATA 2016; 10:127-134. [PMID: 27843768 PMCID: PMC5099264 DOI: 10.1016/j.gdata.2016.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 10/18/2016] [Accepted: 10/30/2016] [Indexed: 12/15/2022]
Abstract
A targeted customized sequencing of genes implicated in autosomal recessive polycystic kidney disease (ARPKD) phenotype was performed to identify candidate variants using the Ion torrent PGM next-generation sequencing. The results identified four potential pathogenic variants in PKHD1 gene [c.4870C > T, p.(Arg1624Trp), c.5725C > T, p.(Arg1909Trp), c.1736C > T, p.(Thr579Met) and c.10628T > G, p.(Leu3543Trp)] among 12 out of 18 samples. However, one variant c.4870C > T, p.(Arg1624Trp) was common among eight patients. Some patient samples also showed few variants in autosomal dominant polycystic kidney disease (ADPKD) disease causing genes PKD1 and PKD2 such as c.12433G > A, p.(Val4145Ile) and c.1445T > G, p.(Phe482Cys), respectively. All causative variants were validated by capillary sequencing and confirmed the presence of a novel homozygous variant c.10628T > G, p.(Leu3543Trp) in a male proband. We have recently published the results of these studies (Edrees et al., 2016). Here we report for the first time the effect of the common mutation p.(Arg1624Trp) found in eight samples on the protein structure and function due to the specific amino acid changes of PKHD1 protein using molecular dynamics simulations. The computational approaches provide tool predict the phenotypic effect of variant on the structure and function of the altered protein. The structural analysis with the common mutation p.(Arg1624Trp) in the native and mutant modeled protein were also studied for solvent accessibility, secondary structure and stabilizing residues to find out the stability of the protein between wild type and mutant forms. Furthermore, comparative genomics and evolutionary analyses of variants observed in PKHD1, PKD1, and PKD2 genes were also performed in some mammalian species including human to understand the complexity of genomes among closely related mammalian species. Taken together, the results revealed that the evolutionary comparative analyses and characterization of PKHD1, PKD1, and PKD2 genes among various related and unrelated mammalian species will provide important insights into their evolutionary process and understanding for further disease characterization and management.
Collapse
Affiliation(s)
- Burhan M Edrees
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Mohammad Athar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Zainularifeen Abduljaleel
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Faisal A Al-Allaf
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Molecular Diagnostics Unit, Department of Laboratory and Blood Bank, King Abdullah Medical City, Makkah 21955, Saudi Arabia
| | - Mohiuddin M Taher
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Wajahatullah Khan
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdulaziz University for Health Sciences, P.O. Box 3660, Riyadh 11426, Saudi Arabia
| | - Abdellatif Bouazzaoui
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia; Science and Technology Unit, Umm Al Qura University, P.O. Box 715, Makkah 21955, Saudi Arabia
| | - Naffaa Al-Harbi
- Department of Pediatric, King Faisal Specialist Hospital and Research Centre, P.O. Box 40047, Jeddah 21499, Saudi Arabia
| | - Ramzia Safar
- Madinah Maternity and Children's Hospital, P.O. Box 5073, Madinah 42318, Saudi Arabia
| | - Howaida Al-Edressi
- Madinah Maternity and Children's Hospital, P.O. Box 5073, Madinah 42318, Saudi Arabia
| | - Khawala Alansary
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Abulkareem Anazi
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Naji Altayeb
- King Fahad Medical City, P.O. Box 59046, Riyadh 11525, Saudi Arabia
| | - Muawia A Ahmed
- King Salman Armed Forces Hospital, P.O. box 100, Tabuk, Saudi Arabia
| |
Collapse
|
17
|
Reeb J, Hecht M, Mahlich Y, Bromberg Y, Rost B. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease. PLoS Comput Biol 2016; 12:e1005047. [PMID: 27536940 PMCID: PMC4990455 DOI: 10.1371/journal.pcbi.1005047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 07/04/2016] [Indexed: 11/19/2022] Open
Abstract
Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. The variations in the genetic sequence between individuals affect the gene-product, i.e. the protein differently. Some variants have no measurable effect (are neutral), while others affect protein function. Some of those effects are so severe they cause so called monogenic Mendelian diseases, i.e. diseases triggered by a single letter change. Some in silico methods predict the molecular impact of sequence variation. However, both experimental and computational analyses struggle to generalize from the effect upon molecular protein function to the effect upon the organism such as a disease. Here, we confirmed that methods predicting molecular effects correctly capture the type of effects causing Mendelian diseases in human and introduced a data set for animal diseases that was also captured by predictions methods. Predicted effects were less when in silico testing human variants in an animal model (here mouse). This is important to know because “mouse models” are common to study human diseases. Overall, we provided some evidence for a link between the molecular level and some type of disease.
Collapse
Affiliation(s)
- Jonas Reeb
- Department of Informatics, Bioinformatics & Computational Biology—i12, Technische Universität München, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Technische Universität München, Garching, Germany
- * E-mail:
| | - Maximilian Hecht
- Department of Informatics, Bioinformatics & Computational Biology—i12, Technische Universität München, Garching/Munich, Germany
| | - Yannick Mahlich
- Department of Informatics, Bioinformatics & Computational Biology—i12, Technische Universität München, Garching/Munich, Germany
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, United States of America
- Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, United States of America
- Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology—i12, Technische Universität München, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany
- Institute for Food and Plant Sciences WZW, Technische Universität München, Weihenstephan, Freising, Germany
| |
Collapse
|
18
|
Melo R, Fieldhouse R, Melo A, Correia JDG, Cordeiro MNDS, Gümüş ZH, Costa J, Bonvin AMJJ, Moreira IS. A Machine Learning Approach for Hot-Spot Detection at Protein-Protein Interfaces. Int J Mol Sci 2016; 17:E1215. [PMID: 27472327 PMCID: PMC5000613 DOI: 10.3390/ijms17081215] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 07/11/2016] [Accepted: 07/18/2016] [Indexed: 12/17/2022] Open
Abstract
Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
Collapse
Affiliation(s)
- Rita Melo
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066 Bobadela LRS, Portugal.
- CNC-Center for Neuroscience and Cell Biology; Rua Larga, Faculdade de Medicina, Polo I, 1ºandar, Universidade de Coimbra, 3004-504 Coimbra, Portugal.
| | - Robert Fieldhouse
- Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - André Melo
- REQUIMTE (Rede de Química e Tecnologia), Faculdade de Ciências da Universidade do Porto, Departamento de Química e Bioquímica, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| | - João D G Correia
- Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, Estrada Nacional 10 (ao km 139,7), 2695-066 Bobadela LRS, Portugal.
| | - Maria Natália D S Cordeiro
- REQUIMTE (Rede de Química e Tecnologia), Faculdade de Ciências da Universidade do Porto, Departamento de Química e Bioquímica, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| | - Zeynep H Gümüş
- Department of Genetics and Genomics and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Joaquim Costa
- CMUP/FCUP, Centro de Matemática da Universidade do Porto, Faculdade de Ciências, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| | - Alexandre M J J Bonvin
- Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Utrecht 3584CH, The Netherlands.
| | - Irina S Moreira
- CNC-Center for Neuroscience and Cell Biology; Rua Larga, Faculdade de Medicina, Polo I, 1ºandar, Universidade de Coimbra, 3004-504 Coimbra, Portugal.
- Bijvoet Center for Biomolecular Research, Faculty of Science-Chemistry, Utrecht University, Utrecht 3584CH, The Netherlands.
| |
Collapse
|
19
|
Al-Allaf FA, Alashwal A, Abduljaleel Z, Taher MM, Siddiqui SS, Bouazzaoui A, Abalkhail H, Aun R, Al-Allaf AF, AbuMansour I, Azhar Z, Ba-Hammam FA, Khan W, Athar M. Identification of a recurrent frameshift mutation at the LDLR exon 14 (c.2027delG, p.(G676Afs*33)) causing familial hypercholesterolemia in Saudi Arab homozygous children. Genomics 2015; 107:24-32. [PMID: 26688439 DOI: 10.1016/j.ygeno.2015.12.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Revised: 12/06/2015] [Accepted: 12/09/2015] [Indexed: 11/25/2022]
Abstract
Familial hypercholesterolemia (FH) is an autosomal dominant disease, predominantly caused by variants in the low-density lipoprotein (LDL) receptor gene (LDLR). Herein, we describe genetic analysis of severely affected homozygous FH patients who were mostly resistant to statin therapy and were managed on an apheresis program. We identified a recurrent frameshift mutation p.(G676Afs*33) in exon 14 of the LDLR gene in 9 probands and their relatives in an apparently unrelated Saudi families. We also describe a three dimensional homology model of the LDL receptor protein (LDLR) structure and examine the consequence of the frameshift mutation p.(G676Afs*33), as this could affect the LDLR structure in a region involved in dimer formation, and protein stability. This finding of a recurrent mutation causing FH in the Saudi population could serve to develop a rapid genetic screening procedure for FH, and the 3D-structure analysis of the mutant LDLR, may provide tools to develop a mechanistic model of the LDLR function.
Collapse
Affiliation(s)
- Faisal A Al-Allaf
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia; Science and Technology Unit, Umm Al-Qura University, Makkah, Saudi Arabia; Molecular Diagnostics Unit, Department of Laboratory and Blood Bank, King Abdullah Medical City, Makkah, Saudi Arabia.
| | - Abdullah Alashwal
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Zainularifeen Abduljaleel
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia; Science and Technology Unit, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Mohiuddin M Taher
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia; Science and Technology Unit, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Shahid S Siddiqui
- Department of Oral and Basic Sciences, Faculty of Dentistry, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Abdellatif Bouazzaoui
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia; Science and Technology Unit, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Hala Abalkhail
- King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Rakan Aun
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | | | - Iman AbuMansour
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Zohor Azhar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Faisal A Ba-Hammam
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Wajahatullah Khan
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdul Aziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohammad Athar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia; Science and Technology Unit, Umm Al-Qura University, Makkah, Saudi Arabia.
| |
Collapse
|
20
|
Abstract
Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web
Collapse
|
21
|
Flores DI, Sotelo-Mundo RR, Brizuela CA. A simple extension to the CMASA method for the prediction of catalytic residues in the presence of single point mutations. PLoS One 2014; 9:e108513. [PMID: 25268770 PMCID: PMC4182483 DOI: 10.1371/journal.pone.0108513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/31/2014] [Indexed: 11/23/2022] Open
Abstract
The automatic identification of catalytic residues still remains an important challenge in structural bioinformatics. Sequence-based methods are good alternatives when the query shares a high percentage of identity with a well-annotated enzyme. However, when the homology is not apparent, which occurs with many structures from the structural genome initiative, structural information should be exploited. A local structural comparison is preferred to a global structural comparison when predicting functional residues. CMASA is a recently proposed method for predicting catalytic residues based on a local structure comparison. The method achieves high accuracy and a high value for the Matthews correlation coefficient. However, point substitutions or a lack of relevant data strongly affect the performance of the method. In the present study, we propose a simple extension to the CMASA method to overcome this difficulty. Extensive computational experiments are shown as proof of concept instances, as well as for a few real cases. The results show that the extension performs well when the catalytic site contains mutated residues or when some residues are missing. The proposed modification could correctly predict the catalytic residues of a mutant thymidylate synthase, 1EVF. It also successfully predicted the catalytic residues for 3HRC despite the lack of information for a relevant side chain atom in the PDB file.
Collapse
Affiliation(s)
- David I. Flores
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, México
| | | | - Carlos A. Brizuela
- Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, México
- * E-mail:
| |
Collapse
|
22
|
Anand P, Nagarajan D, Mukherjee S, Chandra N. ABS-Scan: In silico alanine scanning mutagenesis for binding site residues in protein-ligand complex. F1000Res 2014; 3:214. [PMID: 25685322 DOI: 10.12688/f1000research.5165.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/01/2014] [Indexed: 03/24/2024] Open
Abstract
Most physiological processes in living systems are fundamentally regulated by protein-ligand interactions. Understanding the process of ligand recognition by proteins is a vital activity in molecular biology and biochemistry. It is well known that the residues present at the binding site of the protein form pockets that provide a conducive environment for recognition of specific ligands. In many cases, the boundaries of these sites are not well defined. Here, we provide a web-server to systematically evaluate important residues in the binding site of the protein that contribute towards the ligand recognition through in silico alanine-scanning mutagenesis experiments. Each of the residues present at the binding site is computationally mutated to alanine. The ligand interaction energy is computed for each mutant and the corresponding ΔΔG values are computed by comparing it to the wild type protein, thus evaluating individual residue contributions towards ligand interaction. The server will thus provide clues to researchers about residues to obtain loss-of-function mutations and to understand drug resistant mutations. This web-tool can be freely accessed through the following address: http://proline.biochem.iisc.ernet.in/abscan/.
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Deepesh Nagarajan
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Sumanta Mukherjee
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| |
Collapse
|
23
|
Anand P, Nagarajan D, Mukherjee S, Chandra N. ABS-Scan: In silico alanine scanning mutagenesis for binding site residues in protein-ligand complex. F1000Res 2014; 3:214. [PMID: 25685322 PMCID: PMC4319546 DOI: 10.12688/f1000research.5165.2] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/24/2014] [Indexed: 12/27/2022] Open
Abstract
Most physiological processes in living systems are fundamentally regulated by protein-ligand interactions. Understanding the process of ligand recognition by proteins is a vital activity in molecular biology and biochemistry. It is well known that the residues present at the binding site of the protein form pockets that provide a conducive environment for recognition of specific ligands. In many cases, the boundaries of these sites are not well defined. Here, we provide a web-server to systematically evaluate important residues in the binding site of the protein that contribute towards the ligand recognition through in silico alanine-scanning mutagenesis experiments. Each of the residues present at the binding site is computationally mutated to alanine. The ligand interaction energy is computed for each mutant and the corresponding ΔΔG values are calculated by comparing it to the wild type protein, thus evaluating individual residue contributions towards ligand interaction. The server will thus provide a ranked list of residues to the user in order to obtain loss-of-function mutations. This web-tool can be freely accessed through the following address: http://proline.biochem.iisc.ernet.in/abscan/.
Collapse
Affiliation(s)
- Praveen Anand
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Deepesh Nagarajan
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| | - Sumanta Mukherjee
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore, 560012, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore, 560012, India
| |
Collapse
|
24
|
Abduljaleel Z, Al-Allaf FA, Khan W, Athar M, Shahzad N, Taher MM, Elrobh M, Alanazi MS, El-Huneidi W. Evidence of trem2 variant associated with triple risk of Alzheimer's disease. PLoS One 2014; 9:e92648. [PMID: 24663666 PMCID: PMC3963925 DOI: 10.1371/journal.pone.0092648] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2013] [Accepted: 02/25/2014] [Indexed: 12/27/2022] Open
Abstract
Alzheimer's disease is one of the main causes of dementia among elderly individuals and leads to the neurodegeneration of different areas of the brain, resulting in memory impairments and loss of cognitive functions. Recently, a rare variant that is associated with 3-fold higher risk of Alzheimer's disease onset has been found. The rare variant discovered is a missense mutation in the loop region of exon 2 of Trem2 (rs75932628-T, Arg47His). The aim of this study was to investigate the evidence for potential structural and functional significance of Trem2 gene variant (Arg47His) through molecular dynamics simulations. Our results showed the alteration caused due to the variant in TREM2 protein has significant effect on the ligand binding affinity as well as structural configuration. Based on molecular dynamics (MD) simulation under salvation, the results confirmed that native form of the variant (Arg47His) might be responsible for improved compactness, hence thereby improved protein folding. Protein simulation was carried out at different temperatures. At 300K, the deviation of the theoretical model of TREM2 protein increased from 2.0 Å at 10 ns. In contrast, the deviation of the Arg47His mutation was maintained at 1.2 Å until the end of the simulation (t = 10 ns), which indicated that Arg47His had reached its folded state. The mutant residue was a highly conserved region and was similar to "immunoglobulin V-set" and "immunoglobulin-like folds". Taken together, the result from this study provides a biophysical insight on how the studied variant could contribute to the genetic susceptibility to Alzheimer's disease.
Collapse
Affiliation(s)
- Zainularifeen Abduljaleel
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
- Genome Research Chair Unit, Department of Biochemistry, College of Science, King Saud University, Riyadh, Saudi Arabia
- * E-mail:
| | - Faisal A. Al-Allaf
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Wajahatullah Khan
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohammad Athar
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Naiyer Shahzad
- Department of Pharmacology and Toxicology, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Mohiuddin M. Taher
- Department of Medical Genetics, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Mohamed Elrobh
- Genome Research Chair Unit, Department of Biochemistry, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Mohammed S. Alanazi
- Genome Research Chair Unit, Department of Biochemistry, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Waseem El-Huneidi
- Department of Basic Sciences, College of Science and Health Professions, King Saud Bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| |
Collapse
|
25
|
Ozbek P, Soner S, Haliloglu T. Hot spots in a network of functional sites. PLoS One 2013; 8:e74320. [PMID: 24023934 PMCID: PMC3759471 DOI: 10.1371/journal.pone.0074320] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 08/02/2013] [Indexed: 12/05/2022] Open
Abstract
It is of significant interest to understand how proteins interact, which holds the key phenomenon in biological functions. Using dynamic fluctuations in high frequency modes, we show that the Gaussian Network Model (GNM) predicts hot spot residues with success rates ranging between S 8–58%, C 84–95%, P 5–19% and A 81–92% on unbound structures and S 8–51%, C 97–99%, P 14–50%, A 94–97% on complex structures for sensitivity, specificity, precision and accuracy, respectively. High specificity and accuracy rates with a single property on unbound protein structures suggest that hot spots are predefined in the dynamics of unbound structures and forming the binding core of interfaces, whereas the prediction of other functional residues with similar dynamic behavior explains the lower precision values. The latter is demonstrated with the case studies; ubiquitin, hen egg-white lysozyme and M2 proton channel. The dynamic fluctuations suggest a pseudo network of residues with high frequency fluctuations, which could be plausible for the mechanism of biological interactions and allosteric regulation.
Collapse
Affiliation(s)
- Pemra Ozbek
- Department of Bioengineering, Marmara University, Goztepe, Istanbul, Turkey
| | - Seren Soner
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
| | - Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Bebek, Turkey
- * E-mail:
| |
Collapse
|
26
|
Affiliation(s)
- Robert O J Weinzierl
- Department of Life Sciences, Division of Biomolecular Sciences, Imperial College London , Sir Alexander Fleming Building, Exhibition Road, London SW7 2AZ, United Kingdom
| |
Collapse
|
27
|
Hecht M, Bromberg Y, Rost B. News from the protein mutability landscape. J Mol Biol 2013; 425:3937-48. [PMID: 23896297 DOI: 10.1016/j.jmb.2013.07.028] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 07/08/2013] [Accepted: 07/19/2013] [Indexed: 12/16/2022]
Abstract
Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants.
Collapse
Affiliation(s)
- Maximilian Hecht
- Department of Bioinformatics and Computational Biology I12, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
| | | | | |
Collapse
|
28
|
Dehouck Y, Kwasigroch JM, Rooman M, Gilis D. BeAtMuSiC: Prediction of changes in protein-protein binding affinity on mutations. Nucleic Acids Res 2013; 41:W333-9. [PMID: 23723246 PMCID: PMC3692068 DOI: 10.1093/nar/gkt450] [Citation(s) in RCA: 268] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The ability of proteins to establish highly selective interactions with a variety of (macro)molecular partners is a crucial prerequisite to the realization of their biological functions. The availability of computational tools to evaluate the impact of mutations on protein–protein binding can therefore be valuable in a wide range of industrial and biomedical applications, and help rationalize the consequences of non-synonymous single-nucleotide polymorphisms. BeAtMuSiC (http://babylone.ulb.ac.be/beatmusic) is a coarse-grained predictor of the changes in binding free energy induced by point mutations. It relies on a set of statistical potentials derived from known protein structures, and combines the effect of the mutation on the strength of the interactions at the interface, and on the overall stability of the complex. The BeAtMuSiC server requires as input the structure of the protein–protein complex, and gives the possibility to assess rapidly all possible mutations in a protein chain or at the interface, with predictive performances that are in line with the best current methodologies.
Collapse
Affiliation(s)
- Yves Dehouck
- Department of BioModelling, BioInformatics and BioProcesses, Université Libre de Bruxelles, CP165/61, Av. Fr. Roosevelt 50, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
29
|
Schaefer C, Bromberg Y, Achten D, Rost B. Disease-related mutations predicted to impact protein function. BMC Genomics 2012; 13 Suppl 4:S11. [PMID: 22759649 PMCID: PMC3394413 DOI: 10.1186/1471-2164-13-s4-s11] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Non-synonymous single nucleotide polymorphisms (nsSNPs) alter the protein sequence and can cause disease. The impact has been described by reliable experiments for relatively few mutations. Here, we study predictions for functional impact of disease-annotated mutations from OMIM, PMD and Swiss-Prot and of variants not linked to disease. Results Most disease-causing mutations were predicted to impact protein function. More surprisingly, the raw predictions scores for disease-causing mutations were higher than the scores for the function-altering data set originally used for developing the prediction method (here SNAP). We might expect that diseases are caused by change-of-function mutations. However, it is surprising how well prediction methods developed for different purposes identify this link. Conversely, our predictions suggest that the set of nsSNPs not currently linked to diseases contains very few strong disease associations to be discovered. Conclusions Firstly, annotations of disease-causing nsSNPs are on average so reliable that they can be used as proxies for functional impact. Secondly, disease-causing nsSNPs can be identified very well by methods that predict the impact of mutations on protein function. This implies that the existing prediction methods provide a very good means of choosing a set of suspect SNPs relevant for disease.
Collapse
Affiliation(s)
- Christian Schaefer
- Bioinformatics-i12, Informatics, Technical University Munich, Boltzmannstrasse 3, Garching/Munich, Germany.
| | | | | | | |
Collapse
|
30
|
Gray VE, Kukurba KR, Kumar S. Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations. Bioinformatics 2012; 28:2093-6. [PMID: 22685075 PMCID: PMC3413386 DOI: 10.1093/bioinformatics/bts336] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change. Contact:s.kumar@asu.edu
Collapse
Affiliation(s)
- Vanessa E Gray
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | | | | |
Collapse
|
31
|
Binding site characterization of G protein-coupled receptor by alanine-scanning mutagenesis using molecular dynamics and binding free energy approach: application to C-C chemokine receptor-2 (CCR2). Mol Divers 2012; 16:401-13. [PMID: 22528270 DOI: 10.1007/s11030-012-9368-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 03/16/2012] [Indexed: 10/28/2022]
Abstract
The C-C chemokine receptor 2 (CCR2) was proved as a multidrug target in many diseases like diabetes, inflammation and AIDS, but rational drug design on this target is still lagging behind as the information on the exact binding site and the crystal structure is not yet available. Therefore, for a successful structure-based drug design, an accurate receptor model in ligand-bound state is necessary. In this study, binding-site residues of CCR2 was determined using in silico alanine scanning mutagenesis and the interactions between TAK-779 and the developed homology model of CCR2. Molecular dynamic simulation and Molecular Mechanics-Generalized Born Solvent Area method was applied to calculate binding free energy difference between the template and mutated protein. Upon mutating 29 amino acids of template protein and comparison of binding free energy with wild type, six residues were identified as putative hot spots of CCR2.
Collapse
|
32
|
Koes DR, Camacho CJ. PocketQuery: protein-protein interaction inhibitor starting points from protein-protein interaction structure. Nucleic Acids Res 2012; 40:W387-92. [PMID: 22523085 PMCID: PMC3394328 DOI: 10.1093/nar/gks336] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
PocketQuery (http://pocketquery.csb.pitt.edu) is a web interface for exploring the properties of protein–protein interaction (PPI) interfaces with a focus on the discovery of promising starting points for small-molecule design. PocketQuery rapidly focuses attention on the key interacting residues of an interaction using a ‘druggability’ score that provides an estimate of how likely the chemical mimicry of a cluster of interface residues would result in a small-molecule inhibitor of an interaction. These residue clusters are chemical starting points that can be seamlessly exported to a pharmacophore-based drug discovery workflow. PocketQuery is updated on a weekly basis to contain all applicable PPI structures deposited in the Protein Data Bank and allows users to upload their own custom structures for analysis.
Collapse
Affiliation(s)
- David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, 3501 Fifth Avenue, Pittsburgh, PA 15260, USA.
| | | |
Collapse
|
33
|
Adams DR, Sincan M, Fuentes Fajardo K, Mullikin JC, Pierson TM, Toro C, Boerkoel CF, Tifft CJ, Gahl WA, Markello TC. Analysis of DNA sequence variants detected by high-throughput sequencing. Hum Mutat 2012; 33:599-608. [PMID: 22290882 DOI: 10.1002/humu.22035] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 12/02/2011] [Indexed: 12/18/2022]
Abstract
The Undiagnosed Diseases Program at the National Institutes of Health uses high-throughput sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The article is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects.
Collapse
Affiliation(s)
- David R Adams
- NIH Undiagnosed Diseases Program, NIH, Bethesda, Maryland, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Koes DR, Camacho CJ. Small-molecule inhibitor starting points learned from protein-protein interaction inhibitor structure. ACTA ACUST UNITED AC 2011; 28:784-91. [PMID: 22210869 PMCID: PMC3307105 DOI: 10.1093/bioinformatics/btr717] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
MOTIVATION Protein-protein interactions (PPIs) are a promising, but challenging target for pharmaceutical intervention. One approach for addressing these difficult targets is the rational design of small-molecule inhibitors that mimic the chemical and physical properties of small clusters of key residues at the protein-protein interface. The identification of appropriate clusters of interface residues provides starting points for inhibitor design and supports an overall assessment of the susceptibility of PPIs to small-molecule inhibition. RESULTS We extract Small-Molecule Inhibitor Starting Points (SMISPs) from protein-ligand and protein-protein complexes in the Protein Data Bank (PDB). These SMISPs are used to train two distinct classifiers, a support vector machine and an easy to interpret exhaustive rule classifier. Both classifiers achieve better than 70% leave-one-complex-out cross-validation accuracy and correctly predict SMISPs of known PPI inhibitors not in the training set. A PDB-wide analysis suggests that nearly half of all PPIs may be susceptible to small-molecule inhibition.
Collapse
Affiliation(s)
- David Ryan Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | | |
Collapse
|
35
|
Mencarelli M, Dubern B, Alili R, Maestrini S, Benajiba L, Tagliaferri M, Galan P, Rinaldi M, Simon C, Tounian P, Hercberg S, Liuzzi A, Di Blasio AM, Clement K. Rare melanocortin-3 receptor mutations with in vitro functional consequences are associated with human obesity. Hum Mol Genet 2010; 20:392-9. [PMID: 21047972 DOI: 10.1093/hmg/ddq472] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In contrast to the melanocortin 4 receptor, the possible role of the melanocortin 3 receptor (MC3R) in regulating body weight is still debated. We have previously reported three mutations in the MC3R gene showing association with human obesity, but these results were not confirmed in a study of severe obese North American adults. In this study, we evaluated the entire coding region of MC3R in 839 severely obese subjects and 967 lean controls of Italian and French origin. In vitro functional analysis of the mutations detected was also performed. The total prevalence of rare MC3R variants was not significantly different in obese subjects when compared with controls (P= 0.18). However, the prevalence of mutations with functional alterations was significantly higher in the obese group (P= 0.022). In conclusions, the results of this large study demonstrate that in the populations studied functionally significant MC3R variants are associated with obesity supporting the current hypothesis that rare variants might have a stronger impact on the individual susceptibility to gain weight. They also underline the importance of detailed in vitro functional studies in order to prove the pathogenic effect of such variants. Further investigations in larger cohorts will be needed in order to define the specific phenotypic characteristics potentially correlated with reduced MC3R signalling.
Collapse
Affiliation(s)
- Monica Mencarelli
- Molecular Biology Laboratory, Istituto Auxologico Italiano, Verbania, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Bryant DH, Moll M, Chen BY, Fofanov VY, Kavraki LE. Analysis of substructural variation in families of enzymatic proteins with applications to protein function prediction. BMC Bioinformatics 2010; 11:242. [PMID: 20459833 PMCID: PMC2885373 DOI: 10.1186/1471-2105-11-242] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2009] [Accepted: 05/11/2010] [Indexed: 12/02/2022] Open
Abstract
Background Structural variations caused by a wide range of physico-chemical and biological sources directly influence the function of a protein. For enzymatic proteins, the structure and chemistry of the catalytic binding site residues can be loosely defined as a substructure of the protein. Comparative analysis of drug-receptor substructures across and within species has been used for lead evaluation. Substructure-level similarity between the binding sites of functionally similar proteins has also been used to identify instances of convergent evolution among proteins. In functionally homologous protein families, shared chemistry and geometry at catalytic sites provide a common, local point of comparison among proteins that may differ significantly at the sequence, fold, or domain topology levels. Results This paper describes two key results that can be used separately or in combination for protein function analysis. The Family-wise Analysis of SubStructural Templates (FASST) method uses all-against-all substructure comparison to determine Substructural Clusters (SCs). SCs characterize the binding site substructural variation within a protein family. In this paper we focus on examples of automatically determined SCs that can be linked to phylogenetic distance between family members, segregation by conformation, and organization by homology among convergent protein lineages. The Motif Ensemble Statistical Hypothesis (MESH) framework constructs a representative motif for each protein cluster among the SCs determined by FASST to build motif ensembles that are shown through a series of function prediction experiments to improve the function prediction power of existing motifs. Conclusions FASST contributes a critical feedback and assessment step to existing binding site substructure identification methods and can be used for the thorough investigation of structure-function relationships. The application of MESH allows for an automated, statistically rigorous procedure for incorporating structural variation data into protein function prediction pipelines. Our work provides an unbiased, automated assessment of the structural variability of identified binding site substructures among protein structure families and a technique for exploring the relation of substructural variation to protein function. As available proteomic data continues to expand, the techniques proposed will be indispensable for the large-scale analysis and interpretation of structural data.
Collapse
Affiliation(s)
- Drew H Bryant
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | | | | |
Collapse
|
37
|
Izarzugaza JMG, Redfern OC, Orengo CA, Valencia A. Cancer-associated mutations are preferentially distributed in protein kinase functional sites. Proteins 2010; 77:892-903. [PMID: 19626714 DOI: 10.1002/prot.22512] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Protein kinases are a superfamily involved in many crucial cellular processes, including signal transmission and regulation of cell cycle. As a consequence of this role, kinases have been reported to be associated with many types of cancer and are considered as potential therapeutic targets. We analyzed the distribution of pathogenic somatic point mutations (drivers) in the protein kinase superfamily with respect to their location in the protein, such as in structural, evolutionary, and functionally relevant regions. We find these driver mutations are more clearly associated with key protein features than other somatic mutations (passengers) that have not been directly linked to tumor progression. This observation fits well with the expected implication of the alterations in protein kinase function in cancer pathogenicity. To explain the relevance of the detected association of cancer driver mutations at the molecular level in the human kinome, we compare these with genetically inherited mutations (SNPs). We find that the subset of nonsynonymous SNPs that are associated to disease, but sufficiently mild to the point of being widespread in the population, tend to avoid those key protein regions, where they could be more detrimental for protein function. This tendency contrasts with the one detected for cancer associated-driver-mutations, which seems to be more directly implicated in the alteration of protein function. The detailed analysis of protein kinase groups and a number of relevant examples, confirm the relation between cancer associated-driver-mutations and key regions for protein kinase structure and function.
Collapse
Affiliation(s)
- Jose M G Izarzugaza
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), C/Melchor Fernández Almagro 3, Madrid E28029, Spain
| | | | | | | |
Collapse
|
38
|
Lise S, Archambeau C, Pontil M, Jones DT. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinformatics 2009; 10:365. [PMID: 19878545 PMCID: PMC2777894 DOI: 10.1186/1471-2105-10-365] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Accepted: 10/30/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (DeltaDeltaG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. RESULTS We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which DeltaDeltaG >or= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. CONCLUSION We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Computer Science, University College London, UK.
| | | | | | | |
Collapse
|
39
|
Bromberg Y, Overton J, Vaisse C, Leibel RL, Rost B. In silico mutagenesis: a case study of the melanocortin 4 receptor. FASEB J 2009; 23:3059-69. [PMID: 19417090 PMCID: PMC2735358 DOI: 10.1096/fj.08-127530] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The melanocortin 4 receptor (MC4R) is a G-protein-coupled receptor (GPCR) and a key molecule in the regulation of energy homeostasis. At least 159 substitutions in the coding region of human MC4R (hMC4R) have been described experimentally; over 80 of those occur naturally, and many have been implicated in obesity. However, assessment of the presumably functionally essential residues remains incomplete. Here we have performed a complete in silico mutagenesis analysis to assess the functional essentiality of all possible nonnative point mutants in the entire hMC4R protein (332 residues). We applied SNAP, which is a method for quantifying functional consequences of single amino acid (AA) substitutions, to calculate the effects of all possible substitutions at each position in the hMC4R AA sequence. We compiled a mutability score that reflects the degree to which a particular residue is likely to be functionally important. We performed the same experiment for a paralogue human melanocortin receptor (hMC1R) and a mouse orthologue (mMC4R) in order to compare computational evaluations of highly related sequences. Three results are most salient: 1) our predictions largely agree with the available experimental annotations; 2) this analysis identified several AAs that are likely to be functionally critical, but have not yet been studied experimentally; and 3) the differential analysis of the receptors implicates a number of residues as specifically important to MC4Rs vs. other GPCRs, such as hMC1R.—Bromberg, Y., Overton, J., Vaisse, C., Leibel, R. L., Rost, B. In silico mutagenesis: a case study of the melanocortin 4 receptor.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | | | | | | | |
Collapse
|
40
|
Won HH, Kim JW. In Silico Functional Assessment of Sequence Variations: Predicting Phenotypic Functions of Novel Variations. Genomics Inform 2008. [DOI: 10.5808/gi.2008.6.4.166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
41
|
Calton MA, Ersoy BA, Zhang S, Kane JP, Malloy MJ, Pullinger CR, Bromberg Y, Pennacchio LA, Dent R, McPherson R, Ahituv N, Vaisse C. Association of functionally significant Melanocortin-4 but not Melanocortin-3 receptor mutations with severe adult obesity in a large North American case-control study. Hum Mol Genet 2008; 18:1140-7. [PMID: 19091795 DOI: 10.1093/hmg/ddn431] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Functionally significant heterozygous mutations in the Melanocortin-4 receptor (MC4R) have been implicated in 2.5% of early onset obesity cases in European cohorts. The role of mutations in this gene in severely obese adults, particularly in smaller North American patient cohorts, has been less convincing. More recently, it has been proposed that mutations in a phylogenetically and physiologically related receptor, the Melanocortin-3 receptor (MC3R), could also be a cause of severe human obesity. The objectives of this study were to determine if mutations impairing the function of MC4R or MC3R were associated with severe obesity in North American adults. We studied MC4R and MC3R mutations detected in a total of 1821 adults (889 severely obese and 932 lean controls) from two cohorts. We systematically and comparatively evaluated the functional consequences of all mutations found in both MC4R and MC3R. The total prevalence of rare MC4R variants in severely obese North American adults was 2.25% (CI(95%): 1.44-3.47) compared with 0.64% (CI(95%): 0.26-1.43) in lean controls (P < 0.005). After classification of functional consequence, the prevalence of MC4R mutations with functional alterations was significantly greater when compared with controls (P < 0.005). In contrast, the prevalence of rare MC3R variants was not significantly increased in severely obese adults [0.67% (CI(95%): 0.27-1.50) versus 0.32% (CI(95%): 0.06-0.99)] (P = 0.332). Our results confirm that mutations in MC4R are a significant cause of severe obesity, extending this finding to North American adults. However, our data suggest that MC3R mutations are not associated with severe obesity in this population.
Collapse
Affiliation(s)
- Melissa A Calton
- Diabetes Center, University of California San Francisco, San Francisco, CA 94143, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Bromberg Y, Yachdav G, Rost B. SNAP predicts effect of mutations on protein function. Bioinformatics 2008; 24:2397-8. [PMID: 18757876 PMCID: PMC2562009 DOI: 10.1093/bioinformatics/btn435] [Citation(s) in RCA: 198] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2008] [Revised: 08/10/2008] [Accepted: 08/14/2008] [Indexed: 11/13/2022] Open
Abstract
Many non-synonymous single nucleotide polymorphisms (nsSNPs) in humans are suspected to impact protein function. Here, we present a publicly available server implementation of the method SNAP (screening for non-acceptable polymorphisms) that predicts the functional effects of single amino acid substitutions. SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold. Each prediction is associated with a reliability index that correlates with accuracy and thereby enables experimentalists to zoom into the most promising predictions.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | | | |
Collapse
|