1
|
Cheng C, McCauley BS, Matulionis N, Vogelauer M, Camacho D, Christofk HR, Dang W, Irwin NAT, Kurdistani SK. Histone H3 cysteine 110 enhances iron metabolism and modulates replicative life span in Saccharomyces cerevisiae. SCIENCE ADVANCES 2025; 11:eadv4082. [PMID: 40215312 PMCID: PMC11988410 DOI: 10.1126/sciadv.adv4082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Accepted: 03/06/2025] [Indexed: 04/14/2025]
Abstract
The discovery of histone H3 copper reductase activity provides a novel metabolic framework for understanding the functions of core histone residues, which, unlike N-terminal residues, have remained largely unexplored. We previously demonstrated that histone H3 cysteine 110 (H3C110) contributes to cupric (Cu2+) ion binding and its reduction to the cuprous (Cu1+) form. However, this residue is absent in Saccharomyces cerevisiae, raising questions about its evolutionary and functional significance. Here, we report that H3C110 has been lost in many fungal lineages despite near-universal conservation across eukaryotes. Introduction of H3C110 into S. cerevisiae increased intracellular Cu1+ levels and ameliorated the iron homeostasis defects caused by inactivation of the Cup1 metallothionein or glutathione depletion. Enhanced histone copper reductase activity also extended replicative life span under oxidative growth conditions but reduced it under fermentative conditions. Our findings suggest that a trade-off between histone copper reductase activity, iron metabolism, and life span may underlie the loss or retention of H3C110 across eukaryotes.
Collapse
Affiliation(s)
- Chen Cheng
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Brenna S. McCauley
- Huffington Center on Aging, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nedas Matulionis
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Maria Vogelauer
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Dimitrios Camacho
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Heather R. Christofk
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- UCLA Jonsson Comprehensive Cancer Center, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Weiwei Dang
- Huffington Center on Aging, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nicholas A. T. Irwin
- Gregor Mendel Institute (GMI), Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria
| | - Siavash K. Kurdistani
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Yin J, Waman VP, Sen N, Firdaus-Raih M, Lam SD, Orengo C. Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database. Structure 2025; 33:613-631.e6. [PMID: 39826548 DOI: 10.1016/j.str.2024.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/18/2024] [Accepted: 12/19/2024] [Indexed: 01/22/2025]
Abstract
ATP-pyrophosphatases (ATP-PPases) are the most primordial lineage of the large and diverse HUP (high-motif proteins, universal stress proteins, ATP-pyrophosphatase) superfamily. There are four different ATP-PPase substrate-specificity groups (SSGs), and members of each group show considerable sequence variation across the domains of life despite sharing the same catalytic function. Owing to the expansion in the number of ATP-PPase domain structures from advances in protein structure prediction by AlphaFold2 (AF2), we have characterized the two most populated ATP-PPase SSGs, the nicotinamide adenine dinucleotide synthases (NADSs) and guanosine monophosphate synthases (GMPSs). Local structural and sequence comparisons of NADS and GMPS identified taxonomic-group-specific functional motifs. As GMPS and NADS are potential drug targets of pathogenic microorganisms including Mycobacterium tuberculosis, bacterial GMPS and NADS specific functional motifs reported in this study, may contribute to antibacterial-drug development.
Collapse
Affiliation(s)
- Jialin Yin
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Vaishali P Waman
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Neeladri Sen
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Mohd Firdaus-Raih
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Su Datt Lam
- Department of Structural and Molecular Biology, University College London, London, UK; Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
3
|
Hernández Berthet AS, Aptekmann AA, Tejero J, Sánchez IE, Noguera ME, Roman EA. Associating protein sequence positions with the modulation of quantitative phenotypes. Arch Biochem Biophys 2024; 755:109979. [PMID: 38583654 DOI: 10.1016/j.abb.2024.109979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/11/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Although protein sequences encode the information for folding and function, understanding their link is not an easy task. Unluckily, the prediction of how specific amino acids contribute to these features is still considerably impaired. Here, we developed a simple algorithm that finds positions in a protein sequence with potential to modulate the studied quantitative phenotypes. From a few hundred protein sequences, we perform multiple sequence alignments, obtain the per-position pairwise differences for both the sequence and the observed phenotypes, and calculate the correlation between these last two quantities. We tested our methodology with four cases: archaeal Adenylate Kinases and the organisms optimal growth temperatures, microbial rhodopsins and their maximal absorption wavelengths, mammalian myoglobins and their muscular concentration, and inhibition of HIV protease clinical isolates by two different molecules. We found from 3 to 10 positions tightly associated with those phenotypes, depending on the studied case. We showed that these correlations appear using individual positions but an improvement is achieved when the most correlated positions are jointly analyzed. Noteworthy, we performed phenotype predictions using a simple linear model that links per-position divergences and differences in the observed phenotypes. Predictions are comparable to the state-of-art methodologies which, in most of the cases, are far more complex. All of the calculations are obtained at a very low information cost since the only input needed is a multiple sequence alignment of protein sequences with their associated quantitative phenotypes. The diversity of the explored systems makes our work a valuable tool to find sequence determinants of biological activity modulation and to predict various functional features for uncharacterized members of a protein family.
Collapse
Affiliation(s)
- Ayelén S Hernández Berthet
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina.
| | - Ariel A Aptekmann
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA; Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
| | - Jesús Tejero
- Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15260, USA; Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Ignacio E Sánchez
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina.
| | - Martín E Noguera
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina.
| | - Ernesto A Roman
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina.
| |
Collapse
|
4
|
McWhite CD, Armour-Garb I, Singh M. Leveraging protein language models for accurate multiple sequence alignments. Genome Res 2023; 33:1145-1153. [PMID: 37414576 PMCID: PMC10538487 DOI: 10.1101/gr.277675.123] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 06/29/2023] [Indexed: 07/08/2023]
Abstract
Multiple sequence alignment (MSA) is a critical step in the study of protein sequence and function. Typically, MSA algorithms progressively align pairs of sequences and combine these alignments with the aid of a guide tree. These alignment algorithms use scoring systems based on substitution matrices to measure amino acid similarities. Although successful, standard methods struggle on sets of proteins with low sequence identity: the so-called twilight zone of protein alignment. For these difficult cases, another source of information is needed. Protein language models are a powerful new approach that leverages massive sequence data sets to produce high-dimensional contextual embeddings for each amino acid in a sequence. These embeddings have been shown to reflect physicochemical and higher-order structural and functional attributes of amino acids within proteins. Here, we present a novel approach to MSA, based on clustering and ordering amino acid contextual embeddings. Our method for aligning semantically consistent groups of proteins circumvents the need for many standard components of MSA algorithms, avoiding initial guide tree construction, intermediate pairwise alignments, gap penalties, and substitution matrices. The added information from contextual embeddings leads to higher accuracy alignments for structurally similar proteins with low amino-acid similarity. We anticipate that protein language models will become a fundamental component of the next generation of algorithms for generating MSAs.
Collapse
Affiliation(s)
- Claire D McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA;
| | - Isabel Armour-Garb
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA;
- Department of Computer Science, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
5
|
Adeyelu T, Bordin N, Waman VP, Sadlej M, Sillitoe I, Moya-Garcia AA, Orengo CA. KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units. Biomolecules 2023; 13:277. [PMID: 36830646 PMCID: PMC9953599 DOI: 10.3390/biom13020277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open
Abstract
Protein kinases are important targets for treating human disorders, and they are the second most targeted families after G-protein coupled receptors. Several resources provide classification of kinases into evolutionary families (based on sequence homology); however, very few systematically classify functional families (FunFams) comprising evolutionary relatives that share similar functional properties. We have developed the FunFam-MARC (Multidomain ARchitecture-based Clustering) protocol, which uses multi-domain architectures of protein kinases and specificity-determining residues for functional family classification. FunFam-MARC predicts 2210 kinase functional families (KinFams), which have increased functional coherence, in terms of EC annotations, compared to the widely used KinBase classification. Our protocol provides a comprehensive classification for kinase sequences from >10,000 organisms. We associate human KinFams with diseases and drugs and identify 28 druggable human KinFams, i.e., enriched in clinically approved drugs. Since relatives in the same druggable KinFam tend to be structurally conserved, including the drug-binding site, these KinFams may be valuable for shortlisting therapeutic targets. Information on the human KinFams and associated 3D structures from AlphaFold2 are provided via our CATH FTP website and Zenodo. This gives the domain structure representative of each KinFam together with information on any drug compounds available. For 32% of the KinFams, we provide information on highly conserved residue sites that may be associated with specificity.
Collapse
Affiliation(s)
- Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Department of Comparative Biomedical Science, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P. Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Marta Sadlej
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Aurelio A. Moya-Garcia
- Departamento de Biología Molecular y Bioquímica, Universidad de Málaga, 29071 Málaga, Spain
- Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, 29071 Málaga, Spain
| | - Christine A. Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
6
|
Timonina DS, Suplatov DA. Analysis of Multiple Protein Alignments Using 3D-Structural Information on the Orientation of Amino Acid Side-Chains. Mol Biol 2022. [DOI: 10.1134/s0026893322040136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
McGreig JE, Uri H, Antczak M, Sternberg MJE, Michaelis M, Wass MN. 3DLigandSite: structure-based prediction of protein-ligand binding sites. Nucleic Acids Res 2022; 50:W13-W20. [PMID: 35412635 PMCID: PMC9252821 DOI: 10.1093/nar/gkac250] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/13/2022] [Accepted: 04/03/2022] [Indexed: 01/13/2023] Open
Abstract
3DLigandSite is a web tool for the prediction of ligand-binding sites in proteins. Here, we report a significant update since the first release of 3DLigandSite in 2010. The overall methodology remains the same, with candidate binding sites in proteins inferred using known binding sites in related protein structures as templates. However, the initial structural modelling step now uses the newly available structures from the AlphaFold database or alternatively Phyre2 when AlphaFold structures are not available. Further, a sequence-based search using HHSearch has been introduced to identify template structures with bound ligands that are used to infer the ligand-binding residues in the query protein. Finally, we introduced a machine learning element as the final prediction step, which improves the accuracy of predictions and provides a confidence score for each residue predicted to be part of a binding site. Validation of 3DLigandSite on a set of 6416 binding sites obtained 92% recall at 75% precision for non-metal binding sites and 52% recall at 75% precision for metal binding sites. 3DLigandSite is available at https://www.wass-michaelislab.org/3dligandsite. Users submit either a protein sequence or structure. Results are displayed in multiple formats including an interactive Mol* molecular visualization of the protein and the predicted binding sites.
Collapse
Affiliation(s)
- Jake E McGreig
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Hannah Uri
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Magdalena Antczak
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Martin Michaelis
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| | - Mark N Wass
- School of Biosciences, Division of Natural Sciences, University of Kent, Canterbury, Kent CT2 7NJ, UK
| |
Collapse
|
8
|
Pascarelli S, Laurino P. Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins. PLoS Comput Biol 2022; 18:e1010016. [PMID: 35377869 PMCID: PMC9009777 DOI: 10.1371/journal.pcbi.1010016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/14/2022] [Accepted: 03/12/2022] [Indexed: 11/25/2022] Open
Abstract
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline. Proteins are critical components of living systems because they facilitate most biological processes like protein synthesis, DNA replication, chemical catalysis, etc. Proteins are encoded in their genes. During evolution, genes accumulate mutations that get translated at the protein level. These mutations can be “neutral” if they do not affect the protein function immediately and directly; otherwise, mutations can be functional if they directly modify protein function. An event that provides an opportunity to study protein function is gene duplication namely, when two copies of a gene encoding the same protein appear. One copy of the protein often retains the same function while the other is free to diverge and specialize to a different function. This work sheds light on an alternative outcome of gene duplication that might be critical to discern between neutral and functional mutations. By looking at 88 fish genomes, we found proteins in which the evolution of their sequences does not follow the expected pattern of divergence after gene duplication. In this case, the protein sequence of a subgroup of species diverges in the copy expected to retain its function, while the sequence is retained in the expectedly divergent one. We called this event “inter-paralog amino acid inversion”. Our data shows that this “inversion” event is correlated to function, and its detection has to be considered for assigning protein functions correctly.
Collapse
Affiliation(s)
- Stefano Pascarelli
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Paola Laurino
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
- * E-mail:
| |
Collapse
|
9
|
Exploiting protein family and protein network data to identify novel drug targets for bladder cancer. Oncotarget 2022; 13:105-117. [PMID: 35035776 PMCID: PMC8758182 DOI: 10.18632/oncotarget.28175] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/08/2021] [Indexed: 12/11/2022] Open
Abstract
Bladder cancer remains one of the most common forms of cancer and yet there are limited small molecule targeted therapies. Here, we present a computational platform to identify new potential targets for bladder cancer therapy. Our method initially exploited a set of known driver genes for bladder cancer combined with predicted bladder cancer genes from mutationally enriched protein domain families. We enriched this initial set of genes using protein network data to identify a comprehensive set of 323 putative bladder cancer targets. Pathway and cancer hallmarks analyses highlighted putative mechanisms in agreement with those previously reported for this cancer and revealed protein network modules highly enriched in potential drivers likely to be good targets for targeted therapies. 21 of our potential drug targets are targeted by FDA approved drugs for other diseases — some of them are known drivers or are already being targeted for bladder cancer (FGFR3, ERBB3, HDAC3, EGFR). A further 4 potential drug targets were identified by inheriting drug mappings across our in-house CATH domain functional families (FunFams). Our FunFam data also allowed us to identify drug targets in families that are less prone to side effects i.e., where structurally similar protein domain relatives are less dispersed across the human protein network. We provide information on our novel potential cancer driver genes, together with information on pathways, network modules and hallmarks associated with the predicted and known bladder cancer drivers and we highlight those drivers we predict to be likely drug targets.
Collapse
|
10
|
Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021; 8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Specificity Determining Positions (SDPs) are protein sites responsible for functional specificity within a family of homologous proteins. These positions are extracted from a family’s multiple sequence alignment and complement the fully conserved positions as predictors of functional sites. SDP analysis is now routinely used for locating these specificity-related sites in families of proteins of biomedical or biotechnological interest with the aim of mutating them to switch specificities or design new ones. There are many different approaches for detecting these positions in multiple sequence alignments. Nevertheless, existing methods report the potential SDP positions but they do not provide any clue on the physicochemical basis behind the functional specificity, which has to be inferred a-posteriori by manually inspecting these positions in the alignment. In this work, a new methodology is presented that, concomitantly with the detection of the SDPs, automatically provides information on the amino-acid physicochemical properties more related to the change in specificity. This new method is applied to two different multiple sequence alignments of homologous of the well-studied RasH protein representing different cases of functional specificity and the results discussed in detail.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), c/Darwin, 3, 28049 Madrid, Spain
| |
Collapse
|
11
|
TwinCons: Conservation score for uncovering deep sequence similarity and divergence. PLoS Comput Biol 2021; 17:e1009541. [PMID: 34714829 PMCID: PMC8580257 DOI: 10.1371/journal.pcbi.1009541] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/10/2021] [Accepted: 10/06/2021] [Indexed: 11/19/2022] Open
Abstract
We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a 'cost' of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life.
Collapse
|
12
|
Etzion-Fuchs A, Todd DA, Singh M. dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains. Nucleic Acids Res 2021; 49:e78. [PMID: 33999210 PMCID: PMC8287948 DOI: 10.1093/nar/gkab356] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/30/2021] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
Collapse
Affiliation(s)
- Anat Etzion-Fuchs
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA
| | - David A Todd
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.,Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| |
Collapse
|
13
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
14
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
15
|
Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021; 37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. AVAILABILITYAND IMPLEMENTATION https://github.com/UCL/cath-funsite-predictor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd., Long Hanborough, OX29 8LJ Oxford, UK
| | - Harry M Scholes
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
| |
Collapse
|
16
|
Pitarch B, Ranea JAG, Pazos F. Protein residues determining interaction specificity in paralogous families. Bioinformatics 2021; 37:1076-1082. [PMID: 33135068 DOI: 10.1093/bioinformatics/btaa934] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/06/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Predicting the residues controlling a protein's interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. RESULTS In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent 'unsupervised' method that does not use interactome information. AVAILABILITY AND IMPLEMENTATION http://csbg.cnb.csic.es/pazos/Xdet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Pitarch
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga 29071, Spain.,CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), Malaga, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| |
Collapse
|
17
|
Littmann M, Bordin N, Heinzinger M, Schütze K, Dallago C, Orengo C, Rost B. Clustering FunFams using sequence embeddings improves EC purity. Bioinformatics 2021; 37:3449-3455. [PMID: 33978744 PMCID: PMC8545299 DOI: 10.1093/bioinformatics/btab371] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/02/2021] [Accepted: 05/11/2021] [Indexed: 12/05/2022] Open
Abstract
Motivation Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be ‘pure’, i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22 830 of 203 639) contain EC annotations and of those, 7% (1526 of 22 830) have inconsistent functional annotations. Results We propose an approach to further cluster FunFams into functionally more consistent sub-families by encoding their sequences through embeddings. These embeddings originate from language models transferring knowledge gained from predicting missing amino acids in a sequence (ProtBERT) and have been further optimized to distinguish between proteins belonging to the same or a different CATH superfamily (PB-Tucker). Using distances between embeddings and DBSCAN to cluster FunFams and identify outliers, doubled the number of pure clusters per FunFam compared to random clustering. Our approach was not limited to FunFams but also succeeded on families created using sequence similarity alone. Complementing EC annotations, we observed similar results for binding annotations. Thus, we expect an increased purity also for other aspects of function. Our results can help generating FunFams; the resulting clusters with improved functional consistency allow more reliable inference of annotations. We expect this approach to succeed equally for any other grouping of proteins by their phenotypes. Availability and implementation Code and embeddings are available via GitHub: https://github.com/Rostlab/FunFamsClustering. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748 Garching, Germany
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
18
|
Analysis of Sequence Divergence in Mammalian ABCGs Predicts a Structural Network of Residues That Underlies Functional Divergence. Int J Mol Sci 2021; 22:ijms22063012. [PMID: 33809494 PMCID: PMC8001107 DOI: 10.3390/ijms22063012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/08/2021] [Accepted: 03/12/2021] [Indexed: 12/17/2022] Open
Abstract
The five members of the mammalian G subfamily of ATP-binding cassette transporters differ greatly in their substrate specificity. Four members of the subfamily are important in lipid transport and the wide substrate specificity of one of the members, ABCG2, is of significance due to its role in multidrug resistance. To explore the origin of substrate selectivity in members 1, 2, 4, 5 and 8 of this subfamily, we have analysed the differences in conservation between members in a multiple sequence alignment of ABCG sequences from mammals. Mapping sets of residues with similar patterns of conservation onto the resolved 3D structure of ABCG2 reveals possible explanations for differences in function, via a connected network of residues from the cytoplasmic to transmembrane domains. In ABCG2, this network of residues may confer extra conformational flexibility, enabling it to transport a wider array of substrates.
Collapse
|
19
|
Assessing Protein Function Through Structural Similarities with CATH. Methods Mol Biol 2021. [PMID: 32006277 DOI: 10.1007/978-1-0716-0270-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
The functional diversity of proteins is closely related to their differences in sequence and structure. Despite variations in functional sites, global structural similarity is a valuable source of information when assessing potential functional similarities between proteins. The CATH database contains a well-established hierarchical classification of more than 430,000 protein domain structures and nearly 95 million protein domain sequences, with integrated functional annotations for each represented family. The present chapter provides an overview of the main features of CATH with emphasis on exploiting structural similarities to obtain functional information for proteins.
Collapse
|
20
|
Bradley D, Viéitez C, Rajeeve V, Selkrig J, Cutillas PR, Beltrao P. Sequence and Structure-Based Analysis of Specificity Determinants in Eukaryotic Protein Kinases. Cell Rep 2021; 34:108602. [PMID: 33440154 PMCID: PMC7809594 DOI: 10.1016/j.celrep.2020.108602] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 11/03/2020] [Accepted: 12/14/2020] [Indexed: 01/04/2023] Open
Abstract
Protein kinases lie at the heart of cell-signaling processes and are often mutated in disease. Kinase target recognition at the active site is in part determined by a few amino acids around the phosphoacceptor residue. However, relatively little is known about how most preferences are encoded in the kinase sequence or how these preferences evolved. Here, we used alignment-based approaches to predict 30 specificity-determining residues (SDRs) for 16 preferences. These were studied with structural models and were validated by activity assays of mutant kinases. Cancer mutation data revealed that kinase SDRs are mutated more frequently than catalytic residues. We have observed that, throughout evolution, kinase specificity has been strongly conserved across orthologs but can diverge after gene duplication, as illustrated by the G protein-coupled receptor kinase family. The identified SDRs can be used to predict kinase specificity from sequence and aid in the interpretation of evolutionary or disease-related genomic variants.
Collapse
Affiliation(s)
- David Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Cristina Viéitez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK; European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Vinothini Rajeeve
- Integrative Cell Signalling & Proteomics, Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Joel Selkrig
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Pedro R Cutillas
- Integrative Cell Signalling & Proteomics, Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK.
| |
Collapse
|
21
|
Alexander MR, Schoeder CT, Brown JA, Smart CD, Moth C, Wikswo JP, Capra JA, Meiler J, Chen W, Madhur MS. Predicting susceptibility to SARS-CoV-2 infection based on structural differences in ACE2 across species. FASEB J 2020; 34:15946-15960. [PMID: 33015868 PMCID: PMC7675292 DOI: 10.1096/fj.202001808r] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 12/17/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the cause of the global pandemic of coronavirus disease-2019 (COVID-19). SARS-CoV-2 is a zoonotic disease, but little is known about variations in species susceptibility that could identify potential reservoir species, animal models, and the risk to pets, wildlife, and livestock. Certain species, such as domestic cats and tigers, are susceptible to SARS-CoV-2 infection, while other species such as mice and chickens are not. Most animal species, including those in close contact with humans, have unknown susceptibility. Hence, methods to predict the infection risk of animal species are urgently needed. SARS-CoV-2 spike protein binding to angiotensin-converting enzyme 2 (ACE2) is critical for viral cell entry and infection. Here we integrate species differences in susceptibility with multiple in-depth structural analyses to identify key ACE2 amino acid positions including 30, 83, 90, 322, and 354 that distinguish susceptible from resistant species. Using differences in these residues across species, we developed a susceptibility score that predicts an elevated risk of SARS-CoV-2 infection for multiple species including horses and camels. We also demonstrate that SARS-CoV-2 is nearly optimal for binding ACE2 of humans compared to other animals, which may underlie the highly contagious transmissibility of this virus among humans. Taken together, our findings define potential ACE2 and SARS-CoV-2 residues for therapeutic targeting and identification of animal species on which to focus research and protection measures for environmental and public health.
Collapse
Affiliation(s)
- Matthew R. Alexander
- Department of MedicineDivision of Cardiovascular MedicineVanderbilt University Medical Center (VUMC)NashvilleTNUSA
- Department of MedicineDivision of Clinical PharmacologyVanderbilt University Medical CenterNashvilleTNUSA
| | - Clara T. Schoeder
- Center for Structural BiologyVanderbilt UniversityNashvilleTNUSA
- Department of ChemistryVanderbilt UniversityNashvilleTNUSA
| | - Jacquelyn A. Brown
- Department of Physics and AstronomyVanderbilt UniversityNashvilleTNUSA
- Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt UniversityNashvilleTNUSA
| | - Charles D. Smart
- Department of Molecular Physiology and BiophysicsVanderbilt UniversityNashvilleTNUSA
| | - Chris Moth
- Center for Structural BiologyVanderbilt UniversityNashvilleTNUSA
- Department of Biological SciencesVanderbilt UniversityNashvilleTNUSA
| | - John P. Wikswo
- Department of Physics and AstronomyVanderbilt UniversityNashvilleTNUSA
- Vanderbilt Institute for Integrative Biosystems Research and Education, Vanderbilt UniversityNashvilleTNUSA
- Department of Molecular Physiology and BiophysicsVanderbilt UniversityNashvilleTNUSA
- Department of Biomedical EngineeringVanderbilt UniversityNashvilleTNUSA
| | - John A. Capra
- Center for Structural BiologyVanderbilt UniversityNashvilleTNUSA
- Department of Biological SciencesVanderbilt UniversityNashvilleTNUSA
| | - Jens Meiler
- Center for Structural BiologyVanderbilt UniversityNashvilleTNUSA
- Department of ChemistryVanderbilt UniversityNashvilleTNUSA
- Department of Biomedical EngineeringVanderbilt UniversityNashvilleTNUSA
- Institute for Drug DiscoveryLeipzig University Medical SchoolLeipzigGermany
| | - Wenbiao Chen
- Department of Molecular Physiology and BiophysicsVanderbilt UniversityNashvilleTNUSA
| | - Meena S. Madhur
- Department of MedicineDivision of Cardiovascular MedicineVanderbilt University Medical Center (VUMC)NashvilleTNUSA
- Department of MedicineDivision of Clinical PharmacologyVanderbilt University Medical CenterNashvilleTNUSA
- Department of Molecular Physiology and BiophysicsVanderbilt UniversityNashvilleTNUSA
- Vanderbilt Institute for Infection, Immunology, and InflammationNashvilleTNUSA
| |
Collapse
|
22
|
Alexander MR, Schoeder CT, Brown JA, Smart CD, Moth C, Wikswo JP, Capra JA, Meiler J, Chen W, Madhur MS. Which animals are at risk? Predicting species susceptibility to Covid-19. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32676592 DOI: 10.1101/2020.07.09.194563] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In only a few months, the novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic, leaving physicians, scientists, and public health officials racing to understand, treat, and contain this zoonotic disease. SARS-CoV-2 has made the leap from animals to humans, but little is known about variations in species susceptibility that could identify potential reservoir species, animal models, and the risk to pets, wildlife, and livestock. While there is evidence that certain species, such as cats, are susceptible, the vast majority of animal species, including those in close contact with humans, have unknown susceptibility. Hence, methods to predict their infection risk are urgently needed. SARS-CoV-2 spike protein binding to angiotensin converting enzyme 2 (ACE2) is critical for viral cell entry and infection. Here we identified key ACE2 residues that distinguish susceptible from resistant species using in-depth sequence and structural analyses of ACE2 and its binding to SARS-CoV-2. Our findings have important implications for identification of ACE2 and SARS-CoV-2 residues for therapeutic targeting and identification of animal species with increased susceptibility for infection on which to focus research and protection measures for environmental and public health.
Collapse
|
23
|
Sergeeva AP, Katsamba PS, Cosmanescu F, Brewer JJ, Ahlsen G, Mannepalli S, Shapiro L, Honig B. DIP/Dpr interactions and the evolutionary design of specificity in protein families. Nat Commun 2020; 11:2125. [PMID: 32358559 PMCID: PMC7195491 DOI: 10.1038/s41467-020-15981-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/06/2020] [Indexed: 01/10/2023] Open
Abstract
Differential binding affinities among closely related protein family members underlie many biological phenomena, including cell-cell recognition. Drosophila DIP and Dpr proteins mediate neuronal targeting in the fly through highly specific protein-protein interactions. We show here that DIPs/Dprs segregate into seven specificity subgroups defined by binding preferences between their DIP and Dpr members. We then describe a sequence-, structure- and energy-based computational approach, combined with experimental binding affinity measurements, to reveal how specificity is coded on the canonical DIP/Dpr interface. We show that binding specificity of DIP/Dpr subgroups is controlled by "negative constraints", which interfere with binding. To achieve specificity, each subgroup utilizes a different combination of negative constraints, which are broadly distributed and cover the majority of the protein-protein interface. We discuss the structural origins of negative constraints, and potential general implications for the evolutionary origins of binding specificity in multi-protein families.
Collapse
Affiliation(s)
- Alina P Sergeeva
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Phinikoula S Katsamba
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Filip Cosmanescu
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Joshua J Brewer
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Goran Ahlsen
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Seetha Mannepalli
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Lawrence Shapiro
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| | - Barry Honig
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Department of Medicine, Columbia University, New York, NY, USA.
| |
Collapse
|
24
|
Marcus K, Mattos C. Water in Ras Superfamily Evolution. J Comput Chem 2020; 41:402-414. [PMID: 31483874 DOI: 10.1002/jcc.26060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Revised: 07/17/2019] [Accepted: 08/16/2019] [Indexed: 01/14/2023]
Abstract
The Ras GTPase superfamily of proteins coordinates a diverse set of cellular outcomes, including cell morphology, vesicle transport, and cell proliferation. Primary amino acid sequence analysis has identified Specificity determinant positions (SDPs) that drive diversified functions specific to the Ras, Rho, Rab, and Arf subfamilies (Rojas et al. 2012, J Cell Biol 196:189-201). The inclusion of water molecules in structural and functional adaptation is likely to be a major response to the selection pressures that drive evolution, yet hydration patterns are not included in phylogenetic analysis. This article shows that conserved crystallographic water molecules coevolved with SDP residues in the differentiation of proteins within the Ras superfamily of small GTPases. The patterns of water conservation between protein subfamilies parallel those of sequence-based evolutionary trees. Thus, hydration patterns have the potential to help elucidate functional significance in the evolution of amino acid residues observed in phylogenetic analysis of homologous proteins. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Kendra Marcus
- Department of Chemistry and Chemical Biology, Northeastern University, 360 Huntington Ave, Boston, Massachusetts, 02115
| | - Carla Mattos
- Department of Chemistry and Chemical Biology, Northeastern University, 360 Huntington Ave, Boston, Massachusetts, 02115
| |
Collapse
|
25
|
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020; 10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open
Abstract
Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
Collapse
|
26
|
Wang J, Yang B, Leier A, Marquez-Lago TT, Hayashida M, Rocker A, Zhang Y, Akutsu T, Chou KC, Strugnell RA, Song J, Lithgow T. Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors. Bioinformatics 2019; 34:2546-2555. [PMID: 29547915 DOI: 10.1093/bioinformatics/bty155] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 03/09/2018] [Indexed: 12/28/2022] Open
Abstract
Motivation Many Gram-negative bacteria use type VI secretion systems (T6SS) to export effector proteins into adjacent target cells. These secreted effectors (T6SEs) play vital roles in the competitive survival in bacterial populations, as well as pathogenesis of bacteria. Although various computational analyses have been previously applied to identify effectors secreted by certain bacterial species, there is no universal method available to accurately predict T6SS effector proteins from the growing tide of bacterial genome sequence data. Results We extracted a wide range of features from T6SE protein sequences and comprehensively analyzed the prediction performance of these features through unsupervised and supervised learning. By integrating these features, we subsequently developed a two-layer SVM-based ensemble model with fine-grain optimized parameters, to identify potential T6SEs. We further validated the predictive model using an independent dataset, which showed that the proposed model achieved an impressive performance in terms of ACC (0.943), F-value (0.946), MCC (0.892) and AUC (0.976). To demonstrate applicability, we employed this method to correctly identify two very recently validated T6SE proteins, which represent challenging prediction targets because they significantly differed from previously known T6SEs in terms of their sequence similarity and cellular function. Furthermore, a genome-wide prediction across 12 bacterial species, involving in total 54 212 protein sequences, was carried out to distinguish 94 putative T6SE candidates. We envisage both this information and our publicly accessible web server will facilitate future discoveries of novel T6SEs. Availability and implementation http://bastion6.erc.monash.edu/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC, Australia
| | - Bingjiao Yang
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Morihiro Hayashida
- National Institute of Technology, Matsue College, Matsue, Shimane, Japan
| | - Andrea Rocker
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC, Australia
| | - Yanju Zhang
- Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Richard A Strugnell
- Department of Microbiology and Immunology and Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Parkville, VIC, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology.,Monash Centre for Data Science, Faculty of Information Technolog, Monash University, Clayton, VIC, Australia.,ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Clayton, VIC, Australia
| | - Trevor Lithgow
- Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC, Australia
| |
Collapse
|
27
|
Gazara RK, Moharana KC, Bellieny-Rabelo D, Venancio TM. Expansion and diversification of the gibberellin receptor GIBBERELLIN INSENSITIVE DWARF1 (GID1) family in land plants. PLANT MOLECULAR BIOLOGY 2018; 97:435-449. [PMID: 29956113 DOI: 10.1007/s11103-018-0750-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 06/14/2018] [Indexed: 05/13/2023]
Abstract
Here we uncover the major evolutionary events shaping the evolution of the GID1 family of gibberellin receptors in land plants at the sequence, structure and gene expression levels. Gibberellic acid (gibberellin, GA) controls key developmental processes in the life cycle of land plants. By interacting with the GIBBERELLIN INSENSITIVE DWARF1 (GID1) receptor, GA regulates the expression of a wide range of genes through different pathways. Here we report the systematic identification and classification of GID1s in 54 plants genomes, encompassing from bryophytes and lycophytes, to several monocots and eudicots. We investigated the evolutionary relationship of GID1s using a comparative genomics framework and found strong support for a previously proposed phylogenetic classification of this family in land plants. We identified lineage-specific expansions of particular subfamilies (i.e. GID1ac and GID1b) in different eudicot lineages (e.g. GID1b in legumes). Further, we found both, shared and divergent structural features between GID1ac and GID1b subgroups in eudicots that provide mechanistic insights on their functions. Gene expression data from several species show that at least one GID1 gene is expressed in every sampled tissue, with a strong bias of GID1b expression towards underground tissues and dry legume seeds (which typically have low GA levels). Taken together, our results indicate that GID1ac retained canonical GA signaling roles, whereas GID1b specialized in conditions of low GA concentrations. We propose that this functional specialization occurred initially at the gene expression level and was later fine-tuned by mutations that conferred greater GA affinity to GID1b, including a Phe residue in the GA-binding pocket. Finally, we discuss the importance of our findings to understand the diversification of GA perception mechanisms in land plants.
Collapse
Affiliation(s)
- Rajesh K Gazara
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
| | - Kanhu C Moharana
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
| | - Daniel Bellieny-Rabelo
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil
- Department of Microbiology and Plant Pathology, University of Pretoria, Lunnon Road, Pretoria, 0028, South Africa
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Av. Alberto Lamego 2000/P5/217, Parque Califórnia, Campos dos Goytacazes, RJ, CEP: 28013-602, Brazil.
| |
Collapse
|
28
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|
29
|
Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018; 19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Present address: Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, c/ Dr. Aiguader, 88, 08003, Barcelona, Spain.,Present address: Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10-12, 08002, Barcelona, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049, Madrid, Spain.
| |
Collapse
|
30
|
Kalaivani R, Reema R, Srinivasan N. Recognition of sites of functional specialisation in all known eukaryotic protein kinase families. PLoS Comput Biol 2018; 14:e1005975. [PMID: 29438395 PMCID: PMC5826538 DOI: 10.1371/journal.pcbi.1005975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 02/26/2018] [Accepted: 01/13/2018] [Indexed: 11/25/2022] Open
Abstract
The conserved function of protein phosphorylation, catalysed by members of protein kinase superfamily, is regulated in different ways in different kinase families. Further, differences in activating triggers, cellular localisation, domain architecture and substrate specificity between kinase families are also well known. While the transfer of γ-phosphate from ATP to the hydroxyl group of Ser/Thr/Tyr is mediated by a conserved Asp, the characteristic functional and regulatory sites are specialized at the level of families or sub-families. Such family-specific sites of functional specialization are unknown for most families of kinases. In this work, we systematically identify the family-specific residue features by comparing the extent of conservation of physicochemical properties, Shannon entropy and statistical probability of residue distributions between families of kinases. An integrated discriminatory score, which combines these three features, is developed to demarcate the functionally specialized sites in a kinase family from other sites. We achieved an area under ROC curve of 0.992 for the discrimination of kinase families. Our approach was extensively tested on well-studied families CDK and MAPK, wherein specific protein interaction sites and substrate recognition sites were successfully detected (p-value < 0.05). We also find that the known family-specific oncogenic driver mutation sites were scored high by our method. The method was applied to all known kinases encompassing 107 families from diverse eukaryotic organisms leading to a comprehensive list of family-specific functional sites. Apart from other uses, our method facilitates identification of specific protein interaction sites and drug target sites in a kinase family.
Collapse
Affiliation(s)
- Raju Kalaivani
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Raju Reema
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | |
Collapse
|
31
|
Neuwald AF, Aravind L, Altschul SF. Inferring joint sequence-structural determinants of protein functional specificity. eLife 2018; 7. [PMID: 29336305 PMCID: PMC5770160 DOI: 10.7554/elife.29880] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023] Open
Abstract
Residues responsible for allostery, cooperativity, and other subtle but functionally important interactions remain difficult to detect. To aid such detection, we employ statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. We identify such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues. Such interactions are derived either from atomic coordinates or from Direct Coupling Analysis scores, used as surrogates for structural distances. Applying this approach to N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases yielded results congruent with biochemical understanding of these proteins, and also revealed striking sequence-structural features overlooked by other methods. These and similar analyses can aid the design of drugs targeting allosteric sites.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States.,Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, United States
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| | - Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| |
Collapse
|
32
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
33
|
Slama P. Two-domain analysis of JmjN-JmjC and PHD-JmjC lysine demethylases: Detecting an inter-domain evolutionary stress. Proteins 2017; 86:3-12. [PMID: 28975662 DOI: 10.1002/prot.25394] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/26/2017] [Accepted: 10/03/2017] [Indexed: 11/09/2022]
Abstract
Residues at different positions of a multiple sequence alignment sometimes evolve together, due to a correlated structural or functional stress at these positions. Co-evolution has thus been evidenced computationally in multiple proteins or protein domains. Here, we wish to study whether an evolutionary stress is exerted on a sequence alignment across protein domains, i.e., on longer sequence separations than within a single protein domain. JmjC-containing lysine demethylases were chosen for analysis, as a follow-up to previous studies; these proteins are important multidomain epigenetic regulators. In these proteins, the JmjC domain is responsible for the demethylase activity, and surrounding domains interact with histones, DNA or partner proteins. This family of enzymes was analyzed at the sequence level, in order to determine whether the sequence of JmjC-domains was affected by the presence of a neighboring JmjN domain or PHD finger in the protein. Multiple positions within JmjC sequences were shown to have their residue distributions significantly altered by the presence of the second domain. Structural considerations confirmed the relevance of the analysis for JmjN-JmjC proteins, while among PHD-JmjC proteins, the length of the linker region could be correlated to the residues observed at the most affected positions. The correlation of domain architecture with residue types at certain positions, as well as that of overall architecture with protein function, is discussed. The present results thus evidence the existence of an across-domain evolutionary stress in JmjC-containing demethylases, and provide further insights into the overall domain architecture of JmjC domain-containing proteins.
Collapse
Affiliation(s)
- Patrick Slama
- Independent researcher, Paris, France; Center for Imaging Science, the Johns Hopkins University, Clark Hall, 3400 N Charles Street, Baltimore, Maryland, 21218
| |
Collapse
|
34
|
Anderson KA, Huynh FK, Fisher-Wellman K, Stuart JD, Peterson BS, Douros JD, Wagner GR, Thompson JW, Madsen AS, Green MF, Sivley RM, Ilkayeva OR, Stevens RD, Backos DS, Capra JA, Olsen CA, Campbell JE, Muoio DM, Grimsrud PA, Hirschey MD. SIRT4 Is a Lysine Deacylase that Controls Leucine Metabolism and Insulin Secretion. Cell Metab 2017; 25:838-855.e15. [PMID: 28380376 PMCID: PMC5444661 DOI: 10.1016/j.cmet.2017.03.003] [Citation(s) in RCA: 245] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Revised: 09/26/2016] [Accepted: 03/06/2017] [Indexed: 01/17/2023]
Abstract
Sirtuins are NAD+-dependent protein deacylases that regulate several aspects of metabolism and aging. In contrast to the other mammalian sirtuins, the primary enzymatic activity of mitochondrial sirtuin 4 (SIRT4) and its overall role in metabolic control have remained enigmatic. Using a combination of phylogenetics, structural biology, and enzymology, we show that SIRT4 removes three acyl moieties from lysine residues: methylglutaryl (MG)-, hydroxymethylglutaryl (HMG)-, and 3-methylglutaconyl (MGc)-lysine. The metabolites leading to these post-translational modifications are intermediates in leucine oxidation, and we show a primary role for SIRT4 in controlling this pathway in mice. Furthermore, we find that dysregulated leucine metabolism in SIRT4KO mice leads to elevated basal and stimulated insulin secretion, which progressively develops into glucose intolerance and insulin resistance. These findings identify a robust enzymatic activity for SIRT4, uncover a mechanism controlling branched-chain amino acid flux, and position SIRT4 as a crucial player maintaining insulin secretion and glucose homeostasis during aging.
Collapse
Affiliation(s)
- Kristin A Anderson
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA; Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA
| | - Frank K Huynh
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Kelsey Fisher-Wellman
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - J Darren Stuart
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Brett S Peterson
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Jonathan D Douros
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Gregory R Wagner
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - J Will Thompson
- Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA; Duke Proteomics and Metabolomics Shared Resource, Duke University Medical Center, Durham, NC 27710, USA
| | - Andreas S Madsen
- Center for Biopharmaceuticals and Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Michelle F Green
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - R Michael Sivley
- Department of Biological Sciences, Department of Biomedical Informatics, Vanderbilt Genetics Institute, Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Olga R Ilkayeva
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Robert D Stevens
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Donald S Backos
- Computational Chemistry and Biology Core Facility, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - John A Capra
- Department of Biological Sciences, Department of Biomedical Informatics, Vanderbilt Genetics Institute, Center for Structural Biology, Vanderbilt University, Nashville, TN 37235, USA
| | - Christian A Olsen
- Center for Biopharmaceuticals and Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Universitetsparken 2, 2100 Copenhagen, Denmark
| | - Jonathan E Campbell
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA; Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA; Department of Medicine, Division of Endocrinology, Metabolism, and Nutrition, Duke University Medical Center, Durham, NC 27710, USA
| | - Deborah M Muoio
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA; Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA; Department of Medicine, Division of Endocrinology, Metabolism, and Nutrition, Duke University Medical Center, Durham, NC 27710, USA
| | - Paul A Grimsrud
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA
| | - Matthew D Hirschey
- Duke Molecular Physiology Institute and Sarah W. Stedman Nutrition and Metabolism Center, Duke University Medical Center, Durham, NC 27701, USA; Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, NC 27710, USA; Department of Medicine, Division of Endocrinology, Metabolism, and Nutrition, Duke University Medical Center, Durham, NC 27710, USA.
| |
Collapse
|
35
|
CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. Methods Mol Biol 2017; 1558:79-110. [PMID: 28150234 DOI: 10.1007/978-1-4939-6783-4_4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.
Collapse
|
36
|
Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016; 12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open
Abstract
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, Baltimore, MD, United States of America
- * E-mail:
| | - Stephen F. Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
37
|
Singh C, Glaab E, Linster CL. Molecular Identification of d-Ribulokinase in Budding Yeast and Mammals. J Biol Chem 2016; 292:1005-1028. [PMID: 27909055 DOI: 10.1074/jbc.m116.760744] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 11/29/2016] [Indexed: 12/13/2022] Open
Abstract
Proteomes of even well characterized organisms still contain a high percentage of proteins with unknown or uncertain molecular and/or biological function. A significant fraction of those proteins is predicted to have catalytic properties. Here we aimed at identifying the function of the Saccharomyces cerevisiae Ydr109c protein and its human homolog FGGY, both of which belong to the broadly conserved FGGY family of carbohydrate kinases. Functionally identified members of this family phosphorylate 3- to 7-carbon sugars or sugar derivatives, but the endogenous substrate of S. cerevisiae Ydr109c and human FGGY has remained unknown. Untargeted metabolomics analysis of an S. cerevisiae deletion mutant of YDR109C revealed ribulose as one of the metabolites with the most significantly changed intracellular concentration as compared with a wild-type strain. In human HEK293 cells, ribulose could only be detected when ribitol was added to the cultivation medium, and under this condition, FGGY silencing led to ribulose accumulation. Biochemical characterization of the recombinant purified Ydr109c and FGGY proteins showed a clear substrate preference of both kinases for d-ribulose over a range of other sugars and sugar derivatives tested, including l-ribulose. Detailed sequence and structural analyses of Ydr109c and FGGY as well as homologs thereof furthermore allowed the definition of a 5-residue d-ribulokinase signature motif (TCSLV). The physiological role of the herein identified eukaryotic d-ribulokinase remains unclear, but we speculate that S. cerevisiae Ydr109c and human FGGY could act as metabolite repair enzymes, serving to re-phosphorylate free d-ribulose generated by promiscuous phosphatases from d-ribulose 5-phosphate. In human cells, FGGY can additionally participate in ribitol metabolism.
Collapse
Affiliation(s)
- Charandeep Singh
- From the Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Enrico Glaab
- From the Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Carole L Linster
- From the Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
38
|
Abstract
The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β-lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β-lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β-lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.
Collapse
Affiliation(s)
- Pradeep Bhadola
- Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India
| | - Nivedita Deo
- Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India
| |
Collapse
|
39
|
Sloutsky R, Naegle KM. High-Resolution Identification of Specificity Determining Positions in the LacI Protein Family Using Ensembles of Sub-Sampled Alignments. PLoS One 2016; 11:e0162579. [PMID: 27681038 PMCID: PMC5040260 DOI: 10.1371/journal.pone.0162579] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 08/08/2016] [Indexed: 01/24/2023] Open
Abstract
Since the advent of large-scale genomic sequencing, and the consequent availability of large numbers of homologous protein sequences, there has been burgeoning development of methods for extracting functional information from multiple sequence alignments (MSAs). One type of analysis seeks to identify specificity determining positions (SDPs) based on the assumption that such positions are highly conserved within groups of sequences sharing functional specificity, but conserved to different amino acids in different specificity groups. This unsupervised approach to utilizing evolutionary information may elucidate mechanisms of specificity in protein-protein interactions, catalytic activity of enzymes, sensitivity to allosteric regulation, and other types of protein functionality. We present an analysis of SDPs in the LacI family of transcriptional regulators in which we 1) relax the constraint that all specificity groups must contribute to SDP signal, and 2) use a novel approach to robust treatment of sequence alignment uncertainty based on sub-sampling. We find that the vast majority of SDP signal occurs at positions with a conservation pattern that significantly complicates detection by previously described methods. This pattern, which we term “partial SDP”, consists of the commonly accepted SDP conservation pattern among a subset of specificity groups and strong degeneracy among the rest. An upshot of this fact is that the SDP complement of every specificity group appears to be unique. Additionally, sub-sampling gives us the ability to assign a confidence interval to the SDP score, as well as increase fidelity, as compared to analysis of a single, comprehensive alignment—the current standard in multiple sequence alignment methodologies.
Collapse
Affiliation(s)
- Roman Sloutsky
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
| | - Kristen M. Naegle
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- * E-mail:
| |
Collapse
|
40
|
Moll M, Finn PW, Kavraki LE. Structure-guided selection of specificity determining positions in the human Kinome. BMC Genomics 2016; 17 Suppl 4:431. [PMID: 27556159 PMCID: PMC5001202 DOI: 10.1186/s12864-016-2790-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The human kinome contains many important drug targets. It is well-known that inhibitors of protein kinases bind with very different selectivity profiles. This is also the case for inhibitors of many other protein families. The increased availability of protein 3D structures has provided much information on the structural variation within a given protein family. However, the relationship between structural variations and binding specificity is complex and incompletely understood. We have developed a structural bioinformatics approach which provides an analysis of key determinants of binding selectivity as a tool to enhance the rational design of drugs with a specific selectivity profile. Results We propose a greedy algorithm that computes a subset of residue positions in a multiple sequence alignment such that structural and chemical variation in those positions helps explain known binding affinities. By providing this information, the main purpose of the algorithm is to provide experimentalists with possible insights into how the selectivity profile of certain inhibitors is achieved, which is useful for lead optimization. In addition, the algorithm can also be used to predict binding affinities for structures whose affinity for a given inhibitor is unknown. The algorithm’s performance is demonstrated using an extensive dataset for the human kinome. Conclusion We show that the binding affinity of 38 different kinase inhibitors can be explained with consistently high precision and accuracy using the variation of at most six residue positions in the kinome binding site. We show for several inhibitors that we are able to identify residues that are known to be functionally important.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA.
| | - Paul W Finn
- University of Buckingham, Hunter St, Buckingham, UK
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA
| |
Collapse
|
41
|
Boari de Lima E, Meira W, de Melo-Minardi RC. Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering. PLoS Comput Biol 2016; 12:e1005001. [PMID: 27348631 PMCID: PMC4922564 DOI: 10.1371/journal.pcbi.1005001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/22/2016] [Indexed: 01/14/2023] Open
Abstract
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
Collapse
Affiliation(s)
- Elisa Boari de Lima
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | |
Collapse
|
42
|
Lee D, Das S, Dawson NL, Dobrijevic D, Ward J, Orengo C. Novel Computational Protocols for Functionally Classifying and Characterising Serine Beta-Lactamases. PLoS Comput Biol 2016; 12:e1004926. [PMID: 27332861 PMCID: PMC4917113 DOI: 10.1371/journal.pcbi.1004926] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 04/19/2016] [Indexed: 11/23/2022] Open
Abstract
Beta-lactamases represent the main bacterial mechanism of resistance to beta-lactam antibiotics and are a significant challenge to modern medicine. We have developed an automated classification and analysis protocol that exploits structure- and sequence-based approaches and which allows us to propose a grouping of serine beta-lactamases that more consistently captures and rationalizes the existing three classification schemes: Classes, (A, C and D, which vary in their implementation of the mechanism of action); Types (that largely reflect evolutionary distance measured by sequence similarity); and Variant groups (which largely correspond with the Bush-Jacoby clinical groups). Our analysis platform exploits a suite of in-house and public tools to identify Functional Determinants (FDs), i.e. residue sites, responsible for conferring different phenotypes between different classes, different types and different variants. We focused on Class A beta-lactamases, the most highly populated and clinically relevant class, to identify FDs implicated in the distinct phenotypes associated with different Class A Types and Variants. We show that our FunFHMMer method can separate the known beta-lactamase classes and identify those positions likely to be responsible for the different implementations of the mechanism of action in these enzymes. Two novel algorithms, ASSP and SSPA, allow detection of FD sites likely to contribute to the broadening of the substrate profiles. Using our approaches, we recognise 151 Class A types in UniProt. Finally, we used our beta-lactamase FunFams and ASSP profiles to detect 4 novel Class A types in microbiome samples. Our platforms have been validated by literature studies, in silico analysis and some targeted experimental verification. Although developed for the serine beta-lactamases they could be used to classify and analyse any diverse protein superfamily where sub-families have diverged over both long and short evolutionary timescales.
Collapse
Affiliation(s)
- David Lee
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Sayoni Das
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Natalie L. Dawson
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Dragana Dobrijevic
- Department of Biochemical Engineering, University College London, London, United Kingdom
| | - John Ward
- Department of Biochemical Engineering, University College London, London, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
43
|
Huang Y, Wang X, Ge S, Rao GY. Divergence and adaptive evolution of the gibberellin oxidase genes in plants. BMC Evol Biol 2015; 15:207. [PMID: 26416509 PMCID: PMC4587577 DOI: 10.1186/s12862-015-0490-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/17/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The important phytohormone gibberellins (GAs) play key roles in various developmental processes. GA oxidases (GAoxs) are critical enzymes in GA synthesis pathway, but their classification, evolutionary history and the forces driving the evolution of plant GAox genes remain poorly understood. RESULTS This study provides the first large-scale evolutionary analysis of GAox genes in plants by using an extensive whole-genome dataset of 41 species, representing green algae, bryophytes, pteridophyte, and seed plants. We defined eight subfamilies under the GAox family, namely C19-GA2ox, C20-GA2ox, GA20ox,GA3ox, GAox-A, GAox-B, GAox-C and GAox-D. Of these, subfamilies GAox-A, GAox-B, GAox-C and GAox-D are described for the first time. On the basis of phylogenetic analyses and characteristic motifs of GAox genes, we demonstrated a rapid expansion and functional divergence of the GAox genes during the diversification of land plants. We also detected the subfamily-specific motifs and potential sites of some GAox genes, which might have evolved under positive selection. CONCLUSIONS GAox genes originated very early-before the divergence of bryophytes and the vascular plants and the diversification of GAox genes is associated with the functional divergence and could be driven by positive selection. Our study not only provides information on the classification of GAox genes, but also facilitates the further functional characterization and analysis of GA oxidases.
Collapse
Affiliation(s)
- Yuan Huang
- College of Life Sciences, Peking University, Beijing, 100871, China.
| | - Xi Wang
- College of Life Sciences, Peking University, Beijing, 100871, China.
| | - Song Ge
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China.
| | - Guang-Yuan Rao
- College of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
44
|
Chagoyen M, García-Martín JA, Pazos F. Practical analysis of specificity-determining residues in protein families. Brief Bioinform 2015; 17:255-61. [DOI: 10.1093/bib/bbv045] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/15/2015] [Indexed: 12/17/2022] Open
|
45
|
Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 2015; 31:3460-7. [PMID: 26139634 PMCID: PMC4612221 DOI: 10.1093/bioinformatics/btv398] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 06/24/2015] [Indexed: 11/18/2022] Open
Abstract
Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact:sayoni.das.12@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
46
|
Revuelta MV, van Kan JAL, Kay J, Ten Have A. Extensive expansion of A1 family aspartic proteinases in fungi revealed by evolutionary analyses of 107 complete eukaryotic proteomes. Genome Biol Evol 2015; 6:1480-94. [PMID: 24869856 PMCID: PMC4079213 DOI: 10.1093/gbe/evu110] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The A1 family of eukaryotic aspartic proteinases (APs) forms one of the 16 AP families. Although one of the best characterized families, the recent increase in genome sequence data has revealed many fungal AP homologs with novel sequence characteristics. This study was performed to explore the fungal AP sequence space and to obtain an in-depth understanding of fungal AP evolution. Using a comprehensive phylogeny of approximately 700 AP sequences from the complete proteomes of 87 fungi and 20 nonfungal eukaryotes, 11 major clades of APs were defined of which clade I largely corresponds to the A1A subfamily of pepsin-archetype APs. Clade II largely corresponds to the A1B subfamily of nepenthesin-archetype APs. Remarkably, the nine other clades contain only fungal APs, thus indicating that fungal APs have undergone a large sequence diversification. The topology of the tree indicates that fungal APs have been subject to both “birth and death” evolution and “functional redundancy and diversification.” This is substantiated by coclustering of certain functional sequence characteristics. A meta-analysis toward the identification of Cluster Determining Positions (CDPs) was performed in order to investigate the structural and biochemical basis for diversification. Seven CDPs contribute to the secondary structure of the enzyme. Three other CDPs are found in the vicinity of the substrate binding cleft. Tree topology, the large sequence variation among fungal APs, and the apparent functional diversification suggest that an amendment to update the current A1 AP classification based on a comprehensive phylogenetic clustering might contribute to refinement of the classification in the MEROPS peptidase database.
Collapse
Affiliation(s)
- María V Revuelta
- Instituto de Investigaciones Biológicas-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
| | - Jan A L van Kan
- Laboratory of Phytopathology, Wageningen University, The Netherlands
| | - John Kay
- School of Biosciences, Cardiff University, United Kingdom
| | - Arjen Ten Have
- Instituto de Investigaciones Biológicas-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
| |
Collapse
|
47
|
Sillitoe I, Lewis TE, Cuff A, Das S, Ashford P, Dawson NL, Furnham N, Laskowski RA, Lee D, Lees JG, Lehtinen S, Studer RA, Thornton J, Orengo CA. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 2015; 43:D376-81. [PMID: 25348408 PMCID: PMC4384018 DOI: 10.1093/nar/gku947] [Citation(s) in RCA: 309] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 09/29/2014] [Indexed: 11/19/2022] Open
Abstract
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Tony E Lewis
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Alison Cuff
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Paul Ashford
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Nicholas Furnham
- London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Roman A Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Sonja Lehtinen
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Romain A Studer
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Janet Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 636 Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
48
|
Lee TW, Yang ASP, Brittain T, Birch NP. An analysis approach to identify specific functional sites in orthologous proteins using sequence and structural information: application to neuroserpin reveals regions that differentially regulate inhibitory activity. Proteins 2015; 83:135-52. [PMID: 25363759 DOI: 10.1002/prot.24711] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 10/22/2014] [Accepted: 10/27/2014] [Indexed: 01/12/2023]
Abstract
The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution-matrix-based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface-exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site-directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins.
Collapse
Affiliation(s)
- Tet Woo Lee
- School of Biological Sciences and Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | | | | | | |
Collapse
|
49
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
50
|
Abstract
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod.
Collapse
Affiliation(s)
- Morena Pappalardo
- Centre for Molecular Processing, School of Biosciences, University of Kent, CT2 7NH, UK
| | - Mark N Wass
- Centre for Molecular Processing, School of Biosciences, University of Kent, CT2 7NH, UK
| |
Collapse
|