1
|
Malatesta M, Fornasier E, Di Salvo ML, Tramonti A, Zangelmi E, Peracchi A, Secchi A, Polverini E, Giachin G, Battistutta R, Contestabile R, Percudani R. One substrate many enzymes virtual screening uncovers missing genes of carnitine biosynthesis in human and mouse. Nat Commun 2024; 15:3199. [PMID: 38615009 PMCID: PMC11016064 DOI: 10.1038/s41467-024-47466-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 03/26/2024] [Indexed: 04/15/2024] Open
Abstract
The increasing availability of experimental and computational protein structures entices their use for function prediction. Here we develop an automated procedure to identify enzymes involved in metabolic reactions by assessing substrate conformations docked to a library of protein structures. By screening AlphaFold-modeled vitamin B6-dependent enzymes, we find that a metric based on catalytically favorable conformations at the enzyme active site performs best (AUROC Score=0.84) in identifying genes associated with known reactions. Applying this procedure, we identify the mammalian gene encoding hydroxytrimethyllysine aldolase (HTMLA), the second enzyme of carnitine biosynthesis. Upon experimental validation, we find that the top-ranked candidates, serine hydroxymethyl transferase (SHMT) 1 and 2, catalyze the HTMLA reaction. However, a mouse protein absent in humans (threonine aldolase; Tha1) catalyzes the reaction more efficiently. Tha1 did not rank highest based on the AlphaFold model, but its rank improved to second place using the experimental crystal structure we determined at 2.26 Å resolution. Our findings suggest that humans have lost a gene involved in carnitine biosynthesis, with HTMLA activity of SHMT partially compensating for its function.
Collapse
Affiliation(s)
- Marco Malatesta
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | | | - Martino Luigi Di Salvo
- Istituto Pasteur Italia-Fondazione Cenci Bolognetti and Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Rome, Rome, Italy
| | - Angela Tramonti
- Institute of Molecular Biology and Pathology, Italian National Research Council, Rome, Italy
| | - Erika Zangelmi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Alessio Peracchi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Andrea Secchi
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy
| | - Eugenia Polverini
- Department of Mathematical, Physical and Computer Sciences, University of Parma, Parma, Italy
| | - Gabriele Giachin
- Department of Chemical Sciences, University of Padua, Padova, Italy
| | | | - Roberto Contestabile
- Istituto Pasteur Italia-Fondazione Cenci Bolognetti and Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Rome, Rome, Italy.
| | - Riccardo Percudani
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, Parma, Italy.
| |
Collapse
|
2
|
Freiberger MI, Ruiz-Serra V, Pontes C, Romero-Durana M, Galaz-Davison P, Ramírez-Sarmiento CA, Schuster CD, Marti MA, Wolynes PG, Ferreiro DU, Parra RG, Valencia A. Local energetic frustration conservation in protein families and superfamilies. Nat Commun 2023; 14:8379. [PMID: 38104123 PMCID: PMC10725452 DOI: 10.1038/s41467-023-43801-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 11/20/2023] [Indexed: 12/19/2023] Open
Abstract
Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability on protein families. Here we present a methodology to analyze local frustration patterns within protein families and superfamilies that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We analyze these signals in very well studied protein families such as PDZ, SH3, ɑ and β globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We apply our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as the ones belonging to emergent pathogens.
Collapse
Affiliation(s)
- Maria I Freiberger
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - Victoria Ruiz-Serra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Camila Pontes
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Miguel Romero-Durana
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
| | - Pablo Galaz-Davison
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Cesar A Ramírez-Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine, and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, 7820436, Chile
- ANID - Millennium Science Initiative Program - Millennium Institute for Integrative Biology (iBio), Santiago, 8331150, Chile
| | - Claudio D Schuster
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Marcelo A Marti
- Laboratorio de Bioinformática, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, C1428EGA, Buenos Aires, Argentina
| | - Peter G Wolynes
- Center for Theoretical Biological Physics and Department of Chemistry, Rice University, Houston, TX, 77005, USA
| | - Diego U Ferreiro
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica - IQUIBICEN/CONICET, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, C1428EGA, Argentina
| | - R Gonzalo Parra
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain.
| | - Alfonso Valencia
- Computational Biology Group, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
3
|
Dennler O, Coste F, Blanquart S, Belleannée C, Théret N. Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family. PLoS Comput Biol 2023; 19:e1011404. [PMID: 37651409 PMCID: PMC10499240 DOI: 10.1371/journal.pcbi.1011404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 09/13/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.
Collapse
Affiliation(s)
- Olivier Dennler
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
- Univ Rennes, Inserm, EHESP, Irset, UMR S1085, Rennes, France
| | - François Coste
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
| | | | | | - Nathalie Théret
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
- Univ Rennes, Inserm, EHESP, Irset, UMR S1085, Rennes, France
| |
Collapse
|
4
|
Liu G, Ekmen E, Jalalypour F, Mertens HDT, Jeffries CM, Svergun D, Atilgan AR, Atilgan C, Sayers Z. Conformational multiplicity of bacterial ferric binding protein revealed by small angle x-ray scattering and molecular dynamics calculations. J Chem Phys 2023; 158:085101. [PMID: 36859088 DOI: 10.1063/5.0136558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
This study combines molecular dynamics (MD) simulations with small angle x-ray scattering (SAXS) measurements to investigate the range of conformations that can be adopted by a pH/ionic strength (IS) sensitive protein and to quantify its distinct populations in solution. To explore how the conformational distribution of proteins may be modified in the environmental niches of biological media, we focus on the periplasmic ferric binding protein A (FbpA) from Haemophilus influenzae involved in the mechanism by which bacteria capture iron from higher organisms. We examine iron-binding/release mechanisms of FbpA in varying conditions simulating its biological environment. While we show that these changes fall within the detectable range for SAXS as evidenced by differences observed in the theoretical scattering patterns calculated from the crystal structure models of apo and holo forms, detection of conformational changes due to the point mutation D52A and changes in ionic strength (IS) from SAXS scattering profiles have been challenging. Here, to reach conclusions, statistical analyses with SAXS profiles and results from different techniques were combined in a complementary fashion. The SAXS data complemented by size exclusion chromatography point to multiple and/or alternative conformations at physiological IS, whereas they are well-explained by single crystallographic structures in low IS buffers. By fitting the SAXS data with unique conformations sampled by a series of MD simulations under conditions mimicking the buffers, we quantify the populations of the occupied substates. We also find that the D52A mutant that we predicted by coarse-grained computational modeling to allosterically control the iron binding site in FbpA, responds to the environmental changes in our experiments with conformational selection scenarios that differ from those of the wild type.
Collapse
Affiliation(s)
- Goksin Liu
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| | - Erhan Ekmen
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| | - Farzaneh Jalalypour
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| | - Haydyn D T Mertens
- European Molecular Biology Laboratory - Hamburg Unit, Notkestrasse 85, 22603 Hamburg, Germany
| | - Cy M Jeffries
- European Molecular Biology Laboratory - Hamburg Unit, Notkestrasse 85, 22603 Hamburg, Germany
| | - Dmitri Svergun
- European Molecular Biology Laboratory - Hamburg Unit, Notkestrasse 85, 22603 Hamburg, Germany
| | - Ali Rana Atilgan
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| | - Canan Atilgan
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| | - Zehra Sayers
- Sabanci University, Faculty of Engineering and Natural Sciences, Orhanli, Tuzla, 34956 Istanbul, Türkiye
| |
Collapse
|
5
|
Durairaj J, de Ridder D, van Dijk AD. Beyond sequence: Structure-based machine learning. Comput Struct Biotechnol J 2022; 21:630-643. [PMID: 36659927 PMCID: PMC9826903 DOI: 10.1016/j.csbj.2022.12.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 12/21/2022] [Accepted: 12/21/2022] [Indexed: 12/31/2022] Open
Abstract
Recent breakthroughs in protein structure prediction demarcate the start of a new era in structural bioinformatics. Combined with various advances in experimental structure determination and the uninterrupted pace at which new structures are published, this promises an age in which protein structure information is as prevalent and ubiquitous as sequence. Machine learning in protein bioinformatics has been dominated by sequence-based methods, but this is now changing to make use of the deluge of rich structural information as input. Machine learning methods making use of structures are scattered across literature and cover a number of different applications and scopes; while some try to address questions and tasks within a single protein family, others aim to capture characteristics across all available proteins. In this review, we look at the variety of structure-based machine learning approaches, how structures can be used as input, and typical applications of these approaches in protein biology. We also discuss current challenges and opportunities in this all-important and increasingly popular field.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D.J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
6
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
7
|
Sen N, Anishchenko I, Bordin N, Sillitoe I, Velankar S, Baker D, Orengo C. Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs. Brief Bioinform 2022; 23:6596316. [PMID: 35641150 PMCID: PMC9294430 DOI: 10.1093/bib/bbac187] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 12/12/2022] Open
Abstract
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein–protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
Collapse
Affiliation(s)
- Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA.,Institute for Protein Design, University of Washington, Seattle, WA 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| |
Collapse
|
8
|
A Comprehensive Review of Computation-Based Metal-Binding Prediction Approaches at the Residue Level. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8965712. [PMID: 35402609 PMCID: PMC8989566 DOI: 10.1155/2022/8965712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 03/04/2022] [Indexed: 12/29/2022]
Abstract
Clear evidence has shown that metal ions strongly connect and delicately tune the dynamic homeostasis in living bodies. They have been proved to be associated with protein structure, stability, regulation, and function. Even small changes in the concentration of metal ions can shift their effects from natural beneficial functions to harmful. This leads to degenerative diseases, malignant tumors, and cancers. Accurate characterizations and predictions of metalloproteins at the residue level promise informative clues to the investigation of intrinsic mechanisms of protein-metal ion interactions. Compared to biophysical or biochemical wet-lab technologies, computational methods provide open web interfaces of high-resolution databases and high-throughput predictors for efficient investigation of metal-binding residues. This review surveys and details 18 public databases of metal-protein binding. We collect a comprehensive set of 44 computation-based methods and classify them into four categories, namely, learning-, docking-, template-, and meta-based methods. We analyze the benchmark datasets, assessment criteria, feature construction, and algorithms. We also compare several methods on two benchmark testing datasets and include a discussion about currently publicly available predictive tools. Finally, we summarize the challenges and underlying limitations of the current studies and propose several prospective directions concerning the future development of the related databases and methods.
Collapse
|
9
|
Shegay MV, Švedas VK, Voevodin VV, Suplatov DA, Popova NN. Guide tree optimization with genetic algorithm to improve multiple protein 3D-structure alignment. Bioinformatics 2022; 38:985-989. [PMID: 34849594 DOI: 10.1093/bioinformatics/btab798] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 10/23/2021] [Accepted: 11/19/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION With the increasing availability of 3D-data, the focus of comparative bioinformatic analysis is shifting from protein sequence alignments toward more content-rich 3D-alignments. This raises the need for new ways to improve the accuracy of 3D-superimposition. RESULTS We proposed guide tree optimization with genetic algorithm (GA) as a universal tool to improve the alignment quality of multiple protein 3D-structures systematically. As a proof of concept, we implemented the suggested GA-based approach in popular Matt and Caretta multiple protein 3D-structure alignment (M3DSA) algorithms, leading to a statistically significant improvement of the TM-score quality indicator by up to 220-1523% on 'SABmark Superfamilies' (in 49-77% of cases) and 'SABmark Twilight' (in 59-80% of cases) datasets. The observed improvement in collections of distant homologies highlights the potentials of GA to optimize 3D-alignments of diverse protein superfamilies as one plausible tool to study the structure-function relationship. AVAILABILITY AND IMPLEMENTATION The source codes of patched gaCaretta and gaMatt programs are available open-access at https://github.com/n-canter/gamaps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maksim V Shegay
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Vytas K Švedas
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.,Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Vladimir V Voevodin
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia.,Research Computing Center, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Dmitry A Suplatov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| | - Nina N Popova
- Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Vorobjev Hills, Moscow 119991, Russia
| |
Collapse
|
10
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|