1
|
Chatonnet A, Perochon M, Velluet E, Marchot P. The ESTHER database on alpha/beta hydrolase fold proteins - An overview of recent developments. Chem Biol Interact 2023; 383:110671. [PMID: 37582413 DOI: 10.1016/j.cbi.2023.110671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/01/2023] [Accepted: 08/12/2023] [Indexed: 08/17/2023]
Abstract
The ESTHER database, dedicated to ESTerases and alpha/beta-Hydrolase Enzymes and Relatives (https://bioweb.supagro.inra.fr/ESTHER/general?what=index), offers online access to a continuously updated, sequence-based classification of proteins harboring the alpha/beta hydrolase fold into families and subfamilies. In particular, the database proposes links to the sequences, structures, ligands and huge diversity of functions of these proteins, and to the related literature and other databases. Taking advantage of the promiscuity of enzymatic function, many engineered esterases, lipases, epoxide-hydrolases, haloalkane dehalogenases are used for biotechnological applications. Finding means for detoxifying those protein members that are targeted by insecticides, herbicides, antibiotics, or for reactivating human cholinesterases when inhibited by nerve gas, are still active areas of research. Using or improving the capacity of some enzymes to breakdown plastics with the aim to recycle valuable material and reduce waste is an emerging challenge. Most hydrolases in the superfamily are water-soluble and act on or are inhibited by small organic compounds, yet in a few subfamilies some members interact with other, unrelated proteins to modulate activity or trigger functional partnerships. Recent development in 3D structure prediction brought by AI-based programs now permits analysis of enzymatic mechanisms for a variety of hydrolases with no experimental 3D structure available. Finally, mutations in as many as 34 of the 120 human genes compiled in the database are now linked to genetic diseases, a feature fueling research on early detection, metabolic pathways, pharmacological treatment or enzyme replacement therapy. Here we review those developments in the database that took place over the latest decade and discuss potential new applications and recent and future expected research in the field.
Collapse
Affiliation(s)
- Arnaud Chatonnet
- DMEM, Université de Montpellier, INRAE, 34000 Montpellier, France.
| | - Michel Perochon
- DMEM, Université de Montpellier, INRAE, 34000 Montpellier, France
| | - Eric Velluet
- INRAE-AgroM / UIC, Place Viala, 34060, Montpellier, France
| | - Pascale Marchot
- CNRS / Aix-Marseille Univ, lab Architecture et Fonction des Macromolécules Biologiques, Marseille, France
| |
Collapse
|
2
|
Rademaker DT, Xue LC, ‘t Hoen PAC, Vriend G. Entropy and Variability: A Second Opinion by Deep Learning. Biomolecules 2022; 12:biom12121740. [PMID: 36551168 PMCID: PMC9775329 DOI: 10.3390/biom12121740] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/13/2022] [Accepted: 11/19/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures of the distribution of the twenty amino acid types found at evolutionary equivalent positions: the columns in multiple sequence alignments. Commonly used measures are variability, average hydrophobicity, or Shannon entropy. One of these techniques, called entropy-variability analysis, as the name already suggests, reduces the distribution of observed residue types in one column to two numbers: the Shannon entropy and the variability as defined by the number of residue types observed. RESULTS We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments.
Collapse
Affiliation(s)
- Daniel T. Rademaker
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 260 Nijmegen, The Netherlands
| | - Li C. Xue
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 260 Nijmegen, The Netherlands
| | - Peter A. C. ‘t Hoen
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 260 Nijmegen, The Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, 260 Nijmegen, The Netherlands
- Baco Institute for Protein Science (BIPS), Mindoro 5201, Philippines
- Correspondence:
| |
Collapse
|
3
|
Roumia AF, Tsirigos KD, Theodoropoulou MC, Tamposis IA, Hamodrakas SJ, Bagos PG. OMPdb: A Global Hub of Beta-Barrel Outer Membrane Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:646581. [PMID: 36303794 PMCID: PMC9581022 DOI: 10.3389/fbinf.2021.646581] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 03/18/2021] [Indexed: 11/14/2022] Open
Abstract
OMPdb (www.ompdb.org) was introduced as a database for β-barrel outer membrane proteins from Gram-negative bacteria in 2011 and then included 69,354 entries classified into 85 families. The database has been updated continuously using a collection of characteristic profile Hidden Markov Models able to discriminate between the different families of prokaryotic transmembrane β-barrels. The number of families has increased ultimately to a total of 129 families in the current, second major version of OMPdb. New additions have been made in parallel with efforts to update existing families and add novel families. Here, we present the upgrade of OMPdb, which from now on aims to become a global repository for all transmembrane β-barrel proteins, both eukaryotic and bacterial.
Collapse
Affiliation(s)
- Ahmed F. Roumia
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | | | | | - Ioannis A. Tamposis
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
| | - Stavros J. Hamodrakas
- Section of Cell Biology and Biophysics, Department of Biology, School of Sciences, National and Kapodistrian University of Athens, Athens, Greece
| | - Pantelis G. Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
- *Correspondence: Pantelis G. Bagos
| |
Collapse
|
4
|
Lange J, Baakman C, Pistorius A, Krieger E, Hooft R, Joosten RP, Vriend G. Facilities that make the PDB data collection more powerful. Protein Sci 2019; 29:330-344. [PMID: 31724231 PMCID: PMC6933850 DOI: 10.1002/pro.3788] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 11/11/2019] [Indexed: 01/13/2023]
Abstract
We describe a series of databases and tools that directly or indirectly support biomedical research on macromolecules, with focus on their applicability in protein structure bioinformatics research. DSSP, that determines secondary structures of proteins, has been updated to work well with extremely large structures in multiple formats. The PDBREPORT database that lists anomalies in protein structures has been remade to remove many small problems. These reports are now available as PDF-formatted files with a computer-readable summary. The VASE software has been added to analyze and visualize HSSP multiple sequence alignments for protein structures. The Lists collection of databases has been extended with a series of databases, most noticeably with a database that gives each protein structure a grade for usefulness in protein structure bioinformatics projects. The PDB-REDO collection of reanalyzed and re-refined protein structures that were solved by X-ray crystallography has been improved by dealing better with sugar residues and with hydrogen bonds, and adding many missing surface loops. All academic software underlying these protein structure bioinformatics applications and databases are now publicly accessible, either directly from the authors or from the GitHub software repository.
Collapse
Affiliation(s)
- Joanna Lange
- Bio-Prodict, Nijmegen, The Netherlands.,Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands
| | - Coos Baakman
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands
| | - Arthur Pistorius
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands
| | - Elmar Krieger
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands
| | - Rob Hooft
- Department of Computer Science, Dutch Techcentre for Life Sciences (DTL), Amsterdam, The Netherlands.,Department of Computer Science, Vrije Universiteit Amsterdam (VU), Amsterdam, The Netherlands
| | - Robbie P Joosten
- Biochemistry department, Netherlands Cancer Institute (NKI), Amsterdam, The Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboudumc, Nijmegen, The Netherlands.,Baco Institute of Protein Science (BIPS), Mindoro, Philippines
| |
Collapse
|
5
|
Holliday GL, Brown SD, Akiva E, Mischel D, Hicks MA, Morris JH, Huang CC, Meng EC, Pegg SCH, Ferrin TE, Babbitt PC. Biocuration in the structure-function linkage database: the anatomy of a superfamily. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3074783. [PMID: 28365730 PMCID: PMC5467563 DOI: 10.1093/database/bax006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/11/2022]
Abstract
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL:http://sfld.rbvi.ucsf.edu/
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Michael A Hicks
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Human Longevity, Inc, San Diego, CA 92121, USA
| | - John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | | | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
6
|
Affiliation(s)
- Mohamed Helmy
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | | | - Gary D. Bader
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- * E-mail:
| |
Collapse
|
7
|
Rost B, Radivojac P, Bromberg Y. Protein function in precision medicine: deep understanding with machine learning. FEBS Lett 2016; 590:2327-41. [PMID: 27423136 PMCID: PMC5937700 DOI: 10.1002/1873-3468.12307] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 07/12/2016] [Accepted: 07/12/2016] [Indexed: 12/21/2022]
Abstract
Precision medicine and personalized health efforts propose leveraging complex molecular, medical and family history, along with other types of personal data toward better life. We argue that this ambitious objective will require advanced and specialized machine learning solutions. Simply skimming some low-hanging results off the data wealth might have limited potential. Instead, we need to better understand all parts of the system to define medically relevant causes and effects: how do particular sequence variants affect particular proteins and pathways? How do these effects, in turn, cause the health or disease-related phenotype? Toward this end, deeper understanding will not simply diffuse from deeper machine learning, but from more explicit focus on understanding protein function, context-specific protein interaction networks, and impact of variation on both.
Collapse
Affiliation(s)
- Burkhard Rost
- Department of Informatics and Bioinformatics, Institute for Advanced Studies, Technical University of Munich, Garching, Germany
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| |
Collapse
|
8
|
Munk C, Isberg V, Mordalski S, Harpsøe K, Rataj K, Hauser AS, Kolb P, Bojarski AJ, Vriend G, Gloriam DE. GPCRdb: the G protein-coupled receptor database - an introduction. Br J Pharmacol 2016; 173:2195-207. [PMID: 27155948 PMCID: PMC4919580 DOI: 10.1111/bph.13509] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 03/18/2016] [Accepted: 04/24/2016] [Indexed: 12/16/2022] Open
Abstract
GPCRs make up the largest family of human membrane proteins and of drug targets. Recent advances in GPCR pharmacology and crystallography have shed new light on signal transduction, allosteric modulation and biased signalling, translating into new mechanisms and principles for drug design. The GPCR database, GPCRdb, has served the community for over 20 years and has recently been extended to include a more multidisciplinary audience. This review is intended to introduce new users to the services in GPCRdb, which meets three overall purposes: firstly, to provide reference data in an integrated, annotated and structured fashion, with a focus on sequences, structures, single‐point mutations and ligand interactions. Secondly, to equip the community with a suite of web tools for swift analysis of structures, sequence similarities, receptor relationships, and ligand target profiles. Thirdly, to facilitate dissemination through interactive diagrams of, for example, receptor residue topologies, phylogenetic relationships and crystal structure statistics. Herein, these services are described for the first time; visitors and guides are provided with good practices for their utilization. Finally, we describe complementary databases cross‐referenced by GPCRdb and web servers with corresponding functionality.
Collapse
Affiliation(s)
- C Munk
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - V Isberg
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - S Mordalski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - K Harpsøe
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - K Rataj
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - A S Hauser
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - P Kolb
- Department of Pharmaceutical Chemistry, Philipps-University Marburg, Marburg, Germany
| | - A J Bojarski
- Department of Medicinal Chemistry, Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboudumc, Nijmegen, The Netherlands
| | - D E Gloriam
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|