1
|
Soleymani S, Gravel N, Huang LC, Yeung W, Bozorgi E, Bendzunas NG, Kochut KJ, Kannan N. Dark kinase annotation, mining, and visualization using the Protein Kinase Ontology. PeerJ 2023; 11:e16087. [PMID: 38077442 PMCID: PMC10704995 DOI: 10.7717/peerj.16087] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 08/22/2023] [Indexed: 12/18/2023] Open
Abstract
The Protein Kinase Ontology (ProKinO) is an integrated knowledge graph that conceptualizes the complex relationships among protein kinase sequence, structure, function, and disease in a human and machine-readable format. In this study, we have significantly expanded ProKinO by incorporating additional data on expression patterns and drug interactions. Furthermore, we have developed a completely new browser from the ground up to render the knowledge graph visible and interactive on the web. We have enriched ProKinO with new classes and relationships that capture information on kinase ligand binding sites, expression patterns, and functional features. These additions extend ProKinO's capabilities as a discovery tool, enabling it to uncover novel insights about understudied members of the protein kinase family. We next demonstrate the application of ProKinO. Specifically, through graph mining and aggregate SPARQL queries, we identify the p21-activated protein kinase 5 (PAK5) as one of the most frequently mutated dark kinases in human cancers with abnormal expression in multiple cancers, including a previously unappreciated role in acute myeloid leukemia. We have identified recurrent oncogenic mutations in the PAK5 activation loop predicted to alter substrate binding and phosphorylation. Additionally, we have identified common ligand/drug binding residues in PAK family kinases, underscoring ProKinO's potential application in drug discovery. The updated ontology browser and the addition of a web component, ProtVista, which enables interactive mining of kinase sequence annotations in 3D structures and Alphafold models, provide a valuable resource for the signaling community. The updated ProKinO database is accessible at https://prokino.uga.edu.
Collapse
Affiliation(s)
- Saber Soleymani
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Elika Bozorgi
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Nathaniel G. Bendzunas
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States
| | - Krzysztof J. Kochut
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States
| |
Collapse
|
2
|
Chu S, Xie X, Payan C, Stochaj U. Valosin containing protein (VCP): initiator, modifier, and potential drug target for neurodegenerative diseases. Mol Neurodegener 2023; 18:52. [PMID: 37545006 PMCID: PMC10405438 DOI: 10.1186/s13024-023-00639-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 06/27/2023] [Indexed: 08/08/2023] Open
Abstract
The AAA+ ATPase valosin containing protein (VCP) is essential for cell and organ homeostasis, especially in cells of the nervous system. As part of a large network, VCP collaborates with many cofactors to ensure proteostasis under normal, stress, and disease conditions. A large number of mutations have revealed the importance of VCP for human health. In particular, VCP facilitates the dismantling of protein aggregates and the removal of dysfunctional organelles. These are critical events to prevent malfunction of the brain and other parts of the nervous system. In line with this idea, VCP mutants are linked to the onset and progression of neurodegeneration and other diseases. The intricate molecular mechanisms that connect VCP mutations to distinct brain pathologies continue to be uncovered. Emerging evidence supports the model that VCP controls cellular functions on multiple levels and in a cell type specific fashion. Accordingly, VCP mutants derail cellular homeostasis through several mechanisms that can instigate disease. Our review focuses on the association between VCP malfunction and neurodegeneration. We discuss the latest insights in the field, emphasize open questions, and speculate on the potential of VCP as a drug target for some of the most devastating forms of neurodegeneration.
Collapse
Affiliation(s)
- Siwei Chu
- Department of Physiology, McGill University, Montreal, HG3 1Y6, Canada
| | - Xinyi Xie
- Department of Physiology, McGill University, Montreal, HG3 1Y6, Canada
| | - Carla Payan
- Department of Physiology, McGill University, Montreal, HG3 1Y6, Canada
| | - Ursula Stochaj
- Department of Physiology, McGill University, Montreal, HG3 1Y6, Canada.
- Quantitative Life Sciences Program, McGill University, Montreal, Canada.
| |
Collapse
|
3
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
4
|
Salazar GA, Luciani A, Watkins X, Kandasaamy S, Rice DL, Blum M, Bateman A, Martin M. Nightingale: web components for protein feature visualization. Bioinform Adv 2023; 3:vbad064. [PMID: 37359723 PMCID: PMC10287899 DOI: 10.1093/bioadv/vbad064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/20/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023]
Abstract
Motivation The visualization of biological data is a fundamental technique that enables researchers to understand and explain biology. Some of these visualizations have become iconic, for instance: tree views for taxonomy, cartoon rendering of 3D protein structures or tracks to represent features in a gene or protein, for instance in a genome browser. Nightingale provides visualizations in the context of proteins and protein features. Results Nightingale is a library of re-usable data visualization web components that are currently used by UniProt and InterPro, among other projects. The components can be used to display protein sequence features, variants, interaction data, 3D structure, etc. These components are flexible, allowing users to easily view multiple data sources within the same context, as well as compose these components to create a customized view. Availability and implementation Nightingale examples and documentation are freely available at https://ebi-webcomponents.github.io/nightingale/. It is distributed under the MIT license, and its source code can be found at https://github.com/ebi-webcomponents/nightingale.
Collapse
Affiliation(s)
- Gustavo A Salazar
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Aurélien Luciani
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Xavier Watkins
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Swaathi Kandasaamy
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Daniel L Rice
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Matthias Blum
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Alex Bateman
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maria Martin
- Macromolecules, Structure, Chemistry and Bioimaging Section, European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL-EBI), Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
5
|
UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 2023; 51:D523-31. [PMID: 36408920 DOI: 10.1093/nar/gkac1052] [Citation(s) in RCA: 1024] [Impact Index Per Article: 1024.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/05/2022] [Accepted: 10/25/2022] [Indexed: 11/22/2022] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users' experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
Collapse
|
6
|
Roddy JW, Lesica GT, Wheeler TJ. SODA: a TypeScript/JavaScript library for visualizing biological sequence annotation. NAR Genom Bioinform 2022; 4:lqac077. [PMID: 36212708 PMCID: PMC9535774 DOI: 10.1093/nargab/lqac077] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 08/27/2022] [Accepted: 09/27/2022] [Indexed: 12/02/2022] Open
Abstract
We present SODA, a lightweight and open-source visualization library for biological sequence annotations that enables straightforward development of flexible, dynamic and interactive web graphics. SODA is implemented in TypeScript and can be used as a library within TypeScript and JavaScript.
Collapse
Affiliation(s)
- Jack W Roddy
- Department of Computer Science, University of Montana, Missoula, MT 59812, USA
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ 85721, USA
| | - George T Lesica
- Department of Computer Science, University of Montana, Missoula, MT 59812, USA
| | - Travis J Wheeler
- Department of Computer Science, University of Montana, Missoula, MT 59812, USA
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
7
|
Kelleher KJ, Sheils TK, Mathias SL, Yang JJ, Metzger V, Siramshetty V, Nguyen DT, Jensen LJ, Vidović D, Schürer S, Holmes J, Sharma K, Pillai A, Bologa C, Edwards J, Mathé E, Oprea T. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res 2022; 51:D1405-D1416. [PMID: 36624666 PMCID: PMC9825581 DOI: 10.1093/nar/gkac1033] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/12/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Collapse
Affiliation(s)
- Keith J Kelleher
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Stephen L Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen 2200, Copenhagen, Denmark
| | - Dušica Vidović
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Stephan C Schürer
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Karlie R Sharma
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ajay Pillai
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy S Edwards
- Correspondence may also be addressed to Jeremy Edwards. Tel: +1 505 277 6655;
| | - Ewy A Mathé
- To whom correspondence should be addressed. Tel: +1 301 402 8953;
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| |
Collapse
|
8
|
Martin EC, Ion CF, Ifrimescu F, Spiridon L, Bakker J, Goverse A, Petrescu AJ. NLRscape: an atlas of plant NLR proteins. Nucleic Acids Res 2022; 51:D1470-D1482. [PMID: 36350627 PMCID: PMC9825502 DOI: 10.1093/nar/gkac1014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/18/2022] [Accepted: 10/27/2022] [Indexed: 11/11/2022] Open
Abstract
NLRscape is a webserver that curates a collection of over 80 000 plant protein sequences identified in UniProtKB to contain NOD-like receptor signatures, and hosts in addition a number of tools aimed at the exploration of the complex sequence landscape of this class of plant proteins. Each entry gathers sequence information, domain and motif annotations from multiple third-party sources but also in-house advanced annotations aimed at addressing caveats of the existing broad-based annotations. NLRscape provides a top-down perspective of the NLR sequence landscape but also services for assisting a bottom-up approach starting from a given input sequence. Sequences are clustered by their domain organization layout, global homology and taxonomic spread-in order to allow analysis of how particular traits of an NLR family are scattered within the plant kingdom. Tools are provided for users to locate their own protein of interest in the overall NLR landscape, generate custom clusters centered around it and perform a large number of sequence and structural analyses using included interactive online instruments. Amongst these, we mention: taxonomy distribution plots, homology cluster graphs, identity matrices and interactive MSA synchronizing secondary structure and motif predictions. NLRscape can be found at: https://nlrscape.biochim.ro/.
Collapse
Affiliation(s)
- Eliza C Martin
- Department of Bioinformatics and Structural Biochemistry, Institute of Biochemistry of the Romanian Academy, Bucharest 060031, Romania
| | - Catalin F Ion
- Department of Bioinformatics and Structural Biochemistry, Institute of Biochemistry of the Romanian Academy, Bucharest 060031, Romania
| | - Florin Ifrimescu
- Department of Bioinformatics and Structural Biochemistry, Institute of Biochemistry of the Romanian Academy, Bucharest 060031, Romania
| | - Laurentiu Spiridon
- Department of Bioinformatics and Structural Biochemistry, Institute of Biochemistry of the Romanian Academy, Bucharest 060031, Romania
| | - Jaap Bakker
- Laboratory of Nematology, Wageningen University and Research, Wageningen 6700ES, The Netherlands
| | - Aska Goverse
- Laboratory of Nematology, Wageningen University and Research, Wageningen 6700ES, The Netherlands
| | | |
Collapse
|
9
|
Velecký J, Hamsikova M, Stourac J, Musil M, Damborsk J, Bednar D, Mazurenko S. SoluProtMutDB: a manually curated database of protein solubility changes upon mutations. Comput Struct Biotechnol J 2022; 20:6339-6347. [DOI: 10.1016/j.csbj.2022.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/04/2022] [Accepted: 11/04/2022] [Indexed: 11/11/2022] Open
|
10
|
Meng Y, Zhang L, Zhang L, Wang Z, Wang X, Li C, Chen Y, Shang S, Li L. CysModDB: a comprehensive platform with the integration of manually curated resources and analysis tools for cysteine posttranslational modifications. Brief Bioinform 2022; 23:6775608. [PMID: 36305460 PMCID: PMC9677505 DOI: 10.1093/bib/bbac460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 08/27/2022] [Accepted: 09/26/2022] [Indexed: 12/14/2022] Open
Abstract
The unique chemical reactivity of cysteine residues results in various posttranslational modifications (PTMs), which are implicated in regulating a range of fundamental biological processes. With the advent of chemical proteomics technology, thousands of cysteine PTM (CysPTM) sites have been identified from multiple species. A few CysPTM-based databases have been developed, but they mainly focus on data collection rather than various annotations and analytical integration. Here, we present a platform-dubbed CysModDB, integrated with the comprehensive CysPTM resources and analysis tools. CysModDB contains five parts: (1) 70 536 experimentally verified CysPTM sites with annotations of sample origin and enrichment techniques, (2) 21 654 modified proteins annotated with functional regions and structure information, (3) cross-references to external databases such as the protein-protein interactions database, (4) online computational tools for predicting CysPTM sites and (5) integrated analysis tools such as gene enrichment and investigation of sequence features. These parts are integrated using a customized graphic browser and a Basket. The browser uses graphs to represent the distribution of modified sites with different CysPTM types on protein sequences and mapping these sites to the protein structures and functional regions, which assists in exploring cross-talks between the modified sites and their potential effect on protein functions. The Basket connects proteins and CysPTM sites to the analysis tools. In summary, CysModDB is an integrated platform to facilitate the CysPTM research, freely accessible via https://cysmoddb.bioinfogo.org/.
Collapse
Affiliation(s)
| | | | - Laizhi Zhang
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Ziyu Wang
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Xuanwen Wang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Chan Li
- School of Basic Medicine, Qingdao University, Qingdao, China
| | - Yu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Shipeng Shang
- Corresponding authors: Lei Li, Faculty of Biomedical and Rehabilitation Engineering, University of Health and Rehabilitation Sciences, Qingdao 266071, China. Tel/Fax: +86 532 8581 2983; E-mail: ; Shipeng Shang, School of Basic Medicine, Qingdao University, Qingdao 266071, China. Tel.: +86 532 8595 1111; Fax: +86 532 8581 2983; E-mail:
| | - Lei Li
- Corresponding authors: Lei Li, Faculty of Biomedical and Rehabilitation Engineering, University of Health and Rehabilitation Sciences, Qingdao 266071, China. Tel/Fax: +86 532 8581 2983; E-mail: ; Shipeng Shang, School of Basic Medicine, Qingdao University, Qingdao 266071, China. Tel.: +86 532 8595 1111; Fax: +86 532 8581 2983; E-mail:
| |
Collapse
|
11
|
Malladi S, Powell HR, David A, Islam SA, Copeland MM, Kundrotas PJ, Sternberg MJ, Vakser IA. GWYRE: A resource for mapping variants onto experimental and modeled structures of human protein complexes. J Mol Biol 2022; 434:167608. [PMID: 35662458 PMCID: PMC9188266 DOI: 10.1016/j.jmb.2022.167608] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/31/2022] [Accepted: 04/20/2022] [Indexed: 02/08/2023]
Abstract
Structure of protein complexes is important for interpreting genetic variation. Data on single amino acid variants is available from high-throughput sequencing. Integrated modeling approach was applied to proteins and their complexes. GWYRE resource incorporates predicted protein complexes with mapped mutations.
Rapid progress in structural modeling of proteins and their interactions is powered by advances in knowledge-based methodologies along with better understanding of physical principles of protein structure and function. The pool of structural data for modeling of proteins and protein–protein complexes is constantly increasing due to the rapid growth of protein interaction databases and Protein Data Bank. The GWYRE (Genome Wide PhYRE) project capitalizes on these developments by advancing and applying new powerful modeling methodologies to structural modeling of protein–protein interactions and genetic variation. The methods integrate knowledge-based tertiary structure prediction using Phyre2 and quaternary structure prediction using template-based docking by a full-structure alignment protocol to generate models for binary complexes. The predictions are incorporated in a comprehensive public resource for structural characterization of the human interactome and the location of human genetic variants. The GWYRE resource facilitates better understanding of principles of protein interaction and structure/function relationships. The resource is available at http://www.gwyre.org.
Collapse
|
12
|
Perez-Riverol Y, Bai J, Bandla C, García-Seisdedos D, Hewapathirana S, Kamatchinathan S, Kundu D, Prakash A, Frericks-Zipper A, Eisenacher M, Walzer M, Wang S, Brazma A, Vizcaíno J. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 2306] [Impact Index Per Article: 1153.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David García-Seisdedos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suresh Hewapathirana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Selvakumar Kamatchinathan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepti J Kundu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anika Frericks-Zipper
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Martin Eisenacher
- Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, D-44801 Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Shengbo Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
13
|
Abstract
Pseudokinases regulate diverse cellular processes associated with normal cellular functions and disease. They are defined bioinformatically based on the absence of one or more catalytic residues that are required for canonical protein kinase functions. The ability to define pseudokinases based on primary sequence comparison has enabled the systematic mapping and cataloging of pseudokinase orthologs across the tree of life. While these sequences contain critical information regarding pseudokinase evolution and functional specialization, extracting this information and generating testable hypotheses based on integrative mining of sequence and structural data requires specialized computational tools and resources. In this chapter, we review recent advances in the development and application of open-source tools and resources for pseudokinase research. Specifically, we describe the application of an interactive data analytics framework, KinView, for visualizing the patterns of conservation and variation in the catalytic domain motifs of pseudokinases and evolutionarily related canonical kinases using a consistent set of curated alignments organized based on the widely used kinome evolutionary hierarchy. We also demonstrate the application of an integrated Protein Kinase Ontology (ProKinO) and an interactive viewer, ProtVista, for mapping and analyzing primary sequence motifs and annotations in the context of 3D structures and AlphaFold2 models. We provide examples and protocols for generating testable hypotheses on pseudokinase functions both for bench biologists and advanced users.
Collapse
Affiliation(s)
- Brady O’Boyle
- Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - Safal Shrestha
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Krzysztof Kochut
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA
| | - Patrick A Eyers
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, UK
| | - Natarajan Kannan
- Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA,Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA,Corresponding author:
| |
Collapse
|
14
|
Bernhofer M, Dallago C, Karl T, Satagopam V, Heinzinger M, Littmann M, Olenyi T, Qiu J, Schütze K, Yachdav G, Ashkenazy H, Ben-Tal N, Bromberg Y, Goldberg T, Kajan L, O’Donoghue S, Sander C, Schafferhans A, Schlessinger A, Vriend G, Mirdita M, Gawron P, Gu W, Jarosz Y, Trefois C, Steinegger M, Schneider R, Rost B. PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res 2021; 49:W535-W540. [PMID: 33999203 PMCID: PMC8265159 DOI: 10.1093/nar/gkab354] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/06/2021] [Accepted: 05/10/2021] [Indexed: 12/12/2022] Open
Abstract
Since 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein's infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold (apparently without lowering performance of prediction methods); user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.
Collapse
Affiliation(s)
- Michael Bernhofer
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tim Karl
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Venkata Satagopam
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
| | - Tobias Olenyi
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Jiajun Qiu
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Department of Otolaryngology Head & Neck Surgery, The Ninth People's Hospital & Ear Institute, School of Medicine & Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai Jiao Tong University, Shanghai, China
| | - Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Guy Yachdav
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Haim Ashkenazy
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Nir Ben-Tal
- Department of Biochemistry & Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
| | - Tatyana Goldberg
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
| | - Laszlo Kajan
- Roche Polska Sp. z o.o., Domaniewska 39B, 02–672 Warsaw, Poland
| | | | - Chris Sander
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Boston, MA 02142, USA
| | - Andrea Schafferhans
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- HSWT (Hochschule Weihenstephan Triesdorf | University of Applied Sciences), Department of Bioengineering Sciences, Am Hofgarten 10, 85354 Freising, Germany
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Milot Mirdita
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Piotr Gawron
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Wei Gu
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Yohan Jarosz
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Christophe Trefois
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Reinhard Schneider
- Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
- ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
15
|
Fonseca NJ, Afonso MQL, Carrijo L, Bleicher L. CONAN: a web application to detect specificity determinants and functional sites by amino acids co-variation network analysis. Bioinformatics 2021; 37:1026-1028. [PMID: 32780795 DOI: 10.1093/bioinformatics/btaa713] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/01/2020] [Accepted: 08/05/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY CONAN is a web application developed to detect specificity determinants and function-related sites by amino acids co-variation networks analysis, emphasizing local coevolutionary constraints. The software allows the characterization of structurally and functionally relevant groups of residues and their relationship with subsets of sequences by automatic cross-referencing with GO terms, UniprotKb annotations and INTERPRO. AVAILABILITY AND IMPLEMENTATION CONAN is free and open-source, being distributed in the terms of the GPLV3 license. The software is available as a web application and python script versions and can be accessed at http://bioinfo.icb.ufmg.br/conan. We also provide running instructions, the source code and a user guide.
Collapse
Affiliation(s)
- N J Fonseca
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.,Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - M Q L Afonso
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Carrijo
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Bleicher
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
16
|
Hufsky F, Lamkiewicz K, Almeida A, Aouacheria A, Arighi C, Bateman A, Baumbach J, Beerenwinkel N, Brandt C, Cacciabue M, Chuguransky S, Drechsel O, Finn RD, Fritz A, Fuchs S, Hattab G, Hauschild AC, Heider D, Hoffmann M, Hölzer M, Hoops S, Kaderali L, Kalvari I, von Kleist M, Kmiecinski R, Kühnert D, Lasso G, Libin P, List M, Löchel HF, Martin MJ, Martin R, Matschinske J, McHardy AC, Mendes P, Mistry J, Navratil V, Nawrocki EP, O’Toole ÁN, Ontiveros-Palacios N, Petrov AI, Rangel-Pineros G, Redaschi N, Reimering S, Reinert K, Reyes A, Richardson L, Robertson DL, Sadegh S, Singer JB, Theys K, Upton C, Welzel M, Williams L, Marz M. Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research. Brief Bioinform 2021; 22:642-663. [PMID: 33147627 PMCID: PMC7665365 DOI: 10.1093/bib/bbaa232] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 07/28/2020] [Accepted: 08/26/2020] [Indexed: 12/16/2022] Open
Abstract
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Christian Brandt
- Institute of Infectious Disease and Infection Control at Jena University Hospital, Germany
| | - Marco Cacciabue
- Consejo Nacional de Investigaciones Científicas y Tócnicas (CONICET) working on FMDV virology at the Instituto de Agrobiotecnología y Biología Molecular (IABiMo, INTA-CONICET) and at the Departamento de Ciencias Básicas, Universidad Nacional de Luján (UNLu), Argentina
| | | | - Oliver Drechsel
- bioinformatics department at the Robert Koch-Institute, Germany
| | | | - Adrian Fritz
- Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research, Germany
| | - Stephan Fuchs
- bioinformatics department at the Robert Koch-Institute, Germany
| | - Georges Hattab
- Bioinformatics Division at Philipps-University Marburg, Germany
| | | | - Dominik Heider
- Data Science in Biomedicine at the Philipps-University of Marburg, Germany
| | | | | | - Stefan Hoops
- Biocomplexity Institute and Initiative at the University of Virginia, USA
| | - Lars Kaderali
- Bioinformatics and head of the Institute of Bioinformatics at University Medicine Greifswald, Germany
| | | | - Max von Kleist
- bioinformatics department at the Robert Koch-Institute, Germany
| | - Renó Kmiecinski
- bioinformatics department at the Robert Koch-Institute, Germany
| | | | - Gorka Lasso
- Chandran Lab, Albert Einstein College of Medicine, USA
| | | | | | | | | | | | | | - Alice C McHardy
- Computational Biology of Infection Research Lab at the Helmholtz Centre for Infection Research in Braunschweig, Germany
| | - Pedro Mendes
- Center for Quantitative Medicine of the University of Connecticut School of Medicine, USA
| | | | - Vincent Navratil
- Bioinformatics and Systems Biology at the Rhône Alpes Bioinformatics core facility, Universitó de Lyon, France
| | | | | | | | | | | | - Nicole Redaschi
- Development of the Swiss-Prot group at the SIB for UniProt and SIB resources that cover viral biology (ViralZone)
| | - Susanne Reimering
- Computational Biology of Infection Research group of Alice C. McHardy at the Helmholtz Centre for Infection Research
| | | | | | | | | | - Sepideh Sadegh
- Chair of Experimental Bioinformatics at Technical University of Munich, Germany
| | - Joshua B Singer
- MRC-University of Glasgow Centre for Virus Research, Glasgow, Scotland, UK
| | | | - Chris Upton
- Department of Biochemistry and Microbiology, University of Victoria, Canada
| | | | | | - Manja Marz
- Friedrich Schiller University Jena, Germany
| |
Collapse
|
17
|
Bhowmick P, Roome S, Borchers CH, Goodlett DR, Mohammed Y. An Update on MRMAssayDB: A Comprehensive Resource for Targeted Proteomics Assays in the Community. J Proteome Res 2021; 20:2105-2115. [PMID: 33683131 PMCID: PMC8041396 DOI: 10.1021/acs.jproteome.0c00961] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Precise multiplexed
quantification of proteins in biological samples
can be achieved by targeted proteomics using multiple or parallel
reaction monitoring (MRM/PRM). Combined with internal standards, the
method achieves very good repeatability and reproducibility enabling
excellent protein quantification and allowing longitudinal and cohort
studies. A laborious part of performing such experiments lies in the
preparation steps dedicated to the development and validation of individual
protein assays. Several public repositories host information on targeted
proteomics assays, including NCI’s Clinical Proteomic Tumor
Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library,
SRMAtlas, PanoramaWeb, and PeptideTracker, with all offering varying
levels of details. We introduced MRMAssayDB in 2018 as an integrated
resource for targeted proteomics assays. The Web-based application
maps and links the assays from the repositories, includes comprehensive
up-to-date protein and sequence annotations, and provides multiple
visualization options on the peptide and protein level. We have extended
MRMAssayDB with more assays and extensive annotations. Currently it
contains >828 000 assays covering >51 000 proteins
from
94 organisms, of which >17 000 proteins are present in >2400
biological pathways, and >48 000 mapping to >21 000
Gene Ontology terms. This is an increase of about four times the number
of assays since introduction. We have expanded annotations of interaction,
biological pathways, and disease associations. A newly added visualization
module for coupled molecular structural annotation browsing allows
the user to interactively examine peptide sequence and any known PTMs
and disease mutations, and map all to available protein 3D structures.
Because of its integrative approach, MRMAssayDB enables a holistic
view of suitable proteotypic peptides and commonly used transitions
in empirical data. Availability: http://mrmassaydb.proteincentre.com.
Collapse
Affiliation(s)
- Pallab Bhowmick
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Simon Roome
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, Quebec H3T 1E2, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, Quebec H3T 1E2, Canada.,Department of Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Nobel Street, Moscow 121205, Russia
| | - David R Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, 80-309 Gdansk, Poland
| | - Yassene Mohammed
- University of Victoria - Genome BC Proteomics Centre, Victoria, British Columbia V8Z 7X8, Canada.,University of Victoria, Victoria, British Columbia V8P 5C2, Canada.,Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZA Leiden, Netherlands
| |
Collapse
|
18
|
Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 2021; 49:D344-D354. [PMID: 33156333 PMCID: PMC7778928 DOI: 10.1093/nar/gkaa977] [Citation(s) in RCA: 1048] [Impact Index Per Article: 349.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/08/2020] [Accepted: 10/23/2020] [Indexed: 01/22/2023] Open
Abstract
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Collapse
Affiliation(s)
- Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Swaathi Kandasaamy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Daniel H Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Christine A Orengo
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Catherine Rivoire
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Christian J A Sigrist
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
19
|
Mohammed Y, Bhowmick P, Michaud SA, Sickmann A, Borchers CH. Mouse Quantitative Proteomics Knowledgebase: reference protein concentration ranges in 20 mouse tissues using 5000 quantitative proteomics assays. Bioinformatics 2021; 37:1900-1908. [PMID: 33483739 DOI: 10.1093/bioinformatics/btab018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 12/12/2020] [Accepted: 01/08/2021] [Indexed: 12/21/2022] Open
Abstract
Laboratory mouse is the most used animal model in biological research, largely due to its high conserved synteny with human. Researchers use mice to answer various questions ranging from determining a pathological effect of knocked out/in gene to understanding drug metabolism. Our group developed >5000 quantitative targeted proteomics assays for 20 mouse tissues and determined the concentration ranges of a total of more than 1600 proteins using heavy labelled internal standards. We describe here MouseQuaPro; a knowledgebase that hosts this collection of carefully curated experimental data. The Web-based application includes protein concentrations from >700 mouse tissue samples from three common research strains, corresponding to more than 200k experimentally determined concentrations. The knowledgebase integrates the assay and protein concentration information with their human orthologs, functional and molecular annotations, biological pathways, related human diseases, and known gene expressions. At its core are the protein concentration ranges, which provide insights into (dis)similarities between tissues, strains, and sexes. MouseQuaPro implements advanced search as well as filtering functionalities with a simple interface and interactive visualization. This information-rich resource provides an initial map of protein absolute concentration in mouse tissues and allows guided design of proteomics phenotyping experiments. The knowledgebase is available at mousequapro.proteincentre.com. (Reviewer access username and password: mousequapro_reviewer1234567).
Collapse
Affiliation(s)
- Yassene Mohammed
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada.,Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Pallab Bhowmick
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada
| | - Sarah A Michaud
- University of Victoria-Genome BC Proteomics Centre, Victoria, BC, Canada
| | - Albert Sickmann
- Leibniz Institut für Analytische Wissenschaften-ISAS-e. V, Dortmund, Germany
| | - Christoph H Borchers
- University of Victoria, Victoria, BC, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, Quebec, Canada
| |
Collapse
|
20
|
Stourac J, Dubrava J, Musil M, Horackova J, Damborsky J, Mazurenko S, Bednar D. FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 2021; 49:D319-D324. [PMID: 33166383 PMCID: PMC7778887 DOI: 10.1093/nar/gkaa981] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/18/2020] [Accepted: 10/12/2020] [Indexed: 01/13/2023] Open
Abstract
The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProtDB. The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb.
Collapse
Affiliation(s)
- Jan Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Juraj Dubrava
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Milos Musil
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.,Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Jana Horackova
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Masaryk University, Brno, Czech Republic.,International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
21
|
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
22
|
Bateman A, Martin MJ, Orchard S, Magrane M, Agivetova R, Ahmad S, Alpi E, Bowler-Barnett EH, Britto R, Bursteinas B, Bye-A-Jee H, Coetzee R, Cukura A, Da Silva A, Denny P, Dogan T, Ebenezer T, Fan J, Castro LG, Garmiri P, Georghiou G, Gonzales L, Hatton-Ellis E, Hussein A, Ignatchenko A, Insana G, Ishtiaq R, Jokinen P, Joshi V, Jyothi D, Lock A, Lopez R, Luciani A, Luo J, Lussi Y, MacDougall A, Madeira F, Mahmoudy M, Menchi M, Mishra A, Moulang K, Nightingale A, Oliveira CS, Pundir S, Qi G, Raj S, Rice D, Lopez MR, Saidi R, Sampson J, Sawford T, Speretta E, Turner E, Tyagi N, Vasudev P, Volynkin V, Warner K, Watkins X, Zaru R, Zellner H, Bridge A, Poux S, Redaschi N, Aimo L, Argoud-Puy G, Auchincloss A, Axelsen K, Bansal P, Baratin D, Blatter MC, Bolleman J, Boutet E, Breuza L, Casals-Casas C, de Castro E, Echioukh KC, Coudert E, Cuche B, Doche M, Dornevil D, Estreicher A, Famiglietti ML, Feuermann M, Gasteiger E, Gehant S, Gerritsen V, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, Hyka-Nouspikel N, Jungo F, Keller G, Kerhornou A, Lara V, Le Mercier P, Lieberherr D, Lombardot T, Martin X, Masson P, Morgat A, Neto TB, Paesano S, Pedruzzi I, Pilbout S, Pourcel L, Pozzato M, Pruess M, Rivoire C, Sigrist C, Sonesson K, Stutz A, Sundaram S, Tognolli M, Verbregue L, Wu CH, Arighi CN, Arminski L, Chen C, Chen Y, Garavelli JS, Huang H, Laiho K, McGarvey P, Natale DA, Ross K, Vinayaka CR, Wang Q, Wang Y, Yeh LS, Zhang J, Ruch P, Teodoro D. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480-D489. [PMID: 33237286 PMCID: PMC7778908 DOI: 10.1093/nar/gkaa1100] [Citation(s) in RCA: 3523] [Impact Index Per Article: 1174.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/21/2020] [Accepted: 11/02/2020] [Indexed: 02/07/2023] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this article, we describe significant updates that we have made over the last two years to the resource. The number of sequences in UniProtKB has risen to approximately 190 million, despite continued work to reduce sequence redundancy at the proteome level. We have adopted new methods of assessing proteome completeness and quality. We continue to extract detailed annotations from the literature to add to reviewed entries and supplement these in unreviewed entries with annotations provided by automated systems such as the newly implemented Association-Rule-Based Annotator (ARBA). We have developed a credit-based publication submission interface to allow the community to contribute publications and annotations to UniProt entries. We describe how UniProtKB responded to the COVID-19 pandemic through expert curation of relevant entries that were rapidly made available to the research community through a dedicated portal. UniProt resources are available under a CC-BY (4.0) license via the web at https://www.uniprot.org/.
Collapse
|
23
|
Sheils TK, Mathias SL, Kelleher KJ, Siramshetty VB, Nguyen DT, Bologa CG, Jensen LJ, Vidović D, Koleti A, Schürer SC, Waller A, Yang JJ, Holmes J, Bocci G, Southall N, Dharkar P, Mathé E, Simeonov A, Oprea TI. TCRD and Pharos 2021: mining the human proteome for disease biology. Nucleic Acids Res 2021; 49:D1334-D1346. [PMID: 33156327 PMCID: PMC7778974 DOI: 10.1093/nar/gkaa993] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/09/2020] [Accepted: 10/14/2020] [Indexed: 12/13/2022] Open
Abstract
In 2014, the National Institutes of Health (NIH) initiated the Illuminating the Druggable Genome (IDG) program to identify and improve our understanding of poorly characterized proteins that can potentially be modulated using small molecules or biologics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/) and Pharos (https://pharos.nih.gov/), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently understudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in machine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein-protein interactions, protein-disease and protein-phenotype associations, and drug-induced gene signatures, among others. These aggregated data have enabled us to generate new visualizations and content sections in Pharos, in order to empower users to find new areas of study in the druggable genome.
Collapse
Affiliation(s)
- Timothy K Sheils
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Stephen L Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Keith J Kelleher
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Dušica Vidović
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Amar Koleti
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Stephan C Schürer
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Anna Waller
- UNM Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Giovanni Bocci
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Noel Southall
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Poorva Dharkar
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ewy Mathé
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Anton Simeonov
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
- UNM Comprehensive Cancer Center, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
- Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, 40530 Gothenburg, Sweden
| |
Collapse
|
24
|
Segura J, Rose Y, Westbrook J, Burley SK, Duarte JM. RCSB Protein Data Bank 1D Tools and Services. Bioinformatics 2020; 36:5526-5527. [PMID: 33313665 PMCID: PMC8016458 DOI: 10.1093/bioinformatics/btaa1012] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 10/14/2020] [Accepted: 11/23/2020] [Indexed: 11/17/2022] Open
Abstract
Motivation Interoperability between polymer sequences and structural data is essential for providing a complete picture of protein and gene features and helping to understand biomolecular function. Results Herein, we present two resources designed to improve interoperability between the RCSB Protein Data Bank, the NCBI and the UniProtKB data resources and visualize integrated data therefrom. The underlying tools provide a flexible means of mapping between the different coordinate spaces and an interactive tool allows convenient visualization of the 1-dimensional data over the web. Availabilityand implementation https://1d-coordinates.rcsb.org and https://rcsb.github.io/rcsb-saguaro. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, California, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, California, USA
| | - John Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, California, USA.,Research Collaboratory for Structural Bioinformatics Protein Data Bank.,Institute for Quantitative Biomedicine
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, California, USA.,Research Collaboratory for Structural Bioinformatics Protein Data Bank.,Institute for Quantitative Biomedicine.,Department of Chemistry and Chemical Biology, The State University of New Jersey, New Jersey, USA.,Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Jersey, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, California, USA
| |
Collapse
|
25
|
Porras P, Barrera E, Bridge A, Del-Toro N, Cesareni G, Duesbury M, Hermjakob H, Iannuccelli M, Jurisica I, Kotlyar M, Licata L, Lovering RC, Lynn DJ, Meldal B, Nanduri B, Paneerselvam K, Panni S, Pastrello C, Pellegrini M, Perfetto L, Rahimzadeh N, Ratan P, Ricard-Blum S, Salwinski L, Shirodkar G, Shrivastava A, Orchard S. Towards a unified open access dataset of molecular interactions. Nat Commun 2020; 11:6144. [PMID: 33262342 PMCID: PMC7708836 DOI: 10.1038/s41467-020-19942-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 11/09/2020] [Indexed: 12/16/2022] Open
Abstract
The International Molecular Exchange (IMEx) Consortium provides scientists with a single body of experimentally verified protein interactions curated in rich contextual detail to an internationally agreed standard. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Additionally, we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.
Collapse
Affiliation(s)
- Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Elisabet Barrera
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Alan Bridge
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, CH-1211, Geneva, Switzerland
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gianni Cesareni
- University of Rome Tor Vergata, Rome, Italy.,IRCCS Fondazione Santa Lucia, 00143, Rome, Italy
| | - Margaret Duesbury
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK.,UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Igor Jurisica
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada.,Departments of Medical Biophysics, and Computer Science, University of Toronto, Toronto, ON, Canada.,Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Max Kotlyar
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | | | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - David J Lynn
- Computational and Systems Biology Program, Precision Medicine Theme, South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.,College of Medicine and Public Health, Flinders University, Bedford Park, SA, 5042, Australia
| | - Birgit Meldal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Bindu Nanduri
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS, USA
| | - Kalpana Paneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Simona Panni
- Università della Calabria, Dipartimento di Biologia, Ecologia e Scienze della Terra, Via Pietro Bucci Cubo 6/C, Rende, CS, Italy
| | - Chiara Pastrello
- Osteoarthritis Research Program, Division of Orthopedic Surgery, Schroeder Arthritis Institute, and Krembil Research Institute, University Health Network, 60 Leonard Avenue, 5KD-407, Toronto, ON, M5T 0S8, Canada
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, UCLA, Box 951606, Los Angeles, CA, 90095-1606, USA
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Negin Rahimzadeh
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Prashansa Ratan
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Sylvie Ricard-Blum
- ICBMS, UMR 5246 University Lyon 1 - CNRS, Univ. Lyon, 69622, Villeurbanne, France
| | - Lukasz Salwinski
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Gautam Shirodkar
- UCLA-DOE Institute, University of California, Los Angeles, CA, 90095, USA
| | - Anjalia Shrivastava
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
26
|
Vedithi SC, Malhotra S, Skwark MJ, Munir A, Acebrón-García-De-Eulate M, Waman VP, Alsulami A, Ascher DB, Blundell TL. HARP: a database of structural impacts of systematic missense mutations in drug targets of Mycobacterium leprae. Comput Struct Biotechnol J 2020; 18:3692-704. [PMID: 33304465 DOI: 10.1016/j.csbj.2020.11.013] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 11/08/2020] [Indexed: 12/20/2022] Open
Abstract
Computational Saturation Mutagenesis is an in-silico approach that employs systematic mutagenesis of each amino acid residue in the protein to all other amino acid types, and predicts changes in thermodynamic stability and affinity to the other subunits/protein counterparts, ligands and nucleic acid molecules. The data thus generated are useful in understanding the functional consequences of mutations in antimicrobial resistance phenotypes. In this study, we applied computational saturation mutagenesis to three important drug-targets in Mycobacterium leprae (M. leprae) for the drugs dapsone, rifampin and ofloxacin namely Dihydropteroate Synthase (DHPS), RNA Polymerase (RNAP) and DNA Gyrase (GYR), respectively. M. leprae causes leprosy and is an obligate intracellular bacillus with limited protein structural information associating mutations with phenotypic resistance outcomes in leprosy. Experimentally solved structures of DHPS, RNAP and GYR of M. leprae are not available in the Protein Data Bank, therefore, we modelled the structures of these proteins using template-based comparative modelling and introduced systematic mutations in each model generating 80,902 mutations and mutant structures for all the three proteins. Impacts of mutations on stability and protein-subunit, protein-ligand and protein-nucleic acid affinities were computed using various in-house developed and other published protein stability and affinity prediction software. A consensus impact was estimated for each mutation using qualitative scoring metrics for physicochemical properties and by a categorical grouping of stability and affinity predictions. We developed a web database named HARP (a database of Hansen's Disease Antimicrobial Resistance Profiles), which is accessible at the URL - https://harp-leprosy.org and provides the details to each of these predictions.
Collapse
|
27
|
Iqbal S, Hoksza D, Pérez-Palma E, May P, Jespersen JB, Ahmed SS, Rifat ZT, Heyne HO, Rahman MS, Cottrell JR, Wagner FF, Daly MJ, Campbell AJ, Lal D. MISCAST: MIssense variant to protein StruCture Analysis web SuiTe. Nucleic Acids Res 2020; 48:W132-W139. [PMID: 32402084 PMCID: PMC7319582 DOI: 10.1093/nar/gkaa361] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/17/2020] [Accepted: 05/11/2020] [Indexed: 12/19/2022] Open
Abstract
Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.
Collapse
Affiliation(s)
- Sumaiya Iqbal
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David Hoksza
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - Eduardo Pérez-Palma
- Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jakob B Jespersen
- Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark
| | - Shehab S Ahmed
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Zaara T Rifat
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Henrike O Heyne
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland
| | - M Sohel Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Jeffrey R Cottrell
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Florence F Wagner
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mark J Daly
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland
| | - Arthur J Campbell
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Dennis Lal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA.,Cologne Center for Genomics, University of Cologne, Cologne, Germany.,Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| |
Collapse
|
28
|
Mészáros B, Erdős G, Szabó B, Schád É, Tantos Á, Abukhairan R, Horváth T, Murvai N, Kovács OP, Kovács M, Tosatto SCE, Tompa P, Dosztányi Z, Pancsa R. PhaSePro: the database of proteins driving liquid-liquid phase separation. Nucleic Acids Res 2020; 48:D360-D367. [PMID: 31612960 PMCID: PMC7145634 DOI: 10.1093/nar/gkz848] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 09/11/2019] [Accepted: 10/07/2019] [Indexed: 11/13/2022] Open
Abstract
Membraneless organelles (MOs) are dynamic liquid condensates that host a variety of specific cellular processes, such as ribosome biogenesis or RNA degradation. MOs form through liquid-liquid phase separation (LLPS), a process that relies on multivalent weak interactions of the constituent proteins and other macromolecules. Since the first discoveries of certain proteins being able to drive LLPS, it emerged as a general mechanism for the effective organization of cellular space that is exploited in all kingdoms of life. While numerous experimental studies report novel cases, the computational identification of LLPS drivers is lagging behind, and many open questions remain about the sequence determinants, composition, regulation and biological relevance of the resulting condensates. Our limited ability to overcome these issues is largely due to the lack of a dedicated LLPS database. Therefore, here we introduce PhaSePro (https://phasepro.elte.hu), an openly accessible, comprehensive, manually curated database of experimentally validated LLPS driver proteins/protein regions. It not only provides a wealth of information on such systems, but improves the standardization of data by introducing novel LLPS-specific controlled vocabularies. PhaSePro can be accessed through an appealing, user-friendly interface and thus has definite potential to become the central resource in this dynamically developing field.
Collapse
Affiliation(s)
- Bálint Mészáros
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Beáta Szabó
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Éva Schád
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Rawan Abukhairan
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Tamás Horváth
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Nikoletta Murvai
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Orsolya P Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Márton Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova CNR Institute of Neuroscience, Padova, Italy
| | - Péter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary.,Structural Biology (CSB), Brussels, Belgium; Structural Biology Brussels (SBB), Vrije Universiteit Brussel (VUB), Brussels 1050, Belgium
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| |
Collapse
|
29
|
Sillitoe I, Andreeva A, Blundell TL, Buchan DWA, Finn RD, Gough J, Jones D, Kelley LA, Paysan-Lafosse T, Lam SD, Murzin AG, Pandurangan AP, Salazar GA, Skwark MJ, Sternberg MJE, Velankar S, Orengo C. Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation. Nucleic Acids Res 2020; 48:D314-D319. [PMID: 31733063 PMCID: PMC7139969 DOI: 10.1093/nar/gkz967] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/09/2019] [Accepted: 11/07/2019] [Indexed: 12/20/2022] Open
Abstract
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| | - Antonina Andreeva
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Daniel W A Buchan
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK.,The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Robert D Finn
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - David Jones
- Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK.,The Francis Crick Institute, 1 Midland Rd, London NW1 1AT, UK
| | - Lawrence A Kelley
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Typhaine Paysan-Lafosse
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 43600, Malaysia
| | - Alexey G Murzin
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | | | - Gustavo A Salazar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Marcin J Skwark
- Department of Biochemistry, University of Cambridge, Old Addenbrooke's Site, 80 Tennis Court Road, Cambridge CB2 0QH, UK
| | - Michael J E Sternberg
- Centre for Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sameer Velankar
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
30
|
Bonnardel F, Mariethoz J, Salentin S, Robin X, Schroeder M, Perez S, Lisacek F, Imberty A. UniLectin3D, a database of carbohydrate binding proteins with curated information on 3D structures and interacting ligands. Nucleic Acids Res 2020; 47:D1236-D1244. [PMID: 30239928 PMCID: PMC6323968 DOI: 10.1093/nar/gky832] [Citation(s) in RCA: 66] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 09/07/2018] [Indexed: 01/02/2023] Open
Abstract
Lectins, and related receptors such as adhesins and toxins, are glycan-binding proteins from all origins that decipher the glycocode, i.e. the structural information encoded in the conformation of complex carbohydrates present on the surface of all cells. Lectins are still poorly classified and annotated, but since their functions are based on ligand recognition, their 3D-structures provide a solid foundation for characterization. UniLectin3D is a curated database that classifies lectins on origin and fold, with cross-links to literature, other databases in glycosciences and functional data such as known specificity. The database provides detailed information on lectins, their bound glycan ligands, and features their interactions using the Protein–Ligand Interaction Profiler (PLIP) server. Special care was devoted to the description of the bound glycan ligands with the use of simple graphical representation and numerical format for cross-linking to other databases in glycoscience. We conceived the design of the database architecture and the navigation tools to account for all organisms, as well as to search for oligosaccharide epitopes complexed within specified binding sites. UniLectin3D is accessible at https://www.unilectin.eu/unilectin3D.
Collapse
Affiliation(s)
- François Bonnardel
- Univ. Grenoble Alpes, CNRS, CERMAV, 38000 Grenoble, France.,Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland.,Department of Computer Science, University of Geneva, Route de Drize 7, CH-1227 Geneva, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland.,Department of Computer Science, University of Geneva, Route de Drize 7, CH-1227 Geneva, Switzerland
| | - Sebastian Salentin
- Biotechnology Center (BIOTEC), TU Dresden, Tatzberg 47-49, 01307 Dresden, Germany
| | - Xavier Robin
- Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.,Computational Structural Biology Group, SIB Swiss Institute of Bioinformatics, CH-4056 Basel, Switzerland
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), TU Dresden, Tatzberg 47-49, 01307 Dresden, Germany
| | - Serge Perez
- Univ. Grenoble Alpes, CNRS, DPM, 38000 Grenoble, France
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland.,Department of Computer Science, University of Geneva, Route de Drize 7, CH-1227 Geneva, Switzerland.,Section of Biology, University of Geneva, CH-1205 Geneva, Switzerland
| | - Anne Imberty
- Univ. Grenoble Alpes, CNRS, CERMAV, 38000 Grenoble, France
| |
Collapse
|
31
|
Chow CN, Lee TY, Hung YC, Li GZ, Tseng KC, Liu YH, Kuo PL, Zheng HQ, Chang WC. PlantPAN3.0: a new and updated resource for reconstructing transcriptional regulatory networks from ChIP-seq experiments in plants. Nucleic Acids Res 2020; 47:D1155-D1163. [PMID: 30395277 PMCID: PMC6323957 DOI: 10.1093/nar/gky1081] [Citation(s) in RCA: 228] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/22/2018] [Indexed: 01/01/2023] Open
Abstract
The Plant Promoter Analysis Navigator (PlantPAN; http://PlantPAN.itps.ncku.edu.tw/) is an effective resource for predicting regulatory elements and reconstructing transcriptional regulatory networks for plant genes. In this release (PlantPAN 3.0), 17 230 TFs were collected from 78 plant species. To explore regulatory landscapes, genomic locations of TFBSs have been captured from 662 public ChIP-seq samples using standard data processing. A total of 1 233 999 regulatory linkages were identified from 99 regulatory factors (TFs, histones and other DNA-binding proteins) and their target genes across seven species. Additionally, this new version added 2449 matrices extracted from ChIP-seq peaks for cis-regulatory element prediction. In addition to integrated ChIP-seq data, four major improvements were provided for more comprehensive information of TF binding events, including (i) 1107 experimentally verified TF matrices from the literature, (ii) gene regulation network comparison between two species, (iii) 3D structures of TFs and TF-DNA complexes and (iv) condition-specific co-expression networks of TFs and their target genes extended to four species. The PlantPAN 3.0 can not only be efficiently used to investigate critical cis- and trans-regulatory elements in plant promoters, but also to reconstruct high-confidence relationships among TF–targets under specific conditions.
Collapse
Affiliation(s)
- Chi-Nga Chow
- Graduate Program in Translational Agricultural Sciences, National Cheng Kung University and Academia Sinica, Taiwan
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
| | - Yu-Cheng Hung
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Guan-Zhen Li
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Kuan-Chieh Tseng
- Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Ya-Hsin Liu
- Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Po-Li Kuo
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Han-Qin Zheng
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Wen-Chi Chang
- Graduate Program in Translational Agricultural Sciences, National Cheng Kung University and Academia Sinica, Taiwan.,Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan.,Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| |
Collapse
|
32
|
Köhler S, Carmody L, Vasilevsky N, Jacobsen JOB, Danis D, Gourdine JP, Gargano M, Harris NL, Matentzoglu N, McMurry JA, Osumi-Sutherland D, Cipriani V, Balhoff JP, Conlin T, Blau H, Baynam G, Palmer R, Gratian D, Dawkins H, Segal M, Jansen AC, Muaz A, Chang WH, Bergerson J, Laulederkind SJF, Yüksel Z, Beltran S, Freeman AF, Sergouniotis PI, Durkin D, Storm AL, Hanauer M, Brudno M, Bello SM, Sincan M, Rageth K, Wheeler MT, Oegema R, Lourghi H, Della Rocca MG, Thompson R, Castellanos F, Priest J, Cunningham-Rundles C, Hegde A, Lovering RC, Hajek C, Olry A, Notarangelo L, Similuk M, Zhang XA, Gómez-Andrés D, Lochmüller H, Dollfus H, Rosenzweig S, Marwaha S, Rath A, Sullivan K, Smith C, Milner JD, Leroux D, Boerkoel CF, Klion A, Carter MC, Groza T, Smedley D, Haendel MA, Mungall C, Robinson PN. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res 2020; 47:D1018-D1027. [PMID: 30476213 PMCID: PMC6324074 DOI: 10.1093/nar/gky1105] [Citation(s) in RCA: 403] [Impact Index Per Article: 100.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/24/2018] [Indexed: 12/12/2022] Open
Abstract
The Human Phenotype Ontology (HPO)—a standardized vocabulary of phenotypic abnormalities associated with 7000+ diseases—is used by thousands of researchers, clinicians, informaticians and electronic health record systems around the world. Its detailed descriptions of clinical abnormalities and computable disease definitions have made HPO the de facto standard for deep phenotyping in the field of rare disease. The HPO’s interoperability with other ontologies has enabled it to be used to improve diagnostic accuracy by incorporating model organism data. It also plays a key role in the popular Exomiser tool, which identifies potential disease-causing variants from whole-exome or whole-genome sequencing data. Since the HPO was first introduced in 2008, its users have become both more numerous and more diverse. To meet these emerging needs, the project has added new content, language translations, mappings and computational tooling, as well as integrations with external community data. The HPO continues to collaborate with clinical adopters to improve specific areas of the ontology and extend standardized disease descriptions. The newly redesigned HPO website (www.human-phenotype-ontology.org) simplifies browsing terms and exploring clinical features, diseases, and human genes.
Collapse
Affiliation(s)
- Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité-Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin 10117, Germany.,Einstein Center Digital Future, Berlin 10117, Germany.,Monarch Initiative, monarchinitiative.org
| | - Leigh Carmody
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Nicole Vasilevsky
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
| | - Julius O B Jacobsen
- Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Daniel Danis
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Jean-Philippe Gourdine
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA
| | - Michael Gargano
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Nomi L Harris
- Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nicolas Matentzoglu
- Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Julie A McMurry
- Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - David Osumi-Sutherland
- Monarch Initiative, monarchinitiative.org.,European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, UK
| | - Valentina Cipriani
- Monarch Initiative, monarchinitiative.org.,William Harvey Research Institute, Queen Mary University College of London.,UCL Genetics Institute, University College of London.,UCL Institute of Ophthalmology, University College of London
| | - James P Balhoff
- Monarch Initiative, monarchinitiative.org.,Renaissance Computing Institute, University of North Carolina at Chapel Hill
| | - Tom Conlin
- Monarch Initiative, monarchinitiative.org.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - Hannah Blau
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia.,School of Paediatrics and Telethon Kids Institute, University of Western Australia, Perth, WA, Australia.,Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA, Australia.,Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia.,The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
| | - Richard Palmer
- Spatial Sciences, Department of Science and Engineering, Curtin University, Perth, WA, Australia
| | - Dylan Gratian
- Western Australian Register of Developmental Anomalies and Genetic Services of Western Australia, Department of Health, Government of Western Australia, WA, Australia
| | - Hugh Dawkins
- The Office of Population Health Genomics, Department of Health, Government of Western Australia, Perth, WA, Australia
| | | | - Anna C Jansen
- Neurogenetics Research Group, Vrije Universiteit Brussel, Brussels, Belgium.,Pediatric Neurology Unit, Department of Pediatrics, UZ Brussel, Brussels, Belgium
| | - Ahmed Muaz
- Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Willie H Chang
- Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
| | - Jenna Bergerson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Stanley J F Laulederkind
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin & Marquette University, 8701 Watertown Plank Road Milwaukee, WI 53226, USA
| | | | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Alexandra F Freeman
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | | | - Daniel Durkin
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Andrea L Storm
- ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
| | - Marc Hanauer
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Michael Brudno
- Centre for Computational Medicine, Hospital for Sick Children and Department of Computer Science, University of Toronto, Toronto, Canada
| | | | - Murat Sincan
- Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
| | - Kayli Rageth
- Sanford Imagenetics, Sanford Health, Sioux Falls, SD, USA
| | - Matthew T Wheeler
- Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
| | - Renske Oegema
- Department of Genetics, University Medical Center Utrecht, the Netherlands
| | - Halima Lourghi
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Maria G Della Rocca
- ICF, Rockville, MD, USA.,National Center for Advancing Translational Sciences, Office of Rare Diseases Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel Thompson
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne, UK
| | | | - James Priest
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ayushi Hegde
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ruth C Lovering
- Institute of Cardiovascular Science, University College London, UK
| | | | - Annie Olry
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Luigi Notarangelo
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Xingmin A Zhang
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - David Gómez-Andrés
- Child Neurology Unit. Hospital Universitari Vall d'Hebron, Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
| | - Hanns Lochmüller
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona 08028, Spain.,Department of Neuropediatrics and Muscle Disorders, Medical Center-University of Freiburg, Faculty of Medicine, Freiburg, Germany.,Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Canada.,Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Hélène Dollfus
- Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
| | - Sergio Rosenzweig
- Immunology Service, Department of Laboratory Medicine, NIH Clinical Center, Bethesda, MD, USA
| | - Shruti Marwaha
- Center for Undiagnosed Diseases, Stanford University School of Medicine, Stanford, CA, USA
| | - Ana Rath
- INSERM, US14-Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Kathleen Sullivan
- Department of Pediatrics, Division of Allergy Immunology, The Children's Hospital of Philadelphia, University of Pennsylvania Perelman School of Medicine, 3615 Civic Center Boulevard, Philadelphia, PA 19104, USA
| | | | - Joshua D Milner
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Dorothée Leroux
- Centre for Rare Eye Diseases CARGO, SENSGENE FSMR Network, Strasbourg University Hospital, Strasbourg, France
| | | | - Amy Klion
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Melody C Carter
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Tudor Groza
- Monarch Initiative, monarchinitiative.org.,Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Damian Smedley
- Monarch Initiative, monarchinitiative.org.,Genomics England, Queen Mary University of London, Dawson Hall, Charterhouse Square, London EC1M 6BQ, UK
| | - Melissa A Haendel
- Monarch Initiative, monarchinitiative.org.,Oregon Health & Science University, Portland, OR 97217, USA.,Linus Pauling institute, Oregon State University, Corvallis, OR, USA
| | - Chris Mungall
- Monarch Initiative, monarchinitiative.org.,Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Peter N Robinson
- Monarch Initiative, monarchinitiative.org.,The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| |
Collapse
|
33
|
Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong SY, Finn RD. InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res 2020; 47:D351-D360. [PMID: 30398656 PMCID: PMC6323941 DOI: 10.1093/nar/gky1100] [Citation(s) in RCA: 966] [Impact Index Per Article: 241.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 10/22/2018] [Indexed: 12/15/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
Collapse
Affiliation(s)
- Alex L Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Teresa K Attwood
- School of Computer Science, The University of Manchester, Manchester M13 9PL, UK
| | - Patricia C Babbitt
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstr.1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Shoshana D Brown
- Department of Bioengineering & Therapeutic Sciences, University of California, San Francisco, CA 94158, USA
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sara El-Gebali
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew I Fraser
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - David R Haft
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aurélien Luciani
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fabio Madeira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy.,Department of Agricultural Sciences, University of Udine, via Palladio 8, 33100 Udine, Italy.,Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige, Italy
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christine Orengo
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sebastien Pesseat
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon C Potter
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matloob A Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Lorna J Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Catherine Rivoire
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amaia Sangrador-Vegas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Christian J A Sigrist
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland
| | - Ian Sillitoe
- Structural and Molecular Biology, University College London, Darwin Building, London WC1E 6BT, UK
| | - Granger G Sutton
- J. Craig Venter Institute (JCVI), 9605 Medical Center Drive, Suite 150, Rockville, MD 20850, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, NIH Bldg, 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Siew-Yit Yong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
34
|
Cook CE, Lopez R, Stroe O, Cochrane G, Brooksbank C, Birney E, Apweiler R. The European Bioinformatics Institute in 2018: tools, infrastructure and training. Nucleic Acids Res 2020; 47:D15-D22. [PMID: 30445657 PMCID: PMC6323906 DOI: 10.1093/nar/gky1124] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/11/2018] [Indexed: 02/03/2023] Open
Abstract
The European Bioinformatics Institute (https://www.ebi.ac.uk/) archives, curates and analyses life sciences data produced by researchers throughout the world, and makes these data available for re-use globally (https://www.ebi.ac.uk/). Data volumes continue to grow exponentially: total raw storage capacity now exceeds 160 petabytes, and we manage these increasing data flows while maintaining the quality of our services. This year we have improved the efficiency of our computational infrastructure and doubled the bandwidth of our connection to the worldwide web. We report two new data resources, the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/), which is a component of the Expression Atlas; and the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), which collates functional annotations and predictions for structure data in the Protein Data Bank. Additionally, Europe PMC (http://europepmc.org/) has added preprint abstracts to its search results, supplementing results from peer-reviewed publications. EMBL-EBI maintains over 150 analytical bioinformatics tools that complement our data resources. We make these tools available for users through a web interface as well as programmatically using application programming interfaces, whilst ensuring the latest versions are available for our users. Our training team, with support from all of our staff, continued to provide on-site, off-site and web-based training opportunities for thousands of researchers worldwide this year.
Collapse
Affiliation(s)
- Charles E Cook
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oana Stroe
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cath Brooksbank
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
35
|
Autin L, Maritan M, Barbaro BA, Gardner A, Olson AJ, Sanner M, Goodsell DS. Mesoscope: A Web-based Tool for Mesoscale Data Integration and Curation. MolVa (2020) 2020; 2020:23-31. [PMID: 37928321 PMCID: PMC10624244 DOI: 10.2312/molva.20201098] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/07/2023]
Abstract
Interest is growing for 3D models of the biological mesoscale, the intermediate scale between the nanometer scale of molecular structure and micrometer scale of cellular biology. However, it is currently difficult to gather, curate and integrate all the data required to define such models. To address this challenge we developed Mesoscope (mesoscope.scripps.edu/beta), a web-based data integration and curation tool. Mesoscope allows users to begin with a listing of molecules (such as data from proteomics), and to use resources at UniProt and the PDB to identify, prepare and validate appropriate structures and representations for each molecule, ultimately producing a portable output file used by CellPACK and other modeling tools for generation of 3D models of the biological mesoscale. The availability of this tool has proven essential in several exploratory applications, given the high complexity of mesoscale models and the heterogeneity of the available data sources.
Collapse
Affiliation(s)
- L Autin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - M Maritan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - B A Barbaro
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - A Gardner
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - A J Olson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - M Sanner
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - D S Goodsell
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
- RCSB Protein Data Bank and Center for Integrative Proteomics Research, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
36
|
Bye-A-Jee H, Zaru R, Magrane M, Orchard S. Caenorhabditis elegans phosphatase complexes in UniProtKB and Complex Portal. FEBS J 2020; 287:2664-2684. [PMID: 31944606 DOI: 10.1111/febs.15213] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 11/25/2019] [Accepted: 01/13/2020] [Indexed: 02/06/2023]
Abstract
Phosphatases play an essential role in the regulation of protein phosphorylation. Less abundant than kinases, many phosphatases are components of one or more macromolecular complexes with different substrate specificities and specific functionalities. The expert scientific curation of phosphatase complexes for the UniProt and Complex Portal databases supports the whole scientific community by collating and organising small- and large-scale experimental data from the scientific literature into context-specific central resources, where the data can be freely accessed and used to further academic and translational research. In this review, we discuss how the diverse biological functions of phosphatase complexes are presented in UniProt and the Complex Portal, and how understanding the biological significance of phosphatase complexes in Caenorhabditis elegans offers insight into the mechanisms of substrate diversity in a variety of cellular and molecular processes.
Collapse
Affiliation(s)
- Hema Bye-A-Jee
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Rossana Zaru
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | -
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.,SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4, Switzerland.,Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA.,Protein Information Resource, University of Delaware, Newark, DE, USA
| |
Collapse
|
37
|
Paladin L, Schaeffer M, Gaudet P, Zahn-Zabal M, Michel PA, Piovesan D, Tosatto SCE, Bairoch A. The Feature-Viewer: a visualization tool for positional annotations on a sequence. Bioinformatics 2020; 36:3244-3245. [DOI: 10.1093/bioinformatics/btaa055] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 01/02/2020] [Accepted: 01/20/2020] [Indexed: 01/15/2023] Open
Abstract
Abstract
Summary
The Feature-Viewer is a lightweight library for the visualization of biological data mapped to a protein or nucleotide sequence. It is designed for ease of use while allowing for a full customization. The library is already used by several biological data resources and allows intuitive visual mapping of a full spectra of sequence features for different usages.
Availability and implementation
The Feature-Viewer is open source, compatible with state-of-the-art development technologies and responsive, also for mobile viewing. Documentation and usage examples are available online.
Collapse
Affiliation(s)
- Lisanna Paladin
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
| | - Mathieu Schaeffer
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Pascale Gaudet
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Monique Zahn-Zabal
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Pierre-André Michel
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padova 35121, Italy
- CNR Institute of Neuroscience, Padova 35121, Italy
| | - Amos Bairoch
- CALIPHO Group, Swiss Institute of Bioinformatics, University of Geneva, Geneva 1206, Switzerland
| |
Collapse
|
38
|
Varadi M, Berrisford J, Deshpande M, Nair SS, Gutmanas A, Armstrong D, Pravda L, Al-Lazikani B, Anyango S, Barton GJ, Berka K, Blundell T, Borkakoti N, Dana J, Das S, Dey S, Micco PD, Fraternali F, Gibson T, Helmer-Citterich M, Hoksza D, Huang LC, Jain R, Jubb H, Kannas C, Kannan N, Koca J, Krivak R, Kumar M, Levy ED, Madeira F, Madhusudhan MS, Martell HJ, MacGowan S, McGreig JE, Mir S, Mukhopadhyay A, Parca L, Paysan-Lafosse T, Radusky L, Ribeiro A, Serrano L, Sillitoe I, Singh G, Skoda P, Svobodova R, Tyzack J, Valencia A, Fernandez EV, Vranken W, Wass M, Thornton J, Sternberg M, Orengo C, Velankar S. PDBe-KB: a community-driven resource for structural and functional annotations. Nucleic Acids Res 2020; 48:D344-D353. [PMID: 31584092 PMCID: PMC6943075 DOI: 10.1093/nar/gkz853] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 09/11/2019] [Accepted: 10/01/2019] [Indexed: 11/23/2022] Open
Abstract
The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages-the PDBe-KB aggregated views of structure data-which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession.
Collapse
Affiliation(s)
| | - Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - John Berrisford
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Sreenath S Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Aleksandras Gutmanas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - David Armstrong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Lukas Pravda
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Bissan Al-Lazikani
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Karel Berka
- Department of Physical Chemistry, Palacky University, Olomouc
| | | | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Jose Dana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Sayoni Das
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | | | - Patrizio Di Micco
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - Franca Fraternali
- Randall Centre for Cell & Molecular Biophysics, King's College London, London, UK
| | - Toby Gibson
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - David Hoksza
- Charles University, Prague, Czech Republic
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Rishabh Jain
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Christos Kannas
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Jaroslav Koca
- CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Brno, Czech Republic
| | | | - Manjeet Kumar
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - F Madeira
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - M S Madhusudhan
- Indian Institute of Science Education and Research, Pune 411008, India
| | | | | | | | - Saqib Mir
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Abhik Mukhopadhyay
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Luca Parca
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Antonio Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Gulzar Singh
- Indian Institute of Science Education and Research, Pune 411008, India
| | - Petr Skoda
- Charles University, Prague, Czech Republic
| | - Radka Svobodova
- CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Brno, Czech Republic
| | - Jonathan Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Eloy Villasclaras Fernandez
- Cancer Research UK Cancer Therapeutics Unit, Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - Wim Vranken
- Vrije Universiteit Brussel, Brussels, Belgium
| | - Mark Wass
- University of Kent, Canterbury, Kent, CT2 7NJ, UK
| | - Janet Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
39
|
Armstrong DR, Berrisford JM, Conroy MJ, Gutmanas A, Anyango S, Choudhary P, Clark AR, Dana JM, Deshpande M, Dunlop R, Gane P, Gáborová R, Gupta D, Haslam P, Koča J, Mak L, Mir S, Mukhopadhyay A, Nadzirin N, Nair S, Paysan-Lafosse T, Pravda L, Sehnal D, Salih O, Smart O, Tolchard J, Varadi M, Svobodova-Vařeková R, Zaki H, Kleywegt GJ, Velankar S. PDBe: improved findability of macromolecular structure data in the PDB. Nucleic Acids Res 2020; 48:D335-D343. [PMID: 31691821 PMCID: PMC7145656 DOI: 10.1093/nar/gkz990] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 10/11/2019] [Accepted: 10/25/2019] [Indexed: 11/23/2022] Open
Abstract
The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide Protein Data Bank (wwPDB), actively participates in the deposition, curation, validation, archiving and dissemination of macromolecular structure data. PDBe supports diverse research communities in their use of macromolecular structures by enriching the PDB data and by providing advanced tools and services for effective data access, visualization and analysis. This paper details the enrichment of data at PDBe, including mapping of RNA structures to Rfam, and identification of molecules that act as cofactors. PDBe has developed an advanced search facility with ∼100 data categories and sequence searches. New features have been included in the LiteMol viewer at PDBe, with updated visualization of carbohydrates and nucleic acids. Small molecules are now mapped more extensively to external databases and their visual representation has been enhanced. These advances help users to more easily find and interpret macromolecular structure data in order to solve scientific problems.
Collapse
Affiliation(s)
- David R Armstrong
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John M Berrisford
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew J Conroy
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Aleksandras Gutmanas
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen Anyango
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Preeti Choudhary
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alice R Clark
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose M Dana
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mandar Deshpande
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Roisin Dunlop
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Gane
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Romana Gáborová
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic
| | - Deepti Gupta
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Pauline Haslam
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jaroslav Koča
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic
| | - Lora Mak
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Saqib Mir
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhik Mukhopadhyay
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nurul Nadzirin
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sreenath Nair
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Typhaine Paysan-Lafosse
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- InterPro, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lukas Pravda
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Sehnal
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic
| | - Osman Salih
- Electron Microscopy Data Bank (EMDB), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Smart
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - James Tolchard
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Radka Svobodova-Vařeková
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic
| | - Hossam Zaki
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gerard J Kleywegt
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Electron Microscopy Data Bank (EMDB), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
40
|
Breuza L, Arighi CN, Argoud-Puy G, Casals-Casas C, Estreicher A, Famiglietti ML, Georghiou G, Gos A, Gruaz-Gumowski N, Hinz U, Hyka-Nouspikel N, Kramarz B, Lovering RC, Lussi Y, Magrane M, Masson P, Perfetto L, Poux S, Rodriguez-Lopez M, Stoeckert C, Sundaram S, Wang LS, Wu E, Orchard S. A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis 2020; 77:257-273. [PMID: 32716361 PMCID: PMC7592670 DOI: 10.3233/jad-200206] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2020] [Indexed: 01/08/2023]
Abstract
BACKGROUND The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality. OBJECTIVE To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer's disease research. METHODS We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community. RESULTS Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset. CONCLUSION This represents a significant enhancement in the expert curated data pertinent to Alzheimer's disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.
Collapse
Affiliation(s)
- Lionel Breuza
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Cecilia N. Arighi
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
- Protein Information Resource, University of Delaware, Newark, DE, USA
| | - Ghislaine Argoud-Puy
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Cristina Casals-Casas
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Anne Estreicher
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Maria Livia Famiglietti
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - George Georghiou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Arnaud Gos
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Nadine Gruaz-Gumowski
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Ursula Hinz
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Barbara Kramarz
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
| | - Ruth C. Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
| | - Yvonne Lussi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Michele Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Patrick Masson
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Milagros Rodriguez-Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - Christian Stoeckert
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Shyamala Sundaram
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Li-San Wang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
| | - IMEx Consortium, UniProt Consortium
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
- Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA
- Protein Information Resource, University of Delaware, Newark, DE, USA
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Alzforum, Cambridge, MA, USA
| |
Collapse
|
41
|
Hoksza D, Gawron P, Ostaszewski M, Schneider R. MolArt: a molecular structure annotation and visualization tool. Bioinformatics 2019; 34:4127-4128. [PMID: 29931246 PMCID: PMC6247942 DOI: 10.1093/bioinformatics/bty489] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 06/13/2018] [Indexed: 11/16/2022] Open
Abstract
Summary MolArt fills the gap between sequence and structure visualization by providing a light-weight, interactive environment enabling exploration of sequence annotations in the context of available experimental or predicted protein structures. Provided a UniProt ID, MolArt downloads and displays sequence annotations, sequence-structure mapping and relevant structures. The sequence and structure views are interlinked, enabling sequence annotations being color overlaid over the mapped structures, thus providing an enhanced understanding and interpretation of the available molecular data. Availability and implementation MolArt is released under the Apache 2 license and is available at https://github.com/davidhoksza/MolArt. The project web page https://davidhoksza.github.io/MolArt/ features examples and applications of the tool.
Collapse
Affiliation(s)
- David Hoksza
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University Malostranské nám, 118 00 Prague, Czech Republic
| | - Piotr Gawron
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, Luxembourg
| |
Collapse
|
42
|
Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T, Lee D, Orengo C, Lees J. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res 2019; 46:D435-D439. [PMID: 29112716 PMCID: PMC5753370 DOI: 10.1093/nar/gkx1069] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 11/28/2022] Open
Abstract
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of globular domain annotations for millions of available protein sequences. Gene3D has previously featured in the Database issue of NAR and here we report a significant update to the Gene3D database. The current release, Gene3D v16, has significantly expanded its domain coverage over the previous version and now contains over 95 million domain assignments. We also report a new method for dealing with complex domain architectures that exist in Gene3D, arising from discontinuous domains. Amongst other updates, we have added visualization tools for exploring domain annotations in the context of other sequence features and in gene families. We also provide web-pages to visualize other domain families that co-occur with a given query domain family.
Collapse
Affiliation(s)
- Tony E Lewis
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Natalie Dawson
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Su Datt Lam
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Tristan Clarke
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - David Lee
- Bristol Life Sciences Building, University of Bristol, Bristol Life Sciences Building, Bristol, BS8 1TQ, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK
| | - Jonathan Lees
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London WC1E 6BT, UK.,Oxford Brookes University, Faculty of Health and Life Sciences, Oxford, Oxfordshire, UK
| |
Collapse
|
43
|
Varadi M, De Baets G, Vranken WF, Tompa P, Pancsa R. AmyPro: a database of proteins with validated amyloidogenic regions. Nucleic Acids Res 2019; 46:D387-D392. [PMID: 29040693 PMCID: PMC5753394 DOI: 10.1093/nar/gkx950] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/10/2017] [Indexed: 01/05/2023] Open
Abstract
Soluble functional proteins may transform into insoluble amyloid fibrils that deposit in a variety of tissues. Amyloid formation is a hallmark of age-related degenerative disorders. Perhaps surprisingly, amyloid fibrils can also be beneficial and are frequently exploited for diverse functional roles in organisms. Here we introduce AmyPro, an open-access database providing a comprehensive, carefully curated collection of validated amyloid fibril-forming proteins from all kingdoms of life classified into broad functional categories (http://amypro.net). In particular, AmyPro provides the boundaries of experimentally validated amyloidogenic sequence regions, short descriptions of the functional relevance of the proteins and their amyloid state, a list of the experimental techniques applied to study the amyloid state, important structural/functional/variation/mutation data transferred from UniProt, a list of relevant PDB structures categorized according to protein states, database cross-references and literature references. AmyPro greatly improves on similar currently available resources by incorporating both prions and functional amyloids in addition to pathogenic amyloids, and allows users to screen their sequences against the entire collection of validated amyloidogenic sequence fragments. By enabling further elucidation of the sequential determinants of amyloid fibril formation, we hope AmyPro will enhance the development of new methods for the precise prediction of amyloidogenic regions within proteins.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Greet De Baets
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - Wim F Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, 1050, Belgium.,Interuniversity Institute of Bioinformatics in Brussels (IB) 2, ULB-VUB, Brussels, 1050, Belgium.,VIB Center for Structural Biology, Vrije Universiteit Brussel (VUB), Brussels, 1050, Belgium
| | - Peter Tompa
- Structural Biology Brussels, Vrije Universiteit Brussel (VUB), Brussels, 1050, Belgium.,VIB Center for Structural Biology, Vrije Universiteit Brussel (VUB), Brussels, 1050, Belgium.,Institute of Enzymology, Research Centre for Natural Sciences of the HAS, Budapest, 1117, Hungary
| | - Rita Pancsa
- MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| |
Collapse
|
44
|
Abstract
The increase in the number of both patients and healthcare practitioners who grew up using the Internet and computers (so-called "digital natives") is likely to impact the practice of precision medicine, and requires novel platforms for data integration and mining, as well as contextualized information retrieval. The "Illuminating the Druggable Genome Knowledge Management Center" (IDG KMC) quantifies data availability from a wide range of chemical, biological, and clinical resources, and has developed platforms that can be used to navigate understudied proteins (the "dark genome"), and their potential contribution to specific pathologies. Using the "Target Importance and Novelty Explorer" (TIN-X) highlights the role of LRRC10 (a dark gene) in dilated cardiomyopathy. Combining mouse and human phenotype data leads to increased strength of evidence, which is discussed for four additional dark genes: SLX4IP and its role in glucose metabolism, the role of HSF2BP in coronary artery disease, the involvement of ELFN1 in attention-deficit hyperactivity disorder and the role of VPS13D in mouse neural tube development and its confirmed role in childhood onset movement disorders. The workflow and tools described here are aimed at guiding further experimental research, particularly within the context of precision medicine.
Collapse
Affiliation(s)
- Tudor I Oprea
- Department of Internal Medicine, University of New Mexico School of Medicine, Albuquerque, NM, USA. .,UNM Comprehensive Cancer Center, Albuquerque, NM, USA. .,Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden. .,Faculty of Health and Medical Sciences, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
45
|
Nightingale A, Antunes R, Alpi E, Bursteinas B, Gonzales L, Liu W, Luo J, Qi G, Turner E, Martin M. The Proteins API: accessing key integrated protein and genome information. Nucleic Acids Res 2019; 45:W539-W544. [PMID: 28383659 PMCID: PMC5570157 DOI: 10.1093/nar/gkx237] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 04/03/2017] [Indexed: 01/19/2023] Open
Abstract
The Proteins API provides searching and programmatic access to protein and associated genomics data such as curated protein sequence positional annotations from UniProtKB, as well as mapped variation and proteomics data from large scale data sources (LSS). Using the coordinates service, researchers are able to retrieve the genomic sequence coordinates for proteins in UniProtKB. This, the LSS genomics and proteomics data for UniProt proteins is programmatically only available through this service. A Swagger UI has been implemented to provide documentation, an interface for users, with little or no programming experience, to 'talk' to the services to quickly and easily formulate queries with the services and obtain dynamically generated source code for popular programming languages, such as Java, Perl, Python and Ruby. Search results are returned as standard JSON, XML or GFF data objects. The Proteins API is a scalable, reliable, fast, easy to use RESTful services that provides a broad protein information resource for users to ask questions based upon their field of expertise and allowing them to gain an integrated overview of protein annotations available to aid their knowledge gain on proteins in biological processes. The Proteins API is available at (http://www.ebi.ac.uk/proteins/api/doc).
Collapse
Affiliation(s)
| | - Ricardo Antunes
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Emanuele Alpi
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Leonardo Gonzales
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Wudong Liu
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jie Luo
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Guoying Qi
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Edd Turner
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Maria Martin
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
46
|
Chow CN, Lee TY, Hung YC, Li GZ, Tseng KC, Liu YH, Kuo PL, Zheng HQ, Chang WC. PlantPAN3.0: a new and updated resource for reconstructing transcriptional regulatory networks from ChIP-seq experiments in plants. Nucleic Acids Res 2019. [PMID: 30395277 DOI: 10.1093/nar/gky1081chu] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
The Plant Promoter Analysis Navigator (PlantPAN; http://PlantPAN.itps.ncku.edu.tw/) is an effective resource for predicting regulatory elements and reconstructing transcriptional regulatory networks for plant genes. In this release (PlantPAN 3.0), 17 230 TFs were collected from 78 plant species. To explore regulatory landscapes, genomic locations of TFBSs have been captured from 662 public ChIP-seq samples using standard data processing. A total of 1 233 999 regulatory linkages were identified from 99 regulatory factors (TFs, histones and other DNA-binding proteins) and their target genes across seven species. Additionally, this new version added 2449 matrices extracted from ChIP-seq peaks for cis-regulatory element prediction. In addition to integrated ChIP-seq data, four major improvements were provided for more comprehensive information of TF binding events, including (i) 1107 experimentally verified TF matrices from the literature, (ii) gene regulation network comparison between two species, (iii) 3D structures of TFs and TF-DNA complexes and (iv) condition-specific co-expression networks of TFs and their target genes extended to four species. The PlantPAN 3.0 can not only be efficiently used to investigate critical cis- and trans-regulatory elements in plant promoters, but also to reconstruct high-confidence relationships among TF-targets under specific conditions.
Collapse
Affiliation(s)
- Chi-Nga Chow
- Graduate Program in Translational Agricultural Sciences, National Cheng Kung University and Academia Sinica, Taiwan
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
| | - Yu-Cheng Hung
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Guan-Zhen Li
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Kuan-Chieh Tseng
- Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Ya-Hsin Liu
- Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Po-Li Kuo
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Han-Qin Zheng
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Wen-Chi Chang
- Graduate Program in Translational Agricultural Sciences, National Cheng Kung University and Academia Sinica, Taiwan
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
- Department of Life Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| |
Collapse
|
47
|
Bhowmick P, Mohammed Y, Borchers CH. MRMAssayDB: an integrated resource for validated targeted proteomics assays. Bioinformatics 2018; 34:3566-3571. [PMID: 29762640 PMCID: PMC6184479 DOI: 10.1093/bioinformatics/bty385] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 04/28/2018] [Accepted: 05/10/2018] [Indexed: 02/04/2023] Open
Abstract
Motivation Multiple Reaction Monitoring (MRM)-based targeted proteomics is increasingly being used to study the molecular basis of disease. When combined with an internal standard, MRM allows absolute quantification of proteins in virtually any type of sample but the development and validation of an MRM assay for a specific protein is laborious. Therefore, several public repositories now host targeted proteomics MRM assays, including NCI's Clinical Proteomic Tumor Analysis Consortium assay portals, PeptideAtlas SRM Experiment Library, SRMAtlas, PanoramaWeb and PeptideTracker, with all of which contain different levels of information. Results Here we present MRMAssayDB, a web-based application that integrates these repositories into a single resource. MRMAssayDB maps and links the targeted assays, annotates the proteins with information from UniProtKB, KEGG pathways and Gene Ontologies, and provides several visualization options on the peptide and protein level. Currently MRMAssayDB contains >168K assays covering more than 34K proteins from 63 organisms; >13.5K of these proteins are present in >2.3K KEGG biological pathways corresponding to >300 master pathways, and mapping to >13K GO biological processes. MRMAssayDB allows comprehensive searches for a targeted-proteomics assay depending on the user's interests, by using target-protein name or accession number, or using annotations such as subcellular localization, biological pathway, or disease or drug associations. The user can see how many data repositories include a specific peptide assay, and the commonly used transitions for each peptide in all empirical data from the repositories. Availability and implementation http://mrmassaydb.proteincentre.com. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pallab Bhowmick
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
| | - Yassene Mohammed
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, ZA, The Netherlands
| | - Christoph H Borchers
- University of Victoria – Genome British Columbia Proteomics Centre, Victoria, BC, Canada
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada
- Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada
| |
Collapse
|
48
|
Fu X, Liao B, Zhu W, Cai L. New 3D graphical representation for RNA structure analysis and its application in the pre-miRNA identification of plants. RSC Adv 2018; 8:30833-30841. [PMID: 35548744 PMCID: PMC9085476 DOI: 10.1039/c8ra04138e] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 08/24/2018] [Indexed: 11/26/2022] Open
Abstract
MicroRNAs (miRNAs) are a family of short non-coding RNAs that play significant roles as post-transcriptional regulators. Consequently, various methods have been proposed to identify precursor miRNAs (pre-miRNAs), among which the comparative studies of miRNA structures are the most important. To measure and classify the structural similarity of miRNAs, we propose a new three-dimensional (3D) graphical representation of the secondary structure of miRNAs, in which an miRNA secondary structure is initially transformed into a characteristic sequence based on physicochemical properties and frequency of base. A numerical characterization of the 3D graph is used to represent the miRNA secondary structure. We then utilize a novel Euclidean distance method based on this expression to compute the distance of different miRNA sequences for the sequence similarity analysis. Finally, we use this sequence similarity analysis method to identify plant pre-miRNAs among three commonly used datasets. Results show that the method is reasonable and effective.
Collapse
Affiliation(s)
- Xiangzheng Fu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University Changsha Hunan 410082 China
| |
Collapse
|
49
|
Segura J, Sanchez-Garcia R, Martinez M, Cuenca-Alba J, Tabas-Madrid D, Sorzano COS, Carazo JM. 3DBIONOTES v2.0: a web server for the automatic annotation of macromolecular structures. Bioinformatics 2018; 33:3655-3657. [PMID: 28961691 PMCID: PMC5870569 DOI: 10.1093/bioinformatics/btx483] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 07/27/2017] [Indexed: 11/13/2022] Open
Abstract
Motivation Complementing structural information with biochemical and biomedical annotations is a powerful approach to explore the biological function of macromolecular complexes. However, currently the compilation of annotations and structural data is a feature only available for those structures that have been released as entries to the Protein Data Bank. Results To help researchers in assessing the consistency between structures and biological annotations for structural models not deposited in databases, we present 3DBIONOTES v2.0, a web application designed for the automatic annotation of biochemical and biomedical information onto macromolecular structural models determined by any experimental or computational technique. Availability and implementation The web server is available at http://3dbionotes-ws.cnb.csic.es. Contact jsegura@cnb.csic.es. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joan Segura
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - Ruben Sanchez-Garcia
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - Marta Martinez
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - Jesus Cuenca-Alba
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - Daniel Tabas-Madrid
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - C O S Sorzano
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| | - J M Carazo
- GN7 of the Spanish National Institute for Bioinformatics (INB), Biocomputing Unit, National Center of Biotechnology (CSIC), Instruct Image Processing Center, Madrid 28049, Spain
| |
Collapse
|
50
|
Meeting Report of the International Life Science Integration Workshop 2018. Glycobiology 2018; 28:552-5. [PMID: 29982395 DOI: 10.1093/glycob/cwy056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Indexed: 11/14/2022] Open
|