1
|
Perspectives of Proteomics in Respiratory Allergic Diseases. Int J Mol Sci 2023; 24:12924. [PMID: 37629105 PMCID: PMC10454482 DOI: 10.3390/ijms241612924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/18/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open
Abstract
Proteomics in respiratory allergic diseases has such a battery of techniques and programs that one would almost think there is nothing impossible to find, invent or mold. All the resources that we document here are involved in solving problems in allergic diseases, both diagnostic and prognostic treatment, and immunotherapy development. The main perspectives, according to this version, are in three strands and/or a lockout immunological system: (1) Blocking the diapedesis of the cells involved, (2) Modifications and blocking of paratopes and epitopes being understood by modifications to antibodies, antagonisms, or blocking them, and (3) Blocking FcεRI high-affinity receptors to prevent specific IgEs from sticking to mast cells and basophils. These tools and targets in the allergic landscape are, in our view, the prospects in the field. However, there are still many allergens to identify, including some homologies between allergens and cross-reactions, through the identification of structures and epitopes. The current vision of using proteomics for this purpose remains a constant; this is also true for the basis of diagnostic and controlled systems for immunotherapy. Ours is an open proposal to use this vision for treatment.
Collapse
|
2
|
UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
|
3
|
Searching and Navigating UniProt Databases. Curr Protoc 2023; 3:e700. [PMID: 36912607 DOI: 10.1002/cpz1.700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. It provides 10 searchable datasets and four main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information about proteins that is present in UniProtKB protein entries, such as literature citations, taxonomy, and subcellular locations, among others. This article focuses on how to use UniProt datasets. The first basic protocol describes navigation and searching mechanisms for the UniProt datasets, and two additional protocols build on the first protocol to describe advanced search and query building. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Searching UniProt datasets Basic Protocol 2: Advanced search and query building Basis Protocol 3: Adding parameters using advanced search.
Collapse
|
4
|
Abstract
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data (UniProt Consortium, 2023). The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. Along with various datasets that you can search, UniProt provides four main tools. These are the "BLAST" tool for sequence similarity searching, the "Align" tool for multiple sequence alignment, the "Peptide Search" tool for retrieving proteins containing a short peptide sequence, and the "Retrieve/ID Mapping" tool for using a list of identifiers to retrieve UniProt Knowledgebase (UniProtKB) proteins and to convert database identifiers from UniProt to external databases or vice versa. This article provides four basic protocols and seven alternate protocols for using UniProt tools. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Basic local alignment search tool (BLAST) in UniProt Alternate Protocol 1: BLAST through UniProt text search results pages Alternate Protocol 2: BLAST through UniProt basket Basic Protocol 2: Multiple sequence alignment in UniProt Alternate Protocol 3: Align tool through UniProt results pages and entry pages Alternate Protocol 4: Align tool through UniProt basket Basic Protocol 3: Peptide search in UniProt Basic Protocol 4: Batch retrieval and ID mapping in UniProt Alternate Protocol 5: Retrieve/ID Mapping tool through UniProt text search results pages and BLAST and Align results pages Alternate Protocol 6: Retrieve/ID Mapping tool through UniProt basket Alternate Protocol 7: Retrieve/ID Mapping tool through UniProt search box.
Collapse
|
5
|
Redefining the catalytic HECT domain boundaries for the HECT E3 ubiquitin ligase family. Biosci Rep 2022; 42:BSR20221036. [PMID: 36111624 PMCID: PMC9547173 DOI: 10.1042/bsr20221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/07/2022] [Accepted: 09/08/2022] [Indexed: 11/17/2022] Open
Abstract
There are 28 unique human members of the homologous to E6AP C-terminus (HECT) E3 ubiquitin ligase family. Each member of the HECT E3 ubiquitin ligases contains a conserved bilobal HECT domain of approximately 350 residues found near their C-termini that is responsible for their respective ubiquitylation activities. Recent studies have begun to elucidate specific roles that each HECT E3 ubiquitin ligase has in various cancers, age-induced neurodegeneration, and neurological disorders. New structural models have been recently released for some of the HECT E3 ubiquitin ligases, but many HECT domain structures have yet to be examined due to chronic insolubility and/or protein folding issues. Building on these recently published structural studies coupled with our in-house experiments discussed in the present study, we suggest that the addition of ∼50 conserved residues preceding the N-terminal to the current UniProt defined boundaries of the HECT domain are required for isolating soluble, stable, and active HECT domains. We show using in silico bioinformatic analyses coupled with secondary structural prediction software that this predicted N-terminal α-helix found in all 28 human HECT E3 ubiquitin ligases forms an obligate amphipathic α-helix that binds to a hydrophobic pocket found within the HECT N-terminal lobe. The present study brings forth the proposal to redefine the residue boundaries of the HECT domain to include this N-terminal extension that will likely be critical for future biochemical, structural, and therapeutic studies on the HECT E3 ubiquitin ligase family.
Collapse
|
6
|
Abstract
Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases.
Collapse
|
7
|
ProSight annotator: Complete control and customization of protein entries in UniProt xml files. Proteomics 2022; 22:e2100209. [PMID: 35286768 DOI: 10.1002/pmic.202100209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 12/07/2021] [Accepted: 01/25/2022] [Indexed: 11/07/2022]
Abstract
The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These ommissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt XML files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu. This article is protected by copyright. All rights reserved.
Collapse
|
8
|
Identification of Iron-Sulfur (Fe-S) Cluster and Zinc (Zn) Binding Sites Within Proteomes Predicted by DeepMind's AlphaFold2 Program Dramatically Expands the Metalloproteome. J Mol Biol 2022; 434:167377. [PMID: 34838520 PMCID: PMC8785651 DOI: 10.1016/j.jmb.2021.167377] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 02/01/2023]
Abstract
DeepMind's AlphaFold2 software has ushered in a revolution in high quality, 3D protein structure prediction. In very recent work by the DeepMind team, structure predictions have been made for entire proteomes of twenty-one organisms, with >360,000 structures made available for download. Here we show that thousands of novel binding sites for iron-sulfur (Fe-S) clusters and zinc (Zn) ions can be identified within these predicted structures by exhaustive enumeration of all potential ligand-binding orientations. We demonstrate that AlphaFold2 routinely makes highly specific predictions of ligand binding sites: for example, binding sites that are comprised exclusively of four cysteine sidechains fall into three clusters, representing binding sites for 4Fe-4S clusters, 2Fe-2S clusters, or individual Zn ions. We show further: (a) that the majority of known Fe-S cluster and Zn binding sites documented in UniProt are recovered by the AlphaFold2 structures, (b) that there are occasional disputes between AlphaFold2 and UniProt with AlphaFold2 predicting highly plausible alternative binding sites, (c) that the Fe-S cluster binding sites that we identify in E. coli agree well with previous bioinformatics predictions, (d) that cysteines predicted here to be part of ligand binding sites show little overlap with those shown via chemoproteomics techniques to be highly reactive, and (e) that AlphaFold2 occasionally appears to build erroneous disulfide bonds between cysteines that should instead coordinate a ligand. These results suggest that AlphaFold2 could be an important tool for the functional annotation of proteomes, and the methodology presented here is likely to be useful for predicting other ligand-binding sites.
Collapse
|
9
|
IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021; 22:ijms222313066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open
Abstract
The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (cassayj = caj and cdataj = cdj) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (caj) or about the nature and quality of data (cdj). These categorical variables include information about 22 parameters of biological activity (ca0), 28 target proteins (ca1), and 9 organisms of assay (ca2), etc. We also created another partition of (cprotj = cpj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (cp0), 10 chromosomes (cp1), gene orientation (cp2), and 31 protein functions (cp3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Shk (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium.
Collapse
|
10
|
VarSite: Disease variants and protein structure. Protein Sci 2020; 29:111-119. [PMID: 31606900 PMCID: PMC6933866 DOI: 10.1002/pro.3746] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/04/2019] [Accepted: 10/07/2019] [Indexed: 12/20/2022]
Abstract
VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite.
Collapse
|
11
|
UniprotR: Retrieving and visualizing protein sequence and functional information from Universal Protein Resource (UniProt knowledgebase). J Proteomics 2019; 213:103613. [PMID: 31843688 DOI: 10.1016/j.jprot.2019.103613] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 02/06/2023]
Abstract
UniprotR is a software package designed to easily retrieve, cluster and visualize protein data from UniProt knowledgebase (UniProtKB) using R language. The package is implemented mainly to process, parse and illustrate proteomics data in a handy and time-saving approach allowing researchers to summarize all required protein information available at UniProtKB in a readable data frame, Excel CSV file, and/or graphical output. UniprotR generates a set of graphics including gene ontology, chromosomal location, protein scoring and status, protein networking, sequence phylogenetic tree, and physicochemical properties. In addition, the package supports clustering of proteins based on primary gene name or chromosomal location, facilitating additional downstream analysis. SIGNIFICANCE: In this work, we implemented a robust package for retrieving and visualizing information from multiple sources such UniProtKB, SWISS-MODEL, and STRING. UniprotR Contains functions that enable retrieving and cluster data in a handy way and visualize data in publishable graphs to facilitate researcher's work and fulfill their needs. UniprotR will aid in saving time for downstream data analysis instead of manual time consuming data analysis. AVAILABILITY AND IMPLEMENTATION: UniprotR released as free open source code under the license of GPLv3, and available in CRAN (The Comprehensive R Archive Network) and GitHub. (https://cran.r-project.org/web/packages/UniprotR/index.html). (https://github.com/Proteomicslab57357/UniprotR).
Collapse
|
12
|
Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
|
13
|
Analysing Point Mutations in Protein Cleavage Sites by Using Enzyme Specificity Matrices. Front Endocrinol (Lausanne) 2019; 10:267. [PMID: 31130917 PMCID: PMC6509992 DOI: 10.3389/fendo.2019.00267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open
|
14
|
Abstract
Public availability of biological sequences is essential for their widespread access and use by the research community. The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and functional data. While most protein sequences entering UniProt are imported from other source databases containing nucleotide or 3-D structure data, protein sequences determined at the protein level can be submitted directly to UniProt. To this end, UniProt provides a Web interface called SPIN. This service enables researchers to make their de novo-sequenced proteins available to the scientific community and acquire UniProt accession numbers for use in publications. This unit explains the process of submitting a protein sequence to UniProt using SPIN. The basic protocol describes all the necessary steps for a single sequence. A support protocol gives guidance on how best to deal with exceptionally large datasets. © 2018 by John Wiley & Sons, Inc.
Collapse
|
15
|
Toward Spectral Library-Free Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry Bacterial Identification. J Proteome Res 2018; 17:2124-2130. [PMID: 29749232 PMCID: PMC5989274 DOI: 10.1021/acs.jproteome.8b00065] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Bacterial
identification is of great importance in clinical diagnosis,
environmental monitoring, and food safety control. Among various strategies,
matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
(MALDI-TOF MS) has drawn significant interest and has been clinically
used. Nevertheless, current bioinformatics solutions use spectral
libraries for the identification of bacterial strains. Spectral library
generation requires acquisition of MALDI-TOF spectra from monoculture
bacterial colonies, which is time-consuming and not possible for many
species and strains. We propose a strategy for bacterial typing by
MALDI-TOF using protein sequences from public database, that is, UniProt.
Ten genes were identified to encode proteins most often observed by
MALD-TOF from bacteria through 500 times repeated a 10-fold double
cross-validation procedure, using 403 MALDI-TOF spectra corresponding
to 14 genera, 81 species, and 403 strains, and the protein sequences
of 1276 species in UniProt. The 10 genes were then used to annotate
peaks on MALDI-TOF spectra of bacteria for bacterial identification.
With the approach, bacteria can be identified at the genus level by
searching against a database containing the protein sequences of 42
genera of bacteria from UniProt. Our approach identified 84.1% of
the 403 spectra correctly at the genus level. Source code of the algorithm
is available at https://github.com/dipcarbon/BacteriaMSLF.
Collapse
|
16
|
Abstract
The Universal Protein Resource (UniProt) is a freely available comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). Across the three institutes more than 100 people are involved through different tasks such as expert curation, software development, and support.This chapter introduces the functionality and data provided by UniProt. It describes example use cases for which you might come to UniProt and the methods to help you achieve your goals.
Collapse
|
17
|
Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe. Protein Sci 2016; 25:2164-2174. [PMID: 27636733 DOI: 10.1002/pro.3041] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 09/12/2016] [Accepted: 09/12/2016] [Indexed: 12/22/2022]
Abstract
Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures.
Collapse
|
18
|
Abstract
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data (UniProt Consortium, 2015). The UniProt Web site receives ∼400,000 unique visitors per month and is the primary means to access UniProt. Along with various datasets that you can search, UniProt provides three main tools. These are the 'BLAST' tool for sequence similarity searching, the 'Align' tool for multiple sequence alignment, and the 'Retrieve/ID Mapping' tool for using a list of identifiers to retrieve UniProtKB proteins and to convert database identifiers from UniProt to external databases or vice versa. This unit provides three basic protocols, three alternate protocols, and two support protocols for using UniProt tools.
Collapse
|
19
|
Data from comprehensive analysis of nuclear localization signals. Data Brief 2016; 6:200-3. [PMID: 26862559 PMCID: PMC4707185 DOI: 10.1016/j.dib.2015.11.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 11/26/2015] [Accepted: 11/27/2015] [Indexed: 11/27/2022] Open
Abstract
This article describes data related to a research article titled "Comprehensive analysis of the dynamic structure of nuclear localization signals" by Yamagishi et al. [1]. In this article, we provide the data covering wider range of the mammalian NLSs in UniProt (Universal Protein Resource) [2] regardless of their conformations. To be more specific as follows: We have extracted all NLSs which are clearly indicated as "NLS" with evidence type (a code from the Evidence Codes Ontology) [3] in UniProt. A total of 1364 NLSs in 1186 proteins were extracted from UniProt. The number of NLSs found in each protein (UniProt ID), the sequence length of NLSs and their distribution are shown.
Collapse
|
20
|
UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 2016; 1374:23-54. [PMID: 26519399 DOI: 10.1007/978-1-4939-3167-5_2] [Citation(s) in RCA: 450] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.
Collapse
|
21
|
Abstract
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt Web site receives ∼400,000 unique visitors per month and is the primary means to access UniProt. It provides ten searchable datasets and three main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information about proteins that is present in UniProtKB protein entries such as literature citations, taxonomy, and subcellular locations, among others. This paper focuses on how to use UniProt datasets. The basic protocol describes navigation and searching mechanisms for the UniProt datasets, while two alternative protocols build on the basic protocol to describe advanced search and query building.
Collapse
|
22
|
Identification of novel mutations in the ABCA12 gene, c.1857delA and c.5653-5655delTAT, causing harlequin ichthyosis. Gene 2013; 531:510-3. [PMID: 24055722 DOI: 10.1016/j.gene.2013.07.046] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Revised: 07/09/2013] [Accepted: 07/12/2013] [Indexed: 10/26/2022]
Abstract
Harlequin ichthyosis (HI) is a severe autosomal recessive developmental disorder of the skin that is frequently but not always fatal in the first few days of life. In HI, mutations in both ABCA12 gene alleles must have a severe impact on protein function and most mutations are truncating. The presence of at least one nontruncating mutation (predicting a residual protein function) usually causes a less severe congenital ichthyosis (lamellar ichthyosis or congenital ichthyosiform erythroderma). Here we report on a girl with severe HI diagnosed by prenatal ultrasound at 33 5/7 week gestation. Ultrasound findings included ectropion, eclabium, deformed nose, hands and feet, joint contractures, hyperechogenic amniotic fluid and polyhydramnion. After birth, palliative treatment was provided and she died on her first day of life. Sequence analysis of the ABCA12 gene identified two novel mutations, c.1857delA (predicting p.Lys619) in exon 15 and c.5653-5655delTAT (predicting p.1885delTyr) in exon 37, each in heterozygous state. The c.5653-5655delTAT mutation is not truncating, but the deleted tyrosine at position 1885 is perfectly conserved among vertebrates and molecular studies evaluated the mutation as probably disease causing and damaging.
Collapse
|