Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

22
(from Reference Citation Analysis)

Article PDFs (7)

Cited by > 0 (19)

Searched Name

UniProt

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Perspectives of Proteomics in Respiratory Allergic Diseases. Int J Mol Sci 2023;24:12924. [PMID: 37629105 PMCID: PMC10454482 DOI: 10.3390/ijms241612924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/18/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open Abstract Proteomics in respiratory allergic diseases has such a battery of techniques and programs that one would almost think there is nothing impossible to find, invent or mold. All the resources that we document here are involved in solving problems in allergic diseases, both diagnostic and prognostic treatment, and immunotherapy development. The main perspectives, according to this version, are in three strands and/or a lockout immunological system: (1) Blocking the diapedesis of the cells involved, (2) Modifications and blocking of paratopes and epitopes being understood by modifications to antibodies, antagonisms, or blocking them, and (3) Blocking FcεRI high-affinity receptors to prevent specific IgEs from sticking to mast cells and basophils. These tools and targets in the allergic landscape are, in our view, the prospects in the field. However, there are still many allergens to identify, including some homologies between allergens and cross-reactions, through the identification of structures and epitopes. The current vision of using proteomics for this purpose remains a constant; this is also true for the basis of diagnostic and controlled systems for immunotherapy. Ours is an open proposal to use this vision for treatment. Collapse Key Words ELISA Luminex UniProt liquid chromatography mass spectrometry proteomics Collapse MESH Headings Humans Proteomics Hypersensitivity/diagnosis Hypersensitivity/therapy Respiration Disorders Respiratory Tract Diseases Epitopes Allergens Collapse Grants Collapse
2	UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023;22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open Abstract The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases. Collapse Key Words UniProt complete proteomes protein function protein sequence database sequence identifiers Collapse MESH Headings Humans Proteomics Proteome/genetics Databases, Protein Amino Acid Sequence Peptides Collapse Grants BB/T010541/1 Biotechnology and Biological Sciences Research Council BB/S01781X/1 Biotechnology and Biological Sciences Research Council Collapse
3	Searching and Navigating UniProt Databases. Curr Protoc 2023;3:e700. [PMID: 36912607 DOI: 10.1002/cpz1.700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023] Abstract The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. It provides 10 searchable datasets and four main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information about proteins that is present in UniProtKB protein entries, such as literature citations, taxonomy, and subcellular locations, among others. This article focuses on how to use UniProt datasets. The first basic protocol describes navigation and searching mechanisms for the UniProt datasets, and two additional protocols build on the first protocol to describe advanced search and query building. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Searching UniProt datasets Basic Protocol 2: Advanced search and query building Basis Protocol 3: Adding parameters using advanced search. Collapse Key Words UniProt database navigation protein search tutorial Collapse MESH Headings Databases, Protein Amino Acid Sequence Proteome Knowledge Bases Archives Collapse Grants BB/S01781X/1 Biotechnology and Biological Sciences Research Council NIA NIH HHS NHLBI NIH HHS NIGMS NIH HHS ODCDC CDC HHS NIDDK NIH HHS U24HG007822 NIH HHS NEI NIH HHS BB/T010541/1 Biotechnology and Biological Sciences Research Council NCI NIH HHS Collapse
4	UniProt Tools: BLAST, Align, Peptide Search, and ID Mapping. Curr Protoc 2023;3:e697. [PMID: 36943033 PMCID: PMC10034637 DOI: 10.1002/cpz1.697] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023] Abstract The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data (UniProt Consortium, 2023). The UniProt website receives about 800,000 unique visitors per month and is the primary means to access UniProt. Along with various datasets that you can search, UniProt provides four main tools. These are the "BLAST" tool for sequence similarity searching, the "Align" tool for multiple sequence alignment, the "Peptide Search" tool for retrieving proteins containing a short peptide sequence, and the "Retrieve/ID Mapping" tool for using a list of identifiers to retrieve UniProt Knowledgebase (UniProtKB) proteins and to convert database identifiers from UniProt to external databases or vice versa. This article provides four basic protocols and seven alternate protocols for using UniProt tools. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Basic local alignment search tool (BLAST) in UniProt Alternate Protocol 1: BLAST through UniProt text search results pages Alternate Protocol 2: BLAST through UniProt basket Basic Protocol 2: Multiple sequence alignment in UniProt Alternate Protocol 3: Align tool through UniProt results pages and entry pages Alternate Protocol 4: Align tool through UniProt basket Basic Protocol 3: Peptide search in UniProt Basic Protocol 4: Batch retrieval and ID mapping in UniProt Alternate Protocol 5: Retrieve/ID Mapping tool through UniProt text search results pages and BLAST and Align results pages Alternate Protocol 6: Retrieve/ID Mapping tool through UniProt basket Alternate Protocol 7: Retrieve/ID Mapping tool through UniProt search box. Collapse Key Words BLAST UniProt alignment navigation search tutorial Collapse MESH Headings Databases, Protein Peptides Software Proteins/metabolism Amino Acid Sequence Collapse Grants U24 HG007822 NHGRI NIH HHS Collapse
5	Redefining the catalytic HECT domain boundaries for the HECT E3 ubiquitin ligase family. Biosci Rep 2022;42:BSR20221036. [PMID: 36111624 PMCID: PMC9547173 DOI: 10.1042/bsr20221036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/07/2022] [Accepted: 09/08/2022] [Indexed: 11/17/2022] Open Abstract There are 28 unique human members of the homologous to E6AP C-terminus (HECT) E3 ubiquitin ligase family. Each member of the HECT E3 ubiquitin ligases contains a conserved bilobal HECT domain of approximately 350 residues found near their C-termini that is responsible for their respective ubiquitylation activities. Recent studies have begun to elucidate specific roles that each HECT E3 ubiquitin ligase has in various cancers, age-induced neurodegeneration, and neurological disorders. New structural models have been recently released for some of the HECT E3 ubiquitin ligases, but many HECT domain structures have yet to be examined due to chronic insolubility and/or protein folding issues. Building on these recently published structural studies coupled with our in-house experiments discussed in the present study, we suggest that the addition of ∼50 conserved residues preceding the N-terminal to the current UniProt defined boundaries of the HECT domain are required for isolating soluble, stable, and active HECT domains. We show using in silico bioinformatic analyses coupled with secondary structural prediction software that this predicted N-terminal α-helix found in all 28 human HECT E3 ubiquitin ligases forms an obligate amphipathic α-helix that binds to a hydrophobic pocket found within the HECT N-terminal lobe. The present study brings forth the proposal to redefine the residue boundaries of the HECT domain to include this N-terminal extension that will likely be critical for future biochemical, structural, and therapeutic studies on the HECT E3 ubiquitin ligase family. Collapse Key Words HECT E3 ubiquitin ligase HECT domain UniProt multiple sequence alignment ubiquitin ubiquitylation Collapse MESH Headings Catalytic Domain Humans Ubiquitin-Protein Ligases/metabolism Ubiquitination Ubiquitins/metabolism Collapse Grants R15 GM126432 NIGMS NIH HHS Collapse
6	Profiling the Human Phosphoproteome to Estimate the True Extent of Protein Phosphorylation. J Proteome Res 2022;21:1510-1524. [PMID: 35532924 PMCID: PMC9171898 DOI: 10.1021/acs.jproteome.2c00131] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Abstract Public phosphorylation databases such as PhosphoSitePlus (PSP) and PeptideAtlas (PA) compile results from published papers or openly available mass spectrometry (MS) data. However, there is no database-level control for false discovery of sites, likely leading to the overestimation of true phosphosites. By profiling the human phosphoproteome, we estimate the false discovery rate (FDR) of phosphosites and predict a more realistic count of true identifications. We rank sites into phosphorylation likelihood sets and analyze them in terms of conservation across 100 species, sequence properties, and functional annotations. We demonstrate significant differences between the sets and develop a method for independent phosphosite FDR estimation. Remarkably, we report estimated FDRs of 84, 98, and 82% within sets of phosphoserine (pSer), phosphothreonine (pThr), and phosphotyrosine (pTyr) sites, respectively, that are supported by only a single piece of identification evidence─the majority of sites in PSP. We estimate that around 62 000 Ser, 8000 Thr, and 12 000 Tyr phosphosites in the human proteome are likely to be true, which is lower than most published estimates. Furthermore, our analysis estimates that 86 000 Ser, 50 000 Thr, and 26 000 Tyr phosphosites are likely false-positive identifications, highlighting the significant potential of false-positive data to be present in phosphorylation databases. Collapse Key Words PeptideAtlas PhosphoSitePlus UniProt database evolutionary conservation false discovery rate mass spectrometry phosphopeptides phosphoproteomics phosphorylation phosphosites proteome proteomics Collapse MESH Headings Collapse Grants Collapse
7	ProSight annotator: Complete control and customization of protein entries in UniProt xml files. Proteomics 2022;22:e2100209. [PMID: 35286768 DOI: 10.1002/pmic.202100209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 12/07/2021] [Accepted: 01/25/2022] [Indexed: 11/07/2022] Abstract The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These ommissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt XML files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu. This article is protected by copyright. All rights reserved. Collapse Key Words Bottom-Up Proteomics Post Translational Modification Proteoforms Proteomics software Top-Down Proteomics UniProt Collapse MESH Headings Collapse Grants Collapse
8	Identification of Iron-Sulfur (Fe-S) Cluster and Zinc (Zn) Binding Sites Within Proteomes Predicted by DeepMind's AlphaFold2 Program Dramatically Expands the Metalloproteome. J Mol Biol 2022;434:167377. [PMID: 34838520 PMCID: PMC8785651 DOI: 10.1016/j.jmb.2021.167377] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 02/01/2023] Abstract DeepMind's AlphaFold2 software has ushered in a revolution in high quality, 3D protein structure prediction. In very recent work by the DeepMind team, structure predictions have been made for entire proteomes of twenty-one organisms, with >360,000 structures made available for download. Here we show that thousands of novel binding sites for iron-sulfur (Fe-S) clusters and zinc (Zn) ions can be identified within these predicted structures by exhaustive enumeration of all potential ligand-binding orientations. We demonstrate that AlphaFold2 routinely makes highly specific predictions of ligand binding sites: for example, binding sites that are comprised exclusively of four cysteine sidechains fall into three clusters, representing binding sites for 4Fe-4S clusters, 2Fe-2S clusters, or individual Zn ions. We show further: (a) that the majority of known Fe-S cluster and Zn binding sites documented in UniProt are recovered by the AlphaFold2 structures, (b) that there are occasional disputes between AlphaFold2 and UniProt with AlphaFold2 predicting highly plausible alternative binding sites, (c) that the Fe-S cluster binding sites that we identify in E. coli agree well with previous bioinformatics predictions, (d) that cysteines predicted here to be part of ligand binding sites show little overlap with those shown via chemoproteomics techniques to be highly reactive, and (e) that AlphaFold2 occasionally appears to build erroneous disulfide bonds between cysteines that should instead coordinate a ligand. These results suggest that AlphaFold2 could be an important tool for the functional annotation of proteomes, and the methodology presented here is likely to be useful for predicting other ligand-binding sites. Collapse Key Words UniProt functional annotation ligands metalloproteomics Collapse MESH Headings Binding Sites Computational Biology Escherichia coli/genetics Escherichia coli/metabolism Iron/chemistry Iron/metabolism Iron-Sulfur Proteins/chemistry Iron-Sulfur Proteins/metabolism Ligands Models, Molecular Protein Conformation Proteome/metabolism Sulfur/chemistry Sulfur/metabolism Zinc/chemistry Zinc/metabolism Collapse Grants R35 GM122466 NIGMS NIH HHS Collapse
9	IFPTML Mapping of Drug Graphs with Protein and Chromosome Structural Networks vs. Pre-Clinical Assay Information for Discovery of Antimalarial Compounds. Int J Mol Sci 2021;22:ijms222313066. [PMID: 34884870 PMCID: PMC8657696 DOI: 10.3390/ijms222313066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/23/2021] [Accepted: 11/24/2021] [Indexed: 11/16/2022] Open Abstract The parasite species of genus Plasmodium causes Malaria, which remains a major global health problem due to parasite resistance to available Antimalarial drugs and increasing treatment costs. Consequently, computational prediction of new Antimalarial compounds with novel targets in the proteome of Plasmodium sp. is a very important goal for the pharmaceutical industry. We can expect that the success of the pre-clinical assay depends on the conditions of assay per se, the chemical structure of the drug, the structure of the target protein to be targeted, as well as on factors governing the expression of this protein in the proteome such as genes (Deoxyribonucleic acid, DNA) sequence and/or chromosomes structure. However, there are no reports of computational models that consider all these factors simultaneously. Some of the difficulties for this kind of analysis are the dispersion of data in different datasets, the high heterogeneity of data, etc. In this work, we analyzed three databases ChEMBL (Chemical database of the European Molecular Biology Laboratory), UniProt (Universal Protein Resource), and NCBI-GDV (National Center for Biotechnology Information—Genome Data Viewer) to achieve this goal. The ChEMBL dataset contains outcomes for 17,758 unique assays of potential Antimalarial compounds including numeric descriptors (variables) for the structure of compounds as well as a huge amount of information about the conditions of assays. The NCBI-GDV and UniProt datasets include the sequence of genes, proteins, and their functions. In addition, we also created two partitions (c_assayj = c_aj and c_dataj = cd_j) of categorical variables from theChEMBL dataset. These partitions contain variables that encode information about experimental conditions of preclinical assays (c_aj) or about the nature and quality of data (c_dj). These categorical variables include information about 22 parameters of biological activity (c_a0), 28 target proteins (c_a1), and 9 organisms of assay (c_a2), etc. We also created another partition of (c_protj = c_pj) including categorical variables with biological information about the target proteins, genes, and chromosomes. These variables cover32 genes (c_p0), 10 chromosomes (c_p1), gene orientation (c_p2), and 31 protein functions (c_p3). We used a Perturbation-Theory Machine Learning Information Fusion (IFPTML) algorithm to map all this information (from three databases) into and train a predictive model. Shannon’s entropy measure Sh_k (numerical variables) was used to quantify the information about the structure of drugs, protein sequences, gene sequences, and chromosomes in the same information scale. Perturbation Theory Operators (PTOs) with the form of Moving Average (MA) operators have been used to quantify perturbations (deviations) in the structural variables with respect to their expected values for different subsets (partitions) of categorical variables. We obtained three IFPTML models using General Discriminant Analysis (GDA), Classification Tree with Univariate Splits (CTUS), and Classification Tree with Linear Combinations (CTLC). The IFPTML-CTLC presented the better performance with Sensitivity Sn(%) = 83.6/85.1, and Specificity Sp(%) = 89.8/89.7 for training/validation sets, respectively. This model could become a useful tool for the optimization of preclinical assays of new Antimalarial compounds vs. different proteins in the proteome of Plasmodium. Collapse Key Words Antimalarial compounds ChEMBL NCBI-GDV Plasmodium proteome UniProt complex networks machine learning perturbation theory Collapse MESH Headings Algorithms Antimalarials/chemistry Antimalarials/pharmacology Databases, Pharmaceutical Drug Discovery/methods Drug Evaluation, Preclinical Genome, Protozoan Machine Learning Markov Chains Models, Theoretical Plasmodium falciparum/genetics Protozoan Proteins/chemistry Protozoan Proteins/genetics Protozoan Proteins/metabolism Reproducibility of Results Collapse Grants Collapse
10	VarSite: Disease variants and protein structure. Protein Sci 2020;29:111-119. [PMID: 31606900 PMCID: PMC6933866 DOI: 10.1002/pro.3746] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/04/2019] [Accepted: 10/07/2019] [Indexed: 12/20/2022] Abstract VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite. Collapse Key Words 3D protein structure CATH ClinVar PDB Pfam UniProt VarMap VarSite disease variants gnomAD molecular interactions natural variants schematic diagrams Collapse MESH Headings Cloud Computing Computational Biology Databases, Genetic Genetic Predisposition to Disease Genetic Variation Humans Models, Molecular Protein Conformation Proteins/chemistry Proteins/genetics User-Computer Interface Collapse Grants Wellcome Trust 104960/Z/14/Z Wellcome Trust Collapse
11	UniprotR: Retrieving and visualizing protein sequence and functional information from Universal Protein Resource (UniProt knowledgebase). J Proteomics 2019;213:103613. [PMID: 31843688 DOI: 10.1016/j.jprot.2019.103613] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/26/2019] [Accepted: 12/13/2019] [Indexed: 02/06/2023] Abstract UniprotR is a software package designed to easily retrieve, cluster and visualize protein data from UniProt knowledgebase (UniProtKB) using R language. The package is implemented mainly to process, parse and illustrate proteomics data in a handy and time-saving approach allowing researchers to summarize all required protein information available at UniProtKB in a readable data frame, Excel CSV file, and/or graphical output. UniprotR generates a set of graphics including gene ontology, chromosomal location, protein scoring and status, protein networking, sequence phylogenetic tree, and physicochemical properties. In addition, the package supports clustering of proteins based on primary gene name or chromosomal location, facilitating additional downstream analysis. SIGNIFICANCE: In this work, we implemented a robust package for retrieving and visualizing information from multiple sources such UniProtKB, SWISS-MODEL, and STRING. UniprotR Contains functions that enable retrieving and cluster data in a handy way and visualize data in publishable graphs to facilitate researcher's work and fulfill their needs. UniprotR will aid in saving time for downstream data analysis instead of manual time consuming data analysis. AVAILABILITY AND IMPLEMENTATION: UniprotR released as free open source code under the license of GPLv3, and available in CRAN (The Comprehensive R Archive Network) and GitHub. (https://cran.r-project.org/web/packages/UniprotR/index.html). (https://github.com/Proteomicslab57357/UniprotR). Collapse Key Words Bioinformatics Proteomics R package UniProt UniProtKB Collapse MESH Headings Collapse Grants Collapse
12	Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019;26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022] Abstract The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review. Collapse Key Words CLIP PDB PSSM RNA-binding domain RNA-binding motif RNA-binding protein RNA-binding residue RNA-protein interaction UniProt deep learning evolutionary information machine learning meta-strategy physicochemical feature protein-binding nucleotide sequence feature structural feature. Collapse MESH Headings Collapse Grants Collapse
13	Analysing Point Mutations in Protein Cleavage Sites by Using Enzyme Specificity Matrices. Front Endocrinol (Lausanne) 2019;10:267. [PMID: 31130917 PMCID: PMC6509992 DOI: 10.3389/fendo.2019.00267] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 04/11/2019] [Indexed: 11/13/2022] Open Abstract Collapse Key Words Ensembl MEGA MEROPS UniProt cleavage sites enzymes point mutations Collapse MESH Headings Collapse Grants Collapse
14	SPIN: Submitting Sequences Determined at Protein Level to UniProt. ACTA ACUST UNITED AC 2018;62:e52. [PMID: 29927080 DOI: 10.1002/cpbi.52] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract Public availability of biological sequences is essential for their widespread access and use by the research community. The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and functional data. While most protein sequences entering UniProt are imported from other source databases containing nucleotide or 3-D structure data, protein sequences determined at the protein level can be submitted directly to UniProt. To this end, UniProt provides a Web interface called SPIN. This service enables researchers to make their de novo-sequenced proteins available to the scientific community and acquire UniProt accession numbers for use in publications. This unit explains the process of submitting a protein sequence to UniProt using SPIN. The basic protocol describes all the necessary steps for a single sequence. A support protocol gives guidance on how best to deal with exceptionally large datasets. © 2018 by John Wiley & Sons, Inc. Collapse Key Words UniProt direct protein sequencing submission Collapse MESH Headings Collapse Grants Collapse
15	Toward Spectral Library-Free Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry Bacterial Identification. J Proteome Res 2018;17:2124-2130. [PMID: 29749232 PMCID: PMC5989274 DOI: 10.1021/acs.jproteome.8b00065] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Abstract Bacterial identification is of great importance in clinical diagnosis, environmental monitoring, and food safety control. Among various strategies, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has drawn significant interest and has been clinically used. Nevertheless, current bioinformatics solutions use spectral libraries for the identification of bacterial strains. Spectral library generation requires acquisition of MALDI-TOF spectra from monoculture bacterial colonies, which is time-consuming and not possible for many species and strains. We propose a strategy for bacterial typing by MALDI-TOF using protein sequences from public database, that is, UniProt. Ten genes were identified to encode proteins most often observed by MALD-TOF from bacteria through 500 times repeated a 10-fold double cross-validation procedure, using 403 MALDI-TOF spectra corresponding to 14 genera, 81 species, and 403 strains, and the protein sequences of 1276 species in UniProt. The 10 genes were then used to annotate peaks on MALDI-TOF spectra of bacteria for bacterial identification. With the approach, bacteria can be identified at the genus level by searching against a database containing the protein sequences of 42 genera of bacteria from UniProt. Our approach identified 84.1% of the 403 spectra correctly at the genus level. Source code of the algorithm is available at https://github.com/dipcarbon/BacteriaMSLF. Collapse Key Words MALDI-TOF MALDI-TOF peak annotation UniProt bacterial identification data filtering double-cross validation library-free parameter optimization proteomics ribosomal proteins Collapse MESH Headings Collapse Grants Collapse
16	UniProt Protein Knowledgebase. Methods Mol Biol 2017;1558:41-55. [PMID: 28150232 DOI: 10.1007/978-1-4939-6783-4_2] [Citation(s) in RCA: 167] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Abstract The Universal Protein Resource (UniProt) is a freely available comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). Across the three institutes more than 100 people are involved through different tasks such as expert curation, software development, and support.This chapter introduces the functionality and data provided by UniProt. It describes example use cases for which you might come to UniProt and the methods to help you achieve your goals. Collapse Key Words Protein data Protein tools UniProt Collapse MESH Headings Collapse Grants Collapse
17	Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe. Protein Sci 2016;25:2164-2174. [PMID: 27636733 DOI: 10.1002/pro.3041] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 09/12/2016] [Accepted: 09/12/2016] [Indexed: 12/22/2022] Abstract Intrinsic disorder (ID) in proteins has been extensively described for the last decade; a large-scale classification of ID in proteins is mostly missing. Here, we provide an extensive analysis of ID in the protein universe on the UniProt database derived from sequence-based predictions in MobiDB. Almost half the sequences contain an ID region of at least five residues. About 9% of proteins have a long ID region of over 20 residues which are more abundant in Eukaryotic organisms and most frequently cover less than 20% of the sequence. A small subset of about 67,000 (out of over 80 million) proteins is fully disordered and mostly found in Viruses. Most proteins have only one ID, with short ID evenly distributed along the sequence and long ID overrepresented in the center. The charged residue composition of Das and Pappu was used to classify ID proteins by structural propensities and corresponding functional enrichment. Swollen Coils seem to be used mainly as structural components and in biosynthesis in both Prokaryotes and Eukaryotes. In Bacteria, they are confined in the nucleoid and in Viruses provide DNA binding function. Coils & Hairpins seem to be specialized in ribosome binding and methylation activities. Globules & Tadpoles bind antigens in Eukaryotes but are involved in killing other organisms and cytolysis in Bacteria. The Undefined class is used by Bacteria to bind toxic substances and mediate transport and movement between and within organisms in Viruses. Fully disordered proteins behave similarly, but are enriched for glycine residues and extracellular structures. Collapse Key Words MobiDB UniProt classification intrinsic disorder protein sequence protein structure Collapse MESH Headings Collapse Grants Collapse
18	UniProt Tools. ACTA ACUST UNITED AC 2016;53:1.29.1-1.29.15. [PMID: 27010333 DOI: 10.1002/0471250953.bi0129s53] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Abstract The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data (UniProt Consortium, 2015). The UniProt Web site receives ∼400,000 unique visitors per month and is the primary means to access UniProt. Along with various datasets that you can search, UniProt provides three main tools. These are the 'BLAST' tool for sequence similarity searching, the 'Align' tool for multiple sequence alignment, and the 'Retrieve/ID Mapping' tool for using a list of identifiers to retrieve UniProtKB proteins and to convert database identifiers from UniProt to external databases or vice versa. This unit provides three basic protocols, three alternate protocols, and two support protocols for using UniProt tools. Collapse Key Words UniProt navigation search tutorial Collapse MESH Headings Collapse Grants Collapse
19	Data from comprehensive analysis of nuclear localization signals. Data Brief 2016;6:200-3. [PMID: 26862559 PMCID: PMC4707185 DOI: 10.1016/j.dib.2015.11.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 11/26/2015] [Accepted: 11/27/2015] [Indexed: 11/27/2022] Open Abstract This article describes data related to a research article titled "Comprehensive analysis of the dynamic structure of nuclear localization signals" by Yamagishi et al. [1]. In this article, we provide the data covering wider range of the mammalian NLSs in UniProt (Universal Protein Resource) [2] regardless of their conformations. To be more specific as follows: We have extracted all NLSs which are clearly indicated as "NLS" with evidence type (a code from the Evidence Codes Ontology) [3] in UniProt. A total of 1364 NLSs in 1186 proteins were extracted from UniProt. The number of NLSs found in each protein (UniProt ID), the sequence length of NLSs and their distribution are shown. Collapse Key Words Comprehensive analysis Distribution Nuclear localization signal Nuclear transport UniProt Collapse MESH Headings Collapse Grants Collapse
20	UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. Methods Mol Biol 2016;1374:23-54. [PMID: 26519399 DOI: 10.1007/978-1-4939-3167-5_2] [Citation(s) in RCA: 450] [Impact Index Per Article: 56.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023] Abstract The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry. Collapse Key Words Amino-acid sequence Manual annotation Protein database Swiss-Prot TrEMBL UniProt Collapse MESH Headings Animals Computational Biology/methods Databases, Protein Humans Web Browser Collapse Grants 5R01GM080646-07 NIGMS NIH HHS 2P41HG02273 NHGRI NIH HHS 8P20GM103446-12 NIGMS NIH HHS 5G08LM010720-03 NLM NIH HHS 1 U41 HG006104 NHGRI NIH HHS 3R01GM080646-07S1 NIGMS NIH HHS U41 HG007822 NHGRI NIH HHS Collapse
21	Searching and Navigating UniProt Databases. ACTA ACUST UNITED AC 2015;50:1.27.1-1.27.10. [PMID: 26088053 DOI: 10.1002/0471250953.bi0127s50] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Abstract The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt Web site receives ∼400,000 unique visitors per month and is the primary means to access UniProt. It provides ten searchable datasets and three main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information about proteins that is present in UniProtKB protein entries such as literature citations, taxonomy, and subcellular locations, among others. This paper focuses on how to use UniProt datasets. The basic protocol describes navigation and searching mechanisms for the UniProt datasets, while two alternative protocols build on the basic protocol to describe advanced search and query building. Collapse Key Words UniProt navigation search tutorial Collapse MESH Headings Collapse Grants Collapse
22	Identification of novel mutations in the ABCA12 gene, c.1857delA and c.5653-5655delTAT, causing harlequin ichthyosis. Gene 2013;531:510-3. [PMID: 24055722 DOI: 10.1016/j.gene.2013.07.046] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2013] [Revised: 07/09/2013] [Accepted: 07/12/2013] [Indexed: 10/26/2022] Abstract Harlequin ichthyosis (HI) is a severe autosomal recessive developmental disorder of the skin that is frequently but not always fatal in the first few days of life. In HI, mutations in both ABCA12 gene alleles must have a severe impact on protein function and most mutations are truncating. The presence of at least one nontruncating mutation (predicting a residual protein function) usually causes a less severe congenital ichthyosis (lamellar ichthyosis or congenital ichthyosiform erythroderma). Here we report on a girl with severe HI diagnosed by prenatal ultrasound at 33 5/7 week gestation. Ultrasound findings included ectropion, eclabium, deformed nose, hands and feet, joint contractures, hyperechogenic amniotic fluid and polyhydramnion. After birth, palliative treatment was provided and she died on her first day of life. Sequence analysis of the ABCA12 gene identified two novel mutations, c.1857delA (predicting p.Lys619) in exon 15 and c.5653-5655delTAT (predicting p.1885delTyr) in exon 37, each in heterozygous state. The c.5653-5655delTAT mutation is not truncating, but the deleted tyrosine at position 1885 is perfectly conserved among vertebrates and molecular studies evaluated the mutation as probably disease causing and damaging. Collapse Key Words 3D ABCA12 ABCA12 gene ATP-binding cassette, subfamily A, member 12 HI Harlequin ichthyosis Late gestation diagnosis NCBI National Center for Biotechnology Information Novel mutation c.1857delA Novel mutation c.5653–5655delTAT OMIM Online Mendelian Inheritance in Man UniProt Universal Protein Resource base pair bp harlequin ichthyosis no number three-dimensional Collapse MESH Headings Collapse Grants Collapse