1
|
Global analysis of the yeast knockout phenome. SCIENCE ADVANCES 2023; 9:eadg5702. [PMID: 37235661 DOI: 10.1126/sciadv.adg5702] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/20/2023] [Indexed: 05/28/2023]
Abstract
Genome-wide phenotypic screens in the budding yeast Saccharomyces cerevisiae, enabled by its knockout collection, have produced the largest, richest, and most systematic phenotypic description of any organism. However, integrative analyses of this rich data source have been virtually impossible because of the lack of a central data repository and consistent metadata annotations. Here, we describe the aggregation, harmonization, and analysis of ~14,500 yeast knockout screens, which we call Yeast Phenome. Using this unique dataset, we characterized two unknown genes (YHR045W and YGL117W) and showed that tryptophan starvation is a by-product of many chemical treatments. Furthermore, we uncovered an exponential relationship between phenotypic similarity and intergenic distance, which suggests that gene positions in both yeast and human genomes are optimized for function.
Collapse
|
2
|
Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII. Database (Oxford) 2022; 2022:6748864. [PMID: 36197453 PMCID: PMC9534061 DOI: 10.1093/database/baac084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/18/2022] [Accepted: 09/08/2022] [Indexed: 11/06/2022]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system's ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-4/.
Collapse
|
3
|
The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci 2021; 30:187-200. [PMID: 33070389 PMCID: PMC7737760 DOI: 10.1002/pro.3978] [Citation(s) in RCA: 594] [Impact Index Per Article: 198.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 10/09/2020] [Accepted: 10/13/2020] [Indexed: 02/06/2023]
Abstract
The BioGRID (Biological General Repository for Interaction Datasets, thebiogrid.org) is an open-access database resource that houses manually curated protein and genetic interactions from multiple species including yeast, worm, fly, mouse, and human. The ~1.93 million curated interactions in BioGRID can be used to build complex networks to facilitate biomedical discoveries, particularly as related to human health and disease. All BioGRID content is curated from primary experimental evidence in the biomedical literature, and includes both focused low-throughput studies and large high-throughput datasets. BioGRID also captures protein post-translational modifications and protein or gene interactions with bioactive small molecules including many known drugs. A built-in network visualization tool combines all annotations and allows users to generate network graphs of protein, genetic and chemical interactions. In addition to general curation across species, BioGRID undertakes themed curation projects in specific aspects of cellular regulation, for example the ubiquitin-proteasome system, as well as specific disease areas, such as for the SARS-CoV-2 virus that causes COVID-19 severe acute respiratory syndrome. A recent extension of BioGRID, named the Open Repository of CRISPR Screens (ORCS, orcs.thebiogrid.org), captures single mutant phenotypes and genetic interactions from published high throughput genome-wide CRISPR/Cas9-based genetic screens. BioGRID-ORCS contains datasets for over 1,042 CRISPR screens carried out to date in human, mouse and fly cell lines. The biomedical research community can freely access all BioGRID data through the web interface, standardized file downloads, or via model organism databases and partner meta-databases.
Collapse
|
4
|
Proteome-wide, Structure-Based Prediction of Protein-Protein Interactions/New Molecular Interactions Viewer. PLANT PHYSIOLOGY 2019; 179:1893-1907. [PMID: 30679268 PMCID: PMC6446796 DOI: 10.1104/pp.18.01216] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/15/2019] [Indexed: 05/04/2023]
Abstract
Determining the complete Arabidopsis (Arabidopsis thaliana) protein-protein interaction network is essential for understanding the functional organization of the proteome. Numerous small-scale studies and a couple of large-scale ones have elucidated a fraction of the estimated 300,000 binary protein-protein interactions in Arabidopsis. In this study, we provide evidence that a docking algorithm has the ability to identify real interactions using both experimentally determined and predicted protein structures. We ranked 0.91 million interactions generated by all possible pairwise combinations of 1,346 predicted structure models from an Arabidopsis predicted "structure-ome" and found a significant enrichment of real interactions for the top-ranking predicted interactions, as shown by cosubcellular enrichment analysis and yeast two-hybrid validation. Our success rate for computationally predicted, structure-based interactions was 63% of the success rate for published interactions naively tested using the yeast two-hybrid system and 2.7 times better than for randomly picked pairs of proteins. This study provides another perspective in interactome exploration and biological network reconstruction using protein structural information. We have made these interactions freely accessible through an improved Arabidopsis Interactions Viewer and have created community tools for accessing these and ∼2.8 million other protein-protein and protein-DNA interactions for hypothesis generation by researchers worldwide. The Arabidopsis Interactions Viewer is freely available at http://bar.utoronto.ca/interactions2/.
Collapse
|
5
|
A Computational Framework for Genome-wide Characterization of the Human Disease Landscape. Cell Syst 2019; 8:152-162.e6. [PMID: 30685436 PMCID: PMC7374759 DOI: 10.1016/j.cels.2018.12.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 10/16/2018] [Accepted: 12/20/2018] [Indexed: 01/21/2023]
Abstract
A key challenge for the diagnosis and treatment of complex human diseases is identifying their molecular basis. Here, we developed a unified computational framework, URSAHD (Unveiling RNA Sample Annotation for Human Diseases), that leverages machine learning and the hierarchy of anatomical relationships present among diseases to integrate thousands of clinical gene expression profiles and identify molecular characteristics specific to each of the hundreds of complex diseases. URSAHD can distinguish between closely related diseases more accurately than literature-validated genes or traditional differential-expression-based computational approaches and is applicable to any disease, including rare and understudied ones. We demonstrate the utility of URSAHD in classifying related nervous system cancers and experimentally verifying novel neuroblastoma-associated genes identified by URSAHD. We highlight the applications for potential targeted drug-repurposing and for quantitatively assessing the molecular response to clinical therapies. URSAHD is freely available for public use, including the use of underlying models, at ursahd.princeton.edu.
Collapse
|
6
|
The BioGRID interaction database: 2019 update. Nucleic Acids Res 2019; 47:D529-D541. [PMID: 30476227 PMCID: PMC6324058 DOI: 10.1093/nar/gky1079] [Citation(s) in RCA: 822] [Impact Index Per Article: 164.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 10/15/2018] [Accepted: 11/22/2018] [Indexed: 12/17/2022] Open
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.
Collapse
|
7
|
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:baw147. [PMID: 28077563 PMCID: PMC5225395 DOI: 10.1093/database/baw147] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 11/13/2022]
Abstract
A great deal of information on the molecular genetics and biochemistry of model organisms has been reported in the scientific literature. However, this data is typically described in free text form and is not readily amenable to computational analyses. To this end, the BioGRID database systematically curates the biomedical literature for genetic and protein interaction data. This data is provided in a standardized computationally tractable format and includes structured annotation of experimental evidence. BioGRID curation necessarily involves substantial human effort by expert curators who must read each publication to extract the relevant information. Computational text-mining methods offer the potential to augment and accelerate manual curation. To facilitate the development of practical text-mining strategies, a new challenge was organized in BioCreative V for the BioC task, the collaborative Biocurator Assistant Task. This was a non-competitive, cooperative task in which the participants worked together to build BioC-compatible modules into an integrated pipeline to assist BioGRID curators. As an integral part of this task, a test collection of full text articles was developed that contained both biological entity annotations (gene/protein and organism/species) and molecular interaction annotations (protein–protein and genetic interactions (PPIs and GIs)). This collection, which we call the BioC-BioGRID corpus, was annotated by four BioGRID curators over three rounds of annotation and contains 120 full text articles curated in a dataset representing two major model organisms, namely budding yeast and human. The BioC-BioGRID corpus contains annotations for 6409 mentions of genes and their Entrez Gene IDs, 186 mentions of organism names and their NCBI Taxonomy IDs, 1867 mentions of PPIs and 701 annotations of PPI experimental evidence statements, 856 mentions of GIs and 399 annotations of GI evidence statements. The purpose, characteristics and possible future uses of the BioC-BioGRID corpus are detailed in this report. Database URL:http://bioc.sourceforge.net/BioC-BioGRID.html
Collapse
|
8
|
The BioGRID interaction database: 2017 update. Nucleic Acids Res 2016; 45:D369-D379. [PMID: 27980099 PMCID: PMC5210573 DOI: 10.1093/nar/gkw1102] [Citation(s) in RCA: 666] [Impact Index Per Article: 83.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 10/25/2016] [Accepted: 10/27/2016] [Indexed: 01/05/2023] Open
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the annotation and archival of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2016 (build 3.4.140), the BioGRID contains 1 072 173 genetic and protein interactions, and 38 559 post-translational modifications, as manually annotated from 48 114 publications. This dataset represents interaction records for 66 model organisms and represents a 30% increase compared to the previous 2015 BioGRID update. BioGRID curates the biomedical literature for major model organism species, including humans, with a recent emphasis on central biological processes and specific human diseases. To facilitate network-based approaches to drug discovery, BioGRID now incorporates 27 501 chemical-protein interactions for human drug targets, as drawn from the DrugBank database. A new dynamic interaction network viewer allows the easy navigation and filtering of all genetic and protein interaction data, as well as for bioactive compounds and their established targets. BioGRID data are directly downloadable without restriction in a variety of standardized formats and are freely distributed through partner model organism databases and meta-databases.
Collapse
|
9
|
BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw121. [PMID: 27589962 PMCID: PMC5009341 DOI: 10.1093/database/baw121] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 08/02/2016] [Indexed: 11/14/2022]
Abstract
BioC is a simple XML format for text, annotations and relations, and was developed to achieve interoperability for biomedical text processing. Following the success of BioC in BioCreative IV, the BioCreative V BioC track addressed a collaborative task to build an assistant system for BioGRID curation. In this paper, we describe the framework of the collaborative BioC task and discuss our findings based on the user survey. This track consisted of eight subtasks including gene/protein/organism named entity recognition, protein-protein/genetic interaction passage identification and annotation visualization. Using BioC as their data-sharing and communication medium, nine teams, world-wide, participated and contributed either new methods or improvements of existing tools to address different subtasks of the BioC track. Results from different teams were shared in BioC and made available to other teams as they addressed different subtasks of the track. In the end, all submitted runs were merged using a machine learning classifier to produce an optimized output. The biocurator assistant system was evaluated by four BioGRID curators in terms of practical usability. The curators' feedback was overall positive and highlighted the user-friendly design and the convenient gene/protein curation tool based on text mining.Database URL: http://www.biocreative.org/tasks/biocreative-v/track-1-bioc/.
Collapse
|
10
|
An extended set of yeast-based functional assays accurately identifies human disease mutations. Genome Res 2016; 26:670-80. [PMID: 26975778 PMCID: PMC4864455 DOI: 10.1101/gr.192526.115] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 03/08/2016] [Indexed: 12/19/2022]
Abstract
We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods.
Collapse
|
11
|
BioGRID: A Resource for Studying Biological Interactions in Yeast. Cold Spring Harb Protoc 2016; 2016:pdb.top080754. [PMID: 26729913 DOI: 10.1101/pdb.top080754] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The Biological General Repository for Interaction Datasets (BioGRID) is a freely available public database that provides the biological and biomedical research communities with curated protein and genetic interaction data. Structured experimental evidence codes, an intuitive search interface, and visualization tools enable the discovery of individual gene, protein, or biological network function. BioGRID houses interaction data for the major model organism species--including yeast, nematode, fly, zebrafish, mouse, and human--with particular emphasis on the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe as pioneer eukaryotic models for network biology. BioGRID has achieved comprehensive curation coverage of the entire literature for these two major yeast models, which is actively maintained through monthly curation updates. As of September 2015, BioGRID houses approximately 335,400 biological interactions for budding yeast and approximately 67,800 interactions for fission yeast. BioGRID also supports an integrated posttranslational modification (PTM) viewer that incorporates more than 20,100 yeast phosphorylation sites curated through its sister database, the PhosphoGRID.
Collapse
|
12
|
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein interactions curated from the primary biomedical literature for all major model organism species and humans. As of September 2014, the BioGRID contains 749 912 interactions as drawn from 43 149 publications that represent 30 model organisms. This interaction count represents a 50% increase compared to our previous 2013 BioGRID update. BioGRID data are freely distributed through partner model organism databases and meta-databases and are directly downloadable in a variety of formats. In addition to general curation of the published literature for the major model species, BioGRID undertakes themed curation projects in areas of particular relevance for biomedical sciences, such as the ubiquitin-proteasome system and various human disease-associated interaction networks. BioGRID curation is coordinated through an Interaction Management System (IMS) that facilitates the compilation interaction records through structured evidence codes, phenotype ontologies, and gene annotation. The BioGRID architecture has been improved in order to support a broader range of interaction and post-translational modification types, to allow the representation of more complex multi-gene/protein interactions, to account for cellular phenotypes through structured ontologies, to expedite curation through semi-automated text-mining approaches, and to enhance curation quality control.
Collapse
|
13
|
RLIMS-P: an online text-mining tool for literature-based extraction of protein phosphorylation information. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau081. [PMID: 25122463 PMCID: PMC4131691 DOI: 10.1093/database/bau081] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Protein phosphorylation is central to the regulation of most aspects of cell function. Given its importance, it has been the subject of active research as well as the focus of curation in several biological databases. We have developed Rule-based Literature Mining System for protein Phosphorylation (RLIMS-P), an online text-mining tool to help curators identify biomedical research articles relevant to protein phosphorylation. The tool presents information on protein kinases, substrates and phosphorylation sites automatically extracted from the biomedical literature. The utility of the RLIMS-P Web site has been evaluated by curators from Phospho.ELM, PhosphoGRID/BioGrid and Protein Ontology as part of the BioCreative IV user interactive task (IAT). The system achieved F-scores of 0.76, 0.88 and 0.92 for the extraction of kinase, substrate and phosphorylation sites, respectively, and a precision of 0.88 in the retrieval of relevant phosphorylation literature. The system also received highly favorable feedback from the curators in a user survey. Based on the curators’ suggestions, the Web site has been enhanced to improve its usability. In the RLIMS-P Web site, phosphorylation information can be retrieved by PubMed IDs or keywords, with an option for selecting targeted species. The result page displays a sortable table with phosphorylation information. The text evidence page displays the abstract with color-coded entity mentions and includes links to UniProtKB entries via normalization, i.e. the linking of entity mentions to database identifiers, facilitated by the GenNorm tool and by the links to the bibliography in UniProt. Log in and editing capabilities are offered to any user interested in contributing to the validation of RLIMS-P results. Retrieved phosphorylation information can also be downloaded in CSV format and the text evidence in the BioC format. RLIMS-P is freely available. Database URL:http://www.proteininformationresource.org/rlimsp/
Collapse
|
14
|
The PhosphoGRID Saccharomyces cerevisiae protein phosphorylation site database: version 2.0 update. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat026. [PMID: 23674503 PMCID: PMC3653121 DOI: 10.1093/database/bat026] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
PhosphoGRID is an online database that curates and houses experimentally verified in vivo phosphorylation sites in the Saccharomyces cerevisiae proteome (www.phosphogrid.org). Phosphosites are annotated with specific protein kinases and/or phosphatases, along with the condition(s) under which the phosphorylation occurs and/or the effects on protein function. We report here an updated data set, including nine additional high-throughput (HTP) mass spectrometry studies. The version 2.0 data set contains information on 20 177 unique phosphorylated residues, representing a 4-fold increase from version 1.0, and includes 1614 unique phosphosites derived from focused low-throughput (LTP) studies. The overlap between HTP and LTP studies represents only ∼3% of the total unique sites, but importantly 45% of sites from LTP studies with defined function were discovered in at least two independent HTP studies. The majority of new phosphosites in this update occur on previously documented proteins, suggesting that coverage of phosphoproteins in the yeast proteome is approaching saturation. We will continue to update the PhosphoGRID data set, with the expectation that the integration of information from LTP and HTP studies will enable the development of predictive models of phosphorylation-based signaling networks. Database URL:http://www.phosphogrid.org/
Collapse
|
15
|
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: http//thebiogrid.org) is an open access archive of genetic and protein interactions that are curated from the primary biomedical literature for all major model organism species. As of September 2012, BioGRID houses more than 500 000 manually annotated interactions from more than 30 model organisms. BioGRID maintains complete curation coverage of the literature for the budding yeast Saccharomyces cerevisiae, the fission yeast Schizosaccharomyces pombe and the model plant Arabidopsis thaliana. A number of themed curation projects in areas of biomedical importance are also supported. BioGRID has established collaborations and/or shares data records for the annotation of interactions and phenotypes with most major model organism databases, including Saccharomyces Genome Database, PomBase, WormBase, FlyBase and The Arabidopsis Information Resource. BioGRID also actively engages with the text-mining community to benchmark and deploy automated tools to expedite curation workflows. BioGRID data are freely accessible through both a user-defined interactive interface and in batch downloads in a wide variety of formats, including PSI-MI2.5 and tab-delimited files. BioGRID records can also be interrogated and analyzed with a series of new bioinformatics tools, which include a post-translational modification viewer, a graphical viewer, a REST service and a Cytoscape plugin.
Collapse
|
16
|
Inferring protein function from homology using the Princeton Protein Orthology Database (P-POD). ACTA ACUST UNITED AC 2011; Chapter 6:Unit 6.11. [PMID: 21400696 DOI: 10.1002/0471250953.bi0611s33] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Inferring a protein's function by homology is a powerful tool for biologists. The Princeton Protein Orthology Database (P-POD) offers a simple way to visualize and analyze the relationships between homologous proteins in order to infer function. P-POD contains computationally generated analysis distinguishing orthologs from paralogs combined with curated published information on functional complementation and on human diseases. P-POD also features an applet, Notung, for users to explore and modify phylogenetic trees and generate their own ortholog/paralogs calls. This unit describes how to search P-POD for precomputed data, how to find and use the associated curated information from the literature, and how to use Notung to analyze and refine the results.
Collapse
|
17
|
A free NCRR resource for finding and using human disease models (65.46). THE JOURNAL OF IMMUNOLOGY 2011. [DOI: 10.4049/jimmunol.186.supp.65.46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Abstract
LAMHDI.org, a free web-based resource, helps researchers identify model systems to investigate disease mechanisms and therapies by bringing together information about diseases and model organisms. The LAMHDI portal allows search of disease models across species (non-human primates, zebrafish, mice, rats, flies, and yeast, with others in the pipeline) using gene orthology and pathway membership as key linkages. New work includes matching specific phenotypes and common pathways, a zebrafish atlas, and a graphical search based on spatial models of the brain. LAMHDI’s collaboration with BioGRID permits the exploration of networks linked to immune response, such as the NFkappaB and Interferon gamma signaling pathways. Interactions involving homologous members of these pathways provide a valuable resource for exploration across animal models. The goal is to facilitate the identification of models for disease research, make better use of existing model organisms and data about them, and provide the ability to discover new relationships between disease, phenotypes and genes that will further our understanding of disease. Companion efforts will speed the discovery and validation of novel drug candidates in areas as diverse as infectious disease, neuroscience, cardiovascular and metabolic disorders, autoimmunity, and cancer.
Collapse
|
18
|
Abstract
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions.
Collapse
|
19
|
|
20
|
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is a scientific database for the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. The information in SGD includes functional annotations, mapping and sequence information, protein domains and structure, expression data, mutant phenotypes, physical and genetic interactions and the primary literature from which these data are derived. Here we describe how published phenotypes and genetic interaction data are annotated and displayed in SGD.
Collapse
|
21
|
Abstract
The Biological General Repository for Interaction Datasets (BioGRID) database (http://www.thebiogrid.org) was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.
Collapse
|
22
|
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; http://www.ebi.ac.uk/GOA/). Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current.
Collapse
|
23
|
The Princeton Protein Orthology Database (P-POD): a comparative genomics analysis tool for biologists. PLoS One 2007; 2:e766. [PMID: 17712414 PMCID: PMC1942082 DOI: 10.1371/journal.pone.0000766] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2007] [Accepted: 07/18/2007] [Indexed: 02/07/2023] Open
Abstract
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.
Collapse
|
24
|
The
Saccharomyces
Genome Database provides comprehensive information about the biology of
S. cerevisiae
and tools for studies in comparative genomics. FASEB J 2007. [DOI: 10.1096/fasebj.21.5.a264-c] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
25
|
Abstract
The recent explosion in protein data generated from both directed small-scale studies and large-scale proteomics efforts has greatly expanded the quantity of available protein information and has prompted the Saccharomyces Genome Database (SGD; ) to enhance the depth and accessibility of protein annotations. In particular, we have expanded ongoing efforts to improve the integration of experimental information and sequence-based predictions and have redesigned the protein information web pages. A key feature of this redesign is the development of a GBrowse-derived interactive Proteome Browser customized to improve the visualization of sequence-based protein information. This Proteome Browser has enabled SGD to unify the display of hidden Markov model (HMM) domains, protein family HMMs, motifs, transmembrane regions, signal peptides, hydropathy plots and profile hits using several popular prediction algorithms. In addition, a physico-chemical properties page has been introduced to provide easy access to basic protein information. Improvements to the layout of the Protein Information page and integration of the Proteome Browser will facilitate the ongoing expansion of sequence-specific experimental information captured in SGD, including post-translational modifications and other user-defined annotations. Finally, SGD continues to improve upon the availability of genetic and physical interaction data in an ongoing collaboration with BioGRID by providing direct access to more than 82 000 manually-curated interactions.
Collapse
|
26
|
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae. J Biol 2006; 5:11. [PMID: 16762047 PMCID: PMC1561585 DOI: 10.1186/jbiol36] [Citation(s) in RCA: 257] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2005] [Revised: 03/17/2006] [Accepted: 03/30/2006] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND The study of complex biological networks and prediction of gene function has been enabled by high-throughput (HTP) methods for detection of genetic and protein interactions. Sparse coverage in HTP datasets may, however, distort network properties and confound predictions. Although a vast number of well substantiated interactions are recorded in the scientific literature, these data have not yet been distilled into networks that enable system-level inference. RESULTS We describe here a comprehensive database of genetic and protein interactions, and associated experimental evidence, for the budding yeast Saccharomyces cerevisiae, as manually curated from over 31,793 abstracts and online publications. This literature-curated (LC) dataset contains 33,311 interactions, on the order of all extant HTP datasets combined. Surprisingly, HTP protein-interaction datasets currently achieve only around 14% coverage of the interactions in the literature. The LC network nevertheless shares attributes with HTP networks, including scale-free connectivity and correlations between interactions, abundance, localization, and expression. We find that essential genes or proteins are enriched for interactions with other essential genes or proteins, suggesting that the global network may be functionally unified. This interconnectivity is supported by a substantial overlap of protein and genetic interactions in the LC dataset. We show that the LC dataset considerably improves the predictive power of network-analysis approaches. The full LC dataset is available at the BioGRID (http://www.thebiogrid.org) and SGD (http://www.yeastgenome.org/) databases. CONCLUSION Comprehensive datasets of biological interactions derived from the primary literature provide critical benchmarks for HTP methods, augment functional prediction, and reveal system-level attributes of biological networks.
Collapse
|
27
|
Genome Snapshot: a new resource at the Saccharomyces Genome Database (SGD) presenting an overview of the Saccharomyces cerevisiae genome. Nucleic Acids Res 2006; 34:D442-5. [PMID: 16381907 PMCID: PMC1347479 DOI: 10.1093/nar/gkj117] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Sequencing and annotation of the entire Saccharomyces cerevisiae genome has made it possible to gain a genome-wide perspective on yeast genes and gene products. To make this information available on an ongoing basis, the Saccharomyces Genome Database (SGD) () has created the Genome Snapshot (). The Genome Snapshot summarizes the current state of knowledge about the genes and chromosomal features of S.cerevisiae. The information is organized into two categories: (i) number of each type of chromosomal feature annotated in the genome and (ii) number and distribution of genes annotated to Gene Ontology terms. Detailed lists are accessible through SGD's Advanced Search tool (), and all the data presented on this page are available from the SGD ftp site ().
Collapse
|
28
|
Characterization of E3Histone, a novel testis ubiquitin protein ligase which ubiquitinates histones. Mol Cell Biol 2005; 25:2819-31. [PMID: 15767685 PMCID: PMC1061639 DOI: 10.1128/mcb.25.7.2819-2831.2005] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
During spermatogenesis, a large fraction of cellular proteins is degraded as the spermatids evolve to their elongated mature forms. In particular, histones must be degraded in early elongating spermatids to permit chromatin condensation. Our laboratory previously demonstrated the activation of ubiquitin conjugation during spermatogenesis. This activation is dependent on the ubiquitin-conjugating enzyme (E2) UBC4, and a testis-particular isoform, UBC4-testis, is induced when histones are degraded. Therefore, we tested whether there are UBC4-dependent ubiquitin protein ligases (E3s) that can ubiquitinate histones. Indeed, a novel enzyme, E3Histone, which could conjugate ubiquitin to histones H1, H2A, H2B, H3, and H4 in vitro, was found. Only the UBC4/UBC5 family of E2s supported E3Histone-dependent ubiquitination of histone H2A, and of this family, UBC4-1 and UBC4-testis are the preferred E2s. We purified this ligase activity 3,600-fold to near homogeneity. Mass spectrometry of the final material revealed the presence of a 482-kDa HECT domain-containing protein, which was previously named LASU1. Anti-LASU1 antibodies immunodepleted E3Histone activity. Mass spectrometry and size analysis by gel filtration and glycerol gradient centrifugation suggested that E3Histone is a monomer of LASU1. Our assays also show that this enzyme is the major UBC4-1-dependent histone-ubiquitinating E3. E3Histone is therefore a HECT domain E3 that likely plays an important role in the chromatin condensation that occurs during spermatid maturation.
Collapse
|
29
|
Fungal BLAST and Model Organism BLASTP Best Hits: new comparison resources at the Saccharomyces Genome Database (SGD). Nucleic Acids Res 2005; 33:D374-7. [PMID: 15608219 PMCID: PMC539977 DOI: 10.1093/nar/gki023] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is a scientific database of gene, protein and genomic information for the yeast Saccharomyces cerevisiae. SGD has recently developed two new resources that facilitate nucleotide and protein sequence comparisons between S.cerevisiae and other organisms. The Fungal BLAST tool provides directed searches against all fungal nucleotide and protein sequences available from GenBank, divided into categories according to organism, status of completeness and annotation, and source. The Model Organism BLASTP Best Hits resource displays, for each S.cerevisiae protein, the single most similar protein from several model organisms and presents links to the database pages of those proteins, facilitating access to curated information about potential orthologs of yeast proteins.
Collapse
|
30
|
Abstract
UNLABELLED TargetDB is a centralized target registration database that includes protein target data from the NIH structural genomics centers and a number of international sites. TargetDB, which is hosted by the Protein Data Bank (RCSB PDB), provides status information on target sequences and tracks their progress through the various stages of protein production and structure determination. A simple search form permits queries based on contributing site, target ID, protein name, sequence, status and other data. The progress of individual targets or entire structural genomics projects may be tracked over time, and target data from all contributing centers may also be downloaded in the XML format. AVAILABILITY TargetDB is available at http://targetdb.pdb.org/
Collapse
|
31
|
Abstract
UNLABELLED Ligand Depot is an integrated data resource for finding information about small molecules bound to proteins and nucleic acids. The initial release (version 1.0, November, 2003) focuses on providing chemical and structural information for small molecules found as part of the structures deposited in the Protein Data Bank. Ligand Depot accepts keyword-based queries and also provides a graphical interface for performing chemical substructure searches. A wide variety of web resources that contain information on small molecules may also be accessed through Ligand Depot. AVAILABILITY Ligand Depot is available at http://ligand-depot.rutgers.edu/. Version 1.0 supports multiple operating systems including Windows, Unix, Linux and the Macintosh operating system. The current drawing tool works in Internet Explorer, Netscape and Mozilla on Windows, Unix and Linux.
Collapse
|
32
|
Characterization of rat100, a 300-kilodalton ubiquitin-protein ligase induced in germ cells of the rat testis and similar to the Drosophila hyperplastic discs gene. Endocrinology 2002; 143:3740-7. [PMID: 12239083 DOI: 10.1210/en.2002-220262] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Conjugation of ubiquitin to proteins is activated during spermatogenesis. Ubiquitination is mediated by ubiquitin-activating enzyme (E1), ubiquitin-conjugating enzymes (UBCs or E2s), and ubiquitin protein ligases (E3s). Since we previously showed that the activated ubiquitination is UBC4 dependent, we characterized Rat100, a UBC4-dependent E3 expressed in the testis. Analysis of expressed sequence tag sequences and immunoblotting showed that Rat100 is actually a 300-kDa protein expressed mainly in the brain and testis and is similar to the human E3 identified by differential display (EDD) protein and the Drosophila hyperplastic discs gene, mutants of which cause a defect in spermatogenesis. Rat100 is induced during postnatal development of the rat testis, peaking at d 25. It is localized only in germ cells and is highly expressed in spermatocytes, moderately in round and slightly in elongating spermatids. In contrast to UBC4 whose removal from a testis extract abrogates much of the conjugation activity, immmunodepletion of Rat100 from the extracts had little effect. Rat100 therefore has a limited subset of substrates, some of which appear associated with the E3 as the immunoprecipitate containing Rat100 supported incorporation of (125)I-ubiquitin into high molecular weight proteins. Thus, Rat100 is the homolog of human EDD and likely of Drosophila hyperplastic discs. This homology, together with our results, suggests that induction of this E3 results in ubiquitination of specific substrates, some of which are important in male germ cell development.
Collapse
|
33
|
Identification of amino acid residues in a class I ubiquitin-conjugating enzyme involved in determining specificity of conjugation of ubiquitin to proteins. J Biol Chem 1998; 273:18435-42. [PMID: 9660812 DOI: 10.1074/jbc.273.29.18435] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The ubiquitin pathway is a major system for selective proteolysis in eukaryotes. However, the mechanisms underlying substrate selectivity by the ubiquitin system remain unclear. We previously identified isoforms of a rat ubiquitin-conjugating enzyme (E2) homologous to the Saccharomyces cerevisiae class I E2 genes, UBC4/UBC5. Two isoforms, although 93% identical, show distinct features. UBC4-1 is expressed ubiquitously, whereas UBC4-testis is expressed in spermatids. Interestingly, although these isoforms interacted similarly with some ubiquitin-protein ligases (E3s) such as E6-AP and rat p100 and an E3 that conjugates ubiquitin to histone H2A, they also supported conjugation of ubiquitin to distinct subsets of testis proteins. UBC4-1 showed an 11-fold greater ability to support conjugation of ubiquitin to endogenous substrates present in a testis nuclear fraction. Site-directed mutagenesis of the UBC4-testis isoform was undertaken to identify regions of the molecule responsible for the observed difference in substrate specificity. Four residues (Gln-15, Ala-49, Ser-107, and Gln-125) scattered on surfaces away from the active site appeared necessary and sufficient for UBC4-1-like conjugation. These four residues identify a large surface of the E2 core domain that may represent an area of binding to E3s or substrates. These findings demonstrate that a limited number of amino acid substitutions in E2s can dictate conjugation of ubiquitin to different proteins and indicate a mechanism by which small E2 molecules can encode a wide range of substrate specificities.
Collapse
|