1
|
Leonidou N, Renz A, Winnerling B, Grekova A, Grein F, Dräger A. Genome-scale metabolic model of Staphylococcus epidermidis ATCC 12228 matches in vitro conditions. mSystems 2025:e0041825. [PMID: 40396730 DOI: 10.1128/msystems.00418-25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2025] [Accepted: 04/15/2025] [Indexed: 05/22/2025] Open
Abstract
Staphylococcus epidermidis, a commensal bacterium inhabiting collagen-rich areas like human skin, has gained significance due to its probiotic potential in the nasal microbiome and as a leading cause of nosocomial infections. While infrequently leading to severe illnesses, S. epidermidis exerts a significant influence, particularly in its close association with implant-related infections and its role as a classic opportunistic biofilm former. Understanding its opportunistic nature is crucial for developing novel therapeutic strategies, addressing both its beneficial and pathogenic aspects, and alleviating the burdens it imposes on patients and healthcare systems. Here, we employ genome-scale metabolic modeling as a powerful tool to elucidate the metabolic capabilities of S. epidermidis. We created a comprehensive computational resource for understanding the organism's growth conditions within diverse habitats by reconstructing and analyzing a manually curated and experimentally validated metabolic model. The final network, iSep23, incorporates 1,415 reactions, 1,051 metabolites, and 705 genes, adhering to established community standards and modeling guidelines. Benchmarking with the Metabolic Model Testing suite yields a high score, indicating the model's remarkable semantic quality. Following the findable, accessible, interoperable, and reusable (FAIR) data principles, iSep23 becomes a valuable and publicly accessible asset for subsequent studies. Growth simulations and carbon source utilization predictions align with experimental results, showcasing the model's predictive power. Ultimately, this work provides a robust foundation for future research aimed at both exploiting the probiotic potential and mitigating the pathogenic risks posed by S. epidermidis. IMPORTANCE Staphylococcus epidermidis, a bacterium commonly found on human skin, has shown probiotic effects in the nasal microbiome and is a notable causative agent of hospital-acquired infections. While these infections are typically non-life-threatening, their economic impact is considerable, with annual costs reaching billions of dollars in the United States. To better understand its opportunistic nature, we employed genome-scale metabolic modeling to construct a detailed network of S. epidermidis's metabolic capabilities. This model, comprising over a thousand reactions, metabolites, and genes, adheres to established standards and demonstrates solid benchmarking performance. Following the findable, accessible, interoperable, and reusable (FAIR) data principles, the model provides a valuable resource for future research. Growth simulations and predictions closely match experimental data, underscoring the model's predictive accuracy. Overall, this work lays a solid foundation for future studies aimed at leveraging the beneficial properties of S. epidermidis while mitigating its pathogenic potential.
Collapse
Affiliation(s)
- Nantia Leonidou
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard Karl University of Tübingen, Tübingen, Germany
- Department of Computer Science, Eberhard Karl University of Tübingen, Tübingen, Germany
- German Center for Infection Research (DZIF), Tübingen, Germany
- Quantitative Biology Center (QBiC), Eberhard Karl University of Tübingen, Tübingen, Germany
- Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), Heidelberg, Baden-Württemberg, Germany
| | - Alina Renz
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard Karl University of Tübingen, Tübingen, Germany
- Department of Computer Science, Eberhard Karl University of Tübingen, Tübingen, Germany
| | - Benjamin Winnerling
- Institute for Pharmaceutical Microbiology, University of Bonn, Bonn, North Rhine-Westphalia, Germany
- German Center for Infection Research (DZIF), Bonn, Germany
| | - Anastasiia Grekova
- Structural and Computational Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Baden-Württemberg, Germany
| | - Fabian Grein
- Institute for Pharmaceutical Microbiology, University of Bonn, Bonn, North Rhine-Westphalia, Germany
- German Center for Infection Research (DZIF), Bonn, Germany
| | - Andreas Dräger
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard Karl University of Tübingen, Tübingen, Germany
- German Center for Infection Research (DZIF), Tübingen, Germany
- Quantitative Biology Center (QBiC), Eberhard Karl University of Tübingen, Tübingen, Germany
- Data Analytics and Bioinformatics, Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| |
Collapse
|
2
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
3
|
Baron JA, Johnson CSB, Schor MA, Olley D, Nickel L, Felix V, Munro J, Bello S, Bearer C, Lichenstein R, Bisordi K, Koka R, Greene C, Schriml L. The DO-KB Knowledgebase: a 20-year journey developing the disease open science ecosystem. Nucleic Acids Res 2024; 52:D1305-D1314. [PMID: 37953304 PMCID: PMC10767934 DOI: 10.1093/nar/gkad1051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
In 2003, the Human Disease Ontology (DO, https://disease-ontology.org/) was established at Northwestern University. In the intervening 20 years, the DO has expanded to become a highly-utilized disease knowledge resource. Serving as the nomenclature and classification standard for human diseases, the DO provides a stable, etiology-based structure integrating mechanistic drivers of human disease. Over the past two decades the DO has grown from a collection of clinical vocabularies, into an expertly curated semantic resource of over 11300 common and rare diseases linking disease concepts through more than 37000 vocabulary cross mappings (v2023-08-08). Here, we introduce the recently launched DO Knowledgebase (DO-KB), which expands the DO's representation of the diseaseome and enhances the findability, accessibility, interoperability and reusability (FAIR) of disease data through a new SPARQL service and new Faceted Search Interface. The DO-KB is an integrated data system, built upon the DO's semantic disease knowledge backbone, with resources that expose and connect the DO's semantic knowledge with disease-related data across Open Linked Data resources. This update includes descriptions of efforts to assess the DO's global impact and improvements to data quality and content, with emphasis on changes in the last two years.
Collapse
Affiliation(s)
- J Allen Baron
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | | | - Michael A Schor
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Dustin Olley
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Lance Nickel
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Victor Felix
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - James B Munro
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
- Animal and Plant Health Inspection Service, Plant Protection and Quarantine, USDA, USA
| | - Susan M Bello
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | | | | | | | - Rima Koka
- University of Maryland School of Medicine, Baltimore, MD, USA
| | - Carol Greene
- University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| |
Collapse
|
4
|
Putman TE, Schaper K, Matentzoglu N, Rubinetti V, Alquaddoomi F, Cox C, Caufield JH, Elsarboukh G, Gehrke S, Hegde H, Reese J, Braun I, Bruskiewich R, Cappelletti L, Carbon S, Caron A, Chan L, Chute C, Cortes K, De Souza V, Fontana T, Harris N, Hartley E, Hurwitz E, Jacobsen JB, Krishnamurthy M, Laraway B, McLaughlin J, McMurry J, Moxon ST, Mullen K, O’Neil S, Shefchek K, Stefancsik R, Toro S, Vasilevsky N, Walls R, Whetzel P, Osumi-Sutherland D, Smedley D, Robinson P, Mungall C, Haendel M, Munoz-Torres M. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res 2024; 52:D938-D949. [PMID: 38000386 PMCID: PMC10767791 DOI: 10.1093/nar/gkad1082] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/21/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Collapse
Affiliation(s)
- Tim E Putman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kevin Schaper
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Vincent P Rubinetti
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Faisal S Alquaddoomi
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Corey Cox
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Glass Elsarboukh
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sarah Gehrke
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Justin T Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ian Braun
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | | | | | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Katherina G Cortes
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Tommaso Fontana
- Dipartimento di Informatica, Università degli Studi di Milano Statale, Milano, Italy
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Emily L Hartley
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Eric Hurwitz
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Madan Krishnamurthy
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Bryan J Laraway
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Julie A McMurry
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sierra A T Moxon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kathleen R Mullen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Shawn T O’Neil
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kent A Shefchek
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Sabrina Toro
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Ramona L Walls
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Patricia L Whetzel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 6032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
5
|
Prakash SJ, Van Auken KM, Hill DP, Sternberg PW. Semantic representation of neural circuit knowledge in Caenorhabditis elegans. Brain Inform 2023; 10:30. [PMID: 37947958 PMCID: PMC10638142 DOI: 10.1186/s40708-023-00208-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/22/2023] [Indexed: 11/12/2023] Open
Abstract
In modern biology, new knowledge is generated quickly, making it challenging for researchers to efficiently acquire and synthesise new information from the large volume of primary publications. To address this problem, computational approaches that generate machine-readable representations of scientific findings in the form of knowledge graphs have been developed. These representations can integrate different types of experimental data from multiple papers and biological knowledge bases in a unifying data model, providing a complementary method to manual review for interacting with published knowledge. The Gene Ontology Consortium (GOC) has created a semantic modelling framework that extends individual functional gene annotations to structured descriptions of causal networks representing biological processes (Gene Ontology-Causal Activity Modelling, or GO-CAM). In this study, we explored whether the GO-CAM framework could represent knowledge of the causal relationships between environmental inputs, neural circuits and behavior in the model nematode C. elegans [C. elegans Neural-Circuit Causal Activity Modelling (CeN-CAM)]. We found that, given extensions to several relevant ontologies, a wide variety of author statements from the literature about the neural circuit basis of egg-laying and carbon dioxide (CO2) avoidance behaviors could be faithfully represented with CeN-CAM. Through this process, we were able to generate generic data models for several categories of experimental results. We also discuss how semantic modelling may be used to functionally annotate the C. elegans connectome. Thus, Gene Ontology-based semantic modelling has the potential to support various machine-readable representations of neurobiological knowledge.
Collapse
Affiliation(s)
- Sharan J Prakash
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - Kimberly M Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA
| | - David P Hill
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
| |
Collapse
|
6
|
Prakash SJ, Van Auken KM, Hill DP, Sternberg PW. Semantic Representation of Neural Circuit Knowledge in Caenorhabditis elegans. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.28.538760. [PMID: 37162850 PMCID: PMC10168330 DOI: 10.1101/2023.04.28.538760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
In modern biology, new knowledge is generated quickly, making it challenging for researchers to efficiently acquire and synthesise new information from the large volume of primary publications. To address this problem, computational approaches that generate machine-readable representations of scientific findings in the form of knowledge graphs have been developed. These representations can integrate different types of experimental data from multiple papers and biological knowledge bases in a unifying data model, providing a complementary method to manual review for interacting with published knowledge. The Gene Ontology Consortium (GOC) has created a semantic modelling framework that extends individual functional gene annotations to structured descriptions of causal networks representing biological processes (Gene Ontology Causal Activity Modelling, or GO-CAM). In this study, we explored whether the GO-CAM framework could represent knowledge of the causal relationships between environmental inputs, neural circuits and behavior in the model nematode C. elegans (C. elegans Neural Circuit Causal Activity Modelling (CeN-CAM)). We found that, given extensions to several relevant ontologies, a wide variety of author statements from the literature about the neural circuit basis of egg-laying and carbon dioxide (CO2) avoidance behaviors could be faithfully represented with CeN-CAM. Through this process, we were able to generate generic data models for several categories of experimental results. We also discuss how semantic modelling may be used to functionally annotate the C. elegans connectome. Thus, Gene Ontology-based semantic modelling has the potential to support various machine-readable representations of neurobiological knowledge.
Collapse
Affiliation(s)
- Sharan J Prakash
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Kimberly M Van Auken
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - David P Hill
- 2. The Jackson Laboratory, Bar Harbor, ME, 04609 USA
| | - Paul W Sternberg
- 1. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| |
Collapse
|
7
|
Mészáros B, Hatos A, Palopoli N, Quaglia F, Salladini E, Van Roey K, Arthanari H, Dosztányi Z, Felli IC, Fischer PD, Hoch JC, Jeffries CM, Longhi S, Maiani E, Orchard S, Pancsa R, Papaleo E, Pierattelli R, Piovesan D, Pritisanac I, Tenorio L, Viennet T, Tompa P, Vranken W, Tosatto SCE, Davey NE. Minimum information guidelines for experiments structurally characterizing intrinsically disordered protein regions. Nat Methods 2023; 20:1291-1303. [PMID: 37400558 DOI: 10.1038/s41592-023-01915-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023]
Abstract
An unambiguous description of an experiment, and the subsequent biological observation, is vital for accurate data interpretation. Minimum information guidelines define the fundamental complement of data that can support an unambiguous conclusion based on experimental observations. We present the Minimum Information About Disorder Experiments (MIADE) guidelines to define the parameters required for the wider scientific community to understand the findings of an experiment studying the structural properties of intrinsically disordered regions (IDRs). MIADE guidelines provide recommendations for data producers to describe the results of their experiments at source, for curators to annotate experimental data to community resources and for database developers maintaining community resources to disseminate the data. The MIADE guidelines will improve the interpretability of experimental results for data consumers, facilitate direct data submission, simplify data curation, improve data exchange among repositories and standardize the dissemination of the key metadata on an IDR experiment by IDR data sources.
Collapse
Affiliation(s)
- Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Department of Structural Biology and Center for Data Driven Discovery, St Jude Children's Research Hospital, Memphis, TN, USA
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Department of Oncology, Lausanne University Hospital, Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Swiss Cancer Center Leman, Lausanne, Switzerland
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires, Argentina
| | - Federica Quaglia
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Kim Van Roey
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Haribabu Arthanari
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | | | - Isabella C Felli
- Department of Chemistry 'Ugo Schiff' and Magnetic Resonance Center, University of Florence, Sesto Fiorentino (Florence), Italy
| | - Patrick D Fischer
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | - Jeffrey C Hoch
- Department of Molecular Biology and Biophysics, UConn Health, Farmington, CT, USA
| | - Cy M Jeffries
- European Molecular Biology Laboratory (EMBL), Hamburg Unit, c/o Deutsches Elektronen-Synchrotron, Hamburg, Germany
| | - Sonia Longhi
- Laboratory Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), Marseille, France
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, Copenhagen, Denmark
- UniCamillus - Saint Camillus International University of Health and Medical Sciences, Rome, Italy
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, UK
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, Copenhagen, Denmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, Lyngby, Denmark
| | - Roberta Pierattelli
- Department of Chemistry 'Ugo Schiff' and Magnetic Resonance Center, University of Florence, Sesto Fiorentino (Florence), Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Iva Pritisanac
- Hospital for Sick Children, Toronto, Ontario, Canada
- Medical University of Graz, Graz, Austria
| | - Luiggi Tenorio
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Thibault Viennet
- Harvard Medical School (HMS), Boston, MA, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute (DFCI), Boston, MA, USA
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
- VIB-VUB Center for Structural Biology, Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | | | - Norman E Davey
- Division Of Cancer Biology, Institute of Cancer Research, Chester Beatty Laboratories, Chelsea, London, UK.
| |
Collapse
|
8
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
9
|
Cuzick A, Seager J, Wood V, Urban M, Rutherford K, Hammond-Kosack KE. A framework for community curation of interspecies interactions literature. eLife 2023; 12:e84658. [PMID: 37401199 DOI: 10.7554/elife.84658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023] Open
Abstract
The quantity and complexity of data being generated and published in biology has increased substantially, but few methods exist for capturing knowledge about phenotypes derived from molecular interactions between diverse groups of species, in such a way that is amenable to data-driven biology and research. To improve access to this knowledge, we have constructed a framework for the curation of the scientific literature studying interspecies interactions, using data curated for the Pathogen-Host Interactions database (PHI-base) as a case study. The framework provides a curation tool, phenotype ontology, and controlled vocabularies to curate pathogen-host interaction data, at the level of the host, pathogen, strain, gene, and genotype. The concept of a multispecies genotype, the 'metagenotype,' is introduced to facilitate capturing changes in the disease-causing abilities of pathogens, and host resistance or susceptibility, observed by gene alterations. We report on this framework and describe PHI-Canto, a community curation tool for use by publication authors.
Collapse
Affiliation(s)
- Alayne Cuzick
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - James Seager
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Martin Urban
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Kim Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Kim E Hammond-Kosack
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| |
Collapse
|
10
|
Leng J, Xing Z, Li X, Bao X, Zhu J, Zhao Y, Wu S, Yang J. Assessment of Diagnosis, Prognosis and Immune Infiltration Response to the Expression of the Ferroptosis-Related Molecule HAMP in Clear Cell Renal Cell Carcinoma. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:913. [PMID: 36673667 PMCID: PMC9858726 DOI: 10.3390/ijerph20020913] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/20/2022] [Accepted: 12/31/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Hepcidin antimicrobial peptide (HAMP) is a key factor in maintaining iron metabolism, which may induce ferroptosis when upregulated. However, its prognostic value and relation to immune infiltrating cells remains unclear. METHODS This study analyzed the expression levels of HAMP in the Oncomine, Timer and Ualcan databases, and examined its prognostic potential in KIRC with R programming. The Timer and GEPIA databases were used to estimate the correlations between HAMP and immune infiltration and the markers of immune cells. The intersection genes and the co-expression PPI network were constructed via STRING, R programming and GeneMANIA, and the hub genes were selected with Cytoscape. In addition, we analyzed the gene set enrichment and GO/KEGG pathways by GSEA. RESULTS Our study revealed higher HAMP expression levels in tumor tissues including KIRC, which were related to poor prognosis in terms of OS, DSS and PFI. The expression of HAMP was positively related to the immune infiltration level of macrophages, Tregs, etc., corresponding with the immune biomarkers. Based on the intersection genes, we constructed the PPI network and used the 10 top hub genes. Further, we performed a pathway enrichment analysis of the gene sets, including Huntington's disease, the JAK-STAT signaling pathway, ammonium ion metabolic process, and so on. CONCLUSION In summary, our study gave an insight into the potential prognosis of HAMP, which may act as a diagnostic biomarker and therapeutic target related to immune infiltration in KIRC.
Collapse
Affiliation(s)
- Jing Leng
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Zixuan Xing
- Department of Infectious Diseases, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Xiang Li
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Xinyue Bao
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Junzheya Zhu
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Yunhan Zhao
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Shaobo Wu
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| | - Jiao Yang
- Department of Medical Oncology, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
| |
Collapse
|
11
|
Shevkoplyas D, Vuu YM, Davie JR, Rastegar M. The Chromatin Structure at the MECP2 Gene and In Silico Prediction of Potential Coding and Non-Coding MECP2 Splice Variants. Int J Mol Sci 2022; 23:ijms232415643. [PMID: 36555295 PMCID: PMC9779294 DOI: 10.3390/ijms232415643] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/30/2022] [Accepted: 12/05/2022] [Indexed: 12/14/2022] Open
Abstract
Methyl CpG binding protein 2 (MeCP2) is an epigenetic reader that binds to methylated CpG dinucleotides and regulates gene transcription. Mecp2/MECP2 gene has 4 exons, encoding for protein isoforms MeCP2E1 and MeCP2E2. MeCP2 plays key roles in neurodevelopment, therefore, its gain- and loss-of-function mutations lead to neurodevelopmental disorders including Rett Syndrome. Here, we describe the structure, functional domains, and evidence support for potential additional alternatively spliced MECP2 transcripts and protein isoforms. We conclude that NCBI MeCP2 isoforms 3 and 4 contain certain MeCP2 functional domains. Our in silico analysis led to identification of histone modification and accessibility profiles at the MECP2 gene and its cis-regulatory elements. We conclude that the human MECP2 gene associated histone post-translational modifications exhibit high similarity between males and females. Between brain regions, histone modifications were found to be less conserved and enriched within larger genomic segments named as "S1-S11". We also identified highly conserved DNA accessibility regions in different tissues and brain regions, named as "A1-A9" and "B1-B9". DNA methylation profile was similar between mid-frontal gyrus of donors 35 days-25 years of age. Based on ATAC-seq data, the identified hypomethylated regions "H1-H8" intersected with most regions of the accessible chromatin (A regions).
Collapse
|
12
|
Agapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, et alAgapite J, Albou LP, Aleksander SA, Alexander M, Anagnostopoulos AV, Antonazzo G, Argasinska J, Arnaboldi V, Attrill H, Becerra A, Bello SM, Blake JA, Blodgett O, Bradford YM, Bult CJ, Cain S, Calvi BR, Carbon S, Chan J, Chen WJ, Michael Cherry J, Cho J, Christie KR, Crosby MA, Davis P, da Veiga Beltrame E, De Pons JL, D’Eustachio P, Diamantakis S, Dolan ME, dos Santos G, Douglass E, Dunn B, Eagle A, Ebert D, Engel SR, Fashena D, Foley S, Frazer K, Gao S, Gibson AC, Gondwe F, Goodman J, Sian Gramates L, Grove CA, Hale P, Harris T, Thomas Hayman G, Hill DP, Howe DG, Howe KL, Hu Y, Jha S, Kadin JA, Kaufman TC, Kalita P, Karra K, Kishore R, Kwitek AE, Laulederkind SJF, Lee R, Longden I, Luypaert M, MacPherson KA, Martin R, Marygold SJ, Matthews B, McAndrews MS, Millburn G, Miyasato S, Motenko H, Moxon S, Muller HM, Mungall CJ, Muruganujan A, Mushayahama T, Nalabolu HS, Nash RS, Ng P, Nuin P, Paddock H, Paulini M, Perrimon N, Pich C, Quinton-Tulloch M, Raciti D, Ramachandran S, Richardson JE, Gelbart SR, Ruzicka L, Schaper K, Schindelman G, Shimoyama M, Simison M, Shaw DR, Shrivatsav A, Singer A, Skrzypek M, Smith CM, Smith CL, Smith JR, Stein L, Sternberg PW, Tabone CJ, Thomas PD, Thorat K, Thota J, Toro S, Tomczuk M, Trovisco V, Tutaj MA, Tutaj M, Urbano JM, Van Auken K, Van Slyke CE, Wang Q, Wang SJ, Weng S, Westerfield M, Williams G, Wilming LG, Wong ED, Wright A, Yook K, Zarowiecki M, Zhou P, Zytkovicz M. Harmonizing model organism data in the Alliance of Genome Resources. Genetics 2022; 220:iyac022. [PMID: 35380658 PMCID: PMC8982023 DOI: 10.1093/genetics/iyac022] [Show More Authors] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/26/2022] [Indexed: 02/06/2023] Open
Abstract
The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides.
Collapse
|
13
|
Quaglia F, Mészáros B, Salladini E, Hatos A, Pancsa R, Chemes LB, Pajkos M, Lazar T, Peña-Díaz S, Santos J, Ács V, Farahi N, Fichó E, Aspromonte M, Bassot C, Chasapi A, Davey N, Davidović R, Dobson L, Elofsson A, Erdős G, Gaudet P, Giglio M, Glavina J, Iserte J, Iglesias V, Kálmán Z, Lambrughi M, Leonardi E, Longhi S, Macedo-Ribeiro S, Maiani E, Marchetti J, Marino-Buslje C, Mészáros A, Monzon A, Minervini G, Nadendla S, Nilsson JF, Novotný M, Ouzounis C, Palopoli N, Papaleo E, Pereira P, Pozzati G, Promponas V, Pujols J, Rocha AS, Salas M, Sawicki LR, Schad E, Shenoy A, Szaniszló T, Tsirigos K, Veljkovic N, Parisi G, Ventura S, Dosztányi Z, Tompa P, Tosatto SCE, Piovesan D. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res 2022; 50:D480-D487. [PMID: 34850135 PMCID: PMC8728214 DOI: 10.1093/nar/gkab1082] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/15/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.
Collapse
Affiliation(s)
- Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - András Hatos
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Lucía B Chemes
- Instituto de Investigaciones Biotecnológicas (IIBiO-CONICET), Universidad Nacional de San Martín, Av. 25 de Mayo y Francia, CP1650 Buenos Aires, Argentina
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Samuel Peña-Díaz
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Jaime Santos
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Veronika Ács
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Nazanin Farahi
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Erzsébet Fichó
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
- Cytocast Kft., Vecsés, Hungary
| | - Maria Cristina Aspromonte
- Department of Woman and Child Health, University of Padova, Padova, Italy
- Pediatric Research Institute, Città della Speranza, Padova, Italy
| | - Claudio Bassot
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Anastasia Chasapi
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thermi, Thessalonica 57001, Greece
| | - Norman E Davey
- Institute of Cancer Research, Chester Beatty Laboratories, 237 Fulham Rd, Chelsea, London, UK
| | - Radoslav Davidović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, 11000Belgrade, Serbia
| | - Laszlo Dobson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Arne Elofsson
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Pascale Gaudet
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine 670 W. Baltimore St., Baltimore, MD 21201, USA
| | - Juliana Glavina
- Instituto de Investigaciones Biotecnológicas (IIBiO-CONICET), Universidad Nacional de San Martín, Av. 25 de Mayo y Francia, CP1650 Buenos Aires, Argentina
| | - Javier Iserte
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, C1405BWE, Argentina
| | - Valentín Iglesias
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Zsófia Kálmán
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Práter u. 50/A, 1083 Budapest, Hungary
| | - Matteo Lambrughi
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Emanuela Leonardi
- Department of Woman and Child Health, University of Padova, Padova, Italy
- Pediatric Research Institute, Città della Speranza, Padova, Italy
| | - Sonia Longhi
- Lab. Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), 163 Avenue de Luminy, Case 932, 13288, Marseille, France
| | - Sandra Macedo-Ribeiro
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, 4200-135 Porto, Portugal
| | - Emiliano Maiani
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | | | - Attila Mészáros
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | | | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine 670 W. Baltimore St., Baltimore, MD 21201, USA
| | - Juliet F Nilsson
- Lab. Architecture et Fonction des Macromolécules Biologiques (AFMB), UMR 7257, Aix Marseille University and Centre National de la Recherche Scientifique (CNRS), 163 Avenue de Luminy, Case 932, 13288, Marseille, France
| | - Marian Novotný
- Dep. of Cell Biology, Faculty of Science, Vinicna 7, 128 43, Prague, Czech Republic
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thermi, Thessalonica 57001, Greece
- Biological Computation & Computational Biology Group, Artificial Intelligence & Information Analysis Lab, Department of Computer Science, Aristotle University of Thessalonica, Thessalonica 54124, Greece
| | - Nicolás Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Elena Papaleo
- Cancer Structural Biology, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, Lyngby, Denmark
| | - Pedro José Barbosa Pereira
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (i3S), Universidade do Porto, 4200-135 Porto, Portugal
| | - Gabriele Pozzati
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Jordi Pujols
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Martin Salas
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Luciana Rodriguez Sawicki
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Eva Schad
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
| | - Aditi Shenoy
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Tamás Szaniszló
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Konstantinos D Tsirigos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, 11000Belgrade, Serbia
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes - CONICET, Bernal, Buenos Aires B1876BXD, Argentina
| | - Salvador Ventura
- Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, Barcelona, Spain
- Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Barcelona, Spain
- ICREA, Barcelona, Spain
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Peter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest 1117, Hungary
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnology, Brussels, Belgium
- Structural Biology Brussels (SBB), Bioengineering Sciences Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | | | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| |
Collapse
|
14
|
Nadendla S, Jackson R, Munro J, Quaglia F, Mészáros B, Olley D, Hobbs ET, Goralski SM, Chibucos M, Mungall CJ, Tosatto SCE, Erill I, Giglio MG. ECO: the Evidence and Conclusion Ontology, an update for 2022. Nucleic Acids Res 2022; 50:D1515-D1521. [PMID: 34986598 PMCID: PMC8728134 DOI: 10.1093/nar/gkab1025] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/12/2022] Open
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource that provides an ontology of terms used to capture the type of evidence that supports biomedical annotations and assertions. Consistent capture of evidence information with ECO allows tracking of annotation provenance, establishment of quality control measures, and evidence-based data mining. ECO is in use by dozens of data repositories and resources with both specific and general areas of focus. ECO is continually being expanded and enhanced in response to user requests as well as our aim to adhere to community best-practices for ontology development. The ECO support team engages in multiple collaborations with other ontologies and annotating groups. Here we report on recent updates to the ECO ontology itself as well as associated resources that are available through this project. ECO project products are freely available for download from the project website (https://evidenceontology.org/) and GitHub (https://github.com/evidenceontology/evidenceontology). ECO is released into the public domain under a CC0 1.0 Universal license.
Collapse
Affiliation(s)
- Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - James Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Federica Quaglia
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Bari, Italy.,Department of Biomedical Sciences, University of Padova, Padova, Italy
| | - Bálint Mészáros
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg 69117, Germany
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Marcus Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Christopher John Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab, Berkeley, California, USA
| | | | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
15
|
Abstract
Regeneration experiments can produce complex phenotypes including morphological outcomes and gene expression patterns that are crucial for the understanding of the mechanisms of regeneration. However, due to their inherent complexity, variability between individuals, and heterogeneous data spreading across the literature, extracting mechanistic knowledge from them is a current challenge. Toward this goal, here we present protocols to unambiguously formalize the phenotypes of regeneration and their experimental procedures using precise mathematical morphological descriptions and standardized gene expression patterns. We illustrate the application of the methodology with step-by-step protocols for planaria and limb regeneration phenotypes. The curated datasets with these methods are not only helpful for human scientists, but they represent a key formalized resource that can be easily integrated into downstream reverse engineering methodologies for the automatic extraction of mechanistic knowledge. This approach can pave the way for discovering comprehensive systems-level models of regeneration.
Collapse
Affiliation(s)
- Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA.
| |
Collapse
|
16
|
Farrell CM, Goldfarb T, Rangwala SH, Astashyn A, Ermolaeva OD, Hem V, Katz KS, Kodali VK, Ludwig F, Wallin CL, Pruitt KD, Murphy TD. RefSeq Functional Elements as experimentally assayed nongenic reference standards and functional interactions in human and mouse. Genome Res 2022; 32:175-188. [PMID: 34876495 PMCID: PMC8744684 DOI: 10.1101/gr.275819.121] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 12/02/2021] [Indexed: 11/25/2022]
Abstract
Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature. The curated data set is comprised of richly annotated sequence records, descriptive records in the NCBI Gene database, reference genome feature annotation, and activity-based interactions between nongenic regions, target genes, and each other. The data set provides succinct functional details and transparent experimental evidence, leverages data from multiple experimental sources, is readily accessible and adaptable, and uses a flexible data model. The data have multiple uses for basic functional discovery, bioinformatics studies, genetic variant interpretation; as known positive controls for epigenomic data evaluation; and as reference standards for functional interactions. Comparisons to other gene regulatory data sets show that the RefSeqFE data set includes a wider range of feature types representing more areas of biology, but it is comparatively smaller and subject to data selection biases. RefSeqFEs thus provide an alternative and complementary resource for experimentally assayed functional elements, with future data set growth expected.
Collapse
Affiliation(s)
- Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Tamara Goldfarb
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Alexander Astashyn
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Olga D Ermolaeva
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Vichet Hem
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Vamsi K Kodali
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Frank Ludwig
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Craig L Wallin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
17
|
Zaitzeff A, Leiby N, Motta FC, Haase SB, Singer JM. Improved datasets and evaluation methods for the automatic prediction of DNA-binding proteins. Bioinformatics 2021; 38:44-51. [PMID: 34415301 DOI: 10.1093/bioinformatics/btab603] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/04/2021] [Accepted: 08/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate automatic annotation of protein function relies on both innovative models and robust datasets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the datasets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the datasets used in previous DNA-binding protein literature and provide several new datasets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved datasets to two previously published models. In addition, we provide extensive tests showing how the best models predict across taxa. RESULTS Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxa, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms. AVAILABILITY AND IMPLEMENTATION The data and results for this article can be found at https://doi.org/10.5281/zenodo.5153906. The code for this article can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins.
Collapse
Affiliation(s)
| | - Nicholas Leiby
- Two Six Research, Two Six Technologies, Arlington, VA 22203, USA
| | - Francis C Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Steven B Haase
- Department of Biology, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
18
|
Dudek CA, Jahn D. PRODORIC: state-of-the-art database of prokaryotic gene regulation. Nucleic Acids Res 2021; 50:D295-D302. [PMID: 34850133 PMCID: PMC8728284 DOI: 10.1093/nar/gkab1110] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 11/01/2021] [Indexed: 11/14/2022] Open
Abstract
PRODORIC is worldwide one of the largest collections of prokaryotic transcription factor binding sites from multiple bacterial sources with corresponding interpretation and visualization tools. With the introduction of PRODORIC2 in 2017, the transition to a modern web interface and maintainable backend was started. With this latest PRODORIC release the database backend is now fully API-based and provides programmatical access to the complete PRODORIC data. The visualization tools Genome Browser and ProdoNet from the original PRODORIC have been reintroduced and were integrated into the PRODORIC website. Missing input and output options from the original Virtual Footprint were added again for position weight matrix pattern-based searches. The whole PRODORIC dataset was reannotated. Every transcription factor binding site was re-evaluated to increase the overall database quality. During this process, additional parameters, like bound effectors, regulation type and different types of experimental evidence have been added for every transcription factor. Additionally, 109 new transcription factors and 6 new organisms have been added. PRODORIC is publicly available at https://www.prodoric.de.
Collapse
Affiliation(s)
- Christian-Alexander Dudek
- Institute of Microbiology and Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, Braunschweig D-38106, Germany
| | - Dieter Jahn
- Institute of Microbiology and Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, Braunschweig D-38106, Germany
| |
Collapse
|
19
|
Schriml LM, Munro JB, Schor M, Olley D, McCracken C, Felix V, Baron JA, Jackson R, Bello SM, Bearer C, Lichenstein R, Bisordi K, Dialo NC, Giglio M, Greene C. The Human Disease Ontology 2022 update. Nucleic Acids Res 2021; 50:D1255-D1261. [PMID: 34755882 PMCID: PMC8728220 DOI: 10.1093/nar/gkab1063] [Citation(s) in RCA: 127] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 10/13/2021] [Accepted: 10/18/2021] [Indexed: 01/31/2023] Open
Abstract
The Human Disease Ontology (DO) (www.disease-ontology.org) database, has significantly expanded the disease content and enhanced our userbase and website since the DO’s 2018 Nucleic Acids Research DATABASE issue paper. Conservatively, based on available resource statistics, terms from the DO have been annotated to over 1.5 million biomedical data elements and citations, a 10× increase in the past 5 years. The DO, funded as a NHGRI Genomic Resource, plays a key role in disease knowledge organization, representation, and standardization, serving as a reference framework for multiscale biomedical data integration and analysis across thousands of clinical, biomedical and computational research projects and genomic resources around the world. This update reports on the addition of 1,793 new disease terms, a 14% increase of textual definitions and the integration of 22 137 new SubClassOf axioms defining disease to disease connections representing the DO’s complex disease classification. The DO’s updated website provides multifaceted etiology searching, enhanced documentation and educational resources.
Collapse
Affiliation(s)
- Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - James B Munro
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Mike Schor
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Dustin Olley
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Carrie McCracken
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Victor Felix
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - J Allen Baron
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | | | - Susan M Bello
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | | | | | | | | | - Michelle Giglio
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| | - Carol Greene
- University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
20
|
Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, Hu F, Lopes EN, Martens M, Pham N, Shin W, Slenter DN, Waagmeester A, Willighagen EL, Winckers LA, Evelo CT, Pico AR. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol 2021; 17:e1009226. [PMID: 34411100 PMCID: PMC8375987 DOI: 10.1371/journal.pcbi.1009226] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
| | - Martina Kutmon
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Susan L. Coort
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Daniela Digles
- Department of Pharmaceutical Sciences, Division of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Lauren J. Dupuis
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Friederike Ehrhart
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Finterly Hu
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Elisson N. Lopes
- Instituto de Ciencias Biologicas, Departamento de Bioquimica e Imunologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Nhung Pham
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Woosub Shin
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Denise N. Slenter
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | | | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Laurent A. Winckers
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, Maastricht University, Maastricht, the Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, the Netherlands
| | - Alexander R. Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
21
|
Nowotarski SH, Davies EL, Robb SMC, Ross EJ, Matentzoglu N, Doddihal V, Mir M, McClain M, Sánchez Alvarado A. Planarian Anatomy Ontology: a resource to connect data within and across experimental platforms. Development 2021; 148:271068. [PMID: 34318308 PMCID: PMC8353266 DOI: 10.1242/dev.196097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 06/28/2021] [Indexed: 12/23/2022]
Abstract
As the planarian research community expands, the need for an interoperable data organization framework for tool building has become increasingly apparent. Such software would streamline data annotation and enhance cross-platform and cross-species searchability. We created the Planarian Anatomy Ontology (PLANA), an extendable relational framework of defined Schmidtea mediterranea (Smed) anatomical terms used in the field. At publication, PLANA contains over 850 terms describing Smed anatomy from subcellular to system levels across all life cycle stages, in intact animals and regenerating body fragments. Terms from other anatomy ontologies were imported into PLANA to promote interoperability and comparative anatomy studies. To demonstrate the utility of PLANA as a tool for data curation, we created resources for planarian embryogenesis, including a staging series and molecular fate-mapping atlas, and the Planarian Anatomy Gene Expression database, which allows retrieval of a variety of published transcript/gene expression data associated with PLANA terms. As an open-source tool built using FAIR (findable, accessible, interoperable, reproducible) principles, our strategy for continued curation and versioning of PLANA also provides a platform for community-led growth and evolution of this resource. Summary: Description of the construction of an anatomy ontology tool for planaria with examples of its potential use to curate and mine data across multiple experimental platforms.
Collapse
Affiliation(s)
- Stephanie H Nowotarski
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.,Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Erin L Davies
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.,Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD 21702, USA
| | - Sofia M C Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Eric J Ross
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.,Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Nicolas Matentzoglu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Viraj Doddihal
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Mol Mir
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Melainia McClain
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Alejandro Sánchez Alvarado
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.,Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
22
|
Duek P, Mary C, Zahn-Zabal M, Bairoch A, Lane L. Functionathon: a manual data mining workflow to generate functional hypotheses for uncharacterized human proteins and its application by undergraduate students. Database (Oxford) 2021; 2021:baab046. [PMID: 34318869 PMCID: PMC8317215 DOI: 10.1093/database/baab046] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 07/06/2021] [Accepted: 07/12/2021] [Indexed: 12/11/2022]
Abstract
About 10% of human proteins have no annotated function in protein knowledge bases. A workflow to generate hypotheses for the function of these uncharacterized proteins has been developed, based on predicted and experimental information on protein properties, interactions, tissular expression, subcellular localization, conservation in other organisms, as well as phenotypic data in mutant model organisms. This workflow has been applied to seven uncharacterized human proteins (C6orf118, C7orf25, CXorf58, RSRP1, SMLR1, TMEM53 and TMEM232) in the frame of a course-based undergraduate research experience named Functionathon organized at the University of Geneva to teach undergraduate students how to use biological databases and bioinformatics tools and interpret the results. C6orf118, CXorf58 and TMEM232 were proposed to be involved in cilia-related functions; TMEM53 and SMLR1 were proposed to be involved in lipid metabolism and C7orf25 and RSRP1 were proposed to be involved in RNA metabolism and gene expression. Experimental strategies to test these hypotheses were also discussed. The results of this manual data mining study may contribute to the project recently launched by the Human Proteome Organization (HUPO) Human Proteome Project aiming to fill gaps in the functional annotation of human proteins. Database URL: http://www.nextprot.org.
Collapse
Affiliation(s)
- Paula Duek
- CALIPHO group, SIB Swiss Institute of Bioinformatics
- Department of microbiology and molecular medicine, Faculty of medicine, University of Geneva, Geneva, Switzerland
| | - Camille Mary
- Department of microbiology and molecular medicine, Faculty of medicine, University of Geneva, Geneva, Switzerland
| | | | - Amos Bairoch
- CALIPHO group, SIB Swiss Institute of Bioinformatics
- Department of microbiology and molecular medicine, Faculty of medicine, University of Geneva, Geneva, Switzerland
| | - Lydie Lane
- CALIPHO group, SIB Swiss Institute of Bioinformatics
- Department of microbiology and molecular medicine, Faculty of medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
23
|
Zhang W, Zeng B, Lin H, Guan W, Mo J, Wu S, Wei Y, Zhang Q, Yu D, Li W, Chan GCF. CanImmunother: a manually curated database for identification of cancer immunotherapies associating with biomarkers, targets, and clinical effects. Oncoimmunology 2021; 10:1944553. [PMID: 34345532 PMCID: PMC8288037 DOI: 10.1080/2162402x.2021.1944553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 06/12/2021] [Accepted: 06/15/2021] [Indexed: 12/01/2022] Open
Abstract
As immunotherapy is evolving into an essential armamentarium against cancers, numerous translational studies associated with relevant biomarkers, targets, and clinical effects have been reported in recent years. However, a large amount of associated experimental data remains unexplored due to the difficulty in accessibility and utilization. Here, we established a comprehensive high-quality database for cancer immunotherapy called CanImmunother (http://www.biomedical-web.com/cancerit/) through manual curation on 4515 publications. CanImmunother contains 3267 experimentally validated associations between 218 cancer sub-types across 34 body parts and 484 immunotherapies with 642 biomarkers, 108 targets, and 121 control therapies. Each association was manually curated by professional curators, incorporated with valuable annotation and cross references, and assigned with an association score for prioritization. To help clinicians and researchers in identifying and discovering better cancer immunotherapy and their respective biomarkers and targets, CanImmunother offers user-friendly web applications including search, browse, excel table, association prioritization, and network visualization. CanImmunother presents a landscape of experimental cancer immunotherapy association data, serving as a useful resource to improve our insight and to facilitate further discovery of advanced immunotherapy options for cancer patients.
Collapse
Affiliation(s)
- Wenliang Zhang
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Chinese Academy of Sciences, Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China
| | - Binghui Zeng
- Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Huancai Lin
- Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Wen Guan
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China
- Guangdong Key Laboratory of Animal Conservation and Resource Utilization, Institute of Zoology, Guangdong Academy of Sciences, Guangzhou, China
| | - Jing Mo
- Department of Bioinformatics, Outstanding Biotechnology Co., Ltd.-Shenzhen, Shenzhen, China
| | - Song Wu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yanjie Wei
- Chinese Academy of Sciences, Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China
- Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
- CAS Key Laboratory of Health Informatics, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Qianshen Zhang
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - Dongsheng Yu
- Guangdong Provincial Key Laboratory of Stomatology, Guanghua School of Stomatology, Hospital of Stomatology, Sun Yat-sen University, Guangzhou, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control of Ministry of Education, Sun Yat-sen University,Guangzhou, China
| | - Godfrey Chi-Fung Chan
- Department of Pediatrics, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Pediatrics and Adolescent Medicine, Faculty of Medicine, The University of Hong Kong, Hong Kong
| |
Collapse
|
24
|
Hobbs ET, Goralski SM, Mitchell A, Simpson A, Leka D, Kotey E, Sekira M, Munro JB, Nadendla S, Jackson R, Gonzalez-Aguirre A, Krallinger M, Giglio M, Erill I. ECO-CollecTF: A Corpus of Annotated Evidence-Based Assertions in Biomedical Manuscripts. Front Res Metr Anal 2021; 6:674205. [PMID: 34327299 PMCID: PMC8313968 DOI: 10.3389/frma.2021.674205] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 06/28/2021] [Indexed: 11/20/2022] Open
Abstract
Analysis of high-throughput experiments in the life sciences frequently relies upon standardized information about genes, gene products, and other biological entities. To provide this information, expert curators are increasingly relying on text mining tools to identify, extract and harmonize statements from biomedical journal articles that discuss findings of interest. For determining reliability of the statements, curators need the evidence used by the authors to support their assertions. It is important to annotate the evidence directly used by authors to qualify their findings rather than simply annotating mentions of experimental methods without the context of what findings they support. Text mining tools require tuning and adaptation to achieve accurate performance. Many annotated corpora exist to enable developing and tuning text mining tools; however, none currently provides annotations of evidence based on the extensive and widely used Evidence and Conclusion Ontology. We present the ECO-CollecTF corpus, a novel, freely available, biomedical corpus of 84 documents that captures high-quality, evidence-based statements annotated with the Evidence and Conclusion Ontology.
Collapse
Affiliation(s)
- Elizabeth T Hobbs
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Stephen M Goralski
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Ashley Mitchell
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Andrew Simpson
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Dorjan Leka
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Emmanuel Kotey
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - Matt Sekira
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| | - James B Munro
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Rebecca Jackson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | | | - Martin Krallinger
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Centro Nacional de Investigaciones Oncológicas (CNIO), Madrid, Spain
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, United States
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, United States
| |
Collapse
|
25
|
Cao Y, Dong Q, Wang D, Liu Y, Zhang P, Yu X, Niu C. TIDB: a comprehensive database of trained immunity. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6318070. [PMID: 34244719 PMCID: PMC8271126 DOI: 10.1093/database/baab041] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 06/21/2021] [Accepted: 06/25/2021] [Indexed: 11/19/2022]
Abstract
Trained immunity is a newly emerging concept that defines the ability of the innate immune system to form immune memory and provide long-lasting protection against previously encountered antigens. Accumulating evidence reveals that trained immunity not only has broad benefits to host defense but is also harmful to the host in chronic inflammatory diseases. However, all trained immunity-related information is scattered in the literature and thus is difficult to access. Here, we describe Trained Immunity DataBase (TIDB), a comprehensive database that provides well-studied trained immunity-related genes from human, rat and mouse as well as the related literature evidence. Moreover, TIDB also provides three modules to analyze the function of the trained-immunity-related genes of interest, including Reactome pathway over-representation analysis, Gene Ontology enrichment analysis and protein–protein interaction subnetwork reconstruction. We believe TIDB will help developing valuable strategies for vaccine design and immune-mediated disease therapy. Database URL:http://www.ieom-tm.com/tidb
Collapse
Affiliation(s)
- Yang Cao
- Department of Environmental Medicine, Tianjin Institute of Environmental and Operational Medicine, No.1 Dali Road, Heping District, Tianjin 300050, China
| | - Qingyang Dong
- Department of Environmental Medicine, Tianjin Institute of Environmental and Operational Medicine, No.1 Dali Road, Heping District, Tianjin 300050, China
| | - Dan Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No.38 Life Science Park Road, Changping District, Beijing 102206, China
| | - Ying Liu
- Department of Environmental Medicine, Tianjin Institute of Environmental and Operational Medicine, No.1 Dali Road, Heping District, Tianjin 300050, China
| | - Pengcheng Zhang
- Department of Environmental Medicine, Tianjin Institute of Environmental and Operational Medicine, No.1 Dali Road, Heping District, Tianjin 300050, China
| | - Xiaobo Yu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, No.38 Life Science Park Road, Changping District, Beijing 102206, China
| | - Chao Niu
- Department of Environmental Medicine, Tianjin Institute of Environmental and Operational Medicine, No.1 Dali Road, Heping District, Tianjin 300050, China
| |
Collapse
|
26
|
Kramarz B, Huntley RP, Rodríguez-López M, Roncaglia P, Saverimuttu SCC, Parkinson H, Bandopadhyay R, Martin MJ, Orchard S, Hooper NM, Brough D, Lovering RC. Gene Ontology Curation of Neuroinflammation Biology Improves the Interpretation of Alzheimer's Disease Gene Expression Data. J Alzheimers Dis 2021; 75:1417-1435. [PMID: 32417785 PMCID: PMC7369085 DOI: 10.3233/jad-200207] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
BACKGROUND Gene Ontology (GO) is a major bioinformatic resource used for analysis of large biomedical datasets, for example from genome-wide association studies, applied universally across biological fields, including Alzheimer's disease (AD) research. OBJECTIVE We aim to demonstrate the applicability of GO for interpretation of AD datasets to improve the understanding of the underlying molecular disease mechanisms, including the involvement of inflammatory pathways and dysregulated microRNAs (miRs). METHODS We have undertaken a systematic full article GO annotation approach focused on microglial proteins implicated in AD and the miRs regulating their expression. PANTHER was used for enrichment analysis of previously published AD data. Cytoscape was used for visualizing and analyzing miR-target interactions captured from published experimental evidence. RESULTS We contributed 3,084 new annotations for 494 entities, i.e., on average six new annotations per entity. This included a total of 1,352 annotations for 40 prioritized microglial proteins implicated in AD and 66 miRs regulating their expression, yielding an average of twelve annotations per prioritized entity. The updated GO resource was then used to re-analyze previously published data. The re-analysis showed novel processes associated with AD-related genes, not identified in the original study, such as 'gliogenesis', 'regulation of neuron projection development', or 'response to cytokine', demonstrating enhanced applicability of GO for neuroscience research. CONCLUSIONS This study highlights ongoing development of the neurobiological aspects of GO and demonstrates the value of biocuration activities in the area, thus helping to delineate the molecular bases of AD to aid the development of diagnostic tools and treatments.
Collapse
Affiliation(s)
- Barbara Kramarz
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK
| | - Rachael P Huntley
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK
| | - Milagros Rodríguez-López
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Shirin C C Saverimuttu
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rina Bandopadhyay
- UCL Institute of Neurology and Reta Lila Weston Institute of Neurological Studies, University College London, London, UK
| | - Maria-Jesus Martin
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Nigel M Hooper
- Division of Neuroscience and Experimental Psychology, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
| | - David Brough
- Division of Neuroscience and Experimental Psychology, School of Biological Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, University of Manchester, Manchester, UK
| | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK
| |
Collapse
|
27
|
Meldal BHM, Pons C, Perfetto L, Del-Toro N, Wong E, Aloy P, Hermjakob H, Orchard S, Porras P. Analysing the yeast complexome-the Complex Portal rising to the challenge. Nucleic Acids Res 2021; 49:3156-3167. [PMID: 33677561 PMCID: PMC8034636 DOI: 10.1093/nar/gkab077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 01/22/2021] [Accepted: 01/27/2021] [Indexed: 02/06/2023] Open
Abstract
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.
Collapse
Affiliation(s)
- Birgit H M Meldal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, 08028 Barcelona, Catalonia, Spain
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Edith Wong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5477, USA
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, 08028 Barcelona, Catalonia, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Catalonia, Spain
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
28
|
Good BM, Van Auken K, Hill DP, Mi H, Carbon S, Balhoff JP, Albou LP, Thomas PD, Mungall CJ, Blake JA, D'Eustachio P. Reactome and the Gene Ontology: Digital convergence of data resources. Bioinformatics 2021; 37:3343-3348. [PMID: 33964129 PMCID: PMC8504636 DOI: 10.1093/bioinformatics/btab325] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/18/2021] [Accepted: 04/27/2021] [Indexed: 12/22/2022] Open
Abstract
Motivation Gene Ontology Causal Activity Models (GO-CAMs) assemble individual associations of gene products with cellular components, molecular functions and biological processes into causally linked activity flow models. Pathway databases such as the Reactome Knowledgebase create detailed molecular process descriptions of reactions and assemble them, based on sharing of entities between individual reactions into pathway descriptions. Results To convert the rich content of Reactome into GO-CAMs, we have developed a software tool, Pathways2GO, to convert the entire set of normal human Reactome pathways into GO-CAMs. This conversion yields standard GO annotations from Reactome content and supports enhanced quality control for both Reactome and GO, yielding a nearly seamless conversion between these two resources for the bioinformatics community. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin M Good
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena CA 91125 USA
| | | | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517 USA
| | - Laurent-Philippe Albou
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles CA 90033 USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA 94720 USA
| | | | - Peter D'Eustachio
- Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York NY 10016 USA
| |
Collapse
|
29
|
Touré V, Vercruysse S, Acencio ML, Lovering RC, Orchard S, Bradley G, Casals-Casas C, Chaouiya C, Del-Toro N, Flobak Å, Gaudet P, Hermjakob H, Hoyt CT, Licata L, Lægreid A, Mungall CJ, Niknejad A, Panni S, Perfetto L, Porras P, Pratt D, Saez-Rodriguez J, Thieffry D, Thomas PD, Türei D, Kuiper M. The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). Bioinformatics 2021; 36:5712-5718. [PMID: 32637990 PMCID: PMC8023674 DOI: 10.1093/bioinformatics/btaa622] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 06/06/2020] [Accepted: 06/30/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. Results Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. Availability and implementation The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vasundra Touré
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Steven Vercruysse
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Marcio Luis Acencio
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, UCL, University College London, London WC1E 6JF, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Glyn Bradley
- Computational Biology, Functional Genomics, GSK, Stevenage SG1 2NY, UK
| | | | - Claudine Chaouiya
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M Marseille 13331, France
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Åsmund Flobak
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway.,The Cancer Clinic, St. Olav's Hospital, Trondheim University Hospital, Trondheim 7030, Norway
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Geneva 1211, Switzerland
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anne Niknejad
- Vital-IT Group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Amphipole Building, 1015 Lausanne, Switzerland
| | - Simona Panni
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Ecology and Earth Science, Via Pietro Bucci Cubo 6/C, Rende 87036, CS, Italy
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, 69120 Heidelberg, Germany.,Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen 52062, Germany
| | - Denis Thieffry
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90007, USA
| | - Dénes Türei
- Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Aachen 52062, Germany
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim 7491, Norway
| |
Collapse
|
30
|
Renz A, Widerspick L, Dräger A. First Genome-Scale Metabolic Model of Dolosigranulum pigrum Confirms Multiple Auxotrophies. Metabolites 2021; 11:232. [PMID: 33918864 PMCID: PMC8069353 DOI: 10.3390/metabo11040232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/21/2021] [Accepted: 04/06/2021] [Indexed: 12/11/2022] Open
Abstract
Dolosigranulum pigrum is a quite recently discovered Gram-positive coccus. It has gained increasing attention due to its negative correlation with Staphylococcus aureus, which is one of the most successful modern pathogens causing severe infections with tremendous morbidity and mortality due to its multiple resistances. As the possible mechanisms behind its inhibition of S. aureus remain unclear, a genome-scale metabolic model (GEM) is of enormous interest and high importance to better study its role in this fight. This article presents the first GEM of D. pigrum, which was curated using automated reconstruction tools and extensive manual curation steps to yield a high-quality GEM. It was evaluated and validated using all currently available experimental data of D. pigrum. With this model, already predicted auxotrophies and biosynthetic pathways could be verified. The model was used to define a minimal medium for further laboratory experiments and to predict various carbon sources' growth capacities. This model will pave the way to better understand D. pigrum's role in the fight against S. aureus.
Collapse
Affiliation(s)
- Alina Renz
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076 Tübingen, Germany; (A.R.); (L.W.)
- Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence ‘Controlling Microbes to Fight Infections’, University of Tübingen, 72076 Tübingen, Germany
| | - Lina Widerspick
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076 Tübingen, Germany; (A.R.); (L.W.)
| | - Andreas Dräger
- Computational Systems Biology of Infections and Antimicrobial-Resistant Pathogens, Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076 Tübingen, Germany; (A.R.); (L.W.)
- Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany
- Cluster of Excellence ‘Controlling Microbes to Fight Infections’, University of Tübingen, 72076 Tübingen, Germany
- German Center for Infection Research (DZIF), Partner site Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
31
|
Feuermann M, Boutet E, Morgat A, Axelsen KB, Bansal P, Bolleman J, de Castro E, Coudert E, Gasteiger E, Géhant S, Lieberherr D, Lombardot T, Neto TB, Pedruzzi I, Poux S, Pozzato M, Redaschi N, Bridge A. Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB. Metabolites 2021; 11:48. [PMID: 33445429 PMCID: PMC7827101 DOI: 10.3390/metabo11010048] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/05/2021] [Accepted: 01/07/2021] [Indexed: 01/28/2023] Open
Abstract
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.
Collapse
Affiliation(s)
- Marc Feuermann
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Emmanuel Boutet
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Anne Morgat
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Kristian B. Axelsen
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Parit Bansal
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Jerven Bolleman
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Edouard de Castro
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Elisabeth Coudert
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Elisabeth Gasteiger
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Sébastien Géhant
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Damien Lieberherr
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Thierry Lombardot
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Teresa B. Neto
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Ivo Pedruzzi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Sylvain Poux
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Monica Pozzato
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
| | - on behalf of the UniProt Consortium
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel-Servet, CH-1211 Geneva 4, Switzerland; (A.M.); (K.B.A.); (P.B.); (J.B.); (E.d.C.); (E.C.); (E.G.); (S.G.); (D.L.); (T.L.); (T.B.N.); (I.P.); (S.P.); (M.P.); (N.R.); (A.B.)
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
- Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street NorthWest, Suite 1200, Washington, DC 20007, USA
| |
Collapse
|
32
|
Carbon S, Douglass E, Good BM, Unni DR, Harris NL, Mungall CJ, Basu S, Chisholm RL, Dodson RJ, Hartline E, Fey P, Thomas PD, Albou LP, Ebert D, Kesling MJ, Mi H, Muruganujan A, Huang X, Mushayahama T, LaBonte SA, Siegele DA, Antonazzo G, Attrill H, Brown NH, Garapati P, Marygold SJ, Trovisco V, dos Santos G, Falls K, Tabone C, Zhou P, Goodman JL, Strelets VB, Thurmond J, Garmiri P, Ishtiaq R, Rodríguez-López M, Acencio ML, Kuiper M, Lægreid A, Logie C, Lovering RC, Kramarz B, Saverimuttu SCC, Pinheiro SM, Gunn H, Su R, Thurlow KE, Chibucos M, Giglio M, Nadendla S, Munro J, Jackson R, Duesbury MJ, Del-Toro N, Meldal BHM, Paneerselvam K, Perfetto L, Porras P, Orchard S, Shrivastava A, Chang HY, Finn RD, Mitchell AL, Rawlings ND, Richardson L, Sangrador-Vegas A, Blake JA, Christie KR, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov DM, Harris MA, Oliver SG, Rutherford K, Wood V, Hayles J, Bähler J, Bolton ER, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Plasterer C, Tutaj MA, Vedi M, Wang SJ, D’Eustachio P, Matthews L, Balhoff JP, Aleksander SA, Alexander MJ, Cherry JM, Engel SR, Gondwe F, Karra K, et alCarbon S, Douglass E, Good BM, Unni DR, Harris NL, Mungall CJ, Basu S, Chisholm RL, Dodson RJ, Hartline E, Fey P, Thomas PD, Albou LP, Ebert D, Kesling MJ, Mi H, Muruganujan A, Huang X, Mushayahama T, LaBonte SA, Siegele DA, Antonazzo G, Attrill H, Brown NH, Garapati P, Marygold SJ, Trovisco V, dos Santos G, Falls K, Tabone C, Zhou P, Goodman JL, Strelets VB, Thurmond J, Garmiri P, Ishtiaq R, Rodríguez-López M, Acencio ML, Kuiper M, Lægreid A, Logie C, Lovering RC, Kramarz B, Saverimuttu SCC, Pinheiro SM, Gunn H, Su R, Thurlow KE, Chibucos M, Giglio M, Nadendla S, Munro J, Jackson R, Duesbury MJ, Del-Toro N, Meldal BHM, Paneerselvam K, Perfetto L, Porras P, Orchard S, Shrivastava A, Chang HY, Finn RD, Mitchell AL, Rawlings ND, Richardson L, Sangrador-Vegas A, Blake JA, Christie KR, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov DM, Harris MA, Oliver SG, Rutherford K, Wood V, Hayles J, Bähler J, Bolton ER, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Plasterer C, Tutaj MA, Vedi M, Wang SJ, D’Eustachio P, Matthews L, Balhoff JP, Aleksander SA, Alexander MJ, Cherry JM, Engel SR, Gondwe F, Karra K, Miyasato SR, Nash RS, Simison M, Skrzypek MS, Weng S, Wong ED, Feuermann M, Gaudet P, Morgat A, Bakker E, Berardini TZ, Reiser L, Subramaniam S, Huala E, Arighi CN, Auchincloss A, Axelsen K, Argoud-Puy G, Bateman A, Blatter MC, Boutet E, Bowler E, Breuza L, Bridge A, Britto R, Bye-A-Jee H, Casas CC, Coudert E, Denny P, Estreicher A, Famiglietti ML, Georghiou G, Gos A, Gruaz-Gumowski N, Hatton-Ellis E, Hulo C, Ignatchenko A, Jungo F, Laiho K, Le Mercier P, Lieberherr D, Lock A, Lussi Y, MacDougall A, Magrane M, Martin MJ, Masson P, Natale DA, Hyka-Nouspikel N, Orchard S, Pedruzzi I, Pourcel L, Poux S, Pundir S, Rivoire C, Speretta E, Sundaram S, Tyagi N, Warner K, Zaru R, Wu CH, Diehl AD, Chan JN, Grove C, Lee RYN, Muller HM, Raciti D, Van Auken K, Sternberg PW, Berriman M, Paulini M, Howe K, Gao S, Wright A, Stein L, Howe DG, Toro S, Westerfield M, Jaiswal P, Cooper L, Elser J. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res 2021; 49:D325-D334. [PMID: 33290552 PMCID: PMC7779012 DOI: 10.1093/nar/gkaa1113] [Show More Authors] [Citation(s) in RCA: 2172] [Impact Index Per Article: 543.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/22/2020] [Accepted: 12/02/2020] [Indexed: 12/28/2022] Open
Abstract
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
Collapse
|
33
|
Egorova KS, Smirnova NS, Toukach PV. CSDB_GT, a curated glycosyltransferase database with close-to-full coverage on three most studied nonanimal species. Glycobiology 2020; 31:524-529. [PMID: 33242091 DOI: 10.1093/glycob/cwaa107] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/13/2020] [Accepted: 11/18/2020] [Indexed: 11/13/2022] Open
Abstract
We report the accomplishment of the first stage of the development of a novel manually curated database on glycosyltransferase (GT) activities, CSDB_GT. CSDB_GT (http://csdb.glycoscience.ru/gt.html) has been supplemented with GT activities from Saccharomyces cerevisiae. Now it provides the close-to-complete coverage on experimentally confirmed GTs from the three most studied model organisms from the three kingdoms: plantae (Arabidopsis thaliana, ca. 930 activities), bacteria (Escherichia coli, ca. 820 activities) and fungi (S. cerevisiae, ca. 270 activities).
Collapse
Affiliation(s)
- Ksenia S Egorova
- Laboratory of Metal-Complex and Nano-Scale Catalysts, N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prospect 47, Moscow 119991, Russia
| | - Nadezhda S Smirnova
- Kurnakov Institute of General and Inorganic Chemistry, Russian Academy of Sciences, Leninsky prospect 31, Moscow 119991, Russia
| | - Philip V Toukach
- Laboratory of Carbohydrate Chemistry, N.D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Leninsky prospect 47, Moscow 119991, Russia
| |
Collapse
|
34
|
MacDougall A, Volynkin V, Saidi R, Poggioli D, Zellner H, Hatton-Ellis E, Joshi V, O’Donovan C, Orchard S, Auchincloss AH, Baratin D, Bolleman J, Coudert E, de Castro E, Hulo C, Masson P, Pedruzzi I, Rivoire C, Arighi C, Wang Q, Chen C, Huang H, Garavelli J, Vinayaka CR, Yeh LS, Natale DA, Laiho K, Martin MJ, Renaux A, Pichler K. UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase. Bioinformatics 2020; 36:4643-4648. [PMID: 32399560 PMCID: PMC7750954 DOI: 10.1093/bioinformatics/btaa485] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 04/13/2020] [Accepted: 05/05/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.
Collapse
Affiliation(s)
- Alistair MacDougall
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vladimir Volynkin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Diego Poggioli
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Kantar Consulting, Casalecchio Di Reno, 40033 Bologna, Italy
| | - Hermann Zellner
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emma Hatton-Ellis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Vishal Joshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrea H Auchincloss
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Delphine Baratin
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Jerven Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Edouard de Castro
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Chantal Hulo
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Patrick Masson
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Ivo Pedruzzi
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Catherine Rivoire
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Cecilia Arighi
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Qinghua Wang
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Chuming Chen
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - Hongzhan Huang
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - John Garavelli
- Protein Information Resource, University of Delaware, Newark, DE 19711, USA
| | - C R Vinayaka
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Lai-Su Yeh
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Kati Laiho
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandre Renaux
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Klemens Pichler
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
35
|
Arnaud E, Laporte MA, Kim S, Aubert C, Leonelli S, Miro B, Cooper L, Jaiswal P, Kruseman G, Shrestha R, Buttigieg PL, Mungall CJ, Pietragalla J, Agbona A, Muliro J, Detras J, Hualla V, Rathore A, Das RR, Dieng I, Bauchet G, Menda N, Pommier C, Shaw F, Lyon D, Mwanzia L, Juarez H, Bonaiuti E, Chiputwa B, Obileye O, Auzoux S, Yeumo ED, Mueller LA, Silverstein K, Lafargue A, Antezana E, Devare M, King B. The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems. PATTERNS (NEW YORK, N.Y.) 2020; 1:100105. [PMID: 33205138 PMCID: PMC7660444 DOI: 10.1016/j.patter.2020.100105] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/28/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022]
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Soonho Kim
- Markets, Trade and Institutions Division (MTID), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Céline Aubert
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology & Exeter Centre for the Study of the Life Sciences (Egenis), University of Exeter, Exeter, UK
| | - Berta Miro
- Agrifood Policy Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Gideon Kruseman
- Socio-Economics Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, Mexico
| | - Rosemary Shrestha
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, México
| | - Pier Luigi Buttigieg
- Helmholtz Metadata Collaboration, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Afolabi Agbona
- Cassava Breeding Program, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Jeffrey Detras
- Bioinformatics Cluster, Strategic Innovation Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Vilma Hualla
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Abhishek Rathore
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Roma Rani Das
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Ibnou Dieng
- Biometrics Unit, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Guillaume Bauchet
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Naama Menda
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Cyril Pommier
- BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Felix Shaw
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - David Lyon
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Leroy Mwanzia
- Performance, Innovation and Strategic Analysis, International Center for Tropical Agriculture (CIAT), Regional Office for Africa, Nairobi, Kenya
| | - Henry Juarez
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Enrico Bonaiuti
- Monitoring, Evaluation and Learning Team, International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut, Lebanon
| | - Brian Chiputwa
- Research Methods Group (RMG), World Agroforestry (ICRAF), Nairobi, Kenya
| | - Olatunbosun Obileye
- Data Management Section, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Sandrine Auzoux
- UPR AIDA, The French Agricultural Research Centre for International Development (CIRAD), Sainte-Clotilde, Réunion, France
- Université de Montpellier, Montpellier, France
| | - Esther Dzalé Yeumo
- Unité Délégation à l’Information Scientifique et Technique - DIST, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Lukas A. Mueller
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | | | | | - Erick Antezana
- Bayer Crop Science SA-NV, Diegem, Belgium
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Medha Devare
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Brian King
- CGIAR Platform for Big Data in Agriculture, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
36
|
Wood V, Carbon S, Harris MA, Lock A, Engel SR, Hill DP, Van Auken K, Attrill H, Feuermann M, Gaudet P, Lovering RC, Poux S, Rutherford KM, Mungall CJ. Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. Open Biol 2020; 10:200149. [PMID: 32875947 PMCID: PMC7536087 DOI: 10.1098/rsob.200149] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 08/06/2020] [Indexed: 12/11/2022] Open
Abstract
Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.
Collapse
Affiliation(s)
- Valerie Wood
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Midori A. Harris
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6B, UK
| | - Stacia R. Engel
- Department of Genetics, Stanford University, Palo Alto, CA 94304-5477, USA
| | - David P. Hill
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Marc Feuermann
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Pascale Gaudet
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Ruth C. Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London, London WC1E 6JF, UK
| | - Sylvain Poux
- Swiss Institute of Bioinformatics, 1 Michel-Servet, 1204 Geneva, Switzerland
| | - Kim M. Rutherford
- Cambridge Systems Biology Centre, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
37
|
Zhang W, Yao G, Wang J, Yang M, Wang J, Zhang H, Li W. ncRPheno: a comprehensive database platform for identification and validation of disease related noncoding RNAs. RNA Biol 2020; 17:943-955. [PMID: 32122231 PMCID: PMC7549653 DOI: 10.1080/15476286.2020.1737441] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 02/24/2020] [Accepted: 02/25/2020] [Indexed: 12/31/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play critical roles in many critical biological processes and have become a novel class of potential targets and bio-markers for disease diagnosis, therapy, and prognosis. Annotating and analysing ncRNA-disease association data are essential but challenging. Current computational resources lack comprehensive database platforms to consistently interpret and prioritize ncRNA-disease association data for biomedical investigation and application. Here, we present the ncRPheno database platform (http://lilab2.sysu.edu.cn/ncrpheno), which comprehensively integrates and annotates ncRNA-disease association data and provides novel searches, visualizations, and utilities for association identification and validation. ncRPheno contains 482,751 non-redundant associations between 14,494 ncRNAs and 3,210 disease phenotypes across 11 species with supporting evidence in the literature. A scoring model was refined to prioritize the associations based on evidential metrics. Moreover, ncRPheno provides user-friendly web interfaces, novel visualizations, and programmatic access to enable easy exploration, analysis, and utilization of the association data. A case study through ncRPheno demonstrated a comprehensive landscape of ncRNAs dysregulation associated with 22 cancers and uncovered 821 cancer-associated common ncRNAs. As a unique database platform, ncRPheno outperforms the existing similar databases in terms of data coverage and utilities, and it will assist studies in encoding ncRNAs associated with phenotypes ranging from genetic disorders to complex diseases. ABBREVIATIONS APIs: application programming interfaces; circRNA: circular RNA; ECO: Evidence & Conclusion Ontology; EFO: Experimental Factor Ontology; FDR: false discovery rate; GO: Gene Ontology; GWAS: genome wide association studies; HPO: Human Phenotype Ontology; ICGC: International Cancer Genome Consortium; lncRNA: long noncoding RNA; miRNA: micro RNA; ncRNA: noncoding RNA; NGS: next generation sequencing; OMIM: Online Mendelian Inheritance in Man; piRNA: piwi-interacting RNA; snoRNA: small nucleolar RNA; TCGA: The Cancer Genome Atlas.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Guocai Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Jianbo Wang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Minglei Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Jing Wang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Sun Yat-Sen University, Ministry of Education, China
| |
Collapse
|
38
|
Abdelhakim M, McMurray E, Syed AR, Kafkas S, Kamau AA, Schofield PN, Hoehndorf R. DDIEM: drug database for inborn errors of metabolism. Orphanet J Rare Dis 2020; 15:146. [PMID: 32527280 PMCID: PMC7291537 DOI: 10.1186/s13023-020-01428-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 05/28/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Inborn errors of metabolism (IEM) represent a subclass of rare inherited diseases caused by a wide range of defects in metabolic enzymes or their regulation. Of over a thousand characterized IEMs, only about half are understood at the molecular level, and overall the development of treatment and management strategies has proved challenging. An overview of the changing landscape of therapeutic approaches is helpful in assessing strategic patterns in the approach to therapy, but the information is scattered throughout the literature and public data resources. RESULTS We gathered data on therapeutic strategies for 300 diseases into the Drug Database for Inborn Errors of Metabolism (DDIEM). Therapeutic approaches, including both successful and ineffective treatments, were manually classified by their mechanisms of action using a new ontology. CONCLUSIONS We present a manually curated, ontologically formalized knowledgebase of drugs, therapeutic procedures, and mitigated phenotypes. DDIEM is freely available through a web interface and for download at http://ddiem.phenomebrowser.net.
Collapse
Affiliation(s)
- Marwa Abdelhakim
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, PO 23955 Saudi Arabia
| | - Eunice McMurray
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG United Kingdom
| | - Ali Raza Syed
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, PO 23955 Saudi Arabia
| | - Senay Kafkas
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, PO 23955 Saudi Arabia
| | - Allan Anthony Kamau
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955 Kingdom of Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, PO 23955 Saudi Arabia
| |
Collapse
|
39
|
Mészáros B, Erdős G, Szabó B, Schád É, Tantos Á, Abukhairan R, Horváth T, Murvai N, Kovács OP, Kovács M, Tosatto SCE, Tompa P, Dosztányi Z, Pancsa R. PhaSePro: the database of proteins driving liquid-liquid phase separation. Nucleic Acids Res 2020; 48:D360-D367. [PMID: 31612960 PMCID: PMC7145634 DOI: 10.1093/nar/gkz848] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 09/11/2019] [Accepted: 10/07/2019] [Indexed: 11/13/2022] Open
Abstract
Membraneless organelles (MOs) are dynamic liquid condensates that host a variety of specific cellular processes, such as ribosome biogenesis or RNA degradation. MOs form through liquid-liquid phase separation (LLPS), a process that relies on multivalent weak interactions of the constituent proteins and other macromolecules. Since the first discoveries of certain proteins being able to drive LLPS, it emerged as a general mechanism for the effective organization of cellular space that is exploited in all kingdoms of life. While numerous experimental studies report novel cases, the computational identification of LLPS drivers is lagging behind, and many open questions remain about the sequence determinants, composition, regulation and biological relevance of the resulting condensates. Our limited ability to overcome these issues is largely due to the lack of a dedicated LLPS database. Therefore, here we introduce PhaSePro (https://phasepro.elte.hu), an openly accessible, comprehensive, manually curated database of experimentally validated LLPS driver proteins/protein regions. It not only provides a wealth of information on such systems, but improves the standardization of data by introducing novel LLPS-specific controlled vocabularies. PhaSePro can be accessed through an appealing, user-friendly interface and thus has definite potential to become the central resource in this dynamically developing field.
Collapse
Affiliation(s)
- Bálint Mészáros
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Gábor Erdős
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Beáta Szabó
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Éva Schád
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Rawan Abukhairan
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Tamás Horváth
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Nikoletta Murvai
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Orsolya P Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Márton Kovács
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova CNR Institute of Neuroscience, Padova, Italy
| | - Péter Tompa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary.,Structural Biology (CSB), Brussels, Belgium; Structural Biology Brussels (SBB), Vrije Universiteit Brussel (VUB), Brussels 1050, Belgium
| | - Zsuzsanna Dosztányi
- MTA-ELTE Momentum Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest H-1117, Hungary
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences of the Hungarian Academy of Sciences, Budapest H-1117, Hungary
| |
Collapse
|
40
|
Lock A, Harris MA, Rutherford K, Hayles J, Wood V. Community curation in PomBase: enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5827230. [PMID: 32353878 PMCID: PMC7192550 DOI: 10.1093/database/baaa028] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 02/28/2020] [Accepted: 03/22/2020] [Indexed: 11/22/2022]
Abstract
Maximizing the impact and value of scientific research requires efficient knowledge distribution, which increasingly depends on the integration of standardized published data into online databases. To make data integration more comprehensive and efficient for fission yeast research, PomBase has pioneered a community curation effort that engages publication authors directly in FAIR-sharing of data representing detailed biological knowledge from hypothesis-driven experiments. Canto, an intuitive online curation tool that enables biologists to describe their detailed functional data using shared ontologies, forms the core of PomBase’s system. With 8 years’ experience, and as the author response rate reaches 50%, we review community curation progress and the insights we have gained from the project. We highlight incentives and nudges we deploy to maximize participation, and summarize project outcomes, which include increased knowledge integration and dissemination as well as the unanticipated added value arising from co-curation by publication authors and professional curators.
Collapse
Affiliation(s)
- Antonia Lock
- Department of Genetics, Evolution and Environment, University College London, Gower street, London WC1E 6BT, UK
| | - Midori A Harris
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Kim Rutherford
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Jacqueline Hayles
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK
| | - Valerie Wood
- Cell Cycle Laboratory, The Francis Crick Institute, Midland Rd, London NW1 1AT, UK
| |
Collapse
|
41
|
Mabee PM, Balhoff JP, Dahdul WM, Lapp H, Mungall CJ, Vision TJ. A Logical Model of Homology for Comparative Biology. Syst Biol 2020; 69:345-362. [PMID: 31596473 PMCID: PMC7672696 DOI: 10.1093/sysbio/syz067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 09/20/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open
Abstract
There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].
Collapse
Affiliation(s)
- Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Wasila M Dahdul
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3280, USA
| |
Collapse
|
42
|
Roy J, Cheung E, Bhatti J, Muneem A, Lobo D. Curation and annotation of planarian gene expression patterns with segmented reference morphologies. Bioinformatics 2020; 36:2881-2887. [DOI: 10.1093/bioinformatics/btaa023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 12/07/2019] [Accepted: 01/14/2020] [Indexed: 12/30/2022] Open
Abstract
Abstract
Motivation
Morphological and genetic spatial data from functional experiments based on genetic, surgical and pharmacological perturbations are being produced at an extraordinary pace in developmental and regenerative biology. However, our ability to extract knowledge from these large datasets are hindered due to the lack of formalization methods and tools able to unambiguously describe, centralize and interpret them. Formalizing spatial phenotypes and gene expression patterns is especially challenging in organisms with highly variable morphologies such as planarian worms, which due to their extraordinary regenerative capability can experimentally result in phenotypes with almost any combination of body regions or parts.
Results
Here, we present a computational methodology and mathematical formalism to encode and curate the morphological outcomes and gene expression patterns in planaria. Worm morphologies are encoded with mathematical graphs based on anatomical ontology terms to automatically generate reference morphologies. Gene expression patterns are registered to these standard reference morphologies, which can then be annotated automatically with anatomical ontology terms by analyzing the spatial expression patterns and their textual descriptions. This methodology enables the curation and annotation of complex experimental morphologies together with their gene expression patterns in a centralized standardized dataset, paving the way for the extraction of knowledge and reverse-engineering of the much sought-after mechanistic models in planaria and other regenerative organisms.
Availability and implementation
We implemented this methodology in a user-friendly graphical software tool, PlanGexQ, freely available together with the data in the manuscript at https://lobolab.umbc.edu/plangexq.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Joy Roy
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Eric Cheung
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Junaid Bhatti
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Abraar Muneem
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| | - Daniel Lobo
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD 21250, USA
| |
Collapse
|
43
|
Yang M, Zhang W, Yao G, Zhang H, Li W. Combined alignments of sequences and domains characterize unknown proteins with remotely related protein search PSISearch2D. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5532822. [PMID: 31317184 PMCID: PMC6637259 DOI: 10.1093/database/baz092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 06/12/2019] [Accepted: 06/13/2019] [Indexed: 11/20/2022]
Abstract
Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.
Collapse
Affiliation(s)
- Minglei Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Guocai Yao
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education
| |
Collapse
|
44
|
Kishore R, Arnaboldi V, Van Slyke CE, Chan J, Nash RS, Urbano JM, Dolan ME, Engel SR, Shimoyama M, Sternberg PW, Genome Resources TAO. Automated generation of gene summaries at the Alliance of Genome Resources. Database (Oxford) 2020; 2020:baaa037. [PMID: 32559296 PMCID: PMC7304461 DOI: 10.1093/database/baaa037] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/06/2020] [Accepted: 04/29/2020] [Indexed: 12/28/2022]
Abstract
Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.
Collapse
Affiliation(s)
- Ranjana Kishore
- WormBase, Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Valerio Arnaboldi
- WormBase, Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Ceri E Van Slyke
- ZFIN, The Institute of Neuroscience, 222 Huestis Hall, University of Oregon, Eugene, OR 97403-1254, USA
| | - Juancarlos Chan
- WormBase, Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | - Robert S Nash
- Saccharomyces Genome Database, Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Jose M Urbano
- FlyBase, Department of Physiology, Development and Neuroscience, 7 Downing Pl, University of Cambridge, Cambridge CB2 3DY, UK
| | - Mary E Dolan
- MGI, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Stacia R Engel
- Saccharomyces Genome Database, Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Mary Shimoyama
- Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin and Marquette University, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA
| | - Paul W Sternberg
- WormBase, Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125, USA
| | | |
Collapse
|
45
|
Nash RS, Weng S, Karra K, Wong ED, Engel SR, Cherry JM. Incorporation of a unified protein abundance dataset into the Saccharomyces genome database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5775554. [PMID: 32128557 PMCID: PMC7054198 DOI: 10.1093/database/baaa008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The identification and accurate quantitation of protein abundance has been a major objective of proteomics research. Abundance studies have the potential to provide users with data that can be used to gain a deeper understanding of protein function and regulation and can also help identify cellular pathways and modules that operate under various environmental stress conditions. One of the central missions of the Saccharomyces Genome Database (SGD; https://www.yeastgenome.org) is to work with researchers to identify and incorporate datasets of interest to the wider scientific community, thereby enabling hypothesis-driven research. A large number of studies have detailed efforts to generate proteome-wide abundance data, but deeper analyses of these data have been hampered by the inability to compare results between studies. Recently, a unified protein abundance dataset was generated through the evaluation of more than 20 abundance datasets, which were normalized and converted to common measurement units, in this case molecules per cell. We have incorporated these normalized protein abundance data and associated metadata into the SGD database, as well as the SGD YeastMine data warehouse, resulting in the addition of 56 487 values for untreated cells grown in either rich or defined media and 28 335 values for cells treated with environmental stressors. Abundance data for protein-coding genes are displayed in a sortable, filterable table on Protein pages, available through Locus Summary pages. A median abundance value was incorporated, and a median absolute deviation was calculated for each protein-coding gene and incorporated into SGD. These values are displayed in the Protein section of the Locus Summary page. The inclusion of these data has enhanced the quality and quantity of protein experimental information presented at SGD and provides opportunities for researchers to access and utilize the data to further their research.
Collapse
Affiliation(s)
- Robert S Nash
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, 3165 Porter Drive, Palo Alto, CA 94304, USA
| | | |
Collapse
|
46
|
Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. Identifying pseudoenzymes using functional annotation: pitfalls of common practice. FEBS J 2019; 287:4128-4140. [PMID: 31733177 DOI: 10.1111/febs.15142] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 11/14/2019] [Indexed: 12/13/2022]
Abstract
Pseudoenzymes are proteins that are evolutionary related to enzymes but lack relevant catalytic activity. They are usually evolved from enzymatic ancestors that have lost their catalytic activities. The loss of catalytic function is one extreme amongst the other evolutionary changes that can occur to enzymes, like the changing of substrate specificity or the reaction catalysed. However, the loss of catalytic function events remain poorly characterised, except for some notable examples, like the pseudokinases. In this review, we aim to analyse current knowledge related to pseudoenzymes across a large number of enzymes families. This aims to be a review of the data available in biological databases, rather than a more traditional literature review. In particular, we use UniProtKB as the source for functional annotation and M-CSA (Mechanism and Catalytic Site Atlas) for information on the catalytic residues of enzymes. We show that explicit annotation of lack of activity is not exhaustive in UniProtKB and that a protocol using lack of catalytic annotation as an indication for lack of function can be an adequate alternative, after some corrections. After identifying pseudoenzymes related to enzymes in M-CSA, we were able to comment on their prevalence across enzyme families, and on the correlation between lack of catalytic function and the mutation of catalytic residues. These analyses challenge two common ideas in the emerging literature: that pseudoenzymes are ubiquitous across enzyme families and that mutations in the catalytic residues of enzyme homologues are always a good indication of lack of activity.
Collapse
Affiliation(s)
- Antonio J M Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Jonathan D Tyzack
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
| |
Collapse
|
47
|
Kanavy DM, McNulty SM, Jairath MK, Brnich SE, Bizon C, Powell BC, Berg JS. Comparative analysis of functional assay evidence use by ClinGen Variant Curation Expert Panels. Genome Med 2019; 11:77. [PMID: 31783775 PMCID: PMC6884856 DOI: 10.1186/s13073-019-0683-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 11/05/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The 2015 American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) guidelines for clinical sequence variant interpretation state that "well-established" functional studies can be used as evidence in variant classification. These guidelines articulated key attributes of functional data, including that assays should reflect the biological environment and be analytically sound; however, details of how to evaluate these attributes were left to expert judgment. The Clinical Genome Resource (ClinGen) designates Variant Curation Expert Panels (VCEPs) in specific disease areas to make gene-centric specifications to the ACMG/AMP guidelines, including more specific definitions of appropriate functional assays. We set out to evaluate the existing VCEP guidelines for functional assays. METHODS We evaluated the functional criteria (PS3/BS3) of six VCEPs (CDH1, Hearing Loss, Inherited Cardiomyopathy-MYH7, PAH, PTEN, RASopathy). We then established criteria for evaluating functional studies based on disease mechanism, general class of assay, and the characteristics of specific assay instances described in the primary literature. Using these criteria, we extensively curated assay instances cited by each VCEP in their pilot variant classification to analyze VCEP recommendations and their use in the interpretation of functional studies. RESULTS Unsurprisingly, our analysis highlighted the breadth of VCEP-approved assays, reflecting the diversity of disease mechanisms among VCEPs. We also noted substantial variability between VCEPs in the method used to select these assays and in the approach used to specify strength modifications, as well as differences in suggested validation parameters. Importantly, we observed discrepancies between the parameters VCEPs specified as required for approved assay instances and the fulfillment of these requirements in the individual assays cited in pilot variant interpretation. CONCLUSIONS Interpretation of the intricacies of functional assays often requires expert-level knowledge of the gene and disease, and current VCEP recommendations for functional assay evidence are a useful tool to improve the accessibility of functional data by providing a starting point for curators to identify approved functional assays and key metrics. However, our analysis suggests that further guidance is needed to standardize this process and ensure consistency in the application of functional evidence.
Collapse
Affiliation(s)
- Dona M Kanavy
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Shannon M McNulty
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Meera K Jairath
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sarah E Brnich
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Bradford C Powell
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jonathan S Berg
- Department of Genetics, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
48
|
Koopmans F, van Nierop P, Andres-Alonso M, Byrnes A, Cijsouw T, Coba MP, Cornelisse LN, Farrell RJ, Goldschmidt HL, Howrigan DP, Hussain NK, Imig C, de Jong APH, Jung H, Kohansalnodehi M, Kramarz B, Lipstein N, Lovering RC, MacGillavry H, Mariano V, Mi H, Ninov M, Osumi-Sutherland D, Pielot R, Smalla KH, Tang H, Tashman K, Toonen RFG, Verpelli C, Reig-Viader R, Watanabe K, van Weering J, Achsel T, Ashrafi G, Asi N, Brown TC, De Camilli P, Feuermann M, Foulger RE, Gaudet P, Joglekar A, Kanellopoulos A, Malenka R, Nicoll RA, Pulido C, de Juan-Sanz J, Sheng M, Südhof TC, Tilgner HU, Bagni C, Bayés À, Biederer T, Brose N, Chua JJE, Dieterich DC, Gundelfinger ED, Hoogenraad C, Huganir RL, Jahn R, Kaeser PS, Kim E, Kreutz MR, McPherson PS, Neale BM, O'Connor V, Posthuma D, Ryan TA, Sala C, Feng G, Hyman SE, Thomas PD, Smit AB, Verhage M. SynGO: An Evidence-Based, Expert-Curated Knowledge Base for the Synapse. Neuron 2019; 103:217-234.e4. [PMID: 31171447 PMCID: PMC6764089 DOI: 10.1016/j.neuron.2019.05.002] [Citation(s) in RCA: 525] [Impact Index Per Article: 87.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 04/02/2019] [Accepted: 04/30/2019] [Indexed: 12/23/2022]
Abstract
Synapses are fundamental information-processing units of the brain, and synaptic dysregulation is central to many brain disorders ("synaptopathies"). However, systematic annotation of synaptic genes and ontology of synaptic processes are currently lacking. We established SynGO, an interactive knowledge base that accumulates available research about synapse biology using Gene Ontology (GO) annotations to novel ontology terms: 87 synaptic locations and 179 synaptic processes. SynGO annotations are exclusively based on published, expert-curated evidence. Using 2,922 annotations for 1,112 genes, we show that synaptic genes are exceptionally well conserved and less tolerant to mutations than other genes. Many SynGO terms are significantly overrepresented among gene variations associated with intelligence, educational attainment, ADHD, autism, and bipolar disorder and among de novo variants associated with neurodevelopmental disorders, including schizophrenia. SynGO is a public, universal reference for synapse research and an online analysis platform for interpretation of large-scale -omics data (https://syngoportal.org and http://geneontology.org).
Collapse
Affiliation(s)
- Frank Koopmans
- Department of Functional Genomics, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands; Department of Molecular and Cellular Neurobiology, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Pim van Nierop
- Department of Molecular and Cellular Neurobiology, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Maria Andres-Alonso
- RG Neuroplasticity, Leibniz Institute for Neurobiology, 39118 Magdeburg, Germany; Leibniz Group "Dendritic Organelles and Synaptic Function," ZMNH, University MC, Hamburg, 20251, Germany
| | - Andrea Byrnes
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tony Cijsouw
- Department of Neuroscience, Tufts University School of Medicine, Boston, MA 02111, USA
| | - Marcelo P Coba
- Zilkha Neurogenetic Institute and Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90333, USA
| | - L Niels Cornelisse
- Department of Functional Genomics, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Ryan J Farrell
- Department of Biochemistry, Weill Cornell Medicine, New York, NY 10065, USA
| | - Hana L Goldschmidt
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniel P Howrigan
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Natasha K Hussain
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Cordelia Imig
- Department of Molecular Neurobiology, Max Planck Institute of Experimental Medicine, 37075 Göttingen, Germany
| | - Arthur P H de Jong
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Hwajin Jung
- Center for Synaptic Brain Dysfunctions, IBS, and Department of Biological Sciences, KAIST, Daejeon 34141, South Korea
| | - Mahdokht Kohansalnodehi
- Department of Neurobiology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - Barbara Kramarz
- Functional Gene Annotation, Institute of Cardiovascular Science, UCL, London WC1E 6JF, UK
| | - Noa Lipstein
- Department of Molecular Neurobiology, Max Planck Institute of Experimental Medicine, 37075 Göttingen, Germany
| | - Ruth C Lovering
- Functional Gene Annotation, Institute of Cardiovascular Science, UCL, London WC1E 6JF, UK
| | - Harold MacGillavry
- Cell Biology, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
| | - Vittoria Mariano
- Department of Fundamental Neurosciences, University of Lausanne, 1006 Lausanne, Switzerland; Department of Biomedicine and Prevention, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Momchil Ninov
- Department of Neurobiology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | - Rainer Pielot
- Leibniz Institute for Neurobiology, CBBS and Medical Faculty, Otto von Guericke University, 39120 Magdeburg, Germany
| | - Karl-Heinz Smalla
- Leibniz Institute for Neurobiology, CBBS and Medical Faculty, Otto von Guericke University, 39120 Magdeburg, Germany
| | - Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Katherine Tashman
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ruud F G Toonen
- Department of Functional Genomics, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Chiara Verpelli
- CNR Neuroscience Institute Milan and Department of Biotechnology and Translational Medicine, University of Milan, 20129 Milan, Italy
| | - Rita Reig-Viader
- Molecular Physiology of the Synapse Laboratory, Biomedical Research Institute Sant Pau, 08025 Barcelona, Spain; Universitat Autònoma de Barcelona, 08193 Bellaterra, Cerdanyola del Vallès, Spain
| | - Kyoko Watanabe
- Department Complex Trait Genetics, CNCR, Neuroscience Campus Amsterdam, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, the Netherlands; Department of Clinical Genetics, UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Jan van Weering
- Department of Functional Genomics, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Tilmann Achsel
- Department of Fundamental Neurosciences, University of Lausanne, 1006 Lausanne, Switzerland; Department of Biomedicine and Prevention, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Ghazaleh Ashrafi
- Department of Biochemistry, Weill Cornell Medicine, New York, NY 10065, USA
| | - Nimra Asi
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tyler C Brown
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Pietro De Camilli
- Departments of Neuroscience and Cell Biology, HHMI, Kavli Institute for Neuroscience, Yale University School of Medicine, 295 Congress Avenue, New Haven, CT 06510, USA
| | - Marc Feuermann
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland
| | - Rebecca E Foulger
- Functional Gene Annotation, Institute of Cardiovascular Science, UCL, London WC1E 6JF, UK
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland
| | - Anoushka Joglekar
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Alexandros Kanellopoulos
- Department of Fundamental Neurosciences, University of Lausanne, 1006 Lausanne, Switzerland; Department of Biomedicine and Prevention, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Robert Malenka
- Nancy Pritzker Laboratory, Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| | - Roger A Nicoll
- Departments of Cellular and Molecular Pharmacology and Physiology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Camila Pulido
- Department of Biochemistry, Weill Cornell Medicine, New York, NY 10065, USA
| | - Jaime de Juan-Sanz
- Department of Biochemistry, Weill Cornell Medicine, New York, NY 10065, USA
| | - Morgan Sheng
- Department of Neuroscience, Genentech, South San Francisco, CA 94080, USA
| | - Thomas C Südhof
- Department of Molecular and Cellular Physiology, Howard Hughes Medical Institute, Stanford University, Stanford, CA 94305, USA
| | - Hagen U Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Claudia Bagni
- Department of Fundamental Neurosciences, University of Lausanne, 1006 Lausanne, Switzerland; Department of Biomedicine and Prevention, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Àlex Bayés
- Molecular Physiology of the Synapse Laboratory, Biomedical Research Institute Sant Pau, 08025 Barcelona, Spain; Universitat Autònoma de Barcelona, 08193 Bellaterra, Cerdanyola del Vallès, Spain
| | - Thomas Biederer
- Department of Neuroscience, Tufts University School of Medicine, Boston, MA 02111, USA
| | - Nils Brose
- Department of Molecular Neurobiology, Max Planck Institute of Experimental Medicine, 37075 Göttingen, Germany
| | - John Jia En Chua
- Department of Physiology, Yong Loo Lin School of Medicine and Neurobiology/Ageing Program, Life Sciences Institute, National University of Singapore and Institute of Molecular and Cell Biology, A(∗)STAR, Singapore, Singapore
| | - Daniela C Dieterich
- Leibniz Institute for Neurobiology, CBBS and Medical Faculty, Otto von Guericke University, 39120 Magdeburg, Germany
| | - Eckart D Gundelfinger
- Leibniz Institute for Neurobiology, CBBS and Medical Faculty, Otto von Guericke University, 39120 Magdeburg, Germany
| | - Casper Hoogenraad
- Cell Biology, Department of Biology, Faculty of Science, Utrecht University, 3584 CH Utrecht, the Netherlands
| | - Richard L Huganir
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Reinhard Jahn
- Department of Neurobiology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany
| | - Pascal S Kaeser
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Eunjoon Kim
- Center for Synaptic Brain Dysfunctions, IBS, and Department of Biological Sciences, KAIST, Daejeon 34141, South Korea
| | - Michael R Kreutz
- RG Neuroplasticity, Leibniz Institute for Neurobiology, 39118 Magdeburg, Germany; Leibniz Group "Dendritic Organelles and Synaptic Function," ZMNH, University MC, Hamburg, 20251, Germany
| | - Peter S McPherson
- Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada
| | - Ben M Neale
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Vincent O'Connor
- Biological Sciences, University of Southampton, Southampton SO17 1BJ, UK
| | - Danielle Posthuma
- Department Complex Trait Genetics, CNCR, Neuroscience Campus Amsterdam, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, the Netherlands; Department of Clinical Genetics, UMC Amsterdam, 1081 HV Amsterdam, the Netherlands
| | - Timothy A Ryan
- Department of Biochemistry, Weill Cornell Medicine, New York, NY 10065, USA
| | - Carlo Sala
- CNR Neuroscience Institute Milan and Department of Biotechnology and Translational Medicine, University of Milan, 20129 Milan, Italy
| | - Guoping Feng
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven E Hyman
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - August B Smit
- Department of Molecular and Cellular Neurobiology, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands.
| | - Matthijs Verhage
- Department of Functional Genomics, CNCR, VU University and UMC Amsterdam, 1081 HV Amsterdam, the Netherlands.
| |
Collapse
|
49
|
Siegele DA, LaBonte SA, Wu PIF, Chibucos MC, Nandendla S, Giglio MG, Hu JC. Phenotype annotation with the ontology of microbial phenotypes (OMP). J Biomed Semantics 2019; 10:13. [PMID: 31307550 PMCID: PMC6631659 DOI: 10.1186/s13326-019-0205-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 06/19/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microbial genetics has formed a foundation for understanding many aspects of biology. Systematic annotation that supports computational data mining should reveal further insights for microbes, microbiomes, and conserved functions beyond microbes. The Ontology of Microbial Phenotypes (OMP) was created to support such annotation. RESULTS We define standards for an OMP-based annotation framework that supports the capture of a variety of phenotypes and provides flexibility for different levels of detail based on a combination of pre- and post-composition using OMP and other Open Biomedical Ontology (OBO) projects. A system for entering and viewing OMP annotations has been added to our online, public, web-based data portal. CONCLUSIONS The annotation framework described here is ready to support projects to capture phenotypes from the experimental literature for a variety of microbes. Defining the OMP annotation standard should support the development of new software tools for data mining and analysis in comparative phenomics.
Collapse
Affiliation(s)
- Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Sandra A LaBonte
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA
| | - Peter I-Fan Wu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA
| | - Marcus C Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Suvarna Nandendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michelle G Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, USA.
| |
Collapse
|
50
|
Kafkas Ş, Abdelhakim M, Hashish Y, Kulmanov M, Abdellatif M, Schofield PN, Hoehndorf R. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research. Sci Data 2019; 6:79. [PMID: 31160594 PMCID: PMC6546783 DOI: 10.1038/s41597-019-0090-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/07/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate host disease phenotypes with infectious agents. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/ , and the data are freely available through a public SPARQL endpoint.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Yasmeen Hashish
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdellatif
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
| |
Collapse
|