1
|
Seah BKB. Paying it forward: Crowdsourcing the harmonisation and linking of taxon names and biodiversity identifiers. Biodivers Data J 2023; 11:e114076. [PMID: 38312332 PMCID: PMC10838036 DOI: 10.3897/bdj.11.e114076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 11/06/2023] [Indexed: 02/06/2024] Open
Abstract
Linking records for the same taxa between different databases is an essential step when working with biodiversity data. However, name-matching alone is error-prone, because of issues such as homonyms (unrelated taxa with the same name) and synonyms (same taxon under different names). Therefore, most projects will require some curation to ensure that taxon identifiers are correctly linked. Unfortunately, formal guidance on such curation is uncommon and these steps are often ad hoc and poorly documented, which hinders transparency and reproducibility, yet the task requires specialist knowledge and cannot be easily automated without careful validation. Here, we present a case study on linking identifiers between the GBIF and NCBI taxonomies for a species checklist. This represents a common scenario: finding published sequence data (from NCBI) for species chosen by occurrence or geographical distribution (from GBIF). Wikidata, a publicly editable knowledge base of structured data, can serve as an additional information source for identifier linking. We suggest a software toolkit for taxon name-matching and data-cleaning, describe common issues encountered during curation and propose concrete steps to address them. For example, about 2.8% of the taxa in our dataset had wrong identifiers linked on Wikidata because of errors in name-matching caused by homonyms. By correcting such errors during data-cleaning, either directly (through editing Wikidata) or indirectly (by reporting errors in GBIF or NCBI), we crowdsource the curation and contribute to community resources, thereby improving the quality of downstream analyses.
Collapse
Affiliation(s)
- Brandon Kwee Boon Seah
- Thünen Institute for Biodiversity, Braunschweig, GermanyThünen Institute for BiodiversityBraunschweigGermany
| |
Collapse
|
2
|
Evenstein Sigalov S, Nachmias R. Investigating the potential of the semantic web for education: Exploring Wikidata as a learning platform. EDUCATION AND INFORMATION TECHNOLOGIES 2023:1-50. [PMID: 37361737 PMCID: PMC10009355 DOI: 10.1007/s10639-023-11664-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 02/07/2023] [Indexed: 06/28/2023]
Abstract
Wikidata is a free, multilingual, open knowledge-base that stores structured, linked data. It has grown rapidly and as of December 2022 contains over 100 million items and millions of statements, making it the largest semantic knowledge-base in existence. Changing the interaction between people and knowledge, Wikidata offers various learning opportunities, leading to new applications in sciences, technology and cultures. These learning opportunities stem in part from the ability to query this data and ask questions that were difficult to answer in the past. They also stem from the ability to visualize query results, for example on a timeline or a map, which, in turn, helps users make sense of the data and draw additional insights from it. Research on the semantic web as learning platform and on Wikidata in the context of education is almost non-existent, and we are just beginning to understand how to utilize it for educational purposes. This research investigates the Semantic Web as a learning platform, focusing on Wikidata as a prime example. To that end, a methodology of multiple case studies was adopted, demonstrating Wikidata uses by early adopters. Seven semi-structured, in-depth interviews were conducted, out of which 10 distinct projects were extracted. A thematic analysis approach was deployed, revealing eight main uses, as well as benefits and challenges to engaging with the platform. The results shed light on Wikidata's potential as a lifelong learning process, enabling opportunities for improved Data Literacy and a worldwide social impact.
Collapse
Affiliation(s)
| | - Rafi Nachmias
- School of Education, Tel Aviv University, 55 Levanon St., 6997801 Tel Aviv, Israel
| |
Collapse
|
3
|
Obregón Sierra Á. Inserción de metadatos de las bibliotecas españolas en Wikidata: un modelo de datos abiertos enlazados. REVISTA ESPANOLA DE DOCUMENTACION CIENTIFICA 2022. [DOI: 10.3989/redc.2022.3.1870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
El término datos abiertos enlazados se ha hecho muy habitual en el ámbito de la biblioteconomía en los últimos años. Esto es debido al uso que se puede hacer de los metadatos con los que cuentan las bibliotecas. Normalmente se habla de liberar esta información que poseen, pero no sobre dónde se encuentran físicamente estas instituciones u otros datos de la propia institución, para que también puedan utilizarse por cualquier usuario. Esta información podría ser útil para realizar búsquedas de las bibliotecas más cercanas, acceder a los medios de comunicación con los que cuentan o consultar los identificadores que tienen en otras bases de datos. El objetivo de este artículo es el de mostrar el procedimiento utilizado para inserción de todas las bibliotecas de España en una base de datos libre, con la intención de que sean accesibles por todo el mundo. Utilizando diversas herramientas libres se limpiaron los datos y se insertaron 7861 bibliotecas nuevas en Wikidata, para después corregir duplicaciones existentes e insertar nuevos campos. Por último, se presentan diferentes posibilidades de reutilización de los datos en Wikipedia, demostrando que esos datos pueden ser útiles para los usuarios que quieran utilizarlos en el futuro.
Collapse
|
4
|
Bundhoo E, Ghoorah AW, Jaufeerally-Fakim Y. TAGOPSIN: collating taxa-specific gene and protein functional and structural information. BMC Bioinformatics 2021; 22:517. [PMID: 34688246 PMCID: PMC8541804 DOI: 10.1186/s12859-021-04429-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 10/06/2021] [Indexed: 11/25/2022] Open
Abstract
Background The wealth of biological information available nowadays in public databases has triggered an unprecedented rise in multi-database search and data retrieval for obtaining detailed information about key functional and structural entities. This concerns investigations ranging from gene or genome analysis to protein structural analysis. However, the retrieval of interconnected data from a number of different databases is very often done repeatedly in an unsystematic way. Results Here, we present TAxonomy, Gene, Ontology, Protein, Structure INtegrated (TAGOPSIN), a command line program written in Java for rapid and systematic retrieval of select data from seven of the most popular public biological databases relevant to comparative genomics and protein structure studies. The program allows a user to retrieve organism-centred data and assemble them in a single data warehouse which constitutes a useful resource for several biological applications. TAGOPSIN was tested with a number of organisms encompassing eukaryotes, prokaryotes and viruses. For example, it successfully integrated data for about 17,000 UniProt entries of Homo sapiens and 21 UniProt entries of human coronavirus. Conclusion TAGOPSIN demonstrates efficient data integration whereby manipulation of interconnected data is more convenient than doing multi-database queries. The program facilitates for instance interspecific comparative analyses of protein-coding genes in a molecular evolutionary study, or identification of taxa-specific protein domains and three-dimensional structures. TAGOPSIN is available as a JAR file at https://github.com/ebundhoo/TAGOPSIN and is released under the GNU General Public License. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04429-5.
Collapse
Affiliation(s)
- Eshan Bundhoo
- Department of Agricultural and Food Science, Faculty of Agriculture, University of Mauritius, Reduit, 80837, Mauritius
| | - Anisah W Ghoorah
- Department of Digital Technologies, Faculty of Information, Communication and Digital Technologies, University of Mauritius, Reduit, 80837, Mauritius.
| | - Yasmina Jaufeerally-Fakim
- Department of Agricultural and Food Science, Faculty of Agriculture, University of Mauritius, Reduit, 80837, Mauritius
| |
Collapse
|
5
|
Ruiz-Saavedra S, García-González H, Arboleya S, Salazar N, Emilio Labra-Gayo J, Díaz I, Gueimonde M, González S, de los Reyes-Gavilán CG. Intestinal microbiota alterations by dietary exposure to chemicals from food cooking and processing. Application of data science for risk prediction. Comput Struct Biotechnol J 2021; 19:1081-1091. [PMID: 33680352 PMCID: PMC7892627 DOI: 10.1016/j.csbj.2021.01.037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/22/2021] [Accepted: 01/22/2021] [Indexed: 01/07/2023] Open
Abstract
Diet is one of the main sources of exposure to toxic chemicals with carcinogenic potential, some of which are generated during food processing, depending on the type of food (primarily meat, fish, bread and potatoes), cooking methods and temperature. Although demonstrated in animal models at high doses, an unequivocal link between dietary exposure to these compounds with disease has not been proven in humans. A major difficulty in assessing the actual intake of these toxic compounds is the lack of standardised and harmonised protocols for collecting and analysing dietary information. The intestinal microbiota (IM) has a great influence on health and is altered in some diseases such as colorectal cancer (CRC). Diet influences the composition and activity of the IM, and the net exposure to genotoxicity of potential dietary carcinogens in the gut depends on the interaction among these compounds, IM and diet. This review analyses critically the difficulties and challenges in the study of interactions among these three actors on the onset of CRC. Machine Learning (ML) of data obtained in subclinical and precancerous stages would help to establish risk thresholds for the intake of toxic compounds generated during food processing as related to diet and IM profiles, whereas Semantic Web could improve data accessibility and usability from different studies, as well as helping to elucidate novel interactions among those chemicals, IM and diet.
Collapse
Affiliation(s)
- Sergio Ruiz-Saavedra
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA-CSIC), 33300 Villaviciosa, Asturias, Spain
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Herminio García-González
- Department of Computer Science, University of Oviedo, C/ Federico García Lorca S/N, 33007 Oviedo, Asturias, Spain
- IT and Communications Service, University of Oviedo, C/ Fernando Bongera S/N, 33006 Oviedo, Asturias, Spain
| | - Silvia Arboleya
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA-CSIC), 33300 Villaviciosa, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Nuria Salazar
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA-CSIC), 33300 Villaviciosa, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - José Emilio Labra-Gayo
- Department of Computer Science, University of Oviedo, C/ Federico García Lorca S/N, 33007 Oviedo, Asturias, Spain
| | - Irene Díaz
- Department of Computer Science, University of Oviedo, C/ Federico García Lorca S/N, 33007 Oviedo, Asturias, Spain
| | - Miguel Gueimonde
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA-CSIC), 33300 Villaviciosa, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Sonia González
- Department of Functional Biology, University of Oviedo, 33006 Oviedo, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| | - Clara G. de los Reyes-Gavilán
- Department of Microbiology and Biochemistry of Dairy Products, Instituto de Productos Lácteos de Asturias (IPLA-CSIC), 33300 Villaviciosa, Asturias, Spain
- Diet, Microbiota and Health Group, Instituto de Investigación Sanitaria del Principado de Asturias (ISPA), 33011 Oviedo, Spain
| |
Collapse
|
6
|
Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, Hanspers K, Hermjakob H, Hudson TS, Hybiske K, Keating SM, Manske M, Mayers M, Mietchen D, Mitraka E, Pico AR, Putman T, Riutta A, Queralt-Rosinach N, Schriml LM, Shafee T, Slenter D, Stephan R, Thornton K, Tsueng G, Tu R, Ul-Hasan S, Willighagen E, Wu C, Su AI. Wikidata as a knowledge graph for the life sciences. eLife 2020; 9:e52614. [PMID: 32180547 PMCID: PMC7077981 DOI: 10.7554/elife.52614] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 02/28/2020] [Indexed: 12/22/2022] Open
Abstract
Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.
Collapse
Affiliation(s)
| | - Gregory Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sebastian Burgstaller-Muehlbacher
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna and Medical University of ViennaViennaAustria
| | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Obi L Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | | | - Toby S Hudson
- School of Chemistry, The University of SydneySydneyAustralia
| | - Kevin Hybiske
- Division of Allergy and Infectious Diseases, Department of Medicine, University of WashingtonSeattleUnited States
| | - Sarah M Keating
- European Bioinformatics Institute (EMBL-EBI)HinxtonUnited Kingdom
| | - Magnus Manske
- Wellcome Trust Sanger InstituteCambridgeUnited Kingdom
| | - Michael Mayers
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Daniel Mietchen
- School of Data Science, University of VirginiaCharlottesvilleUnited States
| | - Elvira Mitraka
- University of Maryland School of MedicineBaltimoreUnited States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Timothy Putman
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Anders Riutta
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Nuria Queralt-Rosinach
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Lynn M Schriml
- University of Maryland School of MedicineBaltimoreUnited States
| | - Thomas Shafee
- Department of Animal Plant and Soil Sciences, La Trobe UniversityMelbourneAustralia
| | - Denise Slenter
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | | | | | - Ginger Tsueng
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Roger Tu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sabah Ul-Hasan
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| |
Collapse
|
7
|
Abstract
Publishing authoritative genomic annotation data, keeping it up to date, linking it to related information, and allowing community annotation is difficult and hard to support with limited resources. Here, we show how importing GeneDB annotation data into Wikidata allows for leveraging existing resources, integrating volunteer and scientific communities, and enriching the original information.
Collapse
Affiliation(s)
- Magnus Manske
- Parasites and Microbes, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK
| | - Ulrike Böhme
- Parasites and Microbes, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK
| | - Christoph Püthe
- Parasites and Microbes, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK
| | - Matt Berriman
- Parasites and Microbes, Wellcome Trust Sanger Institute, Cambridge, CB10 1SA, UK
| |
Collapse
|
8
|
Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 2019; 46:D661-D667. [PMID: 29136241 PMCID: PMC5753270 DOI: 10.1093/nar/gkx1064] [Citation(s) in RCA: 634] [Impact Index Per Article: 105.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 10/25/2017] [Indexed: 02/06/2023] Open
Abstract
WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.
Collapse
Affiliation(s)
- Denise N Slenter
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | | | - Anders Riutta
- Gladstone Institutes, San Francisco, California, CA 94158, USA
| | - Jacob Windsor
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Nuno Nunes
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Jonathan Mélius
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Elisa Cirillo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Chemistry, 1090 Vienna, Austria
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Pieter Giesbertz
- Chair of Nutritional Physiology, Technische Universität München, 85350 Freising, Germany
| | - Marianthi Kalafati
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Marvin Martens
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Ryan Miller
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Kozo Nishida
- Laboratory for Biochemical Simulation, RIKEN Quantitative Biology Center, Suita, Osaka 565-0874, Japan
| | - Linda Rieswijk
- Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA 94720, USA
| | - Andra Waagmeester
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Micelio, Antwerp, Belgium
| | - Lars M T Eijssen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology, Maastricht University Medical Centre, 6229 ER Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | | | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| |
Collapse
|
9
|
Putman T, Hybiske K, Jow D, Afrasiabi C, Lelong S, Cano MA, Stupp GS, Waagmeester A, Good BM, Wu C, Su AI. ChlamBase: a curated model organism database for the Chlamydia research community. Database (Oxford) 2019; 2019:baz041. [PMID: 30985891 PMCID: PMC6463448 DOI: 10.1093/database/baz041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 02/22/2019] [Accepted: 03/07/2019] [Indexed: 02/06/2023]
Abstract
The accelerating growth of genomic and proteomic information for Chlamydia species, coupled with unique biological aspects of these pathogens, necessitates bioinformatic tools and features that are not provided by major public databases. To meet these growing needs, we developed ChlamBase, a model organism database for Chlamydia that is built upon the WikiGenomes application framework, and Wikidata, a community-curated database. ChlamBase was designed to serve as a central access point for genomic and proteomic information for the Chlamydia research community. ChlamBase integrates information from numerous external databases, as well as important data extracted from the literature that are otherwise not available in structured formats that are easy to use. In addition, a key feature of ChlamBase is that it empowers users in the field to contribute new annotations and data as the field advances with continued discoveries. ChlamBase is freely and publicly available at chlambase.org.
Collapse
Affiliation(s)
- Tim Putman
- Ontology Development Group, Library, Oregon Health and Science University, Portland, OR, USA
| | - Kevin Hybiske
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Derek Jow
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Cyrus Afrasiabi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Sebastien Lelong
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Gregory S Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
10
|
Mignone A, Grand A, Fiori A, Medico E, Bertotti A. Semalytics: a semantic analytics platform for the exploration of distributed and heterogeneous cancer data in translational research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5529975. [PMID: 31287543 PMCID: PMC6615453 DOI: 10.1093/database/baz080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 05/06/2019] [Accepted: 05/29/2019] [Indexed: 12/26/2022]
Abstract
Each cancer is a complex system with unique molecular features determining its dynamics, such as its prognosis and response to therapies. Understanding the role of these biological traits is fundamental in order to personalize cancer clinical care according to the characteristics of each patient’s disease. To achieve this, translational researchers propagate patients’ samples through in vivo and in vitro cultures to test different therapies on the same tumor and to compare their outcomes with the molecular profile of the disease. This in turn generates information that can be subsequently translated into the development of predictive biomarkers for clinical use. These large-scale experiments generate huge collections of hierarchical data (i.e. experimental trees) with relative annotations that are extremely difficult to analyze. To address such issues in data analyses, we came up with the Semalytics data framework, the core of an analytical platform that processes experimental information through Semantic Web technologies. Semalytics allows (i) the efficient exploration of experimental trees with irregular structures together with their annotations. Moreover, (ii) the platform links its data to a wider open knowledge base (i.e. Wikidata) to add an extended knowledge layer without the need to manage and curate those data locally. Altogether, Semalytics provides augmented perspectives on experimental data, allowing the generation of new hypotheses, which were not anticipated by the user a priori. In this work, we present the data core we created for Semalytics, focusing on its semantic nucleus and on how it exploits semantic reasoning and data integration to tackle issues of this kind of analyses. Finally, we describe a proof-of-concept study based on the examination of several dozen cases of metastatic colorectal cancer in order to illustrate how Semalytics can help researchers generate hypotheses about the role of genes alterations in causing resistance or sensitivity of cancer cells to specific drugs.
Collapse
Affiliation(s)
- Andrea Mignone
- Candiolo Cancer Institute, FPO, IRCCS, Candiolo, Torino, Italy
| | - Alberto Grand
- Candiolo Cancer Institute, FPO, IRCCS, Candiolo, Torino, Italy
| | | | - Enzo Medico
- Candiolo Cancer Institute, FPO, IRCCS, Candiolo, Torino, Italy.,Department of Oncology, University of Torino, Torino, Italy
| | - Andrea Bertotti
- Candiolo Cancer Institute, FPO, IRCCS, Candiolo, Torino, Italy.,Department of Oncology, University of Torino, Torino, Italy
| |
Collapse
|
11
|
Combinatorial Interactions of Biotic and Abiotic Stresses in Plants and Their Molecular Mechanisms: Systems Biology Approach. Mol Biotechnol 2018; 60:636-650. [PMID: 29943149 DOI: 10.1007/s12033-018-0100-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Plants are continually facing biotic and abiotic stresses, and hence, they need to respond and adapt to survive. Plant response during multiple and combined biotic and abiotic stresses is highly complex and varied than the individual stress. These stresses resulted alteration of plant behavior through regulating the levels of microRNA, heat shock proteins, epigenetic variations. These variations can cause many adverse effects on the growth and development of the plant. Further, in natural conditions, several abiotic stresses causing factors make the plant more susceptible to pathogens infections and vice-versa. A very intricate and multifaceted interactions of various biomolecules are involved in metabolic pathways that can direct towards a cross-tolerance and improvement of plant's defence system. Systems biology approach plays a significant role in the investigation of these molecular interactions. The valuable information obtained by systems biology will help to develop stress-resistant plant varieties against multiple stresses. Thus, this review aims to decipher various multilevel interactions at the molecular level under combinatorial biotic and abiotic stresses and the role of systems biology to understand these molecular interactions.
Collapse
|
12
|
Mietchen D, Wodak S, Wasik S, Szostak N, Dessimoz C. Submit a Topic Page to PLOS Computational Biology and Wikipedia. PLoS Comput Biol 2018; 14:e1006137. [PMID: 29851950 PMCID: PMC5978877 DOI: 10.1371/journal.pcbi.1006137] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Daniel Mietchen
- Data Science Institute, University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| | - Shoshana Wodak
- Vlaams Instituut voor Biotechnologie-Vrije Universiteit Brussel Centre for Structural Biology, Brussels, Belgium
| | - Szymon Wasik
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- European Centre for Bioinformatics and Genomics, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Natalia Szostak
- Institute of Computing Science, Poznan University of Technology, Poznan, Poland
- European Centre for Bioinformatics and Genomics, Poznan University of Technology, Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Christophe Dessimoz
- University College London, London, United Kingdom
- University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
13
|
Stroehlein AJ, Young ND, Gasser RB. Improved strategy for the curation and classification of kinases, with broad applicability to other eukaryotic protein groups. Sci Rep 2018; 8:6808. [PMID: 29717207 PMCID: PMC5931623 DOI: 10.1038/s41598-018-25020-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 04/12/2018] [Indexed: 12/20/2022] Open
Abstract
Despite the substantial amount of genomic and transcriptomic data available for a wide range of eukaryotic organisms, most genomes are still in a draft state and can have inaccurate gene predictions. To gain a sound understanding of the biology of an organism, it is crucial that inferred protein sequences are accurately identified and annotated. However, this can be challenging to achieve, particularly for organisms such as parasitic worms (helminths), as most gene prediction approaches do not account for substantial phylogenetic divergence from model organisms, such as Caenorhabditis elegans and Drosophila melanogaster, whose genomes are well-curated. In this paper, we describe a bioinformatic strategy for the curation of gene families and subsequent annotation of encoded proteins. This strategy relies on pairwise gene curation between at least two closely related species using genomic and transcriptomic data sets, and is built on recent work on kinase complements of parasitic worms. Here, we discuss salient technical aspects of this strategy and its implications for the curation of protein families more generally.
Collapse
Affiliation(s)
- Andreas J Stroehlein
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia.
| | - Neil D Young
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia
| | - Robin B Gasser
- Melbourne Veterinary School, Department of Veterinary Biosciences, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, 3010, Australia.
| |
Collapse
|