Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023;39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open

For:	Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023;39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open

Number

Cited by Other Article(s)

Kushida T, de Farias TM, Sima AC, Dessimoz C, Chiba H, Bastian FB, Masuya H. Federated SPARQL query performance evaluation for exploring disease model mouse: combining gene expression, orthology, and disease knowledge graphs. BMC Med Inform Decis Mak 2025;25:189. [PMID: 40380154 DOI: 10.1186/s12911-025-03013-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 04/09/2025] [Indexed: 05/19/2025] Open

Abstract

BACKGROUND

The RIKEN BRC develops and maintains the RIKEN BioResource MetaDatabase to help users explore appropriate target bioresources for their experiments and prepare precise and high-quality data infrastructures. The Swiss Institute of Bioinformatics develops two databases across multi-species for the study of gene expression and orthology: Bgee and Orthologous MAtrix (OMA, an orthology database).

METHODS

This study combines the RIKEN BioResource data with Resource Description Framework (RDF) datasets from Bgee, a gene expression database, the OMA, the DisGeNET, a human gene-disease association, Mouse Genome Informatics (MGI), UniProt, and four disease ontologies in the RIKEN BioResource MetaDatabase. Our aim is to evaluate the distributed SPARQL query performance when exploring which model organisms are most appropriate for specific medical science research applications across the aforementioned interoperable datasets. More precisely in our biomedical use cases, we investigate disease-related genes, as well as anatomical parts where these genes are expressed and subsequently identify appropriate bioresource candidates available for specific disease research applications.

RESULTS

We illustrate the above through two use cases targeting either Alzheimer's disease or melanoma. We identified 14 Alzheimer's disease-related genes that were expressed in the prefrontal cortex (e.g., APP and APOE) and 55 RIKEN bioresources, which were genetically modified mice related to these genes, predicted to be relevant to Alzheimer's disease research. Furthermore, executing a transitive search for the Uberon terms by using the Property Paths function, we identified 14 melanoma-related genes (e.g., HRAS and PTEN), and 12 anatomical parts in which these genes were expressed, such as the "skin of limb" as an example. Finally, we compared the performance of the federated SPARQL query via the remote Bgee SPARQL endpoint with the performance of a centralized SPARQL query using the Bgee dataset as part of the RIKEN BioResource MetaDatabase.

CONCLUSIONS

As a result, we confirmed that the performance of the federated approach degraded. We concluded that we reduced the degradation of the query performance of the federated approach from the BioResource MetaDatabase to the SIB by refining the transferred data through a subquery and enhancing the server specifications thereby optimizing the triple store query evaluation.

Collapse

Vendetti J, Harris NL, Dorf MV, Skrenchuk A, Caufield JH, Gonçalves RS, Graybeal JB, Hegde H, Redmond T, Mungall CJ, Musen MA. BioPortal: an open community resource for sharing, searching, and utilizing biomedical ontologies. Nucleic Acids Res 2025:gkaf402. [PMID: 40357648 DOI: 10.1093/nar/gkaf402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 04/11/2025] [Accepted: 05/02/2025] [Indexed: 05/15/2025] Open

McLaughlin J, Lagrimas J, Iqbal H, Parkinson H, Harmse H. OLS4: a new Ontology Lookup Service for a growing interdisciplinary knowledge ecosystem. BIOINFORMATICS (OXFORD, ENGLAND) 2025;41:btaf279. [PMID: 40323307 DOI: 10.1093/bioinformatics/btaf279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/24/2025] [Accepted: 05/01/2025] [Indexed: 05/23/2025]

Hegde H, Vendetti J, Goutte-Gattat D, Caufield JH, Graybeal JB, Harris NL, Karam N, Kindermann C, Matentzoglu N, Overton JA, Musen MA, Mungall CJ. A change language for ontologies and knowledge graphs. Database (Oxford) 2025;2025:baae133. [PMID: 39841813 PMCID: PMC11753292 DOI: 10.1093/database/baae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/21/2024] [Accepted: 12/30/2024] [Indexed: 01/24/2025]

Abstract

Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute. To fill that need, we have created KGCL, the Knowledge Graph Change Language (https://github.com/INCATools/kgcl), a standard data model for describing changes to KGs and ontologies at a high level, and an accompanying human-readable Controlled Natural Language (CNL). This language serves two purposes: a curator can use it to request desired changes, and it can also be used to describe changes that have already happened, corresponding to the concepts of "apply patch" and "diff" commonly used for managing changes in text documents and computer programs. Another key feature of KGCL is that descriptions are at a high enough level to be useful and understood by a variety of stakeholders-e.g. ontology edits can be specified by commands like "add synonym 'arm' to 'forelimb'" or "move 'Parkinson disease' under 'neurodegenerative disease'." We have also built a suite of tools for managing ontology changes. These include an automated agent that integrates with and monitors GitHub ontology repositories and applies any requested changes and a new component in the BioPortal ontology resource that allows users to make change requests directly from within the BioPortal user interface. Overall, the KGCL data model, its CNL, and associated tooling allow for easier management and processing of changes associated with the development of ontologies and KGs. Database URL: https://github.com/INCATools/kgcl.

Collapse

Charlet J, Cui L. Knowledge Representation and Management: 2023 Highlights and the Rise of Knowledge Graph Embeddings. Yearb Med Inform 2024;33:223-226. [PMID: 40199309 PMCID: PMC12020553 DOI: 10.1055/s-0044-1800748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2025] Open

Joachimiak MP, Caufield JH, Harris NL, Kim H, Mungall CJ. Gene Set Summarization Using Large Language Models. ARXIV 2024:arXiv:2305.13338v3. [PMID: 37292480 PMCID: PMC10246080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB. TALISMAN (Terminological ArtificiaL Intelligence SuMmarization of Annotation and Narratives) uses generative AI to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct retrieval from the model. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for an input gene set. However, LLM-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, in our experiments these methods were rarely able to recapitulate the most precise and informative term from standard enrichment analysis. We also observe minor differences depending on prompt input information, with GO term descriptions leading to higher recall but lower precision. However, newer LLM models perform statistically significantly better than the oldest model across all performance metrics, suggesting that future models may lead to further improvements. Overall, the results are nondeterministic, with minor variations in prompt resulting in radically different term lists, true to the stochastic nature of LLMs. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis, however they may provide summarization benefits for implicit knowledge integration across extant but unstandardized knowledge, for large sets of features, and where the amount of information is difficult for humans to process.

Collapse

Di Maria A, Bellomo L, Billeci F, Cardillo A, Alaimo S, Ferragina P, Ferro A, Pulvirenti A. NetMe 2.0: a web-based platform for extracting and modeling knowledge from biomedical literature as a labeled graph. Bioinformatics 2024;40:btae194. [PMID: 38597890 PMCID: PMC11074003 DOI: 10.1093/bioinformatics/btae194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/29/2024] [Accepted: 04/08/2024] [Indexed: 04/11/2024] Open

Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024;11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open

Affiliation(s)

Tiffany J Callahan Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
Ignacio J Tripodi Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
Adrianne L Stefanski Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Luca Cappelletti AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Sanya B Taneja Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
Jordan M Wyrwa Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
Elena Casiraghi AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Nicolas A Matentzoglu Semanticly, Athens, Greece
Justin Reese Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Jonathan C Silverstein Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
Charles Tapley Hoyt Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
Richard D Boyce Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
Scott A Malec Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
Deepak R Unni SIB Swiss Institute of Bioinformatics, Basel, Switzerland
Marcin P Joachimiak Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Peter N Robinson Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Emanuele Cavalleri AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Tommaso Fontana AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Giorgio Valentini AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
Marco Mesiti AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
Lucas A Gillenwater Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Brook Santangelo Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Nicole A Vasilevsky Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
Robert Hoehndorf Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
Tellen D Bennett Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Patrick B Ryan Janssen Research and Development, Raritan, NJ, 08869, USA
George Hripcsak Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
Michael G Kahn Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Michael Bada Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
William A Baumgartner Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
Lawrence E Hunter Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA. Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.

Collapse

Cappelletti L, Rekerle L, Fontana T, Hansen P, Casiraghi E, Ravanmehr V, Mungall CJ, Yang JJ, Spranger L, Karlebach G, Caufield JH, Carmody L, Coleman B, Oprea TI, Reese J, Valentini G, Robinson PN. Node-degree aware edge sampling mitigates inflated classification performance in biomedical random walk-based graph representation learning. BIOINFORMATICS ADVANCES 2024;4:vbae036. [PMID: 38577542 PMCID: PMC10994718 DOI: 10.1093/bioadv/vbae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/31/2024] [Accepted: 02/29/2024] [Indexed: 04/06/2024]

Affiliation(s)

Luca Cappelletti AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
Lauren Rekerle The Jackson Laboratory for Genomic Medicine, CT 06032, United States
Tommaso Fontana AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy
Peter Hansen The Jackson Laboratory for Genomic Medicine, CT 06032, United States
Elena Casiraghi AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
Vida Ravanmehr The Jackson Laboratory for Genomic Medicine, CT 06032, United States
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
Jeremy J Yang Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of Medicine, Albuquerque, NM 87102, United States
Leonard Spranger Institute of Bioinformatics, Freie Universität Berlin, Berlin, 14195, Germany
Guy Karlebach The Jackson Laboratory for Genomic Medicine, CT 06032, United States
J Harry Caufield Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
Leigh Carmody The Jackson Laboratory for Genomic Medicine, CT 06032, United States
Ben Coleman The Jackson Laboratory for Genomic Medicine, CT 06032, United States Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
Tudor I Oprea Department of Internal Medicine and UNM Comprehensive Cancer Center, UNM School of Medicine, Albuquerque, NM 87102, United States
Justin Reese Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, United States
Giorgio Valentini AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano 20133, Italy ELLIS—European Laboratory for Learning and Intelligent Systems
Peter N Robinson The Jackson Laboratory for Genomic Medicine, CT 06032, United States Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States ELLIS—European Laboratory for Learning and Intelligent Systems Berlin Institute of Health, Charité – Universitätsmedizin Berlin, Berlin, 10117, Germany

Collapse