1
|
Fecho K, Tucker N, Beasley JM, Auerbach SS, Bizon C, Tropsha A. Elucidating the mechanistic relationships between peroxisome proliferator-activated receptors and hepatic fibrosis using the ROBOKOP knowledge graph. FRONTIERS IN TOXICOLOGY 2025; 7:1549268. [PMID: 40330554 PMCID: PMC12052891 DOI: 10.3389/ftox.2025.1549268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 03/14/2025] [Indexed: 05/08/2025] Open
Abstract
We developed the Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) application as an open-source knowledge graph system to support evidence-based biomedical discovery and hypothesis generation. This study aimed to apply ROBOKOP to suggest biological mechanisms that might explain the hypothesized relationship between exposure to the herbicide and lipid-lowering drug clofibrate, an activator of peroxisome proliferator-activated receptor-α (PPARA), and hepatic fibrosis. We queried ROBOKOP to first establish that it could demonstrate a relationship between clofibrate and PPARA as a validation test and second to identify intermediary genes and biological processes or activities that might relate the activation of PPARA by clofibrate to hepatic fibrosis. Queries of ROBOKOP returned several paths relating clofibrate, PPARA, and hepatic fibrosis. One path suggested the following: clofibrate - affects / increases_ expression_ of / increases_ activity_ of / increases_ response_ to / decreases_ response_ to / is_ related_ to - PPARA - is_ actively_ involved_ in - cellular response to lipid - actively_ involves - CCL2 - is_ genetically_ associated_ with - hepatic fibrosis. This result established a relationship between clofibrate and PPARA and further suggested that PPARA is actively involved in the cellular response to lipids, which actively involves the chemokine ligand CCL2, a gene genetically associated with hepatic fibrosis; thus, we can infer that PPARA, upon activation by clofibrate, plays a role in hepatic fibrosis. We conclude that ROBOKOP can be used to derive insights into biological mechanisms that might explain relationships between environmental exposures and liver toxicity.
Collapse
Affiliation(s)
- Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Copperline Professional Solutions, LLC, Pittsboro, NC, United States
| | - Nyssa Tucker
- UNC Eshelman School of Pharmacy and Curriculum in Toxicology and Environmental Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Jon-Michael Beasley
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Scott S. Auerbach
- Division of Translational Toxicology, National Institute of Environmental Health Sciences, Durham, NC, United States
| | - Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Alexander Tropsha
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
- Predictive, LLC, Raleigh, NC, United States
| |
Collapse
|
2
|
Tropsha A, Martin HJ, Cherkasov A. The Six Ds of Exponentials and drug discovery: A path toward reversing Eroom's law. Drug Discov Today 2025; 30:104341. [PMID: 40122449 PMCID: PMC12043357 DOI: 10.1016/j.drudis.2025.104341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/09/2025] [Accepted: 03/18/2025] [Indexed: 03/25/2025]
Abstract
Many technological sectors underwent recent exponential growth because of digital disruption, a phenomenon Peter Diamantis characterized as the 'Six Ds of Exponentials': digitization, deception, disruption, demonetization, dematerialization, and democratization. In contrast, drug discovery has been marked by rising costs and modest growth, if any, of annual drug approvals. We argue that the exponential growth of drug discovery can be also achieved through digital disruption brought by data expansion, mature artificial intelligence (AI), automation of experiments, public-private partnerships, and open science. We detected the emergence of all 'Six Ds of Exponentials' within modern drug discovery and discuss how each of the 'Six Ds' can further empower the field and forcefully address the societal demand for novel, potent, affordable, and accessible medicines.
Collapse
Affiliation(s)
- Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Holli-Joi Martin
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
3
|
Sukhwal PC, Rajan V, Kankanhalli A. A Joint LLM-KG System for Disease Q&A. IEEE J Biomed Health Inform 2025; 29:2257-2270. [PMID: 40030566 DOI: 10.1109/jbhi.2024.3514659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Medical question answer (QA) assistants respond to lay users' health-related queries by synthesizing information from multiple sources using natural language processing and related techniques. They can serve as vital tools to alleviate issues of misinformation, information overload, and complexity of medical language, thus addressing lay users' information needs while reducing the burden on healthcare professionals. QA systems, the engines of such assistants, have often used large language models (LLMs) or knowledge graphs (KG), though the approaches could be complementary. LLM-based QA systems excel at understanding complex questions and providing well-formed answers but are prone to factual mistakes. KG-based QA systems, which represent facts well, are mostly limited to answering short-answer questions with pre-created templates. While a few studies have used both LLM and KG for text-based QA, the approaches are still prone to incomplete or inaccurate answers. Extant QA systems also have limitations in terms of automation and performance. We address these challenges by designing a novel, automated disease QA system named Disease Guru-Long-Form Question Answer (DG-LFQA), which effectively utilizes both LLM and KG techniques through a joint reasoning approach to answer disease-related questions appropriate for lay users. Our evaluation of the system using a range of quality metrics demonstrates its efficacy over related baseline systems.
Collapse
|
4
|
Frau F, Loustalot P, Törnqvist M, Temam N, Cupe J, Montmerle M, Augé F. Connecting electronic health records to a biomedical knowledge graph to link clinical phenotypes and molecular endotypes in atopic dermatitis. Sci Rep 2025; 15:3082. [PMID: 39856093 PMCID: PMC11760951 DOI: 10.1038/s41598-024-78794-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 11/04/2024] [Indexed: 01/27/2025] Open
Abstract
Precision medicine is defined by the U.S. Food & Drug Administration as "an innovative approach to tailoring disease prevention and treatment that considers differences in people's genes, environments, and lifestyles". To succeed in providing personalized medicine to patients, it will be necessary to integrate medical, biological and molecular data in order to identify all complex disease subtypes and understand their pathobiological mechanism. Since biomedical knowledge graphs (BKGs) are limited to the integration of prior knowledge data and do not integrate real-world data (RWD) that would allow for the incorporation of patient level information, we propose a first step towards using RWD, BKGs and graph machine learning (ML) to enable a fully integrated precision medicine strategy. In this study, we established a link between RWD and a BKG. Our methodology introduced a novel patient representation using graph ML applied to the BKG. This approach facilitated the interpretation and extension of ML findings, particularly in disease subtype identification with molecular data contained in the BKG. We applied our innovative methodology to deepen our understanding of atopic dermatitis, a condition with a complex underlying pathophysiological mechanism. Through our analysis, we identified seven subgroups of patients each characterized by clinical and genomic characteristics.
Collapse
Affiliation(s)
- Francesca Frau
- Sanofi R&D, Development Real World Evidence, 65926, Frankfurt am Main, Germany
| | | | | | - Nina Temam
- Quinten Health, 8 rue Vernier, 75017, Paris, France
| | - Jean Cupe
- Quinten Health, 8 rue Vernier, 75017, Paris, France
| | | | - Franck Augé
- Sanofi R&D - Translational Medicine & Early Development - Translational Precision Medicine, 13 Quai Jules Guesde, 94400, Vitry-sur-Seine, France.
| |
Collapse
|
5
|
Lotz JC, Ropella G, Anderson P, Yang Q, Hedderich MA, Bailey J, Hunt CA. An exploration of knowledge-organizing technologies to advance transdisciplinary back pain research. JOR Spine 2023; 6:e1300. [PMID: 38156063 PMCID: PMC10751978 DOI: 10.1002/jsp2.1300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/02/2023] [Accepted: 10/29/2023] [Indexed: 12/30/2023] Open
Abstract
Chronic low back pain (LBP) is influenced by a broad spectrum of patient-specific factors as codified in domains of the biopsychosocial model (BSM). Operationalizing the BSM into research and clinical care is challenging because most investigators work in silos that concentrate on only one or two BSM domains. Furthermore, the expanding, multidisciplinary nature of BSM research creates practical limitations as to how individual investigators integrate current data into their processes of generating impactful hypotheses. The rapidly advancing field of artificial intelligence (AI) is providing new tools for organizing knowledge, but the practical aspects for how AI may advance LBP research and clinical are beginning to be explored. The goals of the work presented here are to: (1) explore the current capabilities of knowledge integration technologies (large language models (LLM), similarity graphs (SGs), and knowledge graphs (KGs)) to synthesize biomedical literature and depict multimodal relationships reflected in the BSM, and; (2) highlight limitations, implementation details, and future areas of research to improve performance. We demonstrate preliminary evidence that LLMs, like GPT-3, may be useful in helping scientists analyze and distinguish cLBP publications across multiple BSM domains and determine the degree to which the literature supports or contradicts emergent hypotheses. We show that SG representations and KGs enable exploring LBP's literature in novel ways, possibly providing, trans-disciplinary perspectives or insights that are currently difficult, if not infeasible to achieve. The SG approach is automated, simple, and inexpensive to execute, and thereby may be useful for early-phase literature and narrative explorations beyond one's areas of expertise. Likewise, we show that KGs can be constructed using automated pipelines, queried to provide semantic information, and analyzed to explore trans-domain linkages. The examples presented support the feasibility for LBP-tailored AI protocols to organize knowledge and support developing and refining trans-domain hypotheses.
Collapse
Affiliation(s)
- Jeffrey C. Lotz
- Department of Orthopaedic SurgeryUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| | | | - Paul Anderson
- Department of Computer Science & Software EngineeringCalifornia Polytechnic State UniversitySan Luis ObispoCaliforniaUSA
| | - Qian Yang
- Department of Information ScienceCornell UniversityIthacaNew YorkUSA
| | | | - Jeannie Bailey
- Department of Orthopaedic SurgeryUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| | - C. Anthony Hunt
- Department of Bioengineering & Therapeutic SciencesUniversity of California at San FranciscoSan FranciscoCaliforniaUSA
| |
Collapse
|
6
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. Bioinformatics 2023; 39:7273783. [PMID: 37707514 PMCID: PMC11015316 DOI: 10.1093/bioinformatics/btad570] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/18/2023] [Accepted: 09/12/2023] [Indexed: 09/15/2023] Open
Abstract
SUMMARY Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThings Explorer is distributed as a lightweight application that dynamically retrieves information at query time. AVAILABILITY AND IMPLEMENTATION More information can be found at https://explorer.biothings.io and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| |
Collapse
|
7
|
Shen Y, Kioumourtzoglou MA, Wu H, Vokonas P, Spiro A, Navas-Acien A, Baccarelli AA, Gao F. Cohort Network: A Knowledge Graph toward Data Dissemination and Knowledge-Driven Discovery for Cohort Studies. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:8236-8244. [PMID: 37224396 PMCID: PMC10597774 DOI: 10.1021/acs.est.2c08174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Contemporary environmental health sciences draw on large-scale longitudinal studies to understand the impact of environmental exposures and behavior factors on the risk of disease and identify potential underlying mechanisms. In such studies, cohorts of individuals are assembled and followed up over time. Each cohort generates hundreds of publications, which are typically neither coherently organized nor summarized, hence limiting knowledge-driven dissemination. Hence, we propose a Cohort Network, a multilayer knowledge graph approach to extract exposures, outcomes, and their connections. We applied the Cohort Network on 121 peer-reviewed papers published over the past 10 years from the Veterans Affairs (VA) Normative Aging Study (NAS). The Cohort Network visualized connections between exposures and outcomes across different publications and identified key exposures and outcomes, such as air pollution, DNA methylation, and lung function. We demonstrated the utility of the Cohort Network for new hypothesis generation, e.g., identification of potential mediators of exposure-outcome associations. The Cohort Network can be used by investigators to summarize the cohort's research and facilitate knowledge-driven discovery and dissemination.
Collapse
Affiliation(s)
- Yike Shen
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Marianthi-Anna Kioumourtzoglou
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Haotian Wu
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Pantel Vokonas
- VA Normative Aging Study, VA Boston Healthcare System, Boston, Massachusetts 02130, United States
- Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts 02118, United States
| | - Avron Spiro
- VA Normative Aging Study, VA Boston Healthcare System, Boston, Massachusetts 02130, United States
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts 02118, United States
- Department of Psychiatry, Boston University Chobanian & Avedisian School of Medicine, Boston, Massachusetts 02118, United States
| | - Ana Navas-Acien
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Andrea A Baccarelli
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| | - Feng Gao
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, New York 10032, United States
| |
Collapse
|
8
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. ARXIV 2023:arXiv:2304.09344v1. [PMID: 37131885 PMCID: PMC10153288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| |
Collapse
|
9
|
Morris JH, Soman K, Akbas RE, Zhou X, Smith B, Meng EC, Huang CC, Cerono G, Schenk G, Rizk-Jackson A, Harroud A, Sanders L, Costes SV, Bharat K, Chakraborty A, Pico AR, Mardirossian T, Keiser M, Tang A, Hardi J, Shi Y, Musen M, Israni S, Huang S, Rose PW, Nelson CA, Baranzini SE. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics 2023; 39:btad080. [PMID: 36759942 PMCID: PMC9940622 DOI: 10.1093/bioinformatics/btad080] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 01/17/2023] [Accepted: 02/08/2023] [Indexed: 02/11/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Karthik Soman
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Rabia E Akbas
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Xiaoyuan Zhou
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Brett Smith
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Gabriel Cerono
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Gundolf Schenk
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Angela Rizk-Jackson
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Adil Harroud
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Lauren Sanders
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Sylvain V Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA 94035, USA
| | - Krish Bharat
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Arjun Chakraborty
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Taline Mardirossian
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143-2550, USA
| | - Michael Keiser
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94143-2550, USA
| | - Alice Tang
- UCSF-UC Berkeley Bioengineering Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Josef Hardi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, USA
| | - Yongmei Shi
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Mark Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305-5479, USA
| | - Sharat Israni
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sui Huang
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Charlotte A Nelson
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sergio E Baranzini
- Department of Neurology, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
10
|
Path-Based Recommender System for Learning Activities Using Knowledge Graphs. INFORMATION 2022. [DOI: 10.3390/info14010009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Recommender systems can offer a fertile ground in e-learning software, since they can assist users by presenting them with learning material in which they can be more interested, based on their preferences. To this end, in this paper, we present a new method for a knowledge-graph-based, path-based recommender system for learning activities. The suggested approach makes better learning activity recommendations by using connections between people and/or products. By pre-defining meta-paths or automatically mining connective patterns, our method uses the student-learning activity graph to find path-level commonalities for learning activities. The path-based approach can provide an explanation for the result as well. Our methodology is used in an intelligent tutoring system with Java programming as the domain being taught. The system keeps track of user behavior and can recommend learning activities to students using a knowledge-graph-based recommender system. Numerous metadata, such as kind, complexity, and number of questions, are used to describe each activity. The system has been evaluated with promising results that highlight the effectiveness of the path-based recommendations for learning activities, while preserving the pedagogical affordance.
Collapse
|
11
|
Wood EC, Glen AK, Kvarfordt LG, Womack F, Acevedo L, Yoon TS, Ma C, Flores V, Sinha M, Chodpathumwan Y, Termehchy A, Roach JC, Mendoza L, Hoffman AS, Deutsch EW, Koslicki D, Ramsey SA. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine. BMC Bioinformatics 2022; 23:400. [PMID: 36175836 PMCID: PMC9520835 DOI: 10.1186/s12859-022-04932-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical translational science is increasingly using computational reasoning on repositories of structured knowledge (such as UMLS, SemMedDB, ChEMBL, Reactome, DrugBank, and SMPDB in order to facilitate discovery of new therapeutic targets and modalities. The NCATS Biomedical Data Translator project is working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and the broader field, there is a need for a framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be downloaded in standard serialized form or queried via a public application programming interface (API). RESULTS To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building-and hosting a web API for querying-a biomedical knowledge graph that uses an Extract-Transform-Load approach to integrate 70 knowledge sources (including the aforementioned core six sources) into a knowledge graph with provenance information including (where available) citations. The semantic layer and schema for RTX-KG2 follow the standard Biolink model to maximize interoperability. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered interface. Serializations of RTX-KG2 are available for download in both the pre-canonicalized form and in canonicalized form (in which synonyms are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M nodes and 39.3M edges with a hierarchy of 77 relationship types from Biolink. CONCLUSION RTX-KG2 is the first knowledge graph that integrates UMLS, SemMedDB, ChEMBL, DrugBank, Reactome, SMPDB, and 64 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema. RTX-KG2 is publicly available for querying via its API at arax.rtx.ai/api/rtxkg2/v1.2/openapi.json . The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2 .
Collapse
Affiliation(s)
- E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Amy K Glen
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Finn Womack
- Computer Science and Engineering, Penn State University, State College, PA, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Timothy S Yoon
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Veronica Flores
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | - Arash Termehchy
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | | | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen, The Netherlands
| | | | - David Koslicki
- Computer Science and Engineering, Penn State University, State College, PA, USA
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
- Department of Biology, Penn State University, State College, PA, USA
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
12
|
Heacock ML, Lopez AR, Amolegbe SM, Carlin DJ, Henry HF, Trottier BA, Velasco ML, Suk WA. Enhancing Data Integration, Interoperability, and Reuse to Address Complex and Emerging Environmental Health Problems. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7544-7552. [PMID: 35549252 PMCID: PMC9227711 DOI: 10.1021/acs.est.1c08383] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Indexed: 05/21/2023]
Abstract
Environmental health sciences (EHS) span many diverse disciplines. Within the EHS community, the National Institute of Environmental Health Sciences Superfund Research Program (SRP) funds multidisciplinary research aimed to address pressing and complex issues on how people are exposed to hazardous substances and their related health consequences with the goal of identifying strategies to reduce exposures and protect human health. While disentangling the interrelationships that contribute to environmental exposures and their effects on human health over the course of life remains difficult, advances in data science and data sharing offer a path forward to explore data across disciplines to reveal new insights. Multidisciplinary SRP-funded teams are well-positioned to examine how to best integrate EHS data across diverse research domains to address multifaceted environmental health problems. As such, SRP supported collaborative research projects designed to foster and enhance the interoperability and reuse of diverse and complex data streams. This perspective synthesizes those experiences as a landscape view of the challenges identified while working to increase the FAIR-ness (Findable, Accessible, Interoperable, and Reusable) of EHS data and opportunities to address them.
Collapse
Affiliation(s)
- Michelle L. Heacock
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
- . Tel: 984-287-3267
| | | | - Sara M. Amolegbe
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Danielle J. Carlin
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Heather F. Henry
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Brittany A. Trottier
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | | | - William A. Suk
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| |
Collapse
|