1
|
Zong N, Li N, Wen A, Ngo V, Yu Y, Huang M, Chowdhury S, Jiang C, Fu S, Weinshilboum R, Jiang G, Hunter L, Liu H. BETA: a comprehensive benchmark for computational drug-target prediction. Brief Bioinform 2022; 23:6596989. [PMID: 35649342 PMCID: PMC9294420 DOI: 10.1093/bib/bbac199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/10/2022] [Accepted: 04/29/2022] [Indexed: 11/14/2022] Open
Abstract
Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ning Li
- Center for Structure Biology, Center for Cancer Research, National Cancer Institute, Frederick, MD
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Victoria Ngo
- Betty Irene Moore School of Nursing, University of California Davis Health, Sacramento, CA.,Stanford Health Policy, Stanford School of Medicine and Freeman Spogli Institute for International Studies, Palo Alto, CA
| | - Yue Yu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Ming Huang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Shaika Chowdhury
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Chao Jiang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN
| | - Guoqian Jiang
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| | - Lawrence Hunter
- Department of Pharmacology, University of Colorado Denver, Aurora, CO
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
2
|
Kamdar MR, Musen MA. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 2021; 8:24. [PMID: 33479214 PMCID: PMC7819992 DOI: 10.1038/s41597-021-00797-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 12/04/2020] [Indexed: 01/29/2023] Open
Abstract
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 biomedical linked open data sources into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.
Collapse
Affiliation(s)
- Maulik R Kamdar
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
- Elsevier Health Markets, Philadelphia, PA, USA.
| | - Mark A Musen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
3
|
Malec SA, Boyce RD. Exploring Novel Computable Knowledge in Structured Drug Product Labels. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:403-412. [PMID: 32477661 PMCID: PMC7233092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This paper introduces a database derived from Structured Product Labels (SPLs). SPLs are legally mandated snapshots containing information on all drugs released to market in the United States. Since publication is not required for pre-trial findings, we hypothesize that SPLs may contain knowledge absent in the literature, and hence "novel." SemMedDB is an existing database of computable knowledge derived from the literature. If SPL content could be similarly transformed, novel clinically relevant assertions in the SPLs could be identified through comparison with SemMedDB. After we derive a database (containing 4,297,481 assertions), we compare the extracted content with SemMedDB for recent FDA drug approvals. We find that novelty between the SPLs and the literature is nuanced, due to the redundancy of SPLs. Highlighting areas for improvement and future work, we conclude that SPLs contain a wealth of novel knowledge relevant to research and complementary to the literature.
Collapse
Affiliation(s)
- Scott A Malec
- University of Pittsburgh Department of Biomedical Informatics, Pittsburgh, PA
| | - Richard D Boyce
- University of Pittsburgh Department of Biomedical Informatics, Pittsburgh, PA
| |
Collapse
|
4
|
Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019; 10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open
Abstract
Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.
Collapse
Affiliation(s)
- Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Andigoni Malousi
- Laboratory of Biological Chemistry, Department of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France.,Public Health and Medical Information Unit, University Hospital of Saint-Etienne, Saint-Étienne, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
5
|
Yin W, Gao C, Xu Y, Li B, Ruderfer DM, Chen Y. Learning Opportunities for Drug Repositioning via GWAS and PheWAS Findings. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:237-246. [PMID: 29888080 PMCID: PMC5961790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Drug repositioning for available medications can be preferred over traditional drug development, which requires substantially more effort to uncover new insights into medications and diseases. Genome-Wide Association Studies (GWAS) and Phenome-Wide Association Studies (PheWAS) are two complimentary methods for finding novel associations between genes and diseases. We hypothesize that discoveries from these studies could be leveraged to find new targets for existing drugs. Thus, we propose a framework to learn opportunities for inferring such relationships via overlapped genes between disease-associated genes (e.g. GWAS and PheWAS findings) and drugtargeted genes. We use drug indications found in Medication Indication Resource (MEDI) as a gold standard to evaluate if drug indications learned from GWAS and PheWAS findings have clinical indications. We examined 151,011 <drug, GWAS phenotype> pairs from 987 drugs across 153 diseases and 763 pairs were statistically significant. Out of these 763 pairs, 16 of them were found to have clinical indications.
Collapse
Affiliation(s)
| | - Cheng Gao
- Vanderbilt University, Nashville, TN
| | - Yaomin Xu
- Vanderbilt University, Nashville, TN
| | | | | | - You Chen
- Vanderbilt University, Nashville, TN
| |
Collapse
|
6
|
Romagnoli KM, Nelson SD, Hines L, Empey P, Boyce RD, Hochheiser H. Information needs for making clinical recommendations about potential drug-drug interactions: a synthesis of literature review and interviews. BMC Med Inform Decis Mak 2017; 17:21. [PMID: 28228132 PMCID: PMC5322613 DOI: 10.1186/s12911-017-0419-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 02/14/2017] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Drug information compendia and drug-drug interaction information databases are critical resources for clinicians and pharmacists working to avoid adverse events due to exposure to potential drug-drug interactions (PDDIs). Our goal is to develop information models, annotated data, and search tools that will facilitate the interpretation of PDDI information. To better understand the information needs and work practices of specialists who search and synthesize PDDI evidence for drug information resources, we conducted an inquiry that combined a thematic analysis of published literature with unstructured interviews. METHODS Starting from an initial set of relevant articles, we developed search terms and conducted a literature search. Two reviewers conducted a thematic analysis of included articles. Unstructured interviews with drug information experts were conducted and similarly coded. Information needs, work processes, and indicators of potential strengths and weaknesses of information systems were identified. RESULTS Review of 92 papers and 10 interviews identified 56 categories of information needs related to the interpretation of PDDI information including drug and interaction information; study design; evidence including clinical details, quality and content of reports, and consequences; and potential recommendations. We also identified strengths/weaknesses of PDDI information systems. CONCLUSIONS We identified the kinds of information that might be most effective for summarizing PDDIs. The drug information experts we interviewed had differing goals, suggesting a need for detailed information models and flexible presentations. Several information needs not discussed in previous work were identified, including temporal overlaps in drug administration, biological plausibility of interactions, and assessment of the quality and content of reports. Richly structured depictions of PDDI information may help drug information experts more effectively interpret data and develop recommendations. Effective information models and system designs will be needed to maximize the utility of this information.
Collapse
Affiliation(s)
- Katrina M. Romagnoli
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
| | - Scott D. Nelson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Lisa Hines
- Pharmacy Quality Alliance, Springfield, VA USA
| | - Philip Empey
- School of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, PA USA
| | - Richard D. Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
| |
Collapse
|
7
|
Herrero-Zazo M, Segura-Bedmar I, Martínez P. Conceptual models of drug-drug interactions: A summary of recent efforts. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.10.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Leung TI, Dumontier M. Overlap in drug-disease associations between clinical practice guidelines and drug structured product label indications. J Biomed Semantics 2016; 7:37. [PMID: 27277160 PMCID: PMC4898307 DOI: 10.1186/s13326-016-0081-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Accepted: 05/24/2016] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Clinical practice guidelines (CPGs) recommend pharmacologic treatments for clinical conditions, and drug structured product labels (SPLs) summarize approved treatment indications. Both resources are intended to promote evidence-based medical practices and guide clinicians' prescribing decisions. However, it is unclear how well CPG recommendations about pharmacologic therapies match SPL indications for recommended drugs. In this study, we perform text mining of CPG summaries to examine drug-disease associations in CPG recommendations and in SPL treatment indications for 15 common chronic conditions. METHODS We constructed an initial text corpus of guideline summaries from the National Guideline Clearinghouse (NGC) from a set of manually selected ICD-9 codes for each of the 15 conditions. We obtained 377 relevant guideline summaries and their Major Recommendations section, which excludes guidelines for pediatric patients, pregnant or breastfeeding women, or for medical diagnoses not meeting inclusion criteria. A vocabulary of drug terms was derived from five medical taxonomies. We used named entity recognition, in combination with dictionary-based and ontology-based methods, to identify drug term occurrences in the text corpus and construct drug-disease associations. The ATC (Anatomical Therapeutic Chemical Classification) was utilized to perform drug name and drug class matching to construct the drug-disease associations from CPGs. We then obtained drug-disease associations from SPLs using conditions mentioned in their Indications section in SIDER. The primary outcomes were the frequency of drug-disease associations in CPGs and SPLs, and the frequency of overlap between the two sets of drug-disease associations, with and without using taxonomic information from ATC. RESULTS Without taxonomic information, we identified 1444 drug-disease associations across CPGs and SPLs for 15 common chronic conditions. Of these, 195 drug-disease associations overlapped between CPGs and SPLs, 917 associations occurred in CPGs only and 332 associations occurred in SPLs only. With taxonomic information, 859 unique drug-disease associations were identified, of which 152 of these drug-disease associations overlapped between CPGs and SPLs, 541 associations occurred in CPGs only, and 166 associations occurred in SPLs only. CONCLUSIONS Our results suggest that CPG-recommended pharmacologic therapies and SPL indications do not overlap frequently when identifying drug-disease associations using named entity recognition, although incorporating taxonomic relationships between drug names and drug classes into the approach improves the overlap. This has important implications in practice because conflicting or inconsistent evidence may complicate clinical decision making and implementation or measurement of best practices.
Collapse
Affiliation(s)
- Tiffany I Leung
- Division of General Medical Disciplines, Stanford University, Stanford, CA, USA.
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
9
|
Hochheiser H, Ning Y, Hernandez A, Horn JR, Jacobson R, Boyce RD. Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study. JMIR Res Protoc 2016; 5:e40. [PMID: 27066806 PMCID: PMC4844909 DOI: 10.2196/resprot.5028] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 11/25/2015] [Accepted: 12/19/2015] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Because vital details of potential pharmacokinetic drug-drug interactions are often described in free-text structured product labels, manual curation is a necessary but expensive step in the development of electronic drug-drug interaction information resources. The use of nonexperts to annotate potential drug-drug interaction (PDDI) mentions in drug product label annotation may be a means of lessening the burden of manual curation. OBJECTIVE Our goal was to explore the practicality of using nonexpert participants to annotate drug-drug interaction descriptions from structured product labels. By presenting annotation tasks to both pharmacy experts and relatively naïve participants, we hoped to demonstrate the feasibility of using nonexpert annotators for drug-drug information annotation. We were also interested in exploring whether and to what extent natural language processing (NLP) preannotation helped improve task completion time, accuracy, and subjective satisfaction. METHODS Two experts and 4 nonexperts were asked to annotate 208 structured product label sections under 4 conditions completed sequentially: (1) no NLP assistance, (2) preannotation of drug mentions, (3) preannotation of drug mentions and PDDIs, and (4) a repeat of the no-annotation condition. Results were evaluated within the 2 groups and relative to an existing gold standard. Participants were asked to provide reports on the time required to complete tasks and their perceptions of task difficulty. RESULTS One of the experts and 3 of the nonexperts completed all tasks. Annotation results from the nonexpert group were relatively strong in every scenario and better than the performance of the NLP pipeline. The expert and 2 of the nonexperts were able to complete most tasks in less than 3 hours. Usability perceptions were generally positive (3.67 for expert, mean of 3.33 for nonexperts). CONCLUSIONS The results suggest that nonexpert annotation might be a feasible option for comprehensive labeling of annotated PDDIs across a broader range of drug product labels. Preannotation of drug mentions may ease the annotation task. However, preannotation of PDDIs, as operationalized in this study, presented the participants with difficulties. Future work should test if these issues can be addressed by the use of better performing NLP and a different approach to presenting the PDDI preannotations to users during the annotation workflow.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States.
| | | | | | | | | | | |
Collapse
|
10
|
Ayvaz S, Horn J, Hassanzadeh O, Zhu Q, Stan J, Tatonetti NP, Vilar S, Brochhausen M, Samwald M, Rastegar-Mojarad M, Dumontier M, Boyce RD. Toward a complete dataset of drug-drug interaction information from publicly available sources. J Biomed Inform 2015; 55:206-17. [PMID: 25917055 PMCID: PMC4464899 DOI: 10.1016/j.jbi.2015.04.006] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Revised: 03/30/2015] [Accepted: 04/15/2015] [Indexed: 10/23/2022]
Abstract
Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms, integrating high-quality PDDI information from the merged dataset into Wikidata, and making the combined dataset accessible as Semantic Web Linked Data.
Collapse
Affiliation(s)
- Serkan Ayvaz
- Department of Computer Science, Kent State University, 241 Math and Computer Science Building, Kent, OH 44242, USA.
| | - John Horn
- Department of Pharmacy, School of Pharmacy and University of Washington Medicine, Pharmacy Services, University of Washington, H375V Health Sciences Bldg, Box 357630, Seattle, WA 98195, USA.
| | - Oktie Hassanzadeh
- IBM T.J. Watson Research Center, 1101 Kitchawan Rd Route 134, P.O. Box 218, Yorktown Heights, NY 10598, USA.
| | - Qian Zhu
- Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
| | - Johann Stan
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | - Nicholas P Tatonetti
- Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA.
| | - Santiago Vilar
- Departments of Biomedical Informatics, Systems Biology, and Medicine, Columbia University, 622 West 168th St VC5, New York, NY 10032, USA.
| | - Mathias Brochhausen
- Division of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W. Markham St, #782, Little Rock, AR 72205-7199, USA.
| | - Matthias Samwald
- Section for Medical Expert and Knowledge-Based Systems, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090 Vienna, Austria.
| | - Majid Rastegar-Mojarad
- Biomedical Statistics & Informatics, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, USA.
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford, CA 94305, USA.
| | - Richard D Boyce
- Department of Biomedical Informatics, Suite 419, 5607 Baum Blvd, Pittsburgh, PA 15206-3701, USA.
| |
Collapse
|
11
|
Pfundner A, Schönberg T, Horn J, Boyce RD, Samwald M. Utilizing the Wikidata system to improve the quality of medical content in Wikipedia in diverse languages: a pilot study. J Med Internet Res 2015; 17:e110. [PMID: 25944105 PMCID: PMC4468594 DOI: 10.2196/jmir.4163] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2014] [Revised: 03/12/2015] [Accepted: 03/14/2015] [Indexed: 11/13/2022] Open
Abstract
Background Wikipedia is an important source of medical information for both patients and medical professionals. Given its wide reach, improving the quality, completeness, and accessibility of medical information on Wikipedia could have a positive impact on global health. Objective We created a prototypical implementation of an automated system for keeping drug-drug interaction (DDI) information in Wikipedia up to date with current evidence about clinically significant drug interactions. Our work is based on Wikidata, a novel, graph-based database backend of Wikipedia currently in development. Methods We set up an automated process for integrating data from the Office of the National Coordinator for Health Information Technology (ONC) high priority DDI list into Wikidata. We set up exemplary implementations demonstrating how the DDI data we introduced into Wikidata could be displayed in Wikipedia articles in diverse languages. Finally, we conducted a pilot analysis to explore if adding the ONC high priority data would substantially enhance the information currently available on Wikipedia. Results We derived 1150 unique interactions from the ONC high priority list. Integration of the potential DDI data from Wikidata into Wikipedia articles proved to be straightforward and yielded useful results. We found that even though the majority of current English Wikipedia articles about pharmaceuticals contained sections detailing contraindications, only a small fraction of articles explicitly mentioned interaction partners from the ONC high priority list. For 91.30% (1050/1150) of the interaction pairs we tested, none of the 2 articles corresponding to the interacting substances explicitly mentioned the interaction partner. For 7.21% (83/1150) of the pairs, only 1 of the 2 associated Wikipedia articles mentioned the interaction partner; for only 1.48% (17/1150) of the pairs, both articles contained explicit mentions of the interaction partner. Conclusions Our prototype demonstrated that automated updating of medical content in Wikipedia through Wikidata is a viable option, albeit further refinements and community-wide consensus building are required before integration into public Wikipedia is possible. A long-term endeavor to improve the medical information in Wikipedia through structured data representation and automated workflows might lead to a significant improvement of the quality of medical information in one of the world’s most popular Web resources.
Collapse
Affiliation(s)
- Alexander Pfundner
- Section for Medical Expert and Knowledge-Based Systems, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | | | | | | | | |
Collapse
|
12
|
Automating Identification of Multiple Chronic Conditions in Clinical Practice Guidelines. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:456-60. [PMID: 26306285 PMCID: PMC4525235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Many clinical practice guidelines (CPGs) are intended to provide evidence-based guidance to clinicians on a single disease, and are frequently considered inadequate when caring for patients with multiple chronic conditions (MCC), or two or more chronic conditions. It is unclear to what degree disease-specific CPGs provide guidance about MCC. In this study, we develop a method for extracting knowledge from single-disease chronic condition CPGs to determine how frequently they mention commonly co-occurring chronic diseases. We focus on 15 highly prevalent chronic conditions. We use publicly available resources, including a repository of guideline summaries from the National Guideline Clearinghouse to build a text corpus, a data dictionary of ICD-9 codes from the Medicare Chronic Conditions Data Warehouse (CCW) to construct an initial list of disease terms, and disease synonyms from the National Center for Biomedical Ontology to enhance the list of disease terms. First, for each disease guideline, we determined the frequency of comorbid condition mentions (a disease-comorbidity pair) by exactly matching disease synonyms in the text corpus. Then, we developed an annotated reference standard using a sample subset of guidelines. We used this reference standard to evaluate our approach. Then, we compared the co-prevalence of common pairs of chronic conditions from Medicare CCW data to the frequency of disease-comorbidity pairs in CPGs. Our results show that some disease-comorbidity pairs occur more frequently in CPGs than others. Sixty-one (29.0%) of 210 possible disease-comorbidity pairs occurred zero times; for example, no guideline on chronic kidney disease mentioned depression, while heart failure guidelines mentioned ischemic heart disease the most frequently. Our method adequately identifies comorbid chronic conditions in CPG recommendations with precision 0.82, recall 0.75, and F-measure 0.78. Our work identifies knowledge currently embedded in the free text of clinical practice guideline recommendations and provides an initial view of the extent to which CPGs mention common comorbid conditions. Knowledge extracted from CPG text in this way may be useful to inform gaps in guideline recommendations regarding MCC and therefore identify potential opportunities for guideline improvement.
Collapse
|
13
|
|
14
|
Low YS, Sedykh AY, Rusyn I, Tropsha A. Integrative approaches for predicting in vivo effects of chemicals from their structural descriptors and the results of short-term biological assays. Curr Top Med Chem 2014; 14:1356-64. [PMID: 24805064 PMCID: PMC5344042 DOI: 10.2174/1568026614666140506121116] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2014] [Revised: 02/05/2014] [Accepted: 02/05/2014] [Indexed: 12/22/2022]
Abstract
Cheminformatics approaches such as Quantitative Structure Activity Relationship (QSAR) modeling have been used traditionally for predicting chemical toxicity. In recent years, high throughput biological assays have been increasingly employed to elucidate mechanisms of chemical toxicity and predict toxic effects of chemicals in vivo. The data generated in such assays can be considered as biological descriptors of chemicals that can be combined with molecular descriptors and employed in QSAR modeling to improve the accuracy of toxicity prediction. In this review, we discuss several approaches for integrating chemical and biological data for predicting biological effects of chemicals in vivo and compare their performance across several data sets. We conclude that while no method consistently shows superior performance, the integrative approaches rank consistently among the best yet offer enriched interpretation of models over those built with either chemical or biological data alone. We discuss the outlook for such interdisciplinary methods and offer recommendations to further improve the accuracy and interpretability of computational models that predict chemical toxicity.
Collapse
Affiliation(s)
| | | | | | - Alexander Tropsha
- 100K Beard Hall, Campus Box 7568, University of North Carolina, Chapel Hill, NC 27599-7568, USA.
| |
Collapse
|
15
|
Boyce RD, Freimuth RR, Romagnoli KM, Pummer T, Hochheiser H, Empey PE. Toward semantic modeling of pharmacogenomic knowledge for clinical and translational decision support. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2013; 2013:28-32. [PMID: 24303292 PMCID: PMC3814496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The purpose of this paper is to describe pilot work on a semantic model of the pharmacogenomics information found in drug product labels. The model's development is driven by a series of use cases that we have developed to demonstrate how structured pharmacogenomics information could be more effectively used to support clinical and translational efforts. Using an iterative process, the semantic model was field-tested by five pharmacists, who used it to manually annotate a subset of the sections that the Food and Drug Administration's Table of Pharmacogenomic Biomarkers in Drug Labels cites as containing pharmacogenomics information. The five pharmacists identified a total of 213 pharmacogenomics statements in 29 sections. The model showed the potential to make the unstructured pharmacogenomic information currently written in product labeling more accessible and actionable through structured annotations of pharmacogenomics effects and clinical recommendations.
Collapse
|
16
|
Extending the "web of drug identity" with knowledge extracted from United States product labels. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2013; 2013:64-8. [PMID: 24303301 PMCID: PMC3814463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Structured Product Labels (SPLs) contain information about drugs that can be valuable to clinical and translational research, especially if it can be linked to other sources that provide data about drug targets, chemical properties, interactions, and biological pathways. Unfortunately, SPLs currently provide coarsely-structured drug information and lack the detailed annotation that is required to support computational use cases. To help address this issue we created LinkedSPLs, a Linked Data resource that extends the "web of drug identity" using information extracted from SPLs. In this paper we describe the mapping that LinkedSPLs provides between SPL active ingredients and DrugBank chemical entities. These mappings were created using three approaches: InChI chemical structure descriptors comparison, exact string matching based on the chemical name, and automatic (unsupervised) linkage identification. Comparison of the approaches found that, while these three approaches are complementary, the automatic approach performs well in terms of precision and recall.
Collapse
|