1
|
Cuffy C, McInnes BT. Predicting implicit concept embeddings for singular relationship discovery replication of closed literature-based discovery. Front Res Metr Anal 2025; 10:1509502. [PMID: 40110121 PMCID: PMC11920161 DOI: 10.3389/frma.2025.1509502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 01/27/2025] [Indexed: 03/22/2025] Open
Abstract
Objective Literature-based Discovery (LBD) identifies new knowledge by leveraging existing literature. It exploits interconnecting implicit relationships to build bridges between isolated sets of non-interacting literatures. It has been used to facilitate drug repurposing, new drug discovery, and study adverse event reactions. Within the last decade, LBD systems have transitioned from using statistical methods to exploring deep learning (DL) to analyze semantic spaces between non-interacting literatures. Recent works explore knowledge graphs (KG) to represent explicit relationships. These works envision LBD as a knowledge graph completion (KGC) task and use DL to generate implicit relationships. However, these systems require the researcher to have domain-expert knowledge when submitting relevant queries for novel hypothesis discovery. Methods Our method explores a novel approach to identify all implicit hypotheses given the researcher's search query and expedites the knowledge discovery process. We revise the KGC task as the task of predicting interconnecting vertex embeddings within the graph. We train our model using a similarity learning objective and compare our model's predictions against all known vertices within the graph to determine the likelihood of an implicit relationship (i.e., connecting edge). We also explore three approaches to represent edge connections between vertices within the KG: average, concatenation, and Hadamard. Lastly, we explore an approach to induce inductive biases and expedite model convergence (i.e., input representation scaling). Results We evaluate our method by replicating five known discoveries within the Hallmark of Cancer (HOC) datasets and compare our method to two existing works. Our results show no significant difference in reported ranks and model convergence rate when comparing scaling our input representations and not using this method. Comparing our method to previous works, we found our method achieves optimal performance on two of five datasets and achieves comparable performance on the remaining datasets. We further analyze our results using statistical significance testing to demonstrate the efficacy of our method. Conclusion We found our similarity-based learning objective predicts linking vertex embeddings for single relationship closed discovery replication. Our method also provides a ranked list of linking vertices between a set of inputs. This approach reduces researcher burden and allows further exploration of generated hypotheses.
Collapse
Affiliation(s)
- Clint Cuffy
- Natural Language Processing Lab, Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bridget T McInnes
- Natural Language Processing Lab, Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
2
|
Díaz-Santiago E, Moya-García AA, Pérez-García J, Yahyaoui R, Orengo C, Pazos F, Perkins JR, Ranea JAG. Better understanding the phenotypic effects of drugs through shared targets in genetic disease networks. Front Pharmacol 2025; 15:1470931. [PMID: 39911831 PMCID: PMC11794328 DOI: 10.3389/fphar.2024.1470931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 12/12/2024] [Indexed: 02/07/2025] Open
Abstract
Introduction Most drugs fail during development and there is a clear and unmet need for approaches to better understand mechanistically how drugs exert both their intended and adverse effects. Gaining traction in this field is the use of disease data linking genes with pathological phenotypes and combining this with drugtarget interaction data. Methods We introduce methodology to associate drugs with effects, both intended and adverse, using a tripartite network approach that combines drug-target and target-phenotype data, in which targets can be represented as proteins and protein domains. Results We were able to detect associations for over 140,000 ChEMBL drugs and 3,800 phenotypes, represented as Human Phenotype Ontology (HPO) terms. The overlap of these results with the SIDER databases of known drug side effects was up to 10 times higher than random, depending on the target type, disease database and score threshold used. In terms of overlap with drug-phenotype pairs extracted from the literature, the performance of our methodology was up to 17.47 times greater than random. The top results include phenotype-drug associations that represent intended effects, particularly for cancers such as chronic myelogenous leukemia, which was linked with nilotinib. They also include adverse side effects, such as blurred vision being linked with tetracaine. Discussion This work represents an important advance in our understanding of how drugs cause intended and adverse side effects through their action on disease causing genes and has potential applications for drug development and repositioning.
Collapse
Affiliation(s)
- Elena Díaz-Santiago
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | | | - Jesús Pérez-García
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Raquel Yahyaoui
- Laboratory of Inherited Metabolic Diseases and Newborn Screening, Malaga Regional University Hospital, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| |
Collapse
|
3
|
Cocco M, Carnovale C, Clementi E, Barbieri MA, Battini V, Sessa M. Exploring the impact of co-exposure timing on drug-drug interactions in signal detection through spontaneous reporting system databases: a scoping review. Expert Rev Clin Pharmacol 2024; 17:441-453. [PMID: 38619027 DOI: 10.1080/17512433.2024.2343875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 04/12/2024] [Indexed: 04/16/2024]
Abstract
INTRODUCTION Drug-drug interactions (DDIs) are defined as the pharmacological effects produced by the concomitant administration of two or more drugs. To minimize false positive signals and ensure their validity when analyzing Spontaneous Reporting System (SRS) databases, it has been suggested to incorporate key pharmacological principles, such as temporal plausibility. AREAS COVERED The scoping review of the literature was completed using MEDLINE from inception to March 2023. Included studies had to provide detailed methods for identifying DDIs in SRS databases. Any methodological approach and adverse event were accepted. Descriptive analyzes were excluded as we focused on automatic signal detection methods. The result is an overview of all the available methods for DDI signal detection in SRS databases, with a specific focus on the evaluation of the co-exposure time of the interacting drugs. It is worth noting that only a limited number of studies (n = 3) have attempted to address the issue of overlapping drug administration times. EXPERT OPINION Current guidelines for signal validation focus on factors like the number of reports and temporal association, but they lack guidance on addressing overlapping drug administration times, highlighting a need for further research and method development.
Collapse
Affiliation(s)
- Marianna Cocco
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
- Department of Drug Sciences, University of Pavia, Pavia, Italy
| | - Carla Carnovale
- Pharmacovigilance & Clinical Research, International Centre for Pesticides and Health Risk Prevention, Department of Biomedical and Clinical Sciences (DIBIC), ASST Fatebenefratelli-Sacco University Hospital, Università Degli Studi di Milano, Milan, Italy
| | - Emilio Clementi
- Pharmacovigilance & Clinical Research, International Centre for Pesticides and Health Risk Prevention, Department of Biomedical and Clinical Sciences (DIBIC), ASST Fatebenefratelli-Sacco University Hospital, Università Degli Studi di Milano, Milan, Italy
- Scientific Institute, IRCCS E. Medea, Bosisio Parini, LC, Italy
| | - Maria Antonietta Barbieri
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | - Vera Battini
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Maurizio Sessa
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023; 142:104368. [PMID: 37086959 PMCID: PMC10355339 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
BACKGROUND Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Collapse
Affiliation(s)
- Scott A Malec
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Steven M Albert
- Department of Behavioral and Community Health Sciences, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - C Elizabeth Shaaban
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA
| | - Arthur S Levine
- Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; The Brain Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Paul Munro
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
5
|
De Pretis F, van Gils M, Forsberg MM. A smart hospital-driven approach to precision pharmacovigilance. Trends Pharmacol Sci 2022; 43:473-481. [PMID: 35490032 DOI: 10.1016/j.tips.2022.03.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 02/25/2022] [Accepted: 03/22/2022] [Indexed: 01/03/2023]
Abstract
Researchers, regulatory agencies, and the pharmaceutical industry are moving towards precision pharmacovigilance as a comprehensive framework for drug safety assessment, at the service of the individual patient, by clustering specific risk groups in different databases. This article explores its implementation by focusing on: (i) designing a new data collection infrastructure, (ii) exploring new computational methods suitable for drug safety data, and (iii) providing a computer-aided framework for distributed clinical decisions with the aim of compiling a personalized information leaflet with specific reference to a drug's risks and adverse drug reactions. These goals can be achieved by using 'smart hospitals' as the principal data sources and by employing methods of precision medicine and medical statistics to supplement current public health decisions.
Collapse
Affiliation(s)
- Francesco De Pretis
- VTT Technical Research Centre of Finland Ltd, 70210 Kuopio, Finland; Department of Communication and Economics, University of Modena and Reggio Emilia, 42121 Reggio Emilia, Italy.
| | - Mark van Gils
- Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland
| | - Markus M Forsberg
- VTT Technical Research Centre of Finland Ltd, 70210 Kuopio, Finland; School of Pharmacy, University of Eastern Finland, 70211 Kuopio, Finland
| |
Collapse
|
6
|
Zheng NS, Kerchberger VE, Borza VA, Eken HN, Smith JC, Wei WQ. An updated, computable MEDication-Indication resource for biomedical research. Sci Rep 2021; 11:18953. [PMID: 34556781 PMCID: PMC8460636 DOI: 10.1038/s41598-021-98579-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 09/02/2021] [Indexed: 11/09/2022] Open
Abstract
The MEDication-Indication (MEDI) knowledgebase has been utilized in research with electronic health records (EHRs) since its publication in 2013. To account for new drugs and terminology updates, we rebuilt MEDI to overhaul the knowledgebase for modern EHRs. Indications for prescribable medications were extracted using natural language processing and ontology relationships from six publicly available resources: RxNorm, Side Effect Resource 4.1, Mayo Clinic, WebMD, MedlinePlus, and Wikipedia. We compared the estimated precision and recall between the previous MEDI (MEDI-1) and the updated version (MEDI-2) with manual review. MEDI-2 contains 3031 medications and 186,064 indications. The MEDI-2 high precision subset (HPS) includes indications found within RxNorm or at least three other resources. MEDI-2 and MEDI-2 HPS contain 13% more medications and over triple the indications compared to MEDI-1 and MEDI-1 HPS, respectively. Manual review showed MEDI-2 achieves the same precision (0.60) with better recall (0.89 vs. 0.79) compared to MEDI-1. Likewise, MEDI-2 HPS had the same precision (0.92) and improved recall (0.65 vs. 0.55) than MEDI-1 HPS. The combination of MEDI-1 and MEDI-2 achieved a recall of 0.95. In updating MEDI, we present a more comprehensive medication-indication knowledgebase that can continue to facilitate applications and research with EHRs.
Collapse
Affiliation(s)
- Neil S Zheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Yale School of Medicine, New Haven, CT, USA
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Division of Allergy, Pulmonary and Critical Care Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - H Nur Eken
- Vanderbilt School of Medicine, Nashville, TN, USA
| | - Joshua C Smith
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Avenue Suite 1500, Nashville, TN, 37232-6602, USA.
| |
Collapse
|
7
|
Henry S, Wijesinghe DS, Myers A, McInnes BT. Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front Res Metr Anal 2021; 6:644728. [PMID: 34250435 PMCID: PMC8267364 DOI: 10.3389/frma.2021.644728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/07/2021] [Indexed: 12/19/2022] Open
Abstract
In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.
Collapse
Affiliation(s)
- Sam Henry
- Department of Physics, Computer Science and Engineering, Christopher Newport University, Newport News, VA, United States
| | - D. Shanaka Wijesinghe
- Department of Pharmacotherapy and Outcomes Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Aidan Myers
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Bridget T. McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| |
Collapse
|
8
|
Zhao S, Su C, Lu Z, Wang F. Recent advances in biomedical literature mining. Brief Bioinform 2021; 22:bbaa057. [PMID: 32422651 PMCID: PMC8138828 DOI: 10.1093/bib/bbaa057] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/22/2020] [Accepted: 03/25/2020] [Indexed: 01/26/2023] Open
Abstract
The recent years have witnessed a rapid increase in the number of scientific articles in biomedical domain. These literature are mostly available and readily accessible in electronic format. The domain knowledge hidden in them is critical for biomedical research and applications, which makes biomedical literature mining (BLM) techniques highly demanding. Numerous efforts have been made on this topic from both biomedical informatics (BMI) and computer science (CS) communities. The BMI community focuses more on the concrete application problems and thus prefer more interpretable and descriptive methods, while the CS community chases more on superior performance and generalization ability, thus more sophisticated and universal models are developed. The goal of this paper is to provide a review of the recent advances in BLM from both communities and inspire new research directions.
Collapse
Affiliation(s)
- Sendong Zhao
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| | - Chang Su
- Division of Health Informatics, Department of Healthcare Policy and Research at Weill Cornell Medicine at Cornell University, New York, NY, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI) at National Library of Medicine, National Institute of Health, Bethesda, MD, USA
| | - Fei Wang
- Department of Healthcare Policy and Research, Weill Medical College of Cornell University, New York, NY 10065, USA
| |
Collapse
|
9
|
Schotland P, Racz R, Jackson DB, Soldatos TG, Levin R, Strauss DG, Burkhart K. Target Adverse Event Profiles for Predictive Safety in the Postmarket Setting. Clin Pharmacol Ther 2021; 109:1232-1243. [PMID: 33090463 PMCID: PMC8246740 DOI: 10.1002/cpt.2074] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 08/31/2020] [Indexed: 12/21/2022]
Abstract
We improved a previous pharmacological target adverse-event (TAE) profile model to predict adverse events (AEs) on US Food and Drug Administration (FDA) drug labels at the time of approval. The new model uses more drugs and features for learning as well as a new algorithm. Comparator drugs sharing similar target activities to a drug of interest were evaluated by aggregating AEs from the FDA Adverse Event Reporting System (FAERS), FDA drug labels, and medical literature. An ensemble machine learning model was used to evaluate FAERS case count, disproportionality scores, percent of comparator drug labels with a specific AE, and percent of comparator drugs with the reports of the event in the literature. Overall classifier performance was F1 of 0.71, area under the precision-recall curve of 0.78, and area under the receiver operating characteristic curve of 0.87. TAE analysis continues to show promise as a method to predict adverse events at the time of approval.
Collapse
Affiliation(s)
- Peter Schotland
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
- Present address:
Office of Oncologic DiseasesOffice of New DrugsCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - Rebecca Racz
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | | | | | - Robert Levin
- Office of Surveillance and EpidemiologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - David G. Strauss
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| | - Keith Burkhart
- Division of Applied Regulatory ScienceOffice of Clinical PharmacologyCenter for Drug Evaluation and ResearchUS Food and Drug AdministrationSilver SpringMarylandUSA
| |
Collapse
|
10
|
Malec SA, Wei P, Bernstam EV, Boyce RD, Cohen T. Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance. J Biomed Inform 2021; 117:103719. [PMID: 33716168 PMCID: PMC8559730 DOI: 10.1016/j.jbi.2021.103719] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 10/21/2022]
Abstract
INTRODUCTION Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data. METHODS We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ2 and reporting odds ratio) and with each other. RESULTS AND CONCLUSIONS We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.
Collapse
Affiliation(s)
- Scott A Malec
- University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States.
| | - Peng Wei
- The University of Texas MD Anderson Cancer Center, Department of Biostatistics, Houston, TX, United States
| | - Elmer V Bernstam
- University of Texas Health Science Center at Houston, School of Biomedical Informatics, Houston, TX, United States
| | - Richard D Boyce
- University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States
| | - Trevor Cohen
- University of Washington, Department of Biomedical Informatics and Medical Education, Seattle, WA, United States
| |
Collapse
|
11
|
|
12
|
Zheng NS, Feng Q, Kerchberger VE, Zhao J, Edwards TL, Cox NJ, Stein CM, Roden DM, Denny JC, Wei WQ. PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records. J Am Med Inform Assoc 2020; 27:1675-1687. [PMID: 32974638 PMCID: PMC7751140 DOI: 10.1093/jamia/ocaa104] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 05/06/2020] [Accepted: 05/13/2020] [Indexed: 01/16/2023] Open
Abstract
OBJECTIVE Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs. MATERIALS AND METHODS PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype's quantified concepts and uses them to calculate an individual's probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2. RESULTS In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online. CONCLUSIONS PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.
Collapse
Affiliation(s)
- Neil S Zheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - QiPing Feng
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - V Eric Kerchberger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Todd L Edwards
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Nancy J Cox
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - C Michael Stein
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee, USA
| | - Dan M Roden
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Division of Clinical Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
13
|
Crichton G, Baker S, Guo Y, Korhonen A. Neural networks for open and closed Literature-based Discovery. PLoS One 2020; 15:e0232891. [PMID: 32413059 PMCID: PMC7228051 DOI: 10.1371/journal.pone.0232891] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Accepted: 04/23/2020] [Indexed: 12/18/2022] Open
Abstract
Literature-based Discovery (LBD) aims to discover new knowledge automatically from large collections of literature. Scientific literature is growing at an exponential rate, making it difficult for researchers to stay current in their discipline and easy to miss knowledge necessary to advance their research. LBD can facilitate hypothesis testing and generation and thus accelerate scientific progress. Neural networks have demonstrated improved performance on LBD-related tasks but are yet to be applied to it. We propose four graph-based, neural network methods to perform open and closed LBD. We compared our methods with those used by the state-of-the-art LION LBD system on the same evaluations to replicate recently published findings in cancer biology. We also applied them to a time-sliced dataset of human-curated peer-reviewed biological interactions. These evaluations and the metrics they employ represent performance on real-world knowledge advances and are thus robust indicators of approach efficacy. In the first experiments, our best methods performed 2-4 times better than the baselines in closed discovery and 2-3 times better in open discovery. In the second, our best methods performed almost 2 times better than the baselines in open discovery. These results are strong indications that neural LBD is potentially a very effective approach for generating new scientific discoveries from existing literature. The code for our models and other information can be found at: https://github.com/cambridgeltl/nn_for_LBD.
Collapse
Affiliation(s)
- Gamal Crichton
- Language Technology Laboratory, TAL, University of Cambridge, Cambridge, United Kingdom
| | - Simon Baker
- Language Technology Laboratory, TAL, University of Cambridge, Cambridge, United Kingdom
| | - Yufan Guo
- Language Technology Laboratory, TAL, University of Cambridge, Cambridge, United Kingdom
| | - Anna Korhonen
- Language Technology Laboratory, TAL, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
14
|
Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020; 21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.
Collapse
Affiliation(s)
- Halil Kilicoglu
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
- University of Illinois at Urbana-Champaign, School of Information Sciences, 501 E Daniel Street, Champaign, 61820 IL USA
| | - Graciela Rosemblat
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
| | | | - Dongwook Shin
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike, Bethesda, 20894 MD USA
| |
Collapse
|
15
|
Portanova J, Murray N, Mower J, Subramanian D, Cohen T. aer2vec: Distributed Representations of Adverse Event Reporting System Data as a Means to Identify Drug/Side-Effect Associations. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:717-726. [PMID: 32308867 PMCID: PMC7153155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Adverse event report (AER) data are a key source of signal for post marketing drug surveillance. The standard methodology to analyze AER data applies disproportionality metrics, which estimate the strength of drug/side-effect associations from discrete counts of their occurrence at report level. However, in other domains, improvements in predictive modeling accuracy have been obtained through representation learning, where discrete features are replaced by distributed representations learned from unlabeled data. This paper describes aer2vec, a novel representational approach for AER data in which concept embeddings emerge from neural networks trained to predict drug/side-effect co-occurrence. Trained models are evaluated for their utility in identifying drug/side-effect relationships, with improvements over disproportionality metrics in most cases. In addition, we evaluate the utility of an otherwise-untapped resource in the Food and Drug Administration (FDA) AER system - reporter designations of suspected causality - and find that incorporating this information enhances performance of all models evaluated.
Collapse
|
16
|
Mower J, Cohen T, Subramanian D. Complementing Observational Signals with Literature-Derived Distributed Representations for Post-Marketing Drug Surveillance. Drug Saf 2019; 43:67-77. [PMID: 31646442 DOI: 10.1007/s40264-019-00872-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
INTRODUCTION As a result of the well documented limitations of data collected by spontaneous reporting systems (SRS), such as bias and under-reporting, a number of authors have evaluated the utility of other data sources for the purpose of pharmacovigilance, including the biomedical literature. Previous work has demonstrated the utility of literature-derived distributed representations (concept embeddings) with machine learning for the purpose of drug side-effect prediction. In terms of data sources, these methods are complementary, observing drug safety from two different perspectives (knowledge extracted from the literature and statistics from SRS data). However, the combined utility of these pharmacovigilance methods has yet to be evaluated. OBJECTIVE This research investigates the utility of directly or indirectly combining an observational signal from SRS with literature-derived distributed representations into a single feature vector or in an ensemble approach for downstream machine learning (logistic regression). METHODS Leveraging a recently developed representation scheme, concept embeddings were generated from relational connections extracted from the literature and composed to represent drug and associated adverse reactions, as defined by two reference standards of positive (likely causal) and negative (no causal evidence) pairs. Embeddings were presented with and without common measures of observational signal from SRS sources to logistic regressors, and performance was evaluated with the receiver operating characteristic (ROC) area under the curve (AUC) metric. RESULTS ROC AUC performance with these composite models improves up to ≈ 20% over SRS-based disproportionality metrics alone and exceeds the best prior results reported in the literature when models leverage both sources of information. CONCLUSIONS Results from this study support the hypothesis that knowledge extracted from the literature can enhance the performance of SRS-based methods (and vice versa). Across reference sets, using literature and SRS information together performed better than using either source alone, providing strong support for the complementary nature of these approaches to post-marketing drug surveillance.
Collapse
Affiliation(s)
- Justin Mower
- Department of Computer Science, Rice University, Houston, TX, 77018, USA.
| | - Trevor Cohen
- University of Washington, Biomedical Informatics and Medical Education, Seattle, WA, 98195, USA
| | - Devika Subramanian
- Department of Computer Science, Rice University, Houston, TX, 77018, USA
| |
Collapse
|
17
|
Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches. Front Pharmacol 2019; 10:415. [PMID: 31156424 PMCID: PMC6533857 DOI: 10.3389/fphar.2019.00415] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 04/02/2019] [Indexed: 12/12/2022] Open
Abstract
Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.
Collapse
Affiliation(s)
- Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece.,Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Andigoni Malousi
- Laboratory of Biological Chemistry, Department of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Cédric Bousquet
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France.,Public Health and Medical Information Unit, University Hospital of Saint-Etienne, Saint-Étienne, France
| | - Marie-Christine Jaulent
- Sorbonne Université, INSERM, Univ Paris 13, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, Paris, France
| | - Vassilis Koutkias
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
18
|
Fathiamini S, Johnson AM, Zeng J, Holla V, Sanchez NS, Meric-Bernstam F, Bernstam EV, Cohen T. Rapamycin - mTOR + BRAF = ? Using relational similarity to find therapeutically relevant drug-gene relationships in unstructured text. J Biomed Inform 2019; 90:103094. [PMID: 30615938 PMCID: PMC6386529 DOI: 10.1016/j.jbi.2019.103094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2018] [Revised: 11/30/2018] [Accepted: 12/27/2018] [Indexed: 11/17/2022]
Affiliation(s)
- Safa Fathiamini
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, United States.
| | - Amber M Johnson
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Jia Zeng
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Vijaykumar Holla
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Nora S Sanchez
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Funda Meric-Bernstam
- Sheikh Khalifa Al Nahyan Ben Zayed Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, United States; Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States; Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Elmer V Bernstam
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, United States; Division of General Internal Medicine, Department of Internal Medicine, The University of Texas Health Science Center at Houston, TX, United States.
| | - Trevor Cohen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States.
| |
Collapse
|
19
|
Thilakaratne M, Falkner K, Atapattu T. A systematic review on literature-based discovery workflow. PeerJ Comput Sci 2019; 5:e235. [PMID: 33816888 PMCID: PMC7924697 DOI: 10.7717/peerj-cs.235] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/17/2019] [Indexed: 05/02/2023]
Abstract
As scientific publication rates increase, knowledge acquisition and the research development process have become more complex and time-consuming. Literature-Based Discovery (LBD), supporting automated knowledge discovery, helps facilitate this process by eliciting novel knowledge by analysing existing scientific literature. This systematic review provides a comprehensive overview of the LBD workflow by answering nine research questions related to the major components of the LBD workflow (i.e., input, process, output, and evaluation). With regards to the input component, we discuss the data types and data sources used in the literature. The process component presents filtering techniques, ranking/thresholding techniques, domains, generalisability levels, and resources. Subsequently, the output component focuses on the visualisation techniques used in LBD discipline. As for the evaluation component, we outline the evaluation techniques, their generalisability, and the quantitative measures used to validate results. To conclude, we summarise the findings of the review for each component by highlighting the possible future research directions.
Collapse
Affiliation(s)
- Menasha Thilakaratne
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Katrina Falkner
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Thushari Atapattu
- Faculty of Engineering, Computer and Mathematical Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| |
Collapse
|
20
|
Chu J, Dong W, He K, Duan H, Huang Z. Using neural attention networks to detect adverse medical events from electronic health records. J Biomed Inform 2018; 87:118-130. [DOI: 10.1016/j.jbi.2018.10.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 10/10/2018] [Accepted: 10/12/2018] [Indexed: 01/24/2023]
|
21
|
La MK, Sedykh A, Fourches D, Muratov E, Tropsha A. Predicting Adverse Drug Effects from Literature- and Database-Mined Assertions. Drug Saf 2018; 41:1059-1072. [PMID: 29876834 PMCID: PMC6212308 DOI: 10.1007/s40264-018-0688-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
INTRODUCTION Given that adverse drug effects (ADEs) have led to post-market patient harm and subsequent drug withdrawal, failure of candidate agents in the drug development process, and other negative outcomes, it is essential to attempt to forecast ADEs and other relevant drug-target-effect relationships as early as possible. Current pharmacologic data sources, providing multiple complementary perspectives on the drug-target-effect paradigm, can be integrated to facilitate the inference of relationships between these entities. OBJECTIVE This study aims to identify both existing and unknown relationships between chemicals (C), protein targets (T), and ADEs (E) based on evidence in the literature. MATERIALS AND METHODS Cheminformatics and data mining approaches were employed to integrate and analyze publicly available clinical pharmacology data and literature assertions interrelating drugs, targets, and ADEs. Based on these assertions, a C-T-E relationship knowledge base was developed. Known pairwise relationships between chemicals, targets, and ADEs were collected from several pharmacological and biomedical data sources. These relationships were curated and integrated according to Swanson's paradigm to form C-T-E triangles. Missing C-E edges were then inferred as C-E relationships. RESULTS Unreported associations between drugs, targets, and ADEs were inferred, and inferences were prioritized as testable hypotheses. Several C-E inferences, including testosterone → myocardial infarction, were identified using inferences based on the literature sources published prior to confirmatory case reports. Timestamping approaches confirmed the predictive ability of this inference strategy on a larger scale. CONCLUSIONS The presented workflow, based on free-access databases and an association-based inference scheme, provided novel C-E relationships that have been validated post hoc in case reports. With refinement of prioritization schemes for the generated C-E inferences, this workflow may provide an effective computational method for the early detection of potential drug candidate ADEs that can be followed by targeted experimental investigations.
Collapse
Affiliation(s)
- Mary K La
- Division of Practice Advancement and Clinical Education, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
| | - Alexander Sedykh
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
- Sciome LLC, 2 Davis Drive, Research Triangle Park, NC, 27709, USA
| | - Denis Fourches
- Department of Chemistry, North Carolina State University, Raleigh, NC, 27695, USA
| | - Eugene Muratov
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, 301 Pharmacy Lane, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
22
|
Abatemarco D, Perera S, Bao SH, Desai S, Assuncao B, Tetarenko N, Danysz K, Mockute R, Widdowson M, Fornarotto N, Beauchamp S, Cicirello S, Mingle E. Training Augmented Intelligent Capabilities for Pharmacovigilance: Applying Deep-learning Approaches to Individual Case Safety Report Processing. Pharmaceut Med 2018; 32:391-401. [PMID: 30546259 PMCID: PMC6267537 DOI: 10.1007/s40290-018-0251-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Introduction Regulations are increasing the scope of activities that fall under the remit of drug safety. Currently, individual case safety report (ICSR) collection and collation is done manually, requiring pharmacovigilance professionals to perform many transactional activities before data are available for assessment and aggregated analyses. For a biopharmaceutical company to meet its responsibilities to patients and regulatory bodies regarding the safe use and distribution of its products, improved business processes must be implemented to drive the industry forward in the best interest of patients globally. Augmented intelligent capabilities have already demonstrated success in capturing adverse events from diverse data sources. It has potential to provide a scalable solution for handling the ever-increasing ICSR volumes experienced within the industry by supporting pharmacovigilance professionals’ decision-making. Objective The aim of this study was to train and evaluate a consortium of cognitive services to identify key characteristics of spontaneous ICSRs satisfying an acceptable level of accuracy determined by considering business requirements and effective use in a real-world setting. The results of this study will serve as supporting evidence for or against implementing augmented intelligence in case processing to increase operational efficiency and data quality consistency. Methods A consortium of ten cognitive services to augment aspects of ICSR processing were identified and trained through deep-learning approaches. The input data for model training were 20,000 ICSRs received by Celgene drug safety over a 2-year period. The data were manually made machine-readable through the process of transcription, which converts images into text. The machine-readable documents were manually annotated for pharmacovigilance data elements to facilitate the training and testing of the cognitive services. Once trained by cognitive developers, the cognitive services’ output was reviewed by pharmacovigilance subject-matter experts against the accepted ground-truth for correctness and completeness. To be considered adequately trained and functional, each cognitive service was required to reach a threshold of F1 or accuracy score ≥ 75%. Results All ten cognitive services under development have reached an evaluative score ≥ 75% for spontaneous ICSRs. Conclusion All cognitive services under development have achieved the minimum evaluative threshold to be considered adequately trained, demonstrating how machine-learning and natural language processing techniques together provide accurate outputs that may augment pharmacovigilance professionals’ processing of spontaneous ICSRs quickly and accurately. The intention of augmented intelligence is not to replace the pharmacovigilance professional, but rather support them in their consistent decision-making so that they may better handle the overwhelming amount of data otherwise manually curated and monitored for ongoing drug surveillance requirements. Through this supported decision-making, pharmacovigilance professionals may have more time to apply their knowledge in assessing the case rather than spending it performing transactional tasks to simply capture the pertinent data within a safety database. By capturing data consistently and efficiently, we begin to build a corpus of data upon which analyses may be conducted and insights gleaned. Cognitive services may be key to an organization’s transformation to more proactive decision-making needed to meet regulatory requirements and enhance patient safety.
Collapse
Affiliation(s)
| | - Sujan Perera
- IBM Watson Health, 75 Binney Street, Cambridge, MA 02142 USA
| | - Sheng Hua Bao
- IBM Watson Health, 75 Binney Street, Cambridge, MA 02142 USA
| | - Sameen Desai
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | - Bruno Assuncao
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | - Niki Tetarenko
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | - Karolina Danysz
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | - Ruta Mockute
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | - Mark Widdowson
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| | | | | | | | - Edward Mingle
- 1Celgene Corporation, 86 Morris Avenue, Summit, NJ 07901 USA
| |
Collapse
|
23
|
Wolfe L, Chisolm MS, Bohsali F. Clinically Excellent Use of the Electronic Health Record: Review. JMIR Hum Factors 2018; 5:e10426. [PMID: 30291099 PMCID: PMC6231887 DOI: 10.2196/10426] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 06/27/2018] [Accepted: 07/17/2018] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The transition to the electronic health record (EHR) has brought forth a rapid cultural shift in the world of medicine, presenting both new challenges as well as opportunities for improving health care. As clinicians work to adapt to the changes imposed by the EHR, identification of best practices around the clinically excellent use of the EHR is needed. OBJECTIVE Using the domains of clinical excellence previously defined by the Johns Hopkins Miller Coulson Academy of Clinical Excellence, this review aims to identify best practices around the clinically excellent use of the EHR. METHODS The authors searched the PubMed database, using keywords related to clinical excellence domains and the EHR, to capture the English-language, peer-reviewed literature published between January 1, 2000, and August 2, 2016. One author independently reviewed each article and extracted relevant data. RESULTS The search identified 606 titles, with the majority (393/606, 64.9%) in the domain of communication and interpersonal skills. Twenty-eight of the 606 (4.6%) titles were excluded from full-text review, primarily due to lack of availability of the full-text article. The remaining 578 full-text articles reviewed were related to clinical excellence generally (3/578, 0.5%) or the specific domains of communication and interpersonal skills (380/578, 65.7%), diagnostic acumen (31/578, 5.4%), skillful negotiation of the health care system (4/578, 0.7%), scholarly approach to clinical practice (41/578, 7.1%), professionalism and humanism (2/578, 0.4%), knowledge (97/578, 16.8%), and passion for clinical medicine (20/578, 3.5%). CONCLUSIONS Results suggest that as familiarity and expertise are developed, clinicians are leveraging the EHR to provide clinically excellent care. Best practices identified included deliberate physical configuration of the clinical space to involve sharing the screen with patients and limiting EHR use during difficult and emotional topics. Promising horizons for the EHR include the ability to augment participation in pragmatic trials, identify adverse drug effects, correlate genomic data to clinical outcomes, and follow data-driven guidelines. Clinician and patient satisfaction with the EHR has generally improved with time, and hopefully continued clinician, and patient input will lead to a system that satisfies all.
Collapse
Affiliation(s)
- Leah Wolfe
- Division of General Internal Medicine, Johns Hopkins Bayview Medical Center, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Margaret Smith Chisolm
- Department of Psychiatry and Behavioral Sciences, School of Medicine, Johns Hopkins University, Baltimore, MD, United States
| | - Fuad Bohsali
- Department of Medicine, School of Medicine, Duke University, Durham, NC, United States
| |
Collapse
|
24
|
Trifirò G, Sultana J, Bate A. From Big Data to Smart Data for Pharmacovigilance: The Role of Healthcare Databases and Other Emerging Sources. Drug Saf 2018; 41:143-149. [PMID: 28840504 DOI: 10.1007/s40264-017-0592-4] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
In the last decade 'big data' has become a buzzword used in several industrial sectors, including but not limited to telephony, finance and healthcare. Despite its popularity, it is not always clear what big data refers to exactly. Big data has become a very popular topic in healthcare, where the term primarily refers to the vast and growing volumes of computerized medical information available in the form of electronic health records, administrative or health claims data, disease and drug monitoring registries and so on. This kind of data is generally collected routinely during administrative processes and clinical practice by different healthcare professionals: from doctors recording their patients' medical history, drug prescriptions or medical claims to pharmacists registering dispensed prescriptions. For a long time, this data accumulated without its value being fully recognized and leveraged. Today big data has an important place in healthcare, including in pharmacovigilance. The expanding role of big data in pharmacovigilance includes signal detection, substantiation and validation of drug or vaccine safety signals, and increasingly new sources of information such as social media are also being considered. The aim of the present paper is to discuss the uses of big data for drug safety post-marketing assessment.
Collapse
Affiliation(s)
- Gianluca Trifirò
- Department of Biomedical and Dental Sciences and Morpho-Functional Imaging, University of Messina, Messina, Italy.
- Department of Medical Informatics, Erasmus Medical Centre, Rotterdam, The Netherlands.
| | - Janet Sultana
- Department of Biomedical and Dental Sciences and Morpho-Functional Imaging, University of Messina, Messina, Italy
- Department of Medical Informatics, Erasmus Medical Centre, Rotterdam, The Netherlands
| | - Andrew Bate
- Epidemiology Group Lead, Analytics, Worldwide Safety, Pfizer, Tadworth, UK
- Department of Clinical Pharmacology, New York University (NYU), New York, USA
| |
Collapse
|
25
|
Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc 2018; 25:1339-1350. [PMID: 30010902 PMCID: PMC6454491 DOI: 10.1093/jamia/ocy077] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/23/2018] [Accepted: 06/05/2018] [Indexed: 02/01/2023] Open
Abstract
Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.
Collapse
Affiliation(s)
- Justin Mower
- Baylor College of Medicine, Quantitative and Computational Biosciences, Houston, Texas, USA
| | | | - Trevor Cohen
- School of Biomedical Informatics, University of Texas Health Science Center Houston, Texas, USA
| |
Collapse
|
26
|
Usui M, Aramaki E, Iwao T, Wakamiya S, Sakamoto T, Mochizuki M. Extraction and Standardization of Patient Complaints from Electronic Medication Histories for Pharmacovigilance: Natural Language Processing Analysis in Japanese. JMIR Med Inform 2018; 6:e11021. [PMID: 30262450 PMCID: PMC6231790 DOI: 10.2196/11021] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 08/07/2018] [Accepted: 08/25/2018] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Despite the growing number of studies using natural language processing for pharmacovigilance, there are few reports on manipulating free text patient information in Japanese. OBJECTIVE This study aimed to establish a method of extracting and standardizing patient complaints from electronic medication histories accumulated in a Japanese community pharmacy for the detection of possible adverse drug event (ADE) signals. METHODS Subjective information included in electronic medication history data provided by a Japanese pharmacy operating in Hiroshima, Japan from September 1, 2015 to August 31, 2016, was used as patients' complaints. We formulated search rules based on morphological analysis and daily (nonmedical) speech and developed a system that automatically executes the search rules and annotates free text data with International Classification of Diseases, Tenth Revision (ICD-10) codes. The performance of the system was evaluated through comparisons with data manually annotated by health care workers for a data set of 5000 complaints. RESULTS Of 5000 complaints, the system annotated 2236 complaints with ICD-10 codes, whereas health care workers annotated 2348 statements. There was a match in the annotation of 1480 complaints between the system and manual work. System performance was .66 regarding precision, .63 in recall, and .65 for the F-measure. CONCLUSIONS Our results suggest that the system may be helpful in extracting and standardizing patients' speech related to symptoms from massive amounts of free text data, replacing manual work. After improving the extraction accuracy, we expect to utilize this system to detect signals of possible ADEs from patients' complaints in the future.
Collapse
Affiliation(s)
- Misa Usui
- Division of Hospital Pharmacy Science, Graduate School of Pharmaceutical Sciences, Keio University, Tokyo, Japan
| | - Eiji Aramaki
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Tomohide Iwao
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | - Shoko Wakamiya
- Social Computing Lab, Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan
| | | | - Mayumi Mochizuki
- Division of Hospital Pharmacy Science, Faculty of Pharmacy, Keio University, Tokyo, Japan.,Department of Pharmacy, Keio University Hospital, Tokyo, Japan
| |
Collapse
|
27
|
Wong A, Plasek JM, Montecalvo SP, Zhou L. Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges. Pharmacotherapy 2018; 38:822-841. [PMID: 29884988 DOI: 10.1002/phar.2151] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The safety of medication use has been a priority in the United States since the late 1930s. Recently, it has gained prominence due to the increasing amount of data suggesting that a large amount of patient harm is preventable and can be mitigated with effective risk strategies that have not been sufficiently adopted. Adverse events from medications are part of clinical practice, but the ability to identify a patient's risk and to minimize that risk must be a priority. The ability to identify adverse events has been a challenge due to limitations of available data sources, which are often free text. The use of natural language processing (NLP) may help to address these limitations. NLP is the artificial intelligence domain of computer science that uses computers to manipulate unstructured data (i.e., narrative text or speech data) in the context of a specific task. In this narrative review, we illustrate the fundamentals of NLP and discuss NLP's application to medication safety in four data sources: electronic health records, Internet-based data, published literature, and reporting systems. Given the magnitude of available data from these sources, a growing area is the use of computer algorithms to help automatically detect associations between medications and adverse effects. The main benefit of NLP is in the time savings associated with automation of various medication safety tasks such as the medication reconciliation process facilitated by computers, as well as the potential for near-real-time identification of adverse events for postmarketing surveillance such as those posted on social media that would otherwise go unanalyzed. NLP is limited by a lack of data sharing between health care organizations due to insufficient interoperability capabilities, inhibiting large-scale adverse event monitoring across populations. We anticipate that future work in this area will focus on the integration of data sources from different domains to improve the ability to identify potential adverse events more quickly and to improve clinical decision support with regard to a patient's estimated risk for specific adverse events at the time of medication prescription or review.
Collapse
Affiliation(s)
- Adrian Wong
- Department of Pharmacy and Therapeutics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts.,Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah
| | | | - Li Zhou
- Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
28
|
Hwang Y, Oh M, Jang G, Lee T, Park C, Ahn J, Yoon Y. Identifying the common genetic networks of ADR (adverse drug reaction) clusters and developing an ADR classification model. MOLECULAR BIOSYSTEMS 2018; 13:1788-1796. [PMID: 28702565 DOI: 10.1039/c7mb00059f] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Adverse drug reactions (ADRs) are one of the major concerns threatening public health and have resulted in failures in drug development. Thus, predicting ADRs and discovering the mechanisms underlying ADRs have become important tasks in pharmacovigilance. Identification of potential ADRs by computational approaches in the early stages would be advantageous in drug development. Here we propose a computational method that elucidates the action mechanisms of ADRs and predicts potential ADRs by utilizing ADR genes, drug features, and protein-protein interaction (PPI) networks. If some ADRs share similar features, there is a high possibility that they may appear together in a drug and share analogous mechanisms. Proceeding from this assumption, we clustered ADRs according to interactions of ADR genes in the PPI networks and the frequency of co-occurrence of ADRs in drugs. ADR clusters were verified based on a side effect database and literature data regarding whether ADRs have relevance to other ADRs in the same cluster. Gene networks shared by ADRs in each cluster were constructed by cumulating the shortest paths between drug target genes and ADR genes in the PPI network. We developed a classification model to predict potential ADRs using these gene networks shared by ADRs and calculated cross-validation AUC (area under the curve) values for each ADR cluster. In addition, in order to demonstrate correlations between gene networks shared by ADRs and ADRs in a cluster, we applied the Wilcoxon rank sum statistical test to the literature data and results of a Google query search. We attained statistically meaningful p-values (<0.05) for every ADR cluster. The results suggest that our approach provides insights into discovering the action mechanisms of ADRs and is a novel attempt to predict ADRs in a biological aspect.
Collapse
Affiliation(s)
- Youhyeon Hwang
- Dept. of Computer Science, University of Southern California, USA.
| | | | | | | | | | | | | |
Collapse
|
29
|
Smalheiser NR. Rediscovering Don Swanson: the Past, Present and Future of Literature-Based Discovery. JOURNAL OF DATA AND INFORMATION SCIENCE 2017; 2:43-64. [PMID: 29355246 PMCID: PMC5771422 DOI: 10.1515/jdis-2017-0019] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The late Don R. Swanson was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles. In this informal essay, I will give my personal perspective on Don's contributions to science, and outline some current and future directions in literature-based discovery that are rooted in concepts that he developed.
Collapse
Affiliation(s)
- Neil R Smalheiser
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612 USA, +1 312-413-4581
| |
Collapse
|
30
|
Henry S, McInnes BT. Literature Based Discovery: Models, methods, and trends. J Biomed Inform 2017; 74:20-32. [PMID: 28838802 DOI: 10.1016/j.jbi.2017.08.011] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/21/2017] [Accepted: 08/20/2017] [Indexed: 01/25/2023]
Abstract
OBJECTIVES This paper provides an introduction and overview of literature based discovery (LBD) in the biomedical domain. It introduces the reader to modern and historical LBD models, key system components, evaluation methodologies, and current trends. After completion, the reader will be familiar with the challenges and methodologies of LBD. The reader will be capable of distinguishing between recent LBD systems and publications, and be capable of designing an LBD system for a specific application. TARGET AUDIENCE From biomedical researchers curious about LBD, to someone looking to design an LBD system, to an LBD expert trying to catch up on trends in the field. The reader need not be familiar with LBD, but knowledge of biomedical text processing tools is helpful. SCOPE This paper describes a unifying framework for LBD systems. Within this framework, different models and methods are presented to both distinguish and show overlap between systems. Topics include term and document representation, system components, and an overview of models including co-occurrence models, semantic models, and distributional models. Other topics include uninformative term filtering, term ranking, results display, system evaluation, an overview of the application areas of drug development, drug repurposing, and adverse drug event prediction, and challenges and future directions. A timeline showing contributions to LBD, and a table summarizing the works of several authors is provided. Topics are presented from a high level perspective. References are given if more detailed analysis is required.
Collapse
Affiliation(s)
- Sam Henry
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA.
| | - Bridget T McInnes
- Department of Computer Science, Virginia Commonwealth University, 401 S. Main St., Rm E4222, Richmond, VA 23284, USA
| |
Collapse
|
31
|
Raja K, Patrick M, Elder JT, Tsoi LC. Machine learning workflow to enhance predictions of Adverse Drug Reactions (ADRs) through drug-gene interactions: application to drugs for cutaneous diseases. Sci Rep 2017. [PMID: 28623363 PMCID: PMC5473874 DOI: 10.1038/s41598-017-03914-3] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Adverse drug reactions (ADRs) pose critical public health issues, affecting over 6% of hospitalized patients. While knowledge of potential drug-drug interactions (DDI) is necessary to prevent ADR, the rapid pace of drug discovery makes it challenging to maintain a strong insight into DDIs. In this study, we present a novel literature-mining framework for enhancing the predictions of DDIs and ADR types by integrating drug-gene interactions (DGIs). The ADR types were adapted from a DDI corpus, including i) adverse effect; ii) effect at molecular level; iii) effect related to pharmacokinetics; and iv) DDIs without known ADRs. By using random forest classifier our approach achieves an F-score of 0.87 across the ADRs classification using only the DDI features. We then enhanced the performance of the classifier by including DGIs (F-score = 0.90), and applied the classification model trained with the DDI corpus to identify the drugs that might interact with the drugs for cutaneous diseases. We successfully predict previously known ADRs for drugs prescribed to cutaneous diseases, and are also able to identify promising new ADRs.
Collapse
Affiliation(s)
- Kalpana Raja
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Matthew Patrick
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - James T Elder
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Lam C Tsoi
- Department of Dermatology, University of Michigan Medical School, Ann Arbor, MI, USA. .,Department of Computational Medicine & Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA. .,Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
32
|
Abstract
Background Literature based discovery (LBD) automatically infers missed connections between concepts in literature. It is often assumed that LBD generates more information than can be reasonably examined. Methods We present a detailed analysis of the quantity of hidden knowledge produced by an LBD system and the effect of various filtering approaches upon this. The investigation of filtering combined with single or multi-step linking term chains is carried out on all articles in PubMed. Results The evaluation is carried out using both replication of existing discoveries, which provides justification for multi-step linking chain knowledge in specific cases, and using timeslicing, which gives a large scale measure of performance. Conclusions While the quantity of hidden knowledge generated by LBD can be vast, we demonstrate that (a) intelligent filtering can greatly reduce the number of hidden knowledge pairs generated, (b) for a specific term, the number of single step connections can be manageable, and (c) in the absence of single step hidden links, considering multiple steps can provide valid links.
Collapse
Affiliation(s)
- Judita Preiss
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, UK.
| | - Mark Stevenson
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, UK
| |
Collapse
|
33
|
Abstract
AbstractLiterature-based discovery systems aim at discovering valuable latent connections between previously disparate research areas. This is achieved by analyzing the contents of their respective literatures with the help of various intelligent computational techniques. In this paper, we review the progress of literature-based discovery research, focusing on understanding their technical features and evaluating their performance. The present literature-based discovery techniques can be divided into two general approaches: the traditional approach and the emerging approach. The traditional approach, which dominate the current research landscape, comprises mainly of techniques that rely on utilizing lexical statistics, knowledge-based and visualization methods in order to address literature-based discovery problems. On the other hand, we have also observed the births of new trends and unprecedented paradigm shifts among the recently emerging literature-based discovery approach. These trends are likely to shape the future trajectory of the next generation literature-based discovery systems.
Collapse
|
34
|
Cohen T, Widdows D. Embedding of semantic predications. J Biomed Inform 2017; 68:150-166. [PMID: 28284761 DOI: 10.1016/j.jbi.2017.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Revised: 02/27/2017] [Accepted: 03/05/2017] [Indexed: 11/20/2022]
Abstract
This paper concerns the generation of distributed vector representations of biomedical concepts from structured knowledge, in the form of subject-relation-object triplets known as semantic predications. Specifically, we evaluate the extent to which a representational approach we have developed for this purpose previously, known as Predication-based Semantic Indexing (PSI), might benefit from insights gleaned from neural-probabilistic language models, which have enjoyed a surge in popularity in recent years as a means to generate distributed vector representations of terms from free text. To do so, we develop a novel neural-probabilistic approach to encoding predications, called Embedding of Semantic Predications (ESP), by adapting aspects of the Skipgram with Negative Sampling (SGNS) algorithm to this purpose. We compare ESP and PSI across a number of tasks including recovery of encoded information, estimation of semantic similarity and relatedness, and identification of potentially therapeutic and harmful relationships using both analogical retrieval and supervised learning. We find advantages for ESP in some, but not all of these tasks, revealing the contexts in which the additional computational work of neural-probabilistic modeling is justified.
Collapse
Affiliation(s)
- Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, United States.
| | | |
Collapse
|
35
|
Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semantics 2017; 8:11. [PMID: 28270198 PMCID: PMC5341176 DOI: 10.1186/s13326-017-0115-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 01/13/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Integrating multiple sources of pharmacovigilance evidence has the potential to advance the science of safety signal detection and evaluation. In this regard, there is a need for more research on how to integrate multiple disparate evidence sources while making the evidence computable from a knowledge representation perspective (i.e., semantic enrichment). Existing frameworks suggest well-promising outcomes for such integration but employ a rather limited number of sources. In particular, none have been specifically designed to support both regulatory and clinical use cases, nor have any been designed to add new resources and use cases through an open architecture. This paper discusses the architecture and functionality of a system called Large-scale Adverse Effects Related to Treatment Evidence Standardization (LAERTES) that aims to address these shortcomings. RESULTS LAERTES provides a standardized, open, and scalable architecture for linking evidence sources relevant to the association of drugs with health outcomes of interest (HOIs). Standard terminologies are used to represent different entities. For example, drugs and HOIs are represented in RxNorm and Systematized Nomenclature of Medicine -- Clinical Terms respectively. At the time of this writing, six evidence sources have been loaded into the LAERTES evidence base and are accessible through prototype evidence exploration user interface and a set of Web application programming interface services. This system operates within a larger software stack provided by the Observational Health Data Sciences and Informatics clinical research framework, including the relational Common Data Model for observational patient data created by the Observational Medical Outcomes Partnership. Elements of the Linked Data paradigm facilitate the systematic and scalable integration of relevant evidence sources. CONCLUSIONS The prototype LAERTES system provides useful functionality while creating opportunities for further research. Future work will involve improving the method for normalizing drug and HOI concepts across the integrated sources, aggregated evidence at different levels of a hierarchy of HOI concepts, and developing more advanced user interface for drug-HOI investigations.
Collapse
|
36
|
Mower J, Subramanian D, Shang N, Cohen T. Classification-by-Analogy: Using Vector Representations of Implicit Relationships to Identify Plausibly Causal Drug/Side-effect Relationships. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1940-1949. [PMID: 28269953 PMCID: PMC5333205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
An important aspect of post-marketing drug surveillance involves identifying potential side-effects utilizing adverse drug event (ADE) reporting systems and/or Electronic Health Records. These data are noisy, necessitating identified drug/ADE associations be manually reviewed - a human-intensive process that scales poorly with large numbers of possibly dangerous associations and rapid growth of biomedical literature. Recent work has employed Literature Based Discovery methods that exploit implicit relationships between biomedical entities within the literature to estimate the plausibility of drug/ADE connections. We extend this work by evaluating machine learning classifiers applied to high-dimensional vector representations of relationships extracted from the literature as a means to identify substantiated drug/ADE connections. Using a curated reference standard, we show applying classifiers to such representations improves performance (+≈37%AUC) over previous approaches. These trained systems reproduce outcomes of the manual literature review process used to create the reference standard, but further research is required to establish their generalizability.
Collapse
Affiliation(s)
- Justin Mower
- Baylor College of Medicine, Houston, Texas;; University of Texas Health Science Center at Houston, Houston, Texas
| | | | | | - Trevor Cohen
- Baylor College of Medicine, Houston, Texas;; University of Texas Health Science Center at Houston, Houston, Texas
| |
Collapse
|
37
|
Malec SA, Wei P, Xu H, Bernstam EV, Myneni S, Cohen T. Literature-Based Discovery of Confounding in Observational Clinical Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1920-1929. [PMID: 28269951 PMCID: PMC5333204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Observational data recorded in the Electronic Health Record (EHR) can help us better understand the effects of therapeutic agents in routine clinical practice. As such data were not collected for research purposes, their reuse for research must compensate for additional information that may bias analyses and lead to faulty conclusions. Confounding is present when factors aside from the given predictor(s) affect the response of interest. However, these additional factors may not be known at the outset. In this paper, we present a scalable literature-based confounding variable discovery method for biomedical research applications with pharmacovigilance as our use case. We hypothesized that statistical models, adjusted with literature-derived confounders, will more accurately identify causative drug-adverse drug event (ADE) relationships. We evaluated our method with a curated reference standard, and found a pattern of improved performance ~ 5% in two out of three models for gastrointestinal bleeding (pre-adjusted Area Under Curve ≥ 0.6).
Collapse
Affiliation(s)
| | | | - Hua Xu
- School of Biomedical Informatics
| | - Elmer V Bernstam
- School of Biomedical Informatics; Division of General Internal Medicine, Medical School, Houston, TX
| | | | | |
Collapse
|
38
|
Koutkias VG, Lillo-Le Louët A, Jaulent MC. Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert Opin Drug Saf 2016; 16:113-124. [PMID: 27813420 DOI: 10.1080/14740338.2017.1257604] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
OBJECTIVE Driven by the need of pharmacovigilance centres and companies to routinely collect and review all available data about adverse drug reactions (ADRs) and adverse events of interest, we introduce and validate a computational framework exploiting dominant as well as emerging publicly available data sources for drug safety surveillance. METHODS Our approach relies on appropriate query formulation for data acquisition and subsequent filtering, transformation and joint visualization of the obtained data. We acquired data from the FDA Adverse Event Reporting System (FAERS), PubMed and Twitter. In order to assess the validity and the robustness of the approach, we elaborated on two important case studies, namely, clozapine-induced cardiomyopathy/myocarditis versus haloperidol-induced cardiomyopathy/myocarditis, and apixaban-induced cerebral hemorrhage. RESULTS The analysis of the obtained data provided interesting insights (identification of potential patient and health-care professional experiences regarding ADRs in Twitter, information/arguments against an ADR existence across all sources), while illustrating the benefits (complementing data from multiple sources to strengthen/confirm evidence) and the underlying challenges (selecting search terms, data presentation) of exploiting heterogeneous information sources, thereby advocating the need for the proposed framework. CONCLUSIONS This work contributes in establishing a continuous learning system for drug safety surveillance by exploiting heterogeneous publicly available data sources via appropriate support tools.
Collapse
Affiliation(s)
- Vassilis G Koutkias
- a Institute of Applied Biosciences , Centre for Research & Technology Hellas , Thermi , Thessaloniki , Greece.,b INSERM, U1142, LIMICS , F-75006 , Paris , France.,c Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS 1142, LIMICS, F-75006 , Paris , France.,d Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142) , F-93430 , Villetaneuse , France
| | - Agnès Lillo-Le Louët
- e Centre Reìgional de Pharmacovigilance, Hôpital Européen Georges-Pompidou, AP-HP , F-75015 , Paris , France
| | - Marie-Christine Jaulent
- b INSERM, U1142, LIMICS , F-75006 , Paris , France.,c Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1142, LIMICS 1142, LIMICS, F-75006 , Paris , France.,d Université Paris 13, Sorbonne Paris Cité, LIMICS, (UMR_S 1142) , F-93430 , Villetaneuse , France
| |
Collapse
|
39
|
Teixeira PL, Wei WQ, Cronin RM, Mo H, VanHouten JP, Carroll RJ, LaRose E, Bastarache LA, Rosenbloom ST, Edwards TL, Roden DM, Lasko TA, Dart RA, Nikolai AM, Peissig PL, Denny JC. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc 2016; 24:162-171. [PMID: 27497800 DOI: 10.1093/jamia/ocw071] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2015] [Revised: 04/03/2016] [Accepted: 04/07/2016] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time- and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. MATERIALS AND METHODS We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. RESULTS Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. CONCLUSION This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.
Collapse
Affiliation(s)
- Pedro L Teixeira
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Robert M Cronin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Huan Mo
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Jacob P VanHouten
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Eric LaRose
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Lisa A Bastarache
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Todd L Edwards
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.,Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Thomas A Lasko
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Richard A Dart
- Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 N Oak Ave-MLR, Marshfield, WI 54449, USA
| | - Anne M Nikolai
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 N Oak Ave - ML8, Marshfield, WI 54449, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA .,Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
40
|
Kastrin A, Rindflesch TC, Hristovski D. Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery. Methods Inf Med 2016; 55:340-6. [PMID: 27435341 DOI: 10.3414/me15-01-0108] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 05/19/2016] [Indexed: 12/24/2022]
Abstract
OBJECTIVES Literature-based discovery (LBD) is a text mining methodology for automatically generating research hypotheses from existing knowledge. We mimic the process of LBD as a classification problem on a graph of MeSH terms. We employ unsupervised and supervised link prediction methods for predicting previously unknown connections between biomedical concepts. METHODS We evaluate the effectiveness of link prediction through a series of experiments using a MeSH network that contains the history of link formation between biomedical concepts. We performed link prediction using proximity measures, such as common neighbor (CN), Jaccard coefficient (JC), Adamic / Adar index (AA) and preferential attachment (PA). Our approach relies on the assumption that similar nodes are more likely to establish a link in the future. RESULTS Applying an unsupervised approach, the AA measure achieved the best performance in terms of area under the ROC curve (AUC = 0.76), followed by CN, JC, and PA. In a supervised approach, we evaluate whether proximity measures can be combined to define a model of link formation across all four predictors. We applied various classifiers, including decision trees, k-nearest neighbors, logistic regression, multilayer perceptron, naïve Bayes, and random forests. Random forest classifier accomplishes the best performance (AUC = 0.87). CONCLUSIONS The link prediction approach proved to be effective for LBD processing. Supervised statistical learning approaches clearly outperform an unsupervised approach to link prediction.
Collapse
Affiliation(s)
- Andrej Kastrin
- Andrej Kastrin, PhD, Faculty of Information Studies, Ljubljanska cesta 31A, SI-8000 Novo Mesto, Slovenia, E-mail:
| | | | | |
Collapse
|
41
|
Koutkias V, Jaulent MC. A Multiagent System for Integrated Detection of Pharmacovigilance Signals. J Med Syst 2015; 40:37. [DOI: 10.1007/s10916-015-0378-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 10/09/2015] [Indexed: 12/23/2022]
Affiliation(s)
- Vassilis Koutkias
- INSERM, U1142, LIMICS, 75006, Paris, France. .,Sorbonne Universités, UPMC University Paris 06, UMR_S 1142, LIMICS, 75006, Paris, France. .,Université Paris 13, Sorbonne Paris Cité, LIMICS, UMR_S 1142, 93430, Villetaneuse, France.
| | - Marie-Christine Jaulent
- INSERM, U1142, LIMICS, 75006, Paris, France. .,Sorbonne Universités, UPMC University Paris 06, UMR_S 1142, LIMICS, 75006, Paris, France. .,Université Paris 13, Sorbonne Paris Cité, LIMICS, UMR_S 1142, 93430, Villetaneuse, France.
| |
Collapse
|
42
|
Widdows D, Cohen T. Reasoning with Vectors: A Continuous Model for Fast Robust Inference. LOGIC JOURNAL OF THE IGPL 2015; 23:141-173. [PMID: 26582967 PMCID: PMC4646228 DOI: 10.1093/jigpal/jzu028] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper describes the use of continuous vector space models for reasoning with a formal knowledge base. The practical significance of these models is that they support fast, approximate but robust inference and hypothesis generation, which is complementary to the slow, exact, but sometimes brittle behavior of more traditional deduction engines such as theorem provers. The paper explains the way logical connectives can be used in semantic vector models, and summarizes the development of Predication-based Semantic Indexing, which involves the use of Vector Symbolic Architectures to represent the concepts and relationships from a knowledge base of subject-predicate-object triples. Experiments show that the use of continuous models for formal reasoning is not only possible, but already demonstrably effective for some recognized informatics tasks, and showing promise in other traditional problem areas. Examples described in this paper include: predicting new uses for existing drugs in biomedical informatics; removing unwanted meanings from search results in information retrieval and concept navigation; type-inference from attributes; comparing words based on their orthography; and representing tabular data, including modelling numerical values. The algorithms and techniques described in this paper are all publicly released and freely available in the Semantic Vectors open-source software package.
Collapse
Affiliation(s)
| | - Trevor Cohen
- University of Texas School of Biomedical Informatics at Houston
| |
Collapse
|
43
|
In silico assessment of adverse drug reactions and associated mechanisms. Drug Discov Today 2015; 21:58-71. [PMID: 26272036 DOI: 10.1016/j.drudis.2015.07.018] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Revised: 07/15/2015] [Accepted: 07/31/2015] [Indexed: 12/31/2022]
Abstract
During recent years, various in silico approaches have been developed to estimate chemical and biological drug features, for example chemical fragments, protein targets, pathways, among others, that correlate with adverse drug reactions (ADRs) and explain the associated mechanisms. These features have also been used for the creation of predictive models that enable estimation of ADRs during the early stages of drug development. In this review, we discuss various in silico approaches to predict these features for a certain drug, estimate correlations with ADRs, establish causal relationships between selected features and ADR mechanisms and create corresponding predictive models.
Collapse
|
44
|
Cairelli MJ, Fiszman M, Zhang H, Rindflesch TC. Networks of neuroinjury semantic predications to identify biomarkers for mild traumatic brain injury. J Biomed Semantics 2015; 6:25. [PMID: 25992264 PMCID: PMC4436163 DOI: 10.1186/s13326-015-0022-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 04/22/2015] [Indexed: 12/13/2022] Open
Abstract
Objective Mild traumatic brain injury (mTBI) has high prevalence in the military, among athletes, and in the general population worldwide (largely due to falls). Consequences can include a range of neuropsychological disorders. Unfortunately, such neural injury often goes undiagnosed due to the difficulty in identifying symptoms, so the discovery of an effective biomarker would greatly assist diagnosis; however, no single biomarker has been identified. We identify several body substances as potential components of a panel of biomarkers to support the diagnosis of mild traumatic brain injury. Methods Our approach to diagnostic biomarker discovery combines ideas and techniques from systems medicine, natural language processing, and graph theory. We create a molecular interaction network that represents neural injury and is composed of relationships automatically extracted from the literature. We retrieve citations related to neurological injury and extract relationships (semantic predications) that contain potential biomarkers. After linking all relationships together to create a network representing neural injury, we filter the network by relationship frequency and concept connectivity to reduce the set to a manageable size of higher interest substances. Results 99,437 relevant citations yielded 26,441 unique relations. 18,085 of these contained a potential biomarker as subject or object with a total of 6246 unique concepts. After filtering by graph metrics, the set was reduced to 1021 relationships with 49 unique concepts, including 17 potential biomarkers. Conclusion We created a network of relationships containing substances derived from 99,437 citations and filtered using graph metrics to provide a set of 17 potential biomarkers. We discuss the interaction of several of these (glutamate, glucose, and lactate) as the basis for more effective diagnosis than is currently possible. This method provides an opportunity to focus the effort of wet bench research on those substances with the highest potential as biomarkers for mTBI.
Collapse
Affiliation(s)
- Michael J Cairelli
- National Institutes of Health, National Library of Medicine, 38A 9N912A, 8600 Rockville Pike, Bethesda, MD 20892 USA
| | - Marcelo Fiszman
- National Institutes of Health, National Library of Medicine, 38A 9N912A, 8600 Rockville Pike, Bethesda, MD 20892 USA
| | - Han Zhang
- Department of Medical Informatics, China Medical University, Shenyang, Liaoning 110001 China
| | - Thomas C Rindflesch
- National Institutes of Health, National Library of Medicine, 38A 9N912A, 8600 Rockville Pike, Bethesda, MD 20892 USA
| |
Collapse
|
45
|
Koutkias VG, Jaulent MC. Computational approaches for pharmacovigilance signal detection: toward integrated and semantically-enriched frameworks. Drug Saf 2015; 38:219-32. [PMID: 25749722 PMCID: PMC4374117 DOI: 10.1007/s40264-015-0278-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the 'search space' of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection.
Collapse
Affiliation(s)
- Vassilis G Koutkias
- INSERM, U1142, LIMICS, Campus des Cordeliers, 15 rue de l' École de Médecine, 75006, Paris, France,
| | | |
Collapse
|