1
|
Davies H, Noble PJ, Fins IS, Pinchbeck G, Singleton D, Pirmohamed M, Killick D. Developing electronic health records as a source of real-world data for veterinary pharmacoepidemiology. Front Vet Sci 2025; 12:1550468. [PMID: 40235568 PMCID: PMC11996780 DOI: 10.3389/fvets.2025.1550468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Accepted: 02/03/2025] [Indexed: 04/17/2025] Open
Abstract
Spontaneous reporting of adverse events (AEs) by veterinary professionals and the public is the cornerstone of post-marketing safety surveillance for veterinary medicinal products (VMPs). However, studies suggest that most veterinary AEs remain unreported. Veterinary medicine regulators, including the United Kingdom Veterinary Medicines Directorate and the European Medicines Agency, have included the exploration of big data utilization to support pharmacovigilance efforts in their regulatory strategies. In this study, we describe the application of veterinary electronic healthcare records (EHRs) from the SAVSNET veterinary first opinion informatics system to conduct pharmacoepidemiological analyses. Five VMP-AE pairs were selected for investigation in a proof-of-concept study, where drug exposure was identified from semi-structured treatment data and AEs from the unstructured free-text clinical narrative. Dictionaries were developed to identify AEs based on standard terminology. The precision of these dictionaries improved when they were expanded using word vectorization and expert opinion. A key strength of first-opinion EHR datasets is their ability to enable cohort studies and facilitate calculations of absolute incidence and relative risk. Thus, we demonstrate that unstructured free-text clinical narratives can be used to identify outcomes for veterinary pharmacoepidemiological studies and, consequently, support and expand pharmacovigilance efforts based on spontaneous AE reports.
Collapse
Affiliation(s)
- Heather Davies
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Peter-John Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Ivo S. Fins
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Gina Pinchbeck
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - David Singleton
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Munir Pirmohamed
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - David Killick
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
2
|
Golder S, Xu D, O'Connor K, Wang Y, Batra M, Hernandez GG. Leveraging Natural Language Processing and Machine Learning Methods for Adverse Drug Event Detection in Electronic Health/Medical Records: A Scoping Review. Drug Saf 2025; 48:321-337. [PMID: 39786481 PMCID: PMC11903561 DOI: 10.1007/s40264-024-01505-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2024] [Indexed: 01/12/2025]
Abstract
BACKGROUND Natural language processing (NLP) and machine learning (ML) techniques may help harness unstructured free-text electronic health record (EHR) data to detect adverse drug events (ADEs) and thus improve pharmacovigilance. However, evidence of their real-world effectiveness remains unclear. OBJECTIVE To summarise the evidence on the effectiveness of NLP/ML in detecting ADEs from unstructured EHR data and ultimately improve pharmacovigilance in comparison to other data sources. METHODS A scoping review was conducted by searching six databases in July 2023. Studies leveraging NLP/ML to identify ADEs from EHR were included. Titles/abstracts were screened by two independent researchers as were full-text articles. Data extraction was conducted by one researcher and checked by another. A narrative synthesis summarises the research techniques, ADEs analysed, model performance and pharmacovigilance impacts. RESULTS Seven studies met the inclusion criteria covering a wide range of ADEs and medications. The utilisation of rule-based NLP, statistical models, and deep learning approaches was observed. Natural language processing/ML techniques with unstructured data improved the detection of under-reported adverse events and safety signals. However, substantial variability was noted in the techniques and evaluation methods employed across the different studies and limitations exist in integrating the findings into practice. CONCLUSIONS Natural language processing (NLP) and machine learning (ML) have promising possibilities in extracting valuable insights with regard to pharmacovigilance from unstructured EHR data. These approaches have demonstrated proficiency in identifying specific adverse events and uncovering previously unknown safety signals that would not have been apparent through structured data alone. Nevertheless, challenges such as the absence of standardised methodologies and validation criteria obstruct the widespread adoption of NLP/ML for pharmacovigilance leveraging of unstructured EHR data.
Collapse
Affiliation(s)
- Su Golder
- Department of Health Sciences, University of York, York, YO10 5DD, UK.
| | - Dongfang Xu
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Karen O'Connor
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yunwen Wang
- William Allen White School of Journalism and Mass Communications, The University of Kansas, Lawrence, KS, USA
| | - Mahak Batra
- Department of Health Sciences, University of York, York, YO10 5DD, UK
| | | |
Collapse
|
3
|
Berkowitz J, Weissenbacher D, Srinivasan A, Friedrich NA, Acitores Cortina JM, Kivelson S, Hernandez GG, Tatonetti NP. Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.02.09.25321620. [PMID: 39990542 PMCID: PMC11844579 DOI: 10.1101/2025.02.09.25321620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Large language models (LLMs) integrate knowledge from diverse sources into a single set of internal weights. However, these representations are difficult to interpret, complicating our understanding of the models' learning capabilities. Sparse autoencoders (SAEs) linearize LLM embeddings, creating monosemantic features that both provide insight into the model's comprehension and simplify downstream machine learning tasks. These features are especially important in biomedical applications where explainability is critical. Here, we evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs). We transform hidden-state embeddings of drug names from Gemma2-9b-it into interpretable features and train a linear classifier on these features to classify ADR likelihood, evaluating against an established benchmark. These embeddings provide strong predictive performance, giving AUC-ROC of 0.957 for identifying acute kidney injury, 0.902 for acute liver injury, 0.954 for acute myocardial infarction, and 0.963 for gastrointestinal bleeds. Notably, there are no significant differences (p > 0.05) in performance between the simple linear classifiers built on SAE outputs and neural networks trained on the raw embeddings, suggesting that the information lost in reconstruction is minimal. This finding suggests that SAE-derived representations retain the essential information from the LLM while reducing model complexity, paving the way for more transparent, compute-efficient strategies. We believe that this approach can help synthesize the biomedical knowledge our models learn in training and be used for downstream applications, such as expanding reference sets for pharmacovigilance.
Collapse
Affiliation(s)
- Jacob Berkowitz
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Davy Weissenbacher
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Apoorva Srinivasan
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Nadine A Friedrich
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | | | - Sophia Kivelson
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | | | - Nicholas P Tatonetti
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA
- Cedars-Sinai Cancer, Cedars-Sinai Medical Center, Los Angeles, California, USA
| |
Collapse
|
4
|
Campillos-Llanos L, Valverde-Mateos A, Capllonch-Carrión A. Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics 2025; 26:7. [PMID: 39780059 PMCID: PMC11708069 DOI: 10.1186/s12859-024-05949-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 09/30/2024] [Indexed: 01/11/2025] Open
Abstract
BACKGROUND Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish. It performs medical NER and normalization, medication information extraction and detection of temporal entities, negation and speculation, and temporality or experiencer attributes (Age, Contraindicated, Negated, Speculated, Hypothetical, Future, Family_member, Patient and Other). We built the tool with a dedicated lexicon and rules adapted from NegEx and HeidelTime. Using these resources, we annotated a corpus of 1200 texts, with high inter-annotator agreement (average F1 = 0.841% ± 0.045 for entities, and average F1 = 0.881% ± 0.032 for attributes). We used this corpus to train Transformer-based models (RoBERTa-based models, mBERT and mDeBERTa). We integrated them with the dictionary-based system in a hybrid tool, and distribute the models via the Hugging Face hub. For an internal validation, we used a held-out test set and conducted an error analysis. For an external validation, eight medical professionals evaluated the system by revising the annotation of 200 new texts not used in development. RESULTS In the internal validation, the models yielded F1 values up to 0.915. In the external validation with 100 clinical trials, the tool achieved an average F1 score of 0.858 (± 0.032); and in 100 anonymized clinical cases, it achieved an average F1 score of 0.910 (± 0.019). CONCLUSIONS The tool is available at https://claramed.csic.es/medspaner . We also release the code ( https://github.com/lcampillos/medspaner ) and the annotated corpus to train the models.
Collapse
Affiliation(s)
| | - Ana Valverde-Mateos
- Medical Terminology Unit, Spanish Royal Academy of Medicine, C/Arrieta 12, 28013, Madrid, Spain
| | - Adrián Capllonch-Carrión
- Centro de Salud Retiro, Hospital Universitario Gregorio Marañon, C/Lope de Rueda, 43, 28009, Madrid, Spain
| |
Collapse
|
5
|
Olaker VR, Fry S, Terebuh P, Davis PB, Tisch DJ, Xu R, Miller MG, Dorney I, Palchuk MB, Kaelber DC. With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research. Clin Transl Sci 2025; 18:e70093. [PMID: 39740190 DOI: 10.1111/cts.70093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 01/02/2025] Open
Abstract
Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries. On the other hand, the possibility of study of rare disorders or the ability to link apparently disparate events are extremely valuable. Strategies for avoiding the worst pitfalls and hewing to conservative interpretations are essential. This article summarizes many of the approaches that have been used to avoid the most common pitfalls and extract the maximum information from aggregated, standardized, and de-identified EHR data. This article describes 26 topics broken into three major areas: (1) 14 topics related to design issues for observational study using EHR data, (2) 7 topics related to analysis issues when analyzing EHR data, and (3) 5 topics related to reporting studies using EHR data.
Collapse
Affiliation(s)
- Veronica R Olaker
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Sarah Fry
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Pauline Terebuh
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Pamela B Davis
- Center for Community Health Integration, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Daniel J Tisch
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Margaret G Miller
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Ian Dorney
- The Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA
| | | | - David C Kaelber
- The Center for Clinical Informatics Research and Education, The MetroHealth System, Cleveland, Ohio, USA
- The Departments of Internal Medicine, Pediatrics and Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
6
|
Abrisqueta-Costa P, García-Marco JA, Gutiérrez A, Hernández-Rivas JÁ, Andreu-Lapiedra R, Arguello-Tomas M, Leiva-Farré C, López-Roda MD, Callejo-Mellén Á, Álvarez-García E, Loscertales J. Real-World Evidence on Adverse Events and Healthcare Resource Utilization in Patients with Chronic Lymphocytic Leukaemia in Spain Using Natural Language Processing: The SRealCLL Study. Cancers (Basel) 2024; 16:4004. [PMID: 39682190 DOI: 10.3390/cancers16234004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 11/26/2024] [Indexed: 12/18/2024] Open
Abstract
Objectives: The SRealCLL study described the occurrence of adverse events (AEs) and healthcare resource utilization in patients with chronic lymphocytic leukaemia (CLL) using artificial intelligence in a real-world scenario in Spain. Methods: We collected real-world data on patients with CLL from seven Spanish hospitals between January 2016 and December 2018, focusing on their AE and healthcare service utilization. Data extraction from electronic health records of 385,904 patients was performed using the EHRead® technology, which is based on natural language processing and machine learning. Results: Among the 534 CLL patients finally included, 270 (50.6%) were categorized as watch and wait (W&W), 230 (43.1%) as first-line treatment (1L), and 58 (10.9%) as relapse/refractory with second-line treatment (2L). The median study follow-up periods were 14.4, 8.4, and 6 months for W&W, 1L, and 2L, respectively. The most common antineoplastic treatments were ibrutinib (64.8%) and bendamustine + rituximab (12.6%) in 1L, and ibrutinib (62.1%) and venetoclax (15.5%) in 2L. Among the most frequent AEs, anaemia and thrombocytopenia presented higher rates in the treated groups (1L and 2L) compared with W&W (2.01 and 2.32 vs. 0.93; p ≤ 0.05 and 1.29 and 1.62 vs. 0.42; p ≤ 0.05). Moreover, several AEs, such as major bleeding, digestive symptoms, general symptoms, or Richter syndrome, were more frequent in 1L than W&W (all p ≤ 0.05). No differences were shown between groups in the rates of outpatient visits. However, rates of outpatient visits due to AE were higher in 1L than in W&W (1.07 vs. 0.65, p ≤ 0.05). The rates of patients being hospitalized were higher in the treated groups compared to W&W (1.68 and 1.9 vs. 0.88; p ≤ 0.05), and those due to AE were higher in 1L than W&W (1.23 vs. 0.60; p ≤ 0.05). Conclusions: Patients with CLL in 1L or 2L treatments often require healthcare resources due to AEs, particularly cytopenias. The methodology used in this study likely enabled us to identify higher rates of AEs that may be underreported using other real-world approaches. Addressing AEs with effective agents that maximize patient safety and optimize healthcare resource use is crucial in this typically older and comorbid population.
Collapse
Affiliation(s)
- Pau Abrisqueta-Costa
- Haematology Department, Hospital Universitari Vall d'Hebron, 08035 Barcelona, Spain
| | | | - Antonio Gutiérrez
- Haematology Department, Hospital Son Espases, IdISBa, 07120 Palma de Mallorca, Spain
| | - José Ángel Hernández-Rivas
- Haematology Department, Hospital Universitario Infanta Leonor, Universidad Complutense, 28051 Madrid, Spain
| | | | - Miguel Arguello-Tomas
- Haematology Department, Hospital de la Santa Creu i Sant Pau, 08041 Barcelona, Spain
| | | | | | | | | | - Javier Loscertales
- Haematology Department, Hospital Universitario de la Princesa, 28004 Madrid, Spain
| |
Collapse
|
7
|
Kawazoe Y, Shimamoto K, Seki T, Tsuchiya M, Shinohara E, Yada S, Wakamiya S, Imai S, Hori S, Aramaki E. Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records. NPJ Digit Med 2024; 7:315. [PMID: 39521935 PMCID: PMC11550814 DOI: 10.1038/s41746-024-01323-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024] Open
Abstract
This study demonstrates that adverse events (AEs) extracted using natural language processing (NLP) from clinical texts reflect the known frequencies of AEs associated with anticancer drugs. Using data from 44,502 cancer patients at a single hospital, we identified cases prescribed anticancer drugs (platinum, PLT; taxane, TAX; pyrimidine, PYA) and compared them to non-treatment (NTx) group using propensity score matching. Over 365 days, AEs (peripheral neuropathy, PN; oral mucositis, OM; taste abnormality, TA; appetite loss, AL) were extracted from clinical text using an NLP tool. The hazard ratios (HRs) for the anticancer drugs were: PN, 1.15-1.95; OM, 3.11-3.85; TA, 3.48-4.71; and AL, 1.98-3.84; the HRs were significantly higher than that of the NTx group. Sensitivity analysis revealed that the HR for TA may have been underestimated; however, the remaining three types of AEs extracted from clinical text by NLP were consistently associated with the three anticancer drugs.
Collapse
Affiliation(s)
- Yoshimasa Kawazoe
- Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| | - Kiminori Shimamoto
- Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Tomohisa Seki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Masami Tsuchiya
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Emiko Shinohara
- Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shuntaro Yada
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shoko Wakamiya
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| | - Shungo Imai
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Eiji Aramaki
- Division of Information Science, Graduate School of Science and Technology, Nara Institute of Science and Technology, Nara, Japan
| |
Collapse
|
8
|
Steckhan N, Ring R, Borchert F, Koppold DA. Triangulation of Questionnaires, Qualitative Data and Natural Language Processing: A Differential Approach to Religious Bahá'í Fasting in Germany. JOURNAL OF RELIGION AND HEALTH 2024; 63:3360-3373. [PMID: 37878201 PMCID: PMC11502581 DOI: 10.1007/s10943-023-01929-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/25/2023] [Indexed: 10/26/2023]
Abstract
Approaches to integrating mixed methods into medical research are gaining popularity. To get a holistic understanding of the effects of behavioural interventions, we investigated religious fasting using a triangulation of quantitative, qualitative, and natural language analysis. We analysed an observational study of Bahá'í fasting in Germany using a between-method triangulation that is based on links between qualitative and quantitative analyses. Individual interviews show an increase in the mindfulness and well-being categories. Sentiment scores, extracted from the interviews through natural language processing, positively correlate with questionnaire outcomes on quality of life (WHO-5: Spearman correlation r = 0.486, p = 0.048). Five questionnaires contribute to the first principal component capturing the spectrum of mood states (50.1% explained variance). Integrating the findings of the between-method triangulation enabled us to converge on the underlying effects of this kind of intermittent fasting. TRIAL REGISTRATION: NCT03443739.
Collapse
Affiliation(s)
- Nico Steckhan
- Digital Health, Hasso Plattner Institute, University of Potsdam, 14482, Potsdam, Germany.
- Institute of Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117, Berlin, Germany.
- Department of Internal and Complementary Medicine, Immanuel Hospital Berlin, Berlin, 14109, Germany.
| | - Raphaela Ring
- Institute of Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117, Berlin, Germany
| | - Florian Borchert
- Digital Health, Hasso Plattner Institute, University of Potsdam, 14482, Potsdam, Germany
| | - Daniela A Koppold
- Institute of Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 10117, Berlin, Germany
- Department of Internal and Complementary Medicine, Immanuel Hospital Berlin, Berlin, 14109, Germany
| |
Collapse
|
9
|
Sree Sudha TY, Meena B, Pareek S, Singh H. Enhancing pharmacovigilance for robust drug safety monitoring: addressing underreporting and collaborative solutions. Ther Adv Drug Saf 2024; 15:20420986241285927. [PMID: 39364334 PMCID: PMC11447715 DOI: 10.1177/20420986241285927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024] Open
Affiliation(s)
- Tanguturi Yella Sree Sudha
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Deoghar, GE Road, Deoghar, Jharkhand 814112, India
| | - Bhumika Meena
- All India Institute of Medical Sciences (AIIMS), Deoghar, Jharkhand, India
| | - Sumit Pareek
- All India Institute of Medical Sciences (AIIMS), Deoghar, Jharkhand, India
| | - Harminder Singh
- Department of Pharmacology, All India Institute of Medical Sciences (AIIMS), Deoghar, Jharkhand, India
| |
Collapse
|
10
|
Davies H, Nenadic G, Alfattni G, Arguello Casteleiro M, Al Moubayed N, Farrell S, Radford AD, Noble PJM. Text mining for disease surveillance in veterinary clinical data: part two, training computers to identify features in clinical text. Front Vet Sci 2024; 11:1352726. [PMID: 39239390 PMCID: PMC11376235 DOI: 10.3389/fvets.2024.1352726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 07/17/2024] [Indexed: 09/07/2024] Open
Abstract
In part two of this mini-series, we evaluate the range of machine-learning tools now available for application to veterinary clinical text-mining. These tools will be vital to automate extraction of information from large datasets of veterinary clinical narratives curated by projects such as the Small Animal Veterinary Surveillance Network (SAVSNET) and VetCompass, where volumes of millions of records preclude reading records and the complexities of clinical notes limit usefulness of more "traditional" text-mining approaches. We discuss the application of various machine learning techniques ranging from simple models for identifying words and phrases with similar meanings to expand lexicons for keyword searching, to the use of more complex language models. Specifically, we describe the use of language models for record annotation, unsupervised approaches for identifying topics within large datasets, and discuss more recent developments in the area of generative models (such as ChatGPT). As these models become increasingly complex it is pertinent that researchers and clinicians work together to ensure that the outputs of these models are explainable in order to instill confidence in any conclusions drawn from them.
Collapse
Affiliation(s)
- Heather Davies
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Goran Nenadic
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | - Ghada Alfattni
- Department of Computer Science, Manchester University, Manchester, United Kingdom
| | | | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Sean Farrell
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Alan D Radford
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - P-J M Noble
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
11
|
Desai MK. Artificial intelligence in pharmacovigilance - Opportunities and challenges. Perspect Clin Res 2024; 15:116-121. [PMID: 39140015 PMCID: PMC11318788 DOI: 10.4103/picr.picr_290_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/04/2023] [Accepted: 12/09/2023] [Indexed: 08/15/2024] Open
Abstract
Pharmacovigilance (PV) is a data-driven process to identify medicine safety issues at the earliest by processing suspected adverse event (AE) reports and extraction of health data. The PV case processing cycle starts with data collection, data entry, initial checking completeness and validity, coding, medical assessment for causality, expectedness, severity, and seriousness, subsequently submitting report, quality checking followed by data storage and maintenance. This requires a workforce and technical expertise and therefore, is expensive and time-consuming. There has been exponential growth in the number of suspected AE reports in the PV database due to smart collection and reporting of individual case safety reports, widening the base by increased awareness and participation by health-care professionals and patients. Processing of the enormous volume and variety of data, making its sensible use and separating "needles from haystack," is a challenge for key stakeholders such as pharmaceutical firms, regulatory authorities, medical and PV experts, and National Pharmacovigilance Program managers. Artificial intelligence (AI) in health care has been very impressive in specialties that rely heavily on the interpretation of medical images. Similarly, there has been a growing interest to adopt AI tools to complement and automate the PV process. The advanced technology can certainly complement the routine, repetitive, manual task of case processing, and boost efficiency; however, its implementation across the PV lifecycle and practical impact raises several questions and challenges. Full automation of PV system is a double-edged sword and needs to consider two aspects - people and processes. The focus should be a collaborative approach of technical expertise (people) combined with intelligent technology (processes) to augment human talent that meets the objective of the PV system and benefit all stakeholders. AI technology should enhance human intelligence rather than substitute human experts. What is important is to emphasize and ensure that AI brings more benefits to PV rather than challenges. This review describes the benefits and the outstanding scientific, technological, and policy issues, and the maturity of AI tools for full automation in the context to the Indian health-care system.
Collapse
Affiliation(s)
- Mira Kirankumar Desai
- Department of Pharmacology, Dr. M. K. Shah Medical College and Research Centre, Ahmedabad, Gujarat, India
| |
Collapse
|
12
|
Abedian Kalkhoran H, Zwaveling J, van Hunsel F, Kant A. An innovative method to strengthen evidence for potential drug safety signals using Electronic Health Records. J Med Syst 2024; 48:51. [PMID: 38753223 PMCID: PMC11098892 DOI: 10.1007/s10916-024-02070-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 04/25/2024] [Indexed: 05/19/2024]
Abstract
Reports from spontaneous reporting systems (SRS) are hypothesis generating. Additional evidence such as more reports is required to determine whether the generated drug-event associations are in fact safety signals. However, underreporting of adverse drug reactions (ADRs) delays signal detection. Through the use of natural language processing, different sources of real-world data can be used to proactively collect additional evidence for potential safety signals. This study aims to explore the feasibility of using Electronic Health Records (EHRs) to identify additional cases based on initial indications from spontaneous ADR reports, with the goal of strengthening the evidence base for potential safety signals. For two confirmed and two potential signals generated by the SRS of the Netherlands Pharmacovigilance Centre Lareb, targeted searches in the EHR of the Leiden University Medical Centre were performed using a text-mining based tool, CTcue. The search for additional cases was done by constructing and running queries in the structured and free-text fields of the EHRs. We identified at least five additional cases for the confirmed signals and one additional case for each potential safety signal. The majority of the identified cases for the confirmed signals were documented in the EHRs before signal detection by the Dutch Medicines Evaluation Board. The identified cases for the potential signals were reported to Lareb as further evidence for signal detection. Our findings highlight the feasibility of performing targeted searches in the EHR based on an underlying hypothesis to provide further evidence for signal generation.
Collapse
Affiliation(s)
- H Abedian Kalkhoran
- Department of Clinical Pharmacology and Toxicology, Leiden University Medical Centre, Leiden, the Netherlands.
- Department of Pharmacy, Haga Teaching Hospital, The Hague, the Netherlands.
| | - J Zwaveling
- Department of Clinical Pharmacology and Toxicology, Leiden University Medical Centre, Leiden, the Netherlands
| | - F van Hunsel
- The Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, the Netherlands
| | - A Kant
- Department of Clinical Pharmacology and Toxicology, Leiden University Medical Centre, Leiden, the Netherlands
- The Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, the Netherlands
| |
Collapse
|
13
|
Gallifant J, Celi LA, Sharon E, Bitterman DS. Navigating the Complexities of Artificial Intelligence-Enabled Real-World Data Collection for Oncology Pharmacovigilance. JCO Clin Cancer Inform 2024; 8:e2400051. [PMID: 38713889 PMCID: PMC11466373 DOI: 10.1200/cci.24.00051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 04/03/2024] [Indexed: 05/09/2024] Open
Abstract
This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.
Collapse
Affiliation(s)
- Jack Gallifant
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Critical Care, Guy’s & St Thomas’ NHS Trust, London, United Kingdom, SE1 7EH
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA 02139
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02215
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115
| | - Elad Sharon
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Danielle S. Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
14
|
Kiguba R, Isabirye G, Mayengo J, Owiny J, Tregunno P, Harrison K, Pirmohamed M, Ndagije HB. Navigating duplication in pharmacovigilance databases: a scoping review. BMJ Open 2024; 14:e081990. [PMID: 38684275 PMCID: PMC11086478 DOI: 10.1136/bmjopen-2023-081990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 04/12/2024] [Indexed: 05/02/2024] Open
Abstract
OBJECTIVES Pharmacovigilance databases play a critical role in monitoring drug safety. The duplication of reports in pharmacovigilance databases, however, undermines their data integrity. This scoping review sought to provide a comprehensive understanding of duplication in pharmacovigilance databases worldwide. DESIGN A scoping review. DATA SOURCES Reviewers comprehensively searched the literature in PubMed, Web of Science, Wiley Online Library, EBSCOhost, Google Scholar and other relevant websites. ELIGIBILITY CRITERIA Peer-reviewed publications and grey literature, without language restriction, describing duplication and/or methods relevant to duplication in pharmacovigilance databases from inception to 1 September 2023. DATA EXTRACTION AND SYNTHESIS We used the Joanna Briggs Institute guidelines for scoping reviews and conformed with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews. Two reviewers independently screened titles, abstracts and full texts. One reviewer extracted the data and performed descriptive analysis, which the second reviewer assessed. Disagreements were resolved by discussion and consensus or in consultation with a third reviewer. RESULTS We screened 22 745 unique titles and 156 were eligible for full-text review. Of the 156 titles, 58 (47 peer-reviewed; 11 grey literature) fulfilled the inclusion criteria for the scoping review. Included titles addressed the extent (5 papers), prevention strategies (15 papers), causes (32 papers), detection methods (25 papers), management strategies (24 papers) and implications (14 papers) of duplication in pharmacovigilance databases. The papers overlapped, discussing more than one field. Advances in artificial intelligence, particularly natural language processing, hold promise in enhancing the efficiency and precision of deduplication of large and complex pharmacovigilance databases. CONCLUSION Duplication in pharmacovigilance databases compromises risk assessment and decision-making, potentially threatening patient safety. Therefore, efficient duplicate prevention, detection and management are essential for more reliable pharmacovigilance data. To minimise duplication, consistent use of worldwide unique identifiers as the key case identifiers is recommended alongside recent advances in artificial intelligence.
Collapse
Affiliation(s)
- Ronald Kiguba
- Department of Pharmacology and Therapeutics, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Gerald Isabirye
- National Pharmacovigilance Centre, National Drug Authority, Kampala, Uganda
| | - Julius Mayengo
- National Pharmacovigilance Centre, National Drug Authority, Kampala, Uganda
| | - Jonathan Owiny
- National Pharmacovigilance Centre, National Drug Authority, Kampala, Uganda
| | - Phil Tregunno
- Safety and Surveillance Group, Medicines and Healthcare Products Regulatory Agency, London, UK
| | - Kendal Harrison
- Safety and Surveillance Group, Medicines and Healthcare Products Regulatory Agency, London, UK
| | - Munir Pirmohamed
- Centre for Drug Safety Science and Wolfson Centre for Personalised Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | | |
Collapse
|
15
|
Moulson R, Feugère G, Moreira-Lucas TS, Dequen F, Weiss J, Smith J, Brezden-Masley C. Real-World Treatment Patterns and Clinical Outcomes among Patients Receiving CDK4/6 Inhibitors for Metastatic Breast Cancer in a Canadian Setting Using AI-Extracted Data. Curr Oncol 2024; 31:2172-2184. [PMID: 38668064 PMCID: PMC11049664 DOI: 10.3390/curroncol31040161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/20/2024] [Accepted: 04/07/2024] [Indexed: 04/28/2024] Open
Abstract
Cyclin-dependent kinase 4/6 inhibitors (CDK4/6i) are widely used in patients with hormone receptor-positive (HR+)/human epidermal growth factor receptor 2 negative (HER2-) advanced/metastatic breast cancer (ABC/MBC) in first line (1L), but little is known about their real-world use and clinical outcomes long-term, in Canada. This study used Pentavere's previously validated artificial intelligence (AI) to extract real-world data on the treatment patterns and outcomes of patients receiving CDK4/6i+endocrine therapy (ET) for HR+/HER2- ABC/MBC at Sinai Health in Toronto, Canada. Between 1 January 2016 and 1 July 2021, 48 patients were diagnosed with HR+/HER2- ABC/MBC and received CDK4/6i + ET. A total of 38 out of 48 patients received CDK4/6i + ET in 1L, of which 34 of the 38 (89.5%) received palbociclib + ET. In 2L, 12 of the 21 (57.1%) patients received CDK4/6i + ET, of which 58.3% received abemaciclib. In 3L, most patients received chemotherapy (10/12, 83.3%). For the patients receiving CDK4/6i in 1L, the median (95% CI) time to the next treatment was 42.3 (41.2, NA) months. The median (95% CI) time to chemotherapy was 46.5 (41.4, NA) months. The two-year overall survival (95% CI) was 97.4% (92.4, 100.0), and the median (range) follow-up was 28.7 (3.4-67.6) months. Despite the limitations inherent in real-world studies and a limited number of patients, these AI-extracted data complement previous studies, demonstrating the effectiveness of CDK4/6i + ET in the Canadian real-world 1L, with most patients receiving palbociclib as CDK4/6i in 1L.
Collapse
Affiliation(s)
| | | | | | | | | | - Janet Smith
- Mount Sinai Hospital, Toronto, ON M5G 1X5, Canada (C.B.-M.)
| | | |
Collapse
|
16
|
Rani N, Kaushik A, Kardam S, Kag S, Raj VS, Ambasta RK, Kumar P. Reimagining old drugs with new tricks: Mechanisms, strategies and notable success stories in drug repurposing for neurological diseases. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 205:23-70. [PMID: 38789181 DOI: 10.1016/bs.pmbts.2024.03.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Recent evolution in drug repurposing has brought new anticipation, especially in the conflict against neurodegenerative diseases (NDDs). The traditional approach to developing novel drugs for these complex disorders is laborious, time-consuming, and often abortive. However, drug reprofiling which is the implementation of illuminating novel therapeutic applications of existing approved drugs, has shown potential as a promising strategy to accelerate the hunt for therapeutics. The advancement of computational approaches and artificial intelligence has expedited drug repurposing. These progressive technologies have enabled scientists to analyse extensive datasets and predict potential drug-disease interactions. By prospecting into the existing pharmacological knowledge, scientists can recognise potential therapeutic candidates for reprofiling, saving precious time and resources. Preclinical models have also played a pivotal role in this field, confirming the effectiveness and mechanisms of action of repurposed drugs. Several studies have occurred in recent years, including the discovery of available drugs that demonstrate significant protective effects in NDDs, relieve debilitating symptoms, or slow down the progression of the disease. These findings highlight the potential of repurposed drugs to change the landscape of NDD treatment. Here, we present an overview of recent developments and major advances in drug repurposing intending to provide an in-depth analysis of traditional drug discovery and the strategies, approaches and technologies that have contributed to drug repositioning. In addition, this chapter attempts to highlight successful case studies of drug repositioning in various therapeutic areas related to NDDs and explore the clinical trials, challenges and limitations faced by researchers in the field. Finally, the importance of drug repositioning in drug discovery and development and its potential to address discontented medical needs is also highlighted.
Collapse
Affiliation(s)
- Neetu Rani
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Aastha Kaushik
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Shefali Kardam
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - Sonika Kag
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India
| | - V Samuel Raj
- Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India
| | - Rashmi K Ambasta
- Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological University, Delhi, India.
| |
Collapse
|
17
|
Li Y, Tao W, Li Z, Sun Z, Li F, Fenton S, Xu H, Tao C. Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. J Biomed Inform 2024; 152:104621. [PMID: 38447600 DOI: 10.1016/j.jbi.2024.104621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 02/19/2024] [Accepted: 03/03/2024] [Indexed: 03/08/2024]
Abstract
OBJECTIVE The primary objective of this review is to investigate the effectiveness of machine learning and deep learning methodologies in the context of extracting adverse drug events (ADEs) from clinical benchmark datasets. We conduct an in-depth analysis, aiming to compare the merits and drawbacks of both machine learning and deep learning techniques, particularly within the framework of named-entity recognition (NER) and relation classification (RC) tasks related to ADE extraction. Additionally, our focus extends to the examination of specific features and their impact on the overall performance of these methodologies. In a broader perspective, our research extends to ADE extraction from various sources, including biomedical literature, social media data, and drug labels, removing the limitation to exclusively machine learning or deep learning methods. METHODS We conducted an extensive literature review on PubMed using the query "(((machine learning [Medical Subject Headings (MeSH) Terms]) OR (deep learning [MeSH Terms])) AND (adverse drug event [MeSH Terms])) AND (extraction)", and supplemented this with a snowballing approach to review 275 references sourced from retrieved articles. RESULTS In our analysis, we included twelve articles for review. For the NER task, deep learning models outperformed machine learning models. In the RC task, gradient Boosting, multilayer perceptron and random forest models excelled. The Bidirectional Encoder Representations from Transformers (BERT) model consistently achieved the best performance in the end-to-end task. Future efforts in the end-to-end task should prioritize improving NER accuracy, especially for 'ADE' and 'Reason'. CONCLUSION These findings hold significant implications for advancing the field of ADE extraction and pharmacovigilance, ultimately contributing to improved drug safety monitoring and healthcare outcomes.
Collapse
Affiliation(s)
- Yiming Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Wei Tao
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zehan Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Zenan Sun
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Fang Li
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Susan Fenton
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hua Xu
- Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA.
| |
Collapse
|
18
|
Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024; 14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open
Abstract
This study focused on the heterogeneity in progress notes written by physicians or nurses. A total of 806 days of progress notes written by physicians or nurses from 83 randomly selected patients hospitalized in the Gastroenterology Department at Kagawa University Hospital from January to December 2021 were analyzed. We extracted symptoms as the International Classification of Diseases (ICD) Chapter 18 (R00-R99, hereinafter R codes) from each progress note using MedNER-J natural language processing software and counted the days one or more symptoms were extracted to calculate the extraction rate. The R-code extraction rate was significantly higher from progress notes by nurses than by physicians (physicians 68.5% vs. nurses 75.2%; p = 0.00112), regardless of specialty. By contrast, the R-code subcategory R10-R19 for digestive system symptoms (44.2 vs. 37.5%, respectively; p = 0.00299) and many chapters of ICD codes for disease names, as represented by Chapter 11 K00-K93 (68.4 vs. 30.9%, respectively; p < 0.001), were frequently extracted from the progress notes by physicians, reflecting their specialty. We believe that understanding the information heterogeneity of medical documents, which can be the basis of medical artificial intelligence, is crucial, and this study is a pioneering step in that direction.
Collapse
Affiliation(s)
- Yukinori Mashima
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan.
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan.
| | - Masatoshi Tanigawa
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
| | - Hideto Yokoi
- Clinical Research Support Center, Kagawa University Hospital, 1750-1 Ikenobe, Miki-cho, Kita-gun, Kagawa, 761-0793, Japan
- Department of Medical Informatics, Faculty of Medicine, Kagawa University, Kagawa, Japan
| |
Collapse
|
19
|
Qiao H, Chen Y, Qian C, Guo Y. Clinical data mining: challenges, opportunities, and recommendations for translational applications. J Transl Med 2024; 22:185. [PMID: 38378565 PMCID: PMC10880222 DOI: 10.1186/s12967-024-05005-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/18/2024] [Indexed: 02/22/2024] Open
Abstract
Clinical data mining of predictive models offers significant advantages for re-evaluating and leveraging large amounts of complex clinical real-world data and experimental comparison data for tasks such as risk stratification, diagnosis, classification, and survival prediction. However, its translational application is still limited. One challenge is that the proposed clinical requirements and data mining are not synchronized. Additionally, the exotic predictions of data mining are difficult to apply directly in local medical institutions. Hence, it is necessary to incisively review the translational application of clinical data mining, providing an analytical workflow for developing and validating prediction models to ensure the scientific validity of analytic workflows in response to clinical questions. This review systematically revisits the purpose, process, and principles of clinical data mining and discusses the key causes contributing to the detachment from practice and the misuse of model verification in developing predictive models for research. Based on this, we propose a niche-targeting framework of four principles: Clinical Contextual, Subgroup-Oriented, Confounder- and False Positive-Controlled (CSCF), to provide guidance for clinical data mining prior to the model's development in clinical settings. Eventually, it is hoped that this review can help guide future research and develop personalized predictive models to achieve the goal of discovering subgroups with varied remedial benefits or risks and ensuring that precision medicine can deliver its full potential.
Collapse
Affiliation(s)
- Huimin Qiao
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Yijing Chen
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, China
| | - Changshun Qian
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, China
| | - You Guo
- Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China.
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, China.
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, China.
- Ganzhou Key Laboratory of Medical Big Data, Ganzhou, China.
| |
Collapse
|
20
|
Fayos De Arizón L, Viera ER, Pilco M, Perera A, De Maeztu G, Nicolau A, Furlano M, Torra R. Artificial intelligence: a new field of knowledge for nephrologists? Clin Kidney J 2023; 16:2314-2326. [PMID: 38046016 PMCID: PMC10689169 DOI: 10.1093/ckj/sfad182] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Indexed: 12/05/2023] Open
Abstract
Artificial intelligence (AI) is a science that involves creating machines that can imitate human intelligence and learn. AI is ubiquitous in our daily lives, from search engines like Google to home assistants like Alexa and, more recently, OpenAI with its chatbot. AI can improve clinical care and research, but its use requires a solid understanding of its fundamentals, the promises and perils of algorithmic fairness, the barriers and solutions to its clinical implementation, and the pathways to developing an AI-competent workforce. The potential of AI in the field of nephrology is vast, particularly in the areas of diagnosis, treatment and prediction. One of the most significant advantages of AI is the ability to improve diagnostic accuracy. Machine learning algorithms can be trained to recognize patterns in patient data, including lab results, imaging and medical history, in order to identify early signs of kidney disease and thereby allow timely diagnoses and prompt initiation of treatment plans that can improve outcomes for patients. In short, AI holds the promise of advancing personalized medicine to new levels. While AI has tremendous potential, there are also significant challenges to its implementation, including data access and quality, data privacy and security, bias, trustworthiness, computing power, AI integration and legal issues. The European Commission's proposed regulatory framework for AI technology will play a significant role in ensuring the safe and ethical implementation of these technologies in the healthcare industry. Training nephrologists in the fundamentals of AI is imperative because traditionally, decision-making pertaining to the diagnosis, prognosis and treatment of renal patients has relied on ingrained practices, whereas AI serves as a powerful tool for swiftly and confidently synthesizing this information.
Collapse
Affiliation(s)
- Leonor Fayos De Arizón
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Elizabeth R Viera
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Melissa Pilco
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alexandre Perera
- Center for Biomedical Engineering Research (CREB), Universitat Politècnica de Barcelona (UPC), Barcelona, Spain; Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain; Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Spain
| | | | | | - Monica Furlano
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Roser Torra
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
21
|
Sun B, Yew PY, Chi CL, Song M, Loth M, Zhang R, Straka RJ. Development and application of pharmacological statin-associated muscle symptoms phenotyping algorithms using structured and unstructured electronic health records data. JAMIA Open 2023; 6:ooad087. [PMID: 37881784 PMCID: PMC10597587 DOI: 10.1093/jamiaopen/ooad087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/03/2023] [Accepted: 10/03/2023] [Indexed: 10/27/2023] Open
Abstract
Importance Statins are widely prescribed cholesterol-lowering medications in the United States, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation. Objectives In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview. Materials and Methods We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the published SAMS-Clinical Index tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best-performing algorithm to the statin cohort to identify SAMS. Results We identified 16 889 patients who started statins in the Fairview EHR system from 2010 to 2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, and use of immunosuppressants or fibrates. Discussion and Conclusion Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort to enable further analysis which can lead to the development of a SAMS risk prediction model.
Collapse
Affiliation(s)
- Boguang Sun
- Department of Experimental and Clinical Pharmacology, University of Minnesota College of Pharmacy, Minneapolis, MN 55455, United States
| | - Pui Ying Yew
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
| | - Chih-Lin Chi
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
- School of Nursing, University of Minnesota, Minneapolis, MN 55455, United States
| | - Meijia Song
- School of Nursing, University of Minnesota, Minneapolis, MN 55455, United States
| | - Matt Loth
- Center for Learning Health System Sciences, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Institute for Health Informatics, Office of Academic Clinical Affairs, University of Minnesota, Minneapolis, MN 55455, United States
- Center for Learning Health System Sciences, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Robert J Straka
- Department of Experimental and Clinical Pharmacology, University of Minnesota College of Pharmacy, Minneapolis, MN 55455, United States
| |
Collapse
|
22
|
Oborotov GA, Koshechkin KA, Orlov YL. Application of Artificial Intelligence or machine learning in risk sharing agreements for pharmacotherapy risk management. J Integr Bioinform 2023; 20:jib-2023-0014. [PMID: 38073025 PMCID: PMC10757074 DOI: 10.1515/jib-2023-0014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 11/17/2023] [Indexed: 12/31/2023] Open
Abstract
Applications of Artificial Intelligence in medical informatics solutions risk sharing have social value. At a time of ever-increasing cost for the provision of medicines to citizens, there is a need to restrain the growth of health care costs. The search for computer technologies to stop or slow down the growth of costs acquires a new very important and significant meaning. We discussed the two information technologies in pharmacotherapy and the possibility of combining and sharing them, namely the combination of risk-sharing agreements and Machine Learning, which was made possible by the development of Artificial Intelligence (AI). Neural networks could be used to predict the outcome to reduce the risk factors for treatment. AI-based data processing automation technologies could be also used for risk-sharing agreements automation.
Collapse
Affiliation(s)
- Grigory A. Oborotov
- Chair of Information and Internet Technologies, Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia
| | - Konstantin A. Koshechkin
- Chair of Information and Internet Technologies, Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Moscow, Russia
| | - Yuriy L. Orlov
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples’ Friendship University of Russia, Moscow, Russia
| |
Collapse
|
23
|
Loscertales J, Abrisqueta-Costa P, Gutierrez A, Hernández-Rivas JÁ, Andreu-Lapiedra R, Mora A, Leiva-Farré C, López-Roda MD, Callejo-Mellén Á, Álvarez-García E, García-Marco JA. Real-World Evidence on the Clinical Characteristics and Management of Patients with Chronic Lymphocytic Leukemia in Spain Using Natural Language Processing: The SRealCLL Study. Cancers (Basel) 2023; 15:4047. [PMID: 37627075 PMCID: PMC10452602 DOI: 10.3390/cancers15164047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 08/04/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023] Open
Abstract
The SRealCLL study aimed to obtain real-world evidence on the clinical characteristics and treatment patterns of patients with chronic lymphocytic leukemia (CLL) using natural language processing (NLP). Electronic health records (EHRs) from seven Spanish hospitals (January 2016-December 2018) were analyzed using EHRead® technology, based on NLP and machine learning. A total of 534 CLL patients were assessed. No treatment was detected in 270 (50.6%) patients (watch-and-wait, W&W). First-line (1L) treatment was identified in 230 (43.1%) patients and relapsed/refractory (2L) treatment was identified in 58 (10.9%). The median age ranged from 71 to 75 years, with a uniform male predominance (54.8-63.8%). The main comorbidities included hypertension (W&W: 35.6%; 1L: 38.3%; 2L: 39.7%), diabetes mellitus (W&W: 24.4%; 1L: 24.3%; 2L: 31%), cardiac arrhythmia (W&W: 16.7%; 1L: 17.8%; 2L: 17.2%), heart failure (W&W 16.3%, 1L 17.4%, 2L 17.2%), and dyslipidemia (W&W: 13.7%; 1L: 18.7%; 2L: 19.0%). The most common antineoplastic treatment was ibrutinib in 1L (64.8%) and 2L (62.1%), followed by bendamustine + rituximab (12.6%), obinutuzumab + chlorambucil (5.2%), rituximab + chlorambucil (4.8%), and idelalisib + rituximab (3.9%) in 1L and venetoclax (15.5%), idelalisib + rituximab (6.9%), bendamustine + rituximab (3.5%), and venetoclax + rituximab (3.5%) in 2L. This study expands the information available on patients with CLL in Spain, describing the diversity in patient characteristics and therapeutic approaches in clinical practice.
Collapse
Affiliation(s)
- Javier Loscertales
- Hematology Department, Hospital Universitario de la Princesa, Calle de Diego de León 62, 28006 Madrid, Spain;
| | - Pau Abrisqueta-Costa
- Hematology Department, Hospital Universitari Vall d’Hebron, Pg de la vall d’Hebron 199, 08035 Barcelona, Spain
| | - Antonio Gutierrez
- Hematology Department, Hospital Son Espases/IdISBa, Carretera de Valldemossa 79, 07120 Palma de Mallorca, Spain;
| | - José Ángel Hernández-Rivas
- Hematology Department, Hospital Universitario Infanta Leonor, Avda. Gran Vía del Este 80, 28031 Madrid, Spain;
| | - Rafael Andreu-Lapiedra
- Hematology Department, Hospital Universitario La Fe, Avinguda de Fernando Abril Martorell 106, 46026 Valencia, Spain;
| | - Alba Mora
- Hematology Department, Hospital de la Santa Creu i Sant Pau, Calle de St. Antoni Maria Claret 167, 08025 Barcelona, Spain;
| | - Carolina Leiva-Farré
- Medical Department, Astrazeneca Farmacéutica Spain S.A., Calle del Puerto de Somport 21, 28050 Madrid, Spain; (C.L.-F.); (M.D.L.-R.); (Á.C.-M.); (E.Á.-G.)
| | - María Dolores López-Roda
- Medical Department, Astrazeneca Farmacéutica Spain S.A., Calle del Puerto de Somport 21, 28050 Madrid, Spain; (C.L.-F.); (M.D.L.-R.); (Á.C.-M.); (E.Á.-G.)
| | - Ángel Callejo-Mellén
- Medical Department, Astrazeneca Farmacéutica Spain S.A., Calle del Puerto de Somport 21, 28050 Madrid, Spain; (C.L.-F.); (M.D.L.-R.); (Á.C.-M.); (E.Á.-G.)
| | - Esther Álvarez-García
- Medical Department, Astrazeneca Farmacéutica Spain S.A., Calle del Puerto de Somport 21, 28050 Madrid, Spain; (C.L.-F.); (M.D.L.-R.); (Á.C.-M.); (E.Á.-G.)
| | - José Antonio García-Marco
- Hematology Department, Hospital Universitario Puerta de Hierro-Majadahonda, Calle Joaquín Rodrigo 1, 28222 Majadahonda, Spain;
| |
Collapse
|
24
|
González-Juanatey C, Anguita-Sánchez M, Barrios V, Núñez-Gil I, Gómez-Doblas JJ, García-Moll X, Lafuente-Gormaz C, Rollán-Gómez MJ, Peral-Disdier V, Martínez-Dolz L, Rodríguez-Santamarta M, Viñolas-Prat X, Soriano-Colomé T, Muñoz-Aguilera R, Plaza I, Curcio-Ruigómez A, Orts-Soler E, Segovia-Cubero J, Fanjul V, Marín-Corral J, Cequier Á, SAVANA Research Group. Impact of Advanced Age on the Incidence of Major Adverse Cardiovascular Events in Patients with Type 2 Diabetes Mellitus and Stable Coronary Artery Disease in a Real-World Setting in Spain. J Clin Med 2023; 12:5218. [PMID: 37629262 PMCID: PMC10456002 DOI: 10.3390/jcm12165218] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023] Open
Abstract
Patients with type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD) without myocardial infarction (MI) or stroke are at high risk for major cardiovascular events (MACEs). We aimed to provide real-world data on age-related clinical characteristics, treatment management, and incidence of major cardiovascular outcomes in T2DM-CAD patients in Spain from 2014 to 2018. We used EHRead® technology, which is based on natural language processing and machine learning, to extract unstructured clinical information from electronic health records (EHRs) from 12 hospitals. Of the 4072 included patients, 30.9% were younger than 65 years (66.3% male), 34.2% were aged 65-75 years (66.4% male), and 34.8% were older than 75 years (54.3% male). These older patients were more likely to have hypertension (OR 2.85), angina (OR 1.64), heart valve disease (OR 2.13), or peripheral vascular disease (OR 2.38) than those aged <65 years (p < 0.001 for all comparisons). In general, they were also more likely to receive pharmacological and interventional treatments. Moreover, these patients had a significantly higher risk of MACEs (HR 1.29; p = 0.003) and ischemic stroke (HR 2.39; p < 0.001). In summary, patients with T2DM-CAD in routine clinical practice tend to be older, have more comorbidities, are more heavily treated, and have a higher risk of developing MACE than is commonly assumed from clinical trial data.
Collapse
Affiliation(s)
| | - Manuel Anguita-Sánchez
- Instituto Maimonides de Investigación Biomédica de Córdoba (IMIBIC), Hospital Universitario Reina Sofía, Universidad de Córdoba, 14014 Cordoba, Spain;
| | | | - Iván Núñez-Gil
- Cardiology Department, Hospital Clínico Universitario San Carlos, 28040 Madrid, Spain;
- Faculty of Biomedical and Health Sciences, Universidad Europea de Madrid, Villaviciosa de Odón, 28670 Madrid, Spain
| | - Juan José Gómez-Doblas
- IBIMA (Instituto de Investigación Biomédica de Málaga), Hospital Universitario Virgen de la Victoria, CIBERCV (Centro de Investigación Biomédica en Red Enfermedades Cardiovasculares), 29010 Malaga, Spain;
| | - Xavier García-Moll
- Hospital Universitario Santa Creu i Sant Pau, 08041 Barcelona, Spain; (X.G.-M.); (X.V.-P.)
| | | | | | | | - Luis Martínez-Dolz
- Hospital Universitario y Politécnico La Fe, CIBERCV (Centro de Investigación Biomédica en Red Enfermedades Cardiovasculares), IIS La Fe, 46026 Valencia, Spain;
| | | | - Xavier Viñolas-Prat
- Hospital Universitario Santa Creu i Sant Pau, 08041 Barcelona, Spain; (X.G.-M.); (X.V.-P.)
| | - Toni Soriano-Colomé
- Hospital Vall d’Hebron, CIBERCV (Centro de Investigación Biomédica en Red Enfermedades Cardiovasculares), 08035 Barcelona, Spain;
| | | | | | | | - Ernesto Orts-Soler
- Hospital General Universitario de Castellón, 12004 Castellon de la Plana, Spain;
| | | | - Víctor Fanjul
- Savana Research SL, 28013 Madrid, Spain; (V.F.); (J.M.-C.)
| | | | - Ángel Cequier
- Hospital Universitario de Bellvitge, IDIBELL (Instituto de Investigación Biomédica de Bellvitge), Universidad de Barcelona, 08007 Barcelona, Spain;
| | | |
Collapse
|
25
|
Muñoz AJ, Souto JC, Lecumberri R, Obispo B, Sanchez A, Aparicio J, Aguayo C, Gutierrez D, Palomo AG, Fanjul V, Del Rio-Bermudez C, Viñuela-Benéitez MC, Hernández-Presa MÁ. Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning. Thromb Res 2023; 228:181-188. [PMID: 37348318 DOI: 10.1016/j.thromres.2023.06.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 05/29/2023] [Accepted: 06/12/2023] [Indexed: 06/24/2023]
Abstract
INTRODUCTION Patients with cancer and venous thromboembolism (VTE) show a high risk of VTE recurrence during anticoagulant treatment. This study aimed to develop a predictive model to assess the risk of VTE recurrence within 6 months at the moment of primary VTE diagnosis in these patients. MATERIALS AND METHODS Using the EHRead® technology, based on Natural Language Processing (NLP) and machine learning (ML), the unstructured data in electronic health records from 9 Spanish hospitals between 2014 and 2018 were extracted. Both clinically- and ML-driven feature selection were performed to identify predictors for VTE recurrence. Logistic regression (LR), decision tree (DT), and random forest (RF) algorithms were used to train different prediction models, which were subsequently validated in a hold-out data set. RESULTS A total of 16,407 anticoagulated cancer patients with diagnosis of VTE were identified (54.4 % male and median age 70). Deep vein thrombosis, pulmonary embolism and metastases were observed in 67.2 %, 26.6 %, and 47.7 % of the patients, respectively. During the study follow-up, 11.4 % of the patients developed a recurrent VTE, being more frequent in patients with lung cancer. Feature selection and model training based on ML identified primary pulmonary embolism, deep vein thrombosis, metastasis, adenocarcinoma, hemoglobin and serum creatinine levels, platelet and leukocyte count, family history of VTE, and patients' age as predictors of VTE recurrence within 6 months of VTE diagnosis. The LR model had an AUC-ROC (95 % CI) of 0.66 (0.61, 0.70), the DT of 0.69 (0.65, 0.72) and the RF of 0.68 (0.63, 0.72). CONCLUSIONS This is the first ML-based predictive model designed to predict 6-months VTE recurrence in patients with cancer. These results hold great potential to assist clinicians to identify the high-risk patients and improve their clinical management.
Collapse
Affiliation(s)
- Andres J Muñoz
- Gregorio Marañón Health Research Institute, Complutense University, Madrid, Spain.
| | - Juan Carlos Souto
- Hematology Department, Santa Creu I Sant Pau Hospital, Barcelona, Spain
| | - Ramón Lecumberri
- Hematology Service, Clínica Universidad de Navarra, Pamplona, Spain; CIBERCV, Carlos III Health Institute, Madrid, Spain
| | - Berta Obispo
- Oncology Department, Infanta Leonor Hospital, Madrid, Spain
| | - Antonio Sanchez
- Oncology Department, Puerta de Hierro Hospital, Madrid, Spain
| | - Jorge Aparicio
- Oncology Department, Polytechnic and University Hospital of La Fé, Valencia, Spain
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Davis SE, Zabotka L, Desai RJ, Wang SV, Maro JC, Coughlin K, Hernández-Muñoz JJ, Stojanovic D, Shah NH, Smith JC. Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review. Drug Saf 2023; 46:725-742. [PMID: 37340238 PMCID: PMC11635839 DOI: 10.1007/s40264-023-01325-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/31/2023] [Indexed: 06/22/2023]
Abstract
INTRODUCTION Pharmacovigilance programs protect patient health and safety by identifying adverse event signals through postmarketing surveillance of claims data and spontaneous reports. Electronic health records (EHRs) provide new opportunities to address limitations of traditional approaches and promote discovery-oriented pharmacovigilance. METHODS To evaluate the current state of EHR-based medication safety signal identification, we conducted a scoping literature review of studies aimed at identifying safety signals from routinely collected patient-level EHR data. We extracted information on study design, EHR data elements utilized, analytic methods employed, drugs and outcomes evaluated, and key statistical and data analysis choices. RESULTS We identified 81 eligible studies. Disproportionality methods were the predominant analytic approach, followed by data mining and regression. Variability in study design makes direct comparisons difficult. Studies varied widely in terms of data, confounding adjustment, and statistical considerations. CONCLUSION Despite broad interest in utilizing EHRs for safety signal identification, current efforts fail to leverage the full breadth and depth of available data or to rigorously control for confounding. The development of best practices and application of common data models would promote the expansion of EHR-based pharmacovigilance.
Collapse
Affiliation(s)
- Sharon E Davis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | - Rishi J Desai
- Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Shirley V Wang
- Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Judith C Maro
- Harvard Medical School, Boston, MA, USA
- Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | | | | | | | - Nigam H Shah
- School of Medicine, Stanford University, Stanford, CA, USA
- Stanford Health Care, Palo Alto, CA, USA
| | - Joshua C Smith
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA.
- Vanderbilt University School of Medicine, Nashville, TN, USA.
| |
Collapse
|
27
|
Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, Yin H, Xu C, Yang R, Zheng Q, Shi B. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci 2023; 15:29. [PMID: 37507396 PMCID: PMC10382494 DOI: 10.1038/s41368-023-00239-y] [Citation(s) in RCA: 65] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with billions of parameters. LLMs have stirred up much interest among researchers and practitioners in their impressive skills in natural language processing tasks, which profoundly impact various fields. This paper mainly discusses the future applications of LLMs in dentistry. We introduce two primary LLM deployment methods in dentistry, including automated dental diagnosis and cross-modal dental diagnosis, and examine their potential applications. Especially, equipped with a cross-modal encoder, a single LLM can manage multi-source data and conduct advanced natural language reasoning to perform complex clinical operations. We also present cases to demonstrate the potential of a fully automatic Multi-Modal LLM AI system for dentistry clinical application. While LLMs offer significant potential benefits, the challenges, such as data privacy, data quality, and model bias, need further study. Overall, LLMs have the potential to revolutionize dental diagnosis and treatment, which indicates a promising avenue for clinical application and research in dentistry.
Collapse
Affiliation(s)
- Hanyao Huang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China.
| | - Ou Zheng
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA.
| | - Dongdong Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Jiayi Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Zijin Wang
- Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, USA
| | - Shengxuan Ding
- College of Transportation Engineering, University of Central Florida, Orlando, USA
| | - Heng Yin
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Chuan Xu
- School of Transportation and Logistics, Southwest Jiaotong University, Chengdu, China
- C2SMART Center, Tandon School of Engineering, New York University, Brooklyn, USA
| | - Renjie Yang
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Eastern Clinic, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Qian Zheng
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Bing Shi
- State Key Laboratory of Oral Diseases & National Clinical Research Center for Oral Diseases & Department of Oral and Maxillofacial Surgery, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
28
|
Trajanov D, Trajkovski V, Dimitrieva M, Dobreva J, Jovanovik M, Klemen M, Žagar A, Robnik-Šikonja M. Review of Natural Language Processing in Pharmacology. Pharmacol Rev 2023; 75:714-738. [PMID: 36931724 DOI: 10.1124/pharmrev.122.000715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 01/18/2023] [Accepted: 03/07/2023] [Indexed: 03/19/2023] Open
Abstract
Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the past few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP: methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers. SIGNIFICANCE STATEMENT: The main objective of this work is to survey the recent use of NLP in the field of pharmacology in order to provide a comprehensive overview of the current state in the area after the rapid developments that occurred in the past few years. The resulting survey will be useful to practitioners and interested observers in the domain.
Collapse
Affiliation(s)
- Dimitar Trajanov
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Vangel Trajkovski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Makedonka Dimitrieva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Jovana Dobreva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Milos Jovanovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Matej Klemen
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Aleš Žagar
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Marko Robnik-Šikonja
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| |
Collapse
|
29
|
Chen S, Guevara M, Ramirez N, Murray A, Warner JL, Aerts HJWL, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing to Automatically Extract the Presence and Severity of Esophagitis in Notes of Patients Undergoing Radiotherapy. JCO Clin Cancer Inform 2023; 7:e2300048. [PMID: 37506330 DOI: 10.1200/cci.23.00048] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/05/2023] [Accepted: 05/26/2023] [Indexed: 07/30/2023] Open
Abstract
PURPOSE Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT. METHODS Our corpus consisted of a gold-labeled data set of 1,524 clinical notes from 124 patients with lung cancer treated with RT, manually annotated for Common Terminology Criteria for Adverse Events (CTCAE) v5.0 esophagitis grade, and a silver-labeled data set of 2,420 notes from 1,832 patients from whom toxicity grades had been collected as structured data during clinical care. We fine-tuned statistical and pretrained Bidirectional Encoder Representations from Transformers-based models for three esophagitis classification tasks: task 1, no esophagitis versus grade 1-3; task 2, grade ≤1 versus >1; and task 3, no esophagitis versus grade 1 versus grade 2-3. Transferability was tested on 345 notes from patients with esophageal cancer undergoing RT. RESULTS Fine-tuning of PubMedBERT yielded the best performance. The best macro-F1 was 0.92, 0.82, and 0.74 for tasks 1, 2, and 3, respectively. Selecting the most informative note sections during fine-tuning improved macro-F1 by ≥2% for all tasks. Silver-labeled data improved the macro-F1 by ≥3% across all tasks. For the esophageal cancer notes, the best macro-F1 was 0.73, 0.74, and 0.65 for tasks 1, 2, and 3, respectively, without additional fine-tuning. CONCLUSION To our knowledge, this is the first effort to automatically extract esophagitis toxicity severity according to CTCAE guidelines from clinical notes. This provides proof of concept for NLP-based automated detailed toxicity monitoring in expanded domains.
Collapse
Affiliation(s)
- Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Nicolas Ramirez
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Arpi Murray
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Jeremy L Warner
- Population Sciences Program, Legorreta Cancer Center, Brown University, Providence, RI
- Lifespan Cancer Institute, Providence, RI
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, the Netherlands
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA
| |
Collapse
|
30
|
Botsis T, Kreimeyer K. Improving drug safety with adverse event detection using natural language processing. Expert Opin Drug Saf 2023; 22:659-668. [PMID: 37339273 DOI: 10.1080/14740338.2023.2228197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 06/19/2023] [Indexed: 06/22/2023]
Abstract
INTRODUCTION Pharmacovigilance (PV) involves monitoring and aggregating adverse event information from a variety of data sources, including health records, biomedical literature, spontaneous adverse event reports, product labels, and patient-generated content like social media posts, but the most pertinent details in these sources are typically available in narrative free-text formats. Natural language processing (NLP) techniques can be used to extract clinically relevant information from PV texts to inform decision-making. AREAS COVERED We conducted a non-systematic literature review by querying the PubMed database to examine the uses of NLP in drug safety and distilled the findings to present our expert opinion on the topic. EXPERT OPINION New NLP techniques and approaches continue to be applied for drug safety use cases; however, systems that are fully deployed and in use in a clinical environment remain vanishingly rare. To see high-performing NLP techniques implemented in the real setting will require long-term engagement with end users and other stakeholders and revised workflows in fully formulated business plans for the targeted use cases. Additionally, we found little to no evidence of extracted information placed into standardized data models, which should be a way to make implementations more portable and adaptable.
Collapse
Affiliation(s)
- Taxiarchis Botsis
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
31
|
Murali L, Gopakumar G, Viswanathan DM, Nedungadi P. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J Biomed Inform 2023:104403. [PMID: 37230406 DOI: 10.1016/j.jbi.2023.104403] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 05/27/2023]
Abstract
With the growth of data and intelligent technologies, the healthcare sector opened numerous technology that enabled services for patients, clinicians, and researchers. One major hurdle in achieving state-of-the-art results in health informatics is domain-specific terminologies and their semantic complexities. A knowledge graph crafted from medical concepts, events, and relationships acts as a medical semantic network to extract new links and hidden patterns from health data sources. Current medical knowledge graph construction studies are limited to generic techniques and opportunities and focus less on exploiting real-world data sources in knowledge graph construction. A knowledge graph constructed from Electronic Health Records (EHR) data obtains real-world data from healthcare records. It ensures better results in subsequent tasks like knowledge extraction and inference, knowledge graph completion, and medical knowledge graph applications such as diagnosis predictions, clinical recommendations, and clinical decision support. This review critically analyses existing works on medical knowledge graphs that used EHR data as the data source at (i) representation level, (ii) extraction level (iii) completion level. In this investigation, we found that EHR-based knowledge graph construction involves challenges such as high complexity and dimensionality of data, lack of knowledge fusion, and dynamic update of the knowledge graph. In addition, the study presents possible ways to tackle the challenges identified. Our findings conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.
Collapse
Affiliation(s)
- Lino Murali
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - G Gopakumar
- Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India
| | - Daleesha M Viswanathan
- Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - Prema Nedungadi
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India.
| |
Collapse
|
32
|
Rani S, Jain A. Optimizing healthcare system by amalgamation of text processing and deep learning: a systematic review. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-25. [PMID: 37362695 PMCID: PMC10183315 DOI: 10.1007/s11042-023-15539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 05/18/2022] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
The explosion of clinical textual data has drawn the attention of researchers. Owing to the abundance of clinical data, it is becoming difficult for healthcare professionals to take real-time measures. The tools and methods are lacking when compared to the amount of clinical data generated every day. This review aims to survey the text processing pipeline with deep learning methods such as CNN, RNN, LSTM, and GRU in the healthcare domain and discuss various applications such as clinical concept detection and extraction, medically aware dialogue systems, sentiment analysis of drug reviews shared online, clinical trial matching, and pharmacovigilance. In addition, we highlighted the major challenges in deploying text processing with deep learning to clinical textual data and identified the scope of research in this domain. Furthermore, we have discussed various resources that can be used in the future to optimize the healthcare domain by amalgamating text processing and deep learning.
Collapse
Affiliation(s)
- Somiya Rani
- Department of Computer Science and Engineering, NSUT East Campus (erstwhile AIACTR), Affiliated to Guru Gobind Singh Indraprastha University, Delhi, India
| | - Amita Jain
- Department of Computer Science and Engineering, Netaji Subhas University of Technology, Delhi, India
| |
Collapse
|
33
|
Sun B, Yew PY, Chi CL, Song M, Loth M, Zhang R, Straka RJ. Development and Application of Pharmacological Statin-Associated Muscle Symptoms Phenotyping Algorithms Using Structured and Unstructured Electronic Health Records Data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.04.23289523. [PMID: 37215024 PMCID: PMC10197715 DOI: 10.1101/2023.05.04.23289523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Background Statins are widely prescribed cholesterol-lowering medications in the US, but their clinical benefits can be diminished by statin-associated muscle symptoms (SAMS), leading to discontinuation. In this study, we aimed to develop and validate a pharmacological SAMS clinical phenotyping algorithm using electronic health records (EHRs) data from Minnesota Fairview. Methods We retrieved structured and unstructured EHR data of statin users and manually ascertained a gold standard set of SAMS cases and controls using the SAMS-CI tool from clinical notes in 200 patients. We developed machine learning algorithms and rule-based algorithms that incorporated various criteria, including ICD codes, statin allergy, creatine kinase elevation, and keyword mentions in clinical notes. We applied the best performing algorithm to the statin cohort to identify SAMS. Results We identified 16,889 patients who started statins in the Fairview EHR system from 2010-2020. The combined rule-based (CRB) algorithm, which utilized both clinical notes and structured data criteria, achieved similar performance compared to machine learning algorithms with a precision of 0.85, recall of 0.71, and F1 score of 0.77 against the gold standard set. Applying the CRB algorithm to the statin cohort, we identified the pharmacological SAMS prevalence to be 1.9% and selective risk factors which included female gender, coronary artery disease, hypothyroidism, use of immunosuppressants or fibrates. Conclusion Our study developed and validated a simple pharmacological SAMS phenotyping algorithm that can be used to create SAMS case/control cohort for further analysis such as developing SAMS risk prediction model.
Collapse
Affiliation(s)
- Boguang Sun
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, Minneapolis, Minnesota
| | - Pui Ying Yew
- Institute for Health Informatics, Minneapolis, Minnesota
| | - Chih-Lin Chi
- Institute for Health Informatics, Minneapolis, Minnesota
- School of Nursing, Minneapolis, Minnesota
| | - Meijia Song
- Institute for Health Informatics, Minneapolis, Minnesota
| | - Matt Loth
- Center for Learning Health System Sciences, Minneapolis, Minnesota
| | - Rui Zhang
- Institute for Health Informatics, Minneapolis, Minnesota
- Center for Learning Health System Sciences, Minneapolis, Minnesota
| | - Robert J. Straka
- Department of Experimental and Clinical Pharmacology, College of Pharmacy, Minneapolis, Minnesota
| |
Collapse
|
34
|
Mora D, Mateo J, Nieto JA, Bikdeli B, Yamashita Y, Barco S, Jimenez D, Demelo-Rodriguez P, Rosa V, Yoo HHB, Sadeghipour P, Monreal M. Machine learning to predict major bleeding during anticoagulation for venous thromboembolism: possibilities and limitations. Br J Haematol 2023; 201:971-981. [PMID: 36942630 DOI: 10.1111/bjh.18737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 02/20/2023] [Accepted: 02/24/2023] [Indexed: 03/23/2023]
Abstract
Predictive tools for major bleeding (MB) using machine learning (ML) might be advantageous over traditional methods. We used data from the Registro Informatizado de Enfermedad TromboEmbólica (RIETE) to develop ML algorithms to identify patients with venous thromboembolism (VTE) at increased risk of MB during the first 3 months of anticoagulation. A total of 55 baseline variables were used as predictors. New data prospectively collected from the RIETE were used for further validation. The RIETE and VTE-BLEED scores were used for comparisons. External validation was performed with the COMMAND-VTE database. Learning was carried out with data from 49 587 patients, of whom 873 (1.8%) had MB. The best performing ML method was XGBoost. In the prospective validation cohort the sensitivity, specificity, positive predictive value and F1 score were: 33.2%, 93%, 10%, and 15.4% respectively. F1 value for the RIETE and VTE-BLEED scores were 8.6% and 6.4% respectively. In the external validation cohort the metrics were 10.3%, 87.6%, 3.5% and 5.2% respectively. In that cohort, the F1 value for the RIETE score was 17.3% and for the VTE-BLEED score 9.75%. The performance of the XGBoost algorithm was better than that from the RIETE and VTE-BLEED scores only in the prospective validation cohort, but not in the external validation cohort.
Collapse
Affiliation(s)
- Damián Mora
- Department of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - Jorge Mateo
- Institute of Technology, Universidad de Castilla-La Mancha, Cuenca, Spain
| | - José A Nieto
- Department of Internal Medicine, Hospital Virgen de la Luz, Cuenca, Spain
| | - Behnood Bikdeli
- Cardiovascular Medicine Division, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Thrombosis Research Group, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
- YNHH/Yale Center for Outcomes Research and Evaluation (CORE), New Haven, Connecticut, USA
- Cardiovascular Research Foundation (CRF), New York, New York, USA
| | - Yugo Yamashita
- Department of Cardiovascular Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Stefano Barco
- Department of Angiology, University Hospital Zurich, Zurich, Switzerland
- Center for Thrombosis and Hemostasis, University Hospital Mainz, Mainz, Germany
| | - David Jimenez
- Respiratory Department, Hospital Ramón y Cajal and Universidad de Alcalá (IRYCIS), Madrid, Spain
- CIBER de Enfermedades Respiratorias (CIBERES), Madrid, Spain
| | - Pablo Demelo-Rodriguez
- Department of Internal Medicine, Hospital General Universitario Gregorio Marañón, Madrid, Spain
| | - Vladimir Rosa
- Department of Internal Medicine, Hospital Universitario Virgen de Arrixaca, Murcia, Spain
| | - Hugo Hyung Bok Yoo
- Department of Internal Medicine - Pulmonary Division, Botucatu Medical School - São Paulo State University (UNESP), São Paulo, Brazil
| | - Parham Sadeghipour
- Department of Peripheral Vascular Diseases, Rajaie Cardiovascular Medical and Research Center, Tehran, Iran
| | - Manuel Monreal
- Chair of Thromboembolic Diseases, Universidad Católica San Antonio de Murcia, Murcia, Spain
| | | |
Collapse
|
35
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
36
|
Lee RY, Kross EK, Torrence J, Li KS, Sibley J, Cohen T, Lober WB, Engelberg RA, Curtis JR. Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome. JAMA Netw Open 2023; 6:e231204. [PMID: 36862411 PMCID: PMC9982698 DOI: 10.1001/jamanetworkopen.2023.1204] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
IMPORTANCE Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set from identified patients with documented goals-of-care discussions with moderate accuracy (maximal F1 score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial.
Collapse
Affiliation(s)
- Robert Y. Lee
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Erin K. Kross
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Janaki Torrence
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Kevin S. Li
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | - James Sibley
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
| | - Trevor Cohen
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | - William B. Lober
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
- Department of Global Health, University of Washington, Seattle
| | - Ruth A. Engelberg
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - J. Randall Curtis
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
- Department of Health Systems and Population Health, University of Washington, Seattle
| |
Collapse
|
37
|
Singh D, Singh A, Chawla PA. An overview of current strategies and future prospects in drug repurposing in tuberculosis. EXPLORATION OF MEDICINE 2023. [DOI: 10.37349/emed.2023.00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
A large number of the population faces mortality as an effect of tuberculosis (TB). The line of treatment in the management of TB faces a jolt with ever-increasing multi-drug resistance (DR) cases. Further, the drugs engaged in the treatment of TB are associated with different toxicities, such as renal and hepatic toxicity. Different combinations are sought for effective anti-tuberculosis (anti-TB) effects with a decrease in toxicity. In this regard, drug repurposing has been very promising in improving the efficacy of drugs by enhancement of bioavailability and widening the safety margin. The success in drug repurposing lies in specified binding and inhibition of a particular target in the drug molecule. Different drugs have been repurposed for various ailments like cancer, Alzheimer’s disease, acquired immunodeficiency syndrome (AIDS), hair loss, etc. Repurposing in anti-TB drugs holds great potential too. The use of whole-cell screening assays and the availability of large chemical compounds for testing against Mycobacterium tuberculosis poses a challenge in this development. The target-based discovery of sites has emerged in the form of phenotypic screening as ethionamide R (EthR) and malate synthase inhibitors are similar to pharmaceuticals. In this review, the authors have thoroughly described the drug repurposing techniques on the basis of pharmacogenomics and drug metabolism, pathogen-targeted therapy, host-directed therapy, and bioinformatics approaches for the identification of drugs. Further, the significance of repurposing of drugs elaborated on large databases has been revealed. The role of genomics and network-based methods in drug repurposing has been also discussed in this article.
Collapse
Affiliation(s)
- Dilpreet Singh
- Department of Pharmaceutics, ISF College of Pharmacy, Moga 142001, Punjab, India
| | - Amrinder Singh
- Department of Pharmaceutics, ISF College of Pharmacy, Moga 142001, Punjab, India
| | - Pooja A. Chawla
- Department of Pharmaceutical Chemistry, ISF College of Pharmacy, Moga 142001, Punjab, India
| |
Collapse
|
38
|
Keloth VK, Zhou S, Lindemann L, Zheng L, Elhanan G, Einstein AJ, Geller J, Perl Y. Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients. BMC Med Inform Decis Mak 2023; 23:40. [PMID: 36829139 PMCID: PMC9951157 DOI: 10.1186/s12911-023-02136-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 02/09/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data. METHODS We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT. RESULTS Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage. CONCLUSION In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.
Collapse
Affiliation(s)
- Vipina K Keloth
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| | - Shuxin Zhou
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Luke Lindemann
- School of Medicine and Health Sciences, The George Washington University, Washington (D.C.), USA
| | - Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA
| | - Gai Elhanan
- Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, USA
| | - Andrew J Einstein
- Cardiology Division, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, USA
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
39
|
Carpenter KA, Altman RB. Using GPT-3 to Build a Lexicon of Drugs of Abuse Synonyms for Social Media Pharmacovigilance. Biomolecules 2023; 13:biom13020387. [PMID: 36830756 PMCID: PMC9953178 DOI: 10.3390/biom13020387] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/09/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
Drug abuse is a serious problem in the United States, with over 90,000 drug overdose deaths nationally in 2020. A key step in combating drug abuse is detecting, monitoring, and characterizing its trends over time and location, also known as pharmacovigilance. While federal reporting systems accomplish this to a degree, they often have high latency and incomplete coverage. Social-media-based pharmacovigilance has zero latency, is easily accessible and unfiltered, and benefits from drug users being willing to share their experiences online pseudo-anonymously. However, unlike highly structured official data sources, social media text is rife with misspellings and slang, making automated analysis difficult. Generative Pretrained Transformer 3 (GPT-3) is a large autoregressive language model specialized for few-shot learning that was trained on text from the entire internet. We demonstrate that GPT-3 can be used to generate slang and common misspellings of terms for drugs of abuse. We repeatedly queried GPT-3 for synonyms of drugs of abuse and filtered the generated terms using automated Google searches and cross-references to known drug names. When generated terms for alprazolam were manually labeled, we found that our method produced 269 synonyms for alprazolam, 221 of which were new discoveries not included in an existing drug lexicon for social media. We repeated this process for 98 drugs of abuse, of which 22 are widely-discussed drugs of abuse, building a lexicon of colloquial drug synonyms that can be used for pharmacovigilance on social media.
Collapse
Affiliation(s)
- Kristy A. Carpenter
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Russ B. Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, CA 94305, USA
- Correspondence:
| |
Collapse
|
40
|
Segura T, Medrano IH, Collazo S, Maté C, Sguera C, Del Rio-Bermudez C, Casero H, Salcedo I, García-García J, Alcahut-Rodríguez C, Taberna M. Symptoms timeline and outcomes in amyotrophic lateral sclerosis using artificial intelligence. Sci Rep 2023; 13:702. [PMID: 36639403 PMCID: PMC9839769 DOI: 10.1038/s41598-023-27863-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal, neurodegenerative motor neuron disease. Although an early diagnosis is crucial to provide adequate care and improve survival, patients with ALS experience a significant diagnostic delay. This study aimed to use real-world data to describe the clinical profile and timing between symptom onset, diagnosis, and relevant outcomes in ALS. Retrospective and multicenter study in 5 representative hospitals and Primary Care services in the SESCAM Healthcare Network (Castilla-La Mancha, Spain). Using Natural Language Processing (NLP), the clinical information in electronic health records of all patients with ALS was extracted between January 2014 and December 2018. From a source population of all individuals attended in the participating hospitals, 250 ALS patients were identified (61.6% male, mean age 64.7 years). Of these, 64% had spinal and 36% bulbar ALS. For most defining symptoms, including dyspnea, dysarthria, dysphagia and fasciculations, the overall diagnostic delay from symptom onset was 11 (6-18) months. Prior to diagnosis, only 38.8% of patients had visited the neurologist. In a median post-diagnosis follow-up of 25 months, 52% underwent gastrostomy, 64% non-invasive ventilation, 16.4% tracheostomy, and 87.6% riluzole treatment; these were more commonly reported (all Ps < 0.05) and showed greater probability of occurrence (all Ps < 0.03) in bulbar ALS. Our results highlight the diagnostic delay in ALS and revealed differences in the clinical characteristics and occurrence of major disease-specific events across ALS subtypes. NLP holds great promise for its application in the wider context of rare neurological diseases.
Collapse
Affiliation(s)
- Tomás Segura
- University Hospital of Albacete, Albacete, Spain.
| | | | | | | | - Carlo Sguera
- Savana Research, Madrid, Spain.,UC3M-Santander Big Data Institute, Madrid, Spain
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Murphy RM, Klopotowska JE, de Keizer NF, Jager KJ, Leopold JH, Dongelmans DA, Abu-Hanna A, Schut MC. Adverse drug event detection using natural language processing: A scoping review of supervised learning methods. PLoS One 2023; 18:e0279842. [PMID: 36595517 PMCID: PMC9810201 DOI: 10.1371/journal.pone.0279842] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 12/15/2022] [Indexed: 01/04/2023] Open
Abstract
To reduce adverse drug events (ADEs), hospitals need a system to support them in monitoring ADE occurrence routinely, rapidly, and at scale. Natural language processing (NLP), a computerized approach to analyze text data, has shown promising results for the purpose of ADE detection in the context of pharmacovigilance. However, a detailed qualitative assessment and critical appraisal of NLP methods for ADE detection in the context of ADE monitoring in hospitals is lacking. Therefore, we have conducted a scoping review to close this knowledge gap, and to provide directions for future research and practice. We included articles where NLP was applied to detect ADEs in clinical narratives within electronic health records of inpatients. Quantitative and qualitative data items relating to NLP methods were extracted and critically appraised. Out of 1,065 articles screened for eligibility, 29 articles met the inclusion criteria. Most frequent tasks included named entity recognition (n = 17; 58.6%) and relation extraction/classification (n = 15; 51.7%). Clinical involvement was reported in nine studies (31%). Multiple NLP modelling approaches seem suitable, with Long Short Term Memory and Conditional Random Field methods most commonly used. Although reported overall performance of the systems was high, it provides an inflated impression given a steep drop in performance when predicting the ADE entity or ADE relation class. When annotating corpora, treating an ADE as a relation between a drug and non-drug entity seems the best practice. Future research should focus on semi-automated methods to reduce the manual annotation effort, and examine implementation of the NLP methods in practice.
Collapse
Affiliation(s)
- Rachel M. Murphy
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Joanna E. Klopotowska
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Nicolette F. de Keizer
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Kitty J. Jager
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Jan Hendrik Leopold
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Dave A. Dongelmans
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Department of Intensive Care Medicine, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Martijn C. Schut
- Department of Medical Informatics, Amsterdam UMC (location AMC), Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
42
|
Sorbello A, Haque SA, Hasan R, Jermyn R, Hussein A, Vega A, Zembrzuski K, Ripple A, Ahadpour M. Artificial Intelligence-Enabled Software Prototype to Inform Opioid Pharmacovigilance From Electronic Health Records: Development and Usability Study. JMIR AI 2023; 2:e45000. [PMID: 37771410 PMCID: PMC10538589 DOI: 10.2196/45000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 03/29/2023] [Accepted: 06/02/2023] [Indexed: 09/30/2023]
Abstract
Background The use of patient health and treatment information captured in structured and unstructured formats in computerized electronic health record (EHR) repositories could potentially augment the detection of safety signals for drug products regulated by the US Food and Drug Administration (FDA). Natural language processing and other artificial intelligence (AI) techniques provide novel methodologies that could be leveraged to extract clinically useful information from EHR resources. Objective Our aim is to develop a novel AI-enabled software prototype to identify adverse drug event (ADE) safety signals from free-text discharge summaries in EHRs to enhance opioid drug safety and research activities at the FDA. Methods We developed a prototype for web-based software that leverages keyword and trigger-phrase searching with rule-based algorithms and deep learning to extract candidate ADEs for specific opioid drugs from discharge summaries in the Medical Information Mart for Intensive Care III (MIMIC III) database. The prototype uses MedSpacy components to identify relevant sections of discharge summaries and a pretrained natural language processing (NLP) model, Spark NLP for Healthcare, for named entity recognition. Fifteen FDA staff members provided feedback on the prototype's features and functionalities. Results Using the prototype, we were able to identify known, labeled, opioid-related adverse drug reactions from text in EHRs. The AI-enabled model achieved accuracy, recall, precision, and F1-scores of 0.66, 0.69, 0.64, and 0.67, respectively. FDA participants assessed the prototype as highly desirable in user satisfaction, visualizations, and in the potential to support drug safety signal detection for opioid drugs from EHR data while saving time and manual effort. Actionable design recommendations included (1) enlarging the tabs and visualizations; (2) enabling more flexibility and customizations to fit end users' individual needs; (3) providing additional instructional resources; (4) adding multiple graph export functionality; and (5) adding project summaries. Conclusions The novel prototype uses innovative AI-based techniques to automate searching for, extracting, and analyzing clinically useful information captured in unstructured text in EHRs. It increases efficiency in harnessing real-world data for opioid drug safety and increases the usability of the data to support regulatory review while decreasing the manual research burden.
Collapse
Affiliation(s)
- Alfred Sorbello
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Syed Arefinul Haque
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Rashedul Hasan
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| | - Richard Jermyn
- Neuromuscular Institute, Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, United States
| | - Ahmad Hussein
- Neuromuscular Institute, Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, United States
| | - Alex Vega
- Neuromuscular Institute, Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, United States
| | - Krzysztof Zembrzuski
- Neuromuscular Institute, Rowan-Virtua School of Osteopathic Medicine, Stratford, NJ, United States
| | - Anna Ripple
- Lister Hill National Center for Biomedical Communications, National Library of Medicine-National Institutes of Health, Rockville, MD, United States
| | - Mitra Ahadpour
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, United States
| |
Collapse
|
43
|
Valdés Sanz N, García-Layana A, Colas T, Moriche M, Montero Moreno M, Ciprandi G. Clinical Characterization of Inpatients with Acute Conjunctivitis: A Retrospective Analysis by Natural Language Processing and Machine Learning. APPLIED SCIENCES 2022; 12:12352. [DOI: 10.3390/app122312352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2023]
Abstract
Background Acute bacterial conjunctivitis (ABC) is a relatively common medical condition caused by different pathogens. Although it rarely threatens vision, it is one of the most common conditions that cause red eyes and may be accompanied by discomfort and discharge. The study aimed to identify and characterize inpatients with ABC treated with topical antibiotics. Methods The EHRead® technology, based on natural language processing (NLP) and machine learning, was used to extract and analyze the clinical information in the electronic health records (EHRs) of antibiotic-treated patients with conjunctivitis and admitted to five hospitals in Spain between January 2014 and December 2018. Categorical variables were described by frequency, whereas numerical variables included the mean, standard deviation, median, and quartiles. Results From a source population of 2,071,812 adult patients who attended the participating hospitals in the study period, 11,110 patients diagnosed with acute conjunctivitis were identified. Six thousand five hundred eighty-three patients were treated with antibiotics, comprising the final study population. Microbiology was tested only on 12.1% of patients. Antibiotics, mainly tobramycin, and corticosteroids, mainly dexamethasone, were usually prescribed. NSAIDs were also used in about 50% of patients, always combined with antibiotics. Conclusions The present study provided a realistic representation of the hospital practice concerning managing patients with acute antibiotic-treated conjunctivitis. The diagnosis is usually based on the clinical ground, microbiology is rarely tested, few bacteria species are involved, and local antibiotics are frequently associated with corticosteroids and/or NSAIDs. Moreover, this study provided clinically relevant outcomes, based on new technology, that could be applied in clinical practice.
Collapse
Affiliation(s)
- Nuria Valdés Sanz
- Ophthalmology Department, Hospital Puerta de Hierro Hospital, 28222 Madrid, Spain
| | | | - Teresa Colas
- Ophthalmology Department, Infanta Leonor University Hospital, 28031 Madrid, Spain
| | - Manuel Moriche
- Ophthalmology Department, Infanta Sofía University Hospital, 28703 Madrid, Spain
| | | | - Giorgio Ciprandi
- Outpatients Clinic, Casa di Cura Villa Montallegro, 16145 Genoa, Italy
| |
Collapse
|
44
|
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, Luo Y. Multimodal machine learning in precision health: A scoping review. NPJ Digit Med 2022; 5:171. [PMID: 36344814 PMCID: PMC9640667 DOI: 10.1038/s41746-022-00712-8] [Citation(s) in RCA: 126] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/14/2022] [Indexed: 11/09/2022] Open
Abstract
Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support. Its use has historically been focused on single modal data. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 128 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. Early fusion was the most common data merging strategy. Notably, there was an improvement in predictive performance when using data fusion. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Few papers compared the outputs of a multimodal approach with a unimodal prediction. However, those that did achieved an average increase of 6.4% in predictive accuracy. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.
Collapse
Affiliation(s)
- Adrienne Kline
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Hanyin Wang
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Yikuan Li
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Saya Dennis
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA
| | - Zhenxing Xu
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Fei Wang
- Department of Population Health Sciences, Cornell University, New York, 10065, NY, USA
| | - Feixiong Cheng
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, 44195, OH, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, 60201, IL, USA.
| |
Collapse
|
45
|
Sarri G, Bennett D, Debray T, Deruaz‐Luyet A, Soriano Gabarró M, Largent JA, Li X, Liu W, Lund JL, Moga DC, Gokhale M, Rentsch CT, Wen X, Yanover C, Ye Y, Yun H, Zullo AR, Lin KJ. ISPE-Endorsed Guidance in Using Electronic Health Records for Comparative Effectiveness Research in COVID-19: Opportunities and Trade-Offs. Clin Pharmacol Ther 2022; 112:990-999. [PMID: 35170021 PMCID: PMC9087010 DOI: 10.1002/cpt.2560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Abstract
As the scientific research community along with healthcare professionals and decision makers around the world fight tirelessly against the coronavirus disease 2019 (COVID-19) pandemic, the need for comparative effectiveness research (CER) on preventive and therapeutic interventions for COVID-19 is immense. Randomized controlled trials markedly under-represent the frail and complex patients seen in routine care, and they do not typically have data on long-term treatment effects. The increasing availability of electronic health records (EHRs) for clinical research offers the opportunity to generate timely real-world evidence reflective of routine care for optimal management of COVID-19. However, there are many potential threats to the validity of CER based on EHR data that are not originally generated for research purposes. To ensure unbiased and robust results, we need high-quality healthcare databases, rigorous study designs, and proper implementation of appropriate statistical methods. We aimed to describe opportunities and challenges in EHR-based CER for COVID-19-related questions and to introduce best practices in pharmacoepidemiology to minimize potential biases. We structured our discussion into the following topics: (1) study population identification based on exposure status; (2) ascertainment of outcomes; (3) common biases and potential solutions; and (iv) data operational challenges specific to COVID-19 CER using EHRs. We provide structured guidance for the proper conduct and appraisal of drug and vaccine effectiveness and safety research using EHR data for the pandemic. This paper is endorsed by the International Society for Pharmacoepidemiology (ISPE).
Collapse
Affiliation(s)
| | - Dimitri Bennett
- Takeda Global Evidence and OutcomesTakeda Pharmaceuticals USA, IncCambridgeMassachusettsUSA
- Center for Clinical Epidemiology and BiostatisticsPerelman School of Medicine at the University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Thomas Debray
- Julius Center for Health Sciences and Primary CareUniversity Medical Centre UtrechtUtrechtThe Netherlands
- Smart Data Analysis and StatisticsUtrechtThe Netherlands
| | - Anouk Deruaz‐Luyet
- Global Epidemiology and Real‐World Evidence CoECorporate Medical AffairsBoehringer Ingelheim International GmbHIngelheim‐am‐RheinGermany
| | - Montse Soriano Gabarró
- Bayer Partnerships and Integrated Evidence Generation OfficeIntegrated Evidence Generation & Business InnovationMedical Affairs & PharmacovigilanceBayer AGBerlinGermany
| | | | - Xiaojuan Li
- Department of Population MedicineHarvard Medical School and Harvard Pilgrim Health Care InstituteBostonMassachusettsUSA
| | - Wei Liu
- Division of EpidemiologyOffice of Surveillance and EpidemiologyCenter for Drug Evaluation and Research, Food and Drug AdministrationSilver SpringMarylandUSA
| | - Jennifer L. Lund
- Department of EpidemiologyGillings School of Global Public HealthUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Daniela C. Moga
- Department of Pharmacy Practice and ScienceCollege of PharmacyUniversity of KentuckyLexingtonKentuckyUSA
| | - Mugdha Gokhale
- Department of EpidemiologyGillings School of Global Public HealthUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
- Department of EpidemiologyMerckWest PointPennsylvaniaUSA
| | - Christopher T. Rentsch
- Faculty of Epidemiology and Population HealthDepartment of Non‐communicable Disease EpidemiologyLondon School of Hygiene & Tropical MedicineLondonUK
- Department of Internal MedicineYale School of MedicineNew HavenConnecticutUSA
| | - Xuerong Wen
- Health OutcomesPharmacy PracticeCollege of PharmacyUniversity of Rhode IslandKinstonRhode IslandUSA
| | | | - Yizhou Ye
- Global Epidemiology, Pharmacovigilance and Patient SafetyAbbVie IncNorth ChicagoIllinoisUSA
| | - Huifeng Yun
- Department of EpidemiologyUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Andrew R. Zullo
- Department of Health Services, Policy, and PracticeBrown University School of Public HealthProvidenceRhode IslandUSA
- Department of EpidemiologyBrown University School of Public HealthProvidenceRhode IslandUSA
- Center of Innovation in Long‐Term Services and SupportsProvidence Veterans Affairs Medical CenterProvidenceRhode IslandUSA
- Department of PharmacyLifespan‐Rhode Island HospitalProvidenceRhode IslandUSA
| | - Kueiyu Joshua Lin
- Brigham and Women’s Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| |
Collapse
|
46
|
Major Adverse Cardiovascular Events in Coronary Type 2 Diabetic Patients: Identification of Associated Factors Using Electronic Health Records and Natural Language Processing. J Clin Med 2022; 11:jcm11206004. [PMID: 36294325 PMCID: PMC9605132 DOI: 10.3390/jcm11206004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 09/26/2022] [Accepted: 10/06/2022] [Indexed: 11/22/2022] Open
Abstract
Patients with Type 2 diabetes mellitus (T2DM) and coronary artery disease (CAD) are at high risk of developing major adverse cardiovascular events (MACE). This is a multicenter, retrospective, and observational study performed in Spain aimed to characterize these patients in a real-world setting. Unstructured data from the Electronic Health Records were extracted by EHRead®, a technology based on Natural Language Processing and machine learning. The association between new MACE and the variables of interest were investigated by univariable and multivariable analyses. From a source population of 2,184,662 patients, we identified 4072 adults diagnosed with T2DM and CAD (62.2% male, mean age 70 ± 11). The main comorbidities observed included arterial hypertension, hyperlipidemia, and obesity, with metformin and statins being the treatments most frequently prescribed. MACE development was associated with multivessel (Hazard Ratio (HR) = 2.49) and single coronary vessel disease (HR = 1.71), transient ischemic attack (HR = 2.01), heart failure (HR = 1.32), insulin treatment (HR = 1.40), and percutaneous coronary intervention (PCI) (HR = 2.27), whilst statins (HR = 0.73) were associated with a lower risk of MACE occurrence. In conclusion, we found six risk factors associated with the development of MACE which were related with cardiovascular diseases and T2DM severity, and treatment with statins was identified as a protective factor for new MACE in this study.
Collapse
|
47
|
Patient journey of individuals tested for HCV in Spain: LiverTAI, a retrospective analysis of EHRs through natural language processing. GASTROENTEROLOGÍA Y HEPATOLOGÍA 2022:S0210-5705(22)00253-9. [PMID: 36273653 DOI: 10.1016/j.gastrohep.2022.10.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 09/14/2022] [Accepted: 10/16/2022] [Indexed: 11/27/2022]
|
48
|
Sun G, Dong D, Dong Z, Zhang Q, Fang H, Wang C, Zhang S, Wu S, Dong Y, Wan Y. Drug repositioning: A bibliometric analysis. Front Pharmacol 2022; 13:974849. [PMID: 36225586 PMCID: PMC9549161 DOI: 10.3389/fphar.2022.974849] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/12/2022] [Indexed: 11/14/2022] Open
Abstract
Drug repurposing has become an effective approach to drug discovery, as it offers a new way to explore drugs. Based on the Science Citation Index Expanded (SCI-E) and Social Sciences Citation Index (SSCI) databases of the Web of Science core collection, this study presents a bibliometric analysis of drug repurposing publications from 2010 to 2020. Data were cleaned, mined, and visualized using Derwent Data Analyzer (DDA) software. An overview of the history and development trend of the number of publications, major journals, major countries, major institutions, author keywords, major contributors, and major research fields is provided. There were 2,978 publications included in the study. The findings show that the United States leads in this area of research, followed by China, the United Kingdom, and India. The Chinese Academy of Science published the most research studies, and NIH ranked first on the h-index. The Icahn School of Medicine at Mt Sinai leads in the average number of citations per study. Sci Rep, Drug Discov. Today, and Brief. Bioinform. are the three most productive journals evaluated from three separate perspectives, and pharmacology and pharmacy are unquestionably the most commonly used subject categories. Cheng, FX; Mucke, HAM; and Butte, AJ are the top 20 most prolific and influential authors. Keyword analysis shows that in recent years, most research has focused on drug discovery/drug development, COVID-19/SARS-CoV-2/coronavirus, molecular docking, virtual screening, cancer, and other research areas. The hotspots have changed in recent years, with COVID-19/SARS-CoV-2/coronavirus being the most popular topic for current drug repurposing research.
Collapse
Affiliation(s)
- Guojun Sun
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Dashun Dong
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Zuojun Dong
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Qian Zhang
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Hui Fang
- Institute of Information Resource, Zhejiang University of Technology, Hangzhou, China
| | - Chaojun Wang
- Hangzhou Aeronautical Sanatorium for Special Service of Chinese Air Force, Hangzhou, China
| | - Shaoya Zhang
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Shuaijun Wu
- Institute of Pharmaceutical Preparations, Department of Pharmacy, Zhejiang University of Technology, Hangzhou, China
| | - Yichen Dong
- Faculty of Chinese Medicine, Macau University of Science and Technology, Macau, China
| | - Yuehua Wan
- Institute of Information Resource, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
49
|
Acosta JD, Chandra A, Yeung D, Nelson C, Qureshi N, Blagg T, Martin LT. What Data Should Be Included in a Modern Public Health Data System. BIG DATA 2022; 10:S9-S14. [PMID: 36070507 PMCID: PMC9508449 DOI: 10.1089/big.2022.0205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The public is inundated with data, both in where data are ubiquitously collected and in how organizations are using data to drive public sector and commercial decisions. The public health data system is no exception to this flood of data, both in growing data volume and variety. However, what are collected and analyzed about the health status of the nation, how particular data and measures are prioritized for parsimony, and how those data provide a signal for where to invest to address health inequities are in dire need of a reboot. As with other articles in this supplement, this article builds from a literature review, an environmental scan, and deliberations from the National Commission to Transform Public Health Data Systems. The article summarizes what data should be included and identifies where the technology and data sectors can contribute to fill current gaps to measure equity, positive health, and well-being.
Collapse
Affiliation(s)
| | | | | | | | - Nabeel Qureshi
- Pardee RAND Graduate School, Santa Monica, California, USA
| | - Tara Blagg
- Pardee RAND Graduate School, Santa Monica, California, USA
| | | |
Collapse
|
50
|
Abstract
Authors' views on the role of artificial intelligence and machine learning in pharmacovigilance. (MP4 139807 kb).
Collapse
Affiliation(s)
- Andrew Bate
- GSK, Brentford, UK.
- LSHTM, London, UK.
- New York University, New York, NY, USA.
| | - Yuan Luo
- Northwestern University, Evanston, IL, USA
| |
Collapse
|