1
|
Hur B, Verspoor KM, Baldwin T, Hardefeldt LY, Pfeiffer C, Mansfield C, Scarborough R, Gilkerson JR. Using natural language processing and patient journey clustering for temporal phenotyping of antimicrobial therapies for cat bite abscesses. Prev Vet Med 2024; 223:106112. [PMID: 38176151 DOI: 10.1016/j.prevetmed.2023.106112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 11/09/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
BACKGROUND Temporal phenotyping of patient journeys, which capture the common sequence patterns of interventions in the treatment of a specific condition, is useful to support understanding of antimicrobial usage in veterinary patients. Identifying and describing these phenotypes can inform antimicrobial stewardship programs designed to fight antimicrobial resistance, a major health crisis affecting both humans and animals, in which veterinarians have an important role to play. OBJECTIVE This research proposes a framework for extracting temporal phenotypes of patient journeys from clinical practice data through the application of natural language processing (NLP) and unsupervised machine learning (ML) techniques, using cat bite abscesses as a model condition. By constructing temporal phenotypes from key events, the relationship between antimicrobial administration and surgical interventions can be described, and similar treatment patterns can be grouped together to describe outcomes associated with specific antimicrobial selection. METHODS Cases identified as having a cat bite abscess as a diagnosis were extracted from VetCompass Australia, a database of veterinary clinical records. A classifier was trained and used to label the most clinically relevant event features in each record as chosen by a group of veterinarians. The labeled records were processed into coded character strings, where each letter represents a summary of specific types of treatments performed at a given visit. The sequences of letters representing the cases were clustered based on weighted Levenshtein edit distances with KMeans+ + to identify the main variations of the patient treatment journeys, including the antimicrobials used and their duration of administration. RESULTS A total of 13,744 records that met the selection criteria was extracted and grouped into 8436 cases. There were 9 clinically distinct event sequence patterns (temporal phenotypes) of patient journeys identified, representing the main sequences in which surgery and antimicrobial interventions are performed. Patients receiving amoxicillin and surgery had the shortest duration of antimicrobial administration (median of 3.4 days) and patients receiving cefovecin with no surgical intervention had the longest antimicrobial treatment duration (median of 27 days). CONCLUSION Our study demonstrates methods to extract and provide an overview of temporal phenotypes of patient journeys, which can be applied to text-based clinical records for multiple species or clinical conditions. We demonstrate the effectiveness of this approach to derive real-world evidence of treatment impacts using cat bite abscesses as a model condition to describe patterns of antimicrobial therapy prescriptions and their outcomes.
Collapse
Affiliation(s)
- Brian Hur
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia; School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia; Biomedical Informatics and Medical Education, University of Washington School of Medicine, Seattle, WA, USA.
| | - Karin M Verspoor
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia; School of Computing Technologies, RMIT University, Melbourne, Victoria, Australia
| | - Timothy Baldwin
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia; Department of Natural Language Processing, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Laura Y Hardefeldt
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia
| | - Caitlin Pfeiffer
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia
| | - Caroline Mansfield
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia
| | - Riati Scarborough
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia
| | - James R Gilkerson
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
2
|
Flothow A, Novelli A, Sundmacher L. Analytical methods for identifying sequences of utilization in health data: a scoping review. BMC Med Res Methodol 2023; 23:212. [PMID: 37759162 PMCID: PMC10523647 DOI: 10.1186/s12874-023-02019-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 08/08/2023] [Indexed: 09/29/2023] Open
Abstract
BACKGROUND Healthcare, as with other sectors, has undergone progressive digitalization, generating an ever-increasing wealth of data that enables research and the analysis of patient movement. This can help to evaluate treatment processes and outcomes, and in turn improve the quality of care. This scoping review provides an overview of the algorithms and methods that have been used to identify care pathways from healthcare utilization data. METHOD This review was conducted according to the methodology of the Joanna Briggs Institute and the Preferred Reporting Items for Systematic Reviews Extension for Scoping Reviews (PRISMA-ScR) Checklist. The PubMed, Web of Science, Scopus, and EconLit databases were searched and studies published in English between 2000 and 2021 considered. The search strategy used keywords divided into three categories: the method of data analysis, the requirement profile for the data, and the intended presentation of results. Criteria for inclusion were that health data were analyzed, the methodology used was described and that the chronology of care events was considered. In a two-stage review process, records were reviewed by two researchers independently for inclusion. Results were synthesized narratively. RESULTS The literature search yielded 2,865 entries; 51 studies met the inclusion criteria. Health data from different countries ([Formula: see text]) and of different types of disease ([Formula: see text]) were analyzed with respect to different care events. Applied methods can be divided into those identifying subsequences of care and those describing full care trajectories. Variants of pattern mining or Markov models were mostly used to extract subsequences, with clustering often applied to find care trajectories. Statistical algorithms such as rule mining, probability-based machine learning algorithms or a combination of methods were also applied. Clustering methods were sometimes used for data preparation or result compression. Further characteristics of the included studies are presented. CONCLUSION Various data mining methods are already being applied to gain insight from health data. The great heterogeneity of the methods used shows the need for a scoping review. We performed a narrative review and found that clustering methods currently dominate the literature for identifying complete care trajectories, while variants of pattern mining dominate for identifying subsequences of limited length.
Collapse
Affiliation(s)
- Amelie Flothow
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany.
| | - Anna Novelli
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany
| | - Leonie Sundmacher
- Chair of Health Economics, Technical University of Munich, Georg-Brauchle-Ring, Munich, Bavaria, 80992, Germany
| |
Collapse
|
3
|
Lovis C, Siebel J, Fuhrmann S, Fischer A, Sedlmayr M, Weidner J, Bathelt F. Assessment and Improvement of Drug Data Structuredness From Electronic Health Records: Algorithm Development and Validation. JMIR Med Inform 2023; 11:e40312. [PMID: 36696159 PMCID: PMC9909518 DOI: 10.2196/40312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Digitization offers a multitude of opportunities to gain insights into current diagnostics and therapies from retrospective data. In this context, real-world data and their accessibility are of increasing importance to support unbiased and reliable research on big data. However, routinely collected data are not readily usable for research owing to the unstructured nature of health care systems and a lack of interoperability between these systems. This challenge is evident in drug data. OBJECTIVE This study aimed to present an approach that identifies and increases the structuredness of drug data while ensuring standardization according to Anatomical Therapeutic Chemical (ATC) classification. METHODS Our approach was based on available drug prescriptions and a drug catalog and consisted of 4 steps. First, we performed an initial analysis of the structuredness of local drug data to define a point of comparison for the effectiveness of the overall approach. Second, we applied 3 algorithms to unstructured data that translated text into ATC codes based on string comparisons in terms of ingredients and product names and performed similarity comparisons based on Levenshtein distance. Third, we validated the results of the 3 algorithms with expert knowledge based on the 1000 most frequently used prescription texts. Fourth, we performed a final validation to determine the increased degree of structuredness. RESULTS Initially, 47.73% (n=843,980) of 1,768,153 drug prescriptions were classified as structured. With the application of the 3 algorithms, we were able to increase the degree of structuredness to 85.18% (n=1,506,059) based on the 1000 most frequent medication prescriptions. In this regard, the combination of algorithms 1, 2, and 3 resulted in a correctness level of 100% (with 57,264 ATC codes identified), algorithms 1 and 3 resulted in 99.6% (with 152,404 codes identified), and algorithms 1 and 2 resulted in 95.9% (with 39,472 codes identified). CONCLUSIONS As shown in the first analysis steps of our approach, the availability of a product catalog to select during the documentation process is not sufficient to generate structured data. Our 4-step approach reduces the problems and reliably increases the structuredness automatically. Similarity matching shows promising results, particularly for entries with no connection to a product catalog. However, further enhancement of the correctness of such a similarity matching algorithm needs to be investigated in future work.
Collapse
Affiliation(s)
| | - Joscha Siebel
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Saskia Fuhrmann
- Center for Evidence-Based Healthcare, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany.,Hospital Pharmacy, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Andreas Fischer
- Hospital Pharmacy, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Martin Sedlmayr
- Center for Evidence-Based Healthcare, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Jens Weidner
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Franziska Bathelt
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
4
|
Yu SC, Hofford MR, Lai AM, Kollef MH, Payne PRO, Michelson AP. OUP accepted manuscript. J Am Med Inform Assoc 2022; 29:813-821. [PMID: 35092276 PMCID: PMC9006699 DOI: 10.1093/jamia/ocac005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/09/2021] [Accepted: 01/11/2022] [Indexed: 11/14/2022] Open
Abstract
Objective Materials and Methods Results Conclusion
Collapse
Affiliation(s)
- Sean C Yu
- Corresponding Author: Sean C. Yu, MS, Washington University School of Medicine in St. Louis, 4444 Forest Park Avenue, Suite 6318, St. Louis, MI 63108, USA;
| | - Mackenzie R Hofford
- Institute for Informatics, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
- Division of General Medicine, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| | - Albert M Lai
- Institute for Informatics, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| | - Marin H Kollef
- Division of Pulmonary and Critical Care, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| | - Philip R O Payne
- Institute for Informatics, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| | - Andrew P Michelson
- Institute for Informatics, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
- Division of Pulmonary and Critical Care, Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
5
|
Staples LL, Tamayo M, Yockey BD, Rudd JM, Hill N, Fontana SJ, Ray HE, DeMaio J. Characterizing managing physicians by claims sequences in episodes of care. J Biomed Inform 2021; 117:103759. [PMID: 33766779 DOI: 10.1016/j.jbi.2021.103759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 03/15/2021] [Accepted: 03/17/2021] [Indexed: 11/24/2022]
Abstract
Value-based healthcare in the US is a payment structure that ties reimbursement to quality rather than volume alone. One model of value-based care is the Tennessee Division of TennCare's Episodes of Care program, which groups common health conditions into episodes using specified time windows, medical code sets and quality metrics as defined in each episode's Detailed Business Requirements [1,2]. Tennessee's program assigns responsibility for an episode to a managing physician, presenting a unique opportunity to study physician variability in cost and quality within these structured episodes. This paper proposes a pipeline for analysis demonstrated using a cohort of 599 Outpatient and Non-Acute Inpatient Cholecystectomy episodes managed by BlueCross BlueShield of Tennessee in 2016. We sorted episode claims by date of service, then calculated the pairwise Levenshtein distance between all episodes. Next, we adjusted the resulting matrix by cost dissimilarity and performed agglomerative clustering. We then examined the lowest and highest average episode cost clusters for patterns in cost and quality. Our results indicate that the facility type where the surgery takes place is important: outpatient ambulatory care center for the lowest cost cluster, and hospital operating room for the highest cost cluster. Average patient risk scores were higher in the highest cost cluster than the lowest cost cluster. Readmission rate (a quality metric tied to managing physician performance) was low for the whole cohort. Lastly, we explain how our analytical pipeline can be generalized and extended to domains beyond Episodes of Care.
Collapse
Affiliation(s)
- Lauren L Staples
- Analytics and Data Science Institute, Kennesaw State University, GA, USA; Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA
| | - Morgan Tamayo
- Analytics and Data Science Institute, Kennesaw State University, GA, USA; Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA
| | - Bryan D Yockey
- Analytics and Data Science Institute, Kennesaw State University, GA, USA; Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA
| | - Jessica M Rudd
- Analytics and Data Science Institute, Kennesaw State University, GA, USA; Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA
| | - Nicole Hill
- Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA
| | - Scott J Fontana
- Provider Performance Analytics, BlueCross BlueShield of Tennessee, TN, USA.
| | - Herman E Ray
- Analytics and Data Science Institute, Kennesaw State University, GA, USA
| | - Joe DeMaio
- Analytics and Data Science Institute, Kennesaw State University, GA, USA
| |
Collapse
|