1
|
McDaniel CC, Lo-Ciganic WH, Chou C. Diabetes-related complications, glycemic levels, and healthcare utilization outcomes after therapeutic inertia in type 2 diabetes mellitus. Prim Care Diabetes 2024; 18:188-195. [PMID: 38185576 DOI: 10.1016/j.pcd.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/09/2024]
Abstract
AIMS To assess diabetes-related complications, glycemic levels, and healthcare utilization 12 months after exposure to therapeutic inertia among patients with type 2 diabetes mellitus (T2D). METHODS This retrospective cohort study analyzed data from the OneFlorida Clinical Research Consortium (electronic health records from Florida practices/clinics). The cohort included adult patients (≥18 years old) with T2D who had an HbA1c≥7.0% (53 mmol/mol) recorded from January 1, 2014-September 30, 2019. Therapeutic inertia (exposed vs. not exposed) was evaluated during the six months following HbA1c≥7.0% (53 mmol/mol). The outcomes assessed during the 12-month follow-up period included diabetes-related complications (continuous Diabetes Complications and Severity Index (DCSI)), glycemic levels (continuous follow-up HbA1c lab), and healthcare utilization counts. We analyzed data using multivariable regression models, adjusting for covariates. RESULTS The cohort included 26,881 patients with T2D (58.94% White race, 49.72% female, and mean age of 58.82 (SD=13.09)). After adjusting for covariates, therapeutic inertia exposure was associated with lower DCSI (estimate=-0.14 (SE=0.03), p < 0.001), higher follow-up HbA1c (estimate=0.14 (SE=0.04), p < 0.001), and lower rates of ambulatory visits (rate ratio=0.79, 95% CI=0.75-0.82). CONCLUSIONS Findings communicate the clinical practice implications and public health implications for combating therapeutic inertia in diabetes care.
Collapse
Affiliation(s)
- Cassidi C McDaniel
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL, USA.
| | - Wei-Hsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes and Policy, University of Florida, College of Pharmacy, Gainesville, FL, USA; Division of General Internal Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA; Center for Pharmaceutical Policy and Prescribing, University of Pittsburgh, Pittsburgh, PA, USA; North Florida/South Georgia Veterans Health System, Geriatric Research Education and Clinical Center, Gainesville, FL, USA
| | - Chiahung Chou
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL, USA; Department of Medical Research, China Medical University Hospital, Taichung City, Taiwan
| |
Collapse
|
2
|
Zayed AM, Saegeman V, Delvaux N. Establishing the Reportable Interval for Routine Clinical Laboratory Tests: A Data-Driven Strategy Leveraging Retrospective Electronic Medical Record Data. J Appl Lab Med 2024:jfae021. [PMID: 38642405 DOI: 10.1093/jalm/jfae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 02/05/2024] [Indexed: 04/22/2024]
Abstract
BACKGROUND This paper presents a data-driven strategy for establishing the reportable interval in clinical laboratory testing. The reportable interval defines the range of laboratory result values beyond which reporting should be withheld. The lack of clear guidelines and methodology for determining the reportable interval has led to potential errors in reporting and patient risk. METHODS To address this gap, the study developed an integrated strategy that combines statistical analysis, expert review, and hypothetical outlier calculations. A large data set from an accredited clinical laboratory was utilized, analyzing over 124 million laboratory test records from 916 distinct tests. The Dixon test was applied to identify outliers and establish the highest and lowest non-outlier result values for each test, which were validated by clinical pathology experts. The methodology also included matching the reportable intervals with relevant Logical Observation Identifiers Names and Codes (LOINC) and Unified Code for Units of Measure (UCUM)-valid units for broader applicability. RESULTS Upon establishing the reportable interval for 135 routine laboratory tests (493 LOINC codes), we applied these to a primary care laboratory data set of 23 million records, demonstrating their efficacy with over 1% of result records identified as implausible. CONCLUSIONS We developed and tested a data-driven strategy for establishing reportable intervals utilizing large electronic medical record (EMR) data sets. Implementing the established interval in clinical laboratory settings can improve autoverification systems, enhance data reliability, and reduce errors in patient care. Ongoing refinement and reporting of cases exceeding the reportable limits will contribute to continuous improvement in laboratory result management and patient safety.
Collapse
Affiliation(s)
- Ahmed M Zayed
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
- Laboratory Medicine Department, Menoufia University National Liver Institute, Shebin El-Kom, Egypt
| | - Veroniek Saegeman
- Vitaz Sint-Niklaas Moerland, Clinical Laboratory, Sint-Niklaas, Oost-Vlaanderen, Belgium
| | - Nicolas Delvaux
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| |
Collapse
|
3
|
Ru B, Sillah A, Desai K, Chandwani S, Yao L, Kothari S. Real-World Data Quality Framework for Oncology Time to Treatment Discontinuation Use Case: Implementation and Evaluation Study. JMIR Med Inform 2024; 12:e47744. [PMID: 38446504 PMCID: PMC10955397 DOI: 10.2196/47744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 11/30/2023] [Accepted: 01/14/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND The importance of real-world evidence is widely recognized in observational oncology studies. However, the lack of interoperable data quality standards in the fragmented health information technology landscape represents an important challenge. Therefore, adopting validated systematic methods for evaluating data quality is important for oncology outcomes research leveraging real-world data (RWD). OBJECTIVE This study aims to implement real-world time to treatment discontinuation (rwTTD) for a systemic anticancer therapy (SACT) as a new use case for the Use Case Specific Relevance and Quality Assessment, a framework linking data quality and relevance in fit-for-purpose RWD assessment. METHODS To define the rwTTD use case, we mapped the operational definition of rwTTD to RWD elements commonly available from oncology electronic health record-derived data sets. We identified 20 tasks to check the completeness and plausibility of data elements concerning SACT use, line of therapy (LOT), death date, and length of follow-up. Using descriptive statistics, we illustrated how to implement the Use Case Specific Relevance and Quality Assessment on 2 oncology databases (Data sets A and B) to estimate the rwTTD of an SACT drug (target SACT) for patients with advanced head and neck cancer diagnosed on or after January 1, 2015. RESULTS A total of 1200 (24.96%) of 4808 patients in Data set A and 237 (5.92%) of 4003 patients in Data set B received the target SACT, suggesting better relevance of the former in estimating the rwTTD of the target SACT. The 2 data sets differed with regard to the terminology used for SACT drugs, LOT format, and target SACT LOT distribution over time. Data set B appeared to have less complete SACT records, longer lags in incorporating the latest data, and incomplete mortality data, suggesting a lack of fitness for estimating rwTTD. CONCLUSIONS The fit-for-purpose data quality assessment demonstrated substantial variability in the quality of the 2 real-world data sets. The data quality specifications applied for rwTTD estimation can be expanded to support a broad spectrum of oncology use cases.
Collapse
Affiliation(s)
- Boshu Ru
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| | - Arthur Sillah
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| | - Kaushal Desai
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| | - Sheenu Chandwani
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| | - Lixia Yao
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| | - Smita Kothari
- Center for Observational and Real-world Evidence (CORE), Merck & Co, Inc, West Point, PA, United States
| |
Collapse
|
4
|
Lewis AE, Weiskopf N, Abrams ZB, Foraker R, Lai AM, Payne PRO, Gupta A. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc 2023; 30:1730-1740. [PMID: 37390812 PMCID: PMC10531113 DOI: 10.1093/jamia/ocad120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/16/2023] [Accepted: 06/23/2023] [Indexed: 07/02/2023] Open
Abstract
OBJECTIVE We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies. MATERIALS AND METHODS We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process. RESULTS We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology. DISCUSSION There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality. CONCLUSION Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.
Collapse
Affiliation(s)
- Abigail E Lewis
- Division of Computational and Data Sciences, Washington University in St. Louis, St. Louis, Missouri, USA
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Nicole Weiskopf
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Zachary B Abrams
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Randi Foraker
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Albert M Lai
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Philip R O Payne
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Aditi Gupta
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
5
|
de la Iglesia I, Vivó M, Chocrón P, Maeztu GD, Gojenola K, Atutxa A. An open source corpus and automatic tool for section identification in Spanish health records. J Biomed Inform 2023; 145:104461. [PMID: 37536643 DOI: 10.1016/j.jbi.2023.104461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 07/12/2023] [Accepted: 07/25/2023] [Indexed: 08/05/2023]
Abstract
BACKGROUND Electronic Clinical Narratives (ECNs) store valuable individual's health information. However, there are few available open-source data. Besides, ECNs can be structurally heterogeneous, ranging from documents with explicit section headings or titles to unstructured notes. This lack of structure complicates building automatic systems and their evaluation. OBJECTIVE The aim of the present work is to provide the scientific community with a Spanish open-source dataset to build and evaluate automatic section identification systems. Together with this dataset, the purpose is to design and implement a suitable evaluation measure and a fine-tuned language model adapted to the task. MATERIALS AND METHODS A corpus of unstructured clinical records, in this case progress notes written in Spanish, was annotated with seven major section types. Existing metrics for the presented task were thoroughly assessed and, based on the most suitable one, we defined a new B2 metric better tailored given the task. RESULTS The annotated corpus, as well as the designed new evaluation script and a baseline model are freely available for the community. This model reaches an average B2 score of 71.3 on our open source dataset and an average B2 of 67.0 in data scarcity scenarios where the target corpus and its structure differs from the dataset used for training the LM. CONCLUSION Although section identification in unstructured clinical narratives is challenging, this work shows that it is possible to build competitive automatic systems when both data and the right evaluation metrics are available. The annotated data, the implemented evaluation scripts, and the section identification Language Model are open-sourced hoping that this contribution will foster the building of more and better systems.
Collapse
Affiliation(s)
- Iker de la Iglesia
- HiTZ Basque Center for Language Technology Faculty of Engineering Bilbao University of the Basque Country (UPV/EHU), Spain(1).
| | - María Vivó
- IOMED Medical Solutions SL, Barcelona, Spain(2).
| | | | | | - Koldo Gojenola
- HiTZ Basque Center for Language Technology Faculty of Engineering Bilbao University of the Basque Country (UPV/EHU), Spain(1).
| | - Aitziber Atutxa
- HiTZ Basque Center for Language Technology Faculty of Engineering Bilbao University of the Basque Country (UPV/EHU), Spain(1).
| |
Collapse
|
6
|
Baechle C, Lang A, Strassburger K, Kuss O, Burkart V, Szendroedi J, Müssig K, Weber KS, Schrauwen-Hinderling V, Herder C, Roden M, Schlesinger S. Association of a lifestyle score with cardiometabolic markers among individuals with diabetes: a cross-sectional study. BMJ Open Diabetes Res Care 2023; 11:e003469. [PMID: 37433698 DOI: 10.1136/bmjdrc-2023-003469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/13/2023] [Indexed: 07/13/2023] Open
Abstract
INTRODUCTION To investigate the associations of a lifestyle score with various cardiovascular risk markers, indicators for fatty liver disease as well as MRI-determined total, subcutaneous and visceral adipose tissue mass in adults with new-onset diabetes. RESEARCH DESIGN AND METHODS This cross-sectional analysis included 196 individuals with type 1 (median age: 35 years; median body mass index (BMI): 24 kg/m²) and 272 with type 2 diabetes (median age: 53 years; median BMI: 31 kg/m²) from the German Diabetes Study. A healthy lifestyle score was generated based on healthy diet, moderate alcohol consumption, recreational activity, non-smoking and non-obese BMI. These factors were summed to form a score ranging from 0 to 5. Multivariable linear and non-linear regression models were used. RESULTS In total, 8.1% of the individuals adhered to none or one, 17.7% to two, 29.7% to three, 26.7% to four, and 17.7% to all five favorable lifestyle factors. High compared with low adherence to the lifestyle score was associated with more favorable outcome measures, including triglycerides (β (95% CI) -49.1 mg/dL (-76.7; -21.4)), low-density lipoprotein (-16.7 mg/dL (-31.3; -2.0)), and high-density lipoprotein cholesterol (13.5 mg/dL (7.6; 19.4)), glycated hemoglobin (-0.5% (-0.8%; -0.1%)), high-sensitivity C reactive protein (-0.4 mg/dL (-0.6; -0.2)), as well as lower hepatic fat content (-8.3% (-11.9%; -4.7%)), and visceral adipose tissue mass (-1.8 dm³ (-2.9; -0.7)). The dose-response analyses showed that adherence to every additional healthy lifestyle factor was associated with more beneficial risk profiles. CONCLUSIONS Adherence to each additional healthy lifestyle factor was beneficially associated with cardiovascular risk markers, indicators of fatty liver disease and adipose tissue mass. Strongest associations were observed for adherence to all healthy lifestyle factors in combination. TRIAL REGISTRATION NUMBER NCT01055093.
Collapse
Affiliation(s)
- Christina Baechle
- Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
| | - Alexander Lang
- Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
| | - Klaus Strassburger
- Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
| | - Oliver Kuss
- Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Center for Health and Society, Medical Faculty and University Hospital Duesseldorf, Heinrich Heine University, Duesseldorf, Germany
| | - Volker Burkart
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
| | - Julia Szendroedi
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- Internal Medicine I and Clinical Chemistry, University Hospital Heidelberg, Heidelberg, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
| | - Karsten Müssig
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
- Department of Internal Medicine and Gastroenterology, Niels Stensen Hospitals, Franziskus Hospital Harderberg, Georgsmarienhutte, Germany
| | - Katharina Susanne Weber
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- Institute for Epidemiology, Kiel University, Kiel, Germany
| | - Vera Schrauwen-Hinderling
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
| | - Michael Roden
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
- Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
| | - Sabrina Schlesinger
- Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
- German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
| |
Collapse
|
7
|
Röchner P, Rothlauf F. Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries. BMC Med Res Methodol 2023; 23:125. [PMID: 37226114 DOI: 10.1186/s12874-023-01946-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. METHODS Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a real-world scenario by medical domain experts. RESULTS Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified [Formula: see text] of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, [Formula: see text] of the proposed 300 records in each sample were implausible. This corresponds to a precision of [Formula: see text] for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was [Formula: see text] and the sensitivity of FindFPOF was [Formula: see text]. Both anomaly detection methods had a specificity of [Formula: see text]. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. CONCLUSIONS Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
Collapse
Affiliation(s)
- Philipp Röchner
- Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, 55128, Mainz, Germany.
| | - Franz Rothlauf
- Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, 55128, Mainz, Germany
| |
Collapse
|
8
|
Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023; 25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open
Abstract
BACKGROUND The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. OBJECTIVE The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. RESULTS The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. CONCLUSIONS The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.
Collapse
Affiliation(s)
- Rehan Syed
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Rebekah Eden
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Tendai Makasi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Ignatius Chukwudi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Azumah Mamudu
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Mostafa Kamalpour
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Dakshi Kapugama Geeganage
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sareh Sadeghianasl
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sander J J Leemans
- Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany
| | - Kanika Goel
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Robert Andrews
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Moe Thandar Wynn
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Arthur Ter Hofstede
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Trina Myers
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
9
|
Ozonze O, Scott PJ, Hopgood AA. Automating Electronic Health Record Data Quality Assessment. J Med Syst 2023; 47:23. [PMID: 36781551 PMCID: PMC9925537 DOI: 10.1007/s10916-022-01892-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/15/2022] [Indexed: 02/15/2023]
Abstract
Information systems such as Electronic Health Record (EHR) systems are susceptible to data quality (DQ) issues. Given the growing importance of EHR data, there is an increasing demand for strategies and tools to help ensure that available data are fit for use. However, developing reliable data quality assessment (DQA) tools necessary for guiding and evaluating improvement efforts has remained a fundamental challenge. This review examines the state of research on operationalising EHR DQA, mainly automated tooling, and highlights necessary considerations for future implementations. We reviewed 1841 articles from PubMed, Web of Science, and Scopus published between 2011 and 2021. 23 DQA programs deployed in real-world settings to assess EHR data quality (n = 14), and a few experimental prototypes (n = 9), were identified. Many of these programs investigate completeness (n = 15) and value conformance (n = 12) quality dimensions and are backed by knowledge items gathered from domain experts (n = 9), literature reviews and existing DQ measurements (n = 3). A few DQA programs also explore the feasibility of using data-driven techniques to assess EHR data quality automatically. Overall, the automation of EHR DQA is gaining traction, but current efforts are fragmented and not backed by relevant theory. Existing programs also vary in scope, type of data supported, and how measurements are sourced. There is a need to standardise programs for assessing EHR data quality, as current evidence suggests their quality may be unknown.
Collapse
Affiliation(s)
- Obinwa Ozonze
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK
| | - Philip J Scott
- Institute of Management and Health, University of Wales Trinity Saint David, Lampeter, SA48 7ED, UK
| | - Adrian A Hopgood
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK.
| |
Collapse
|
10
|
A Framework for Automatic Clustering of EHR Messages Using a Spatial Clustering Approach. Healthcare (Basel) 2023; 11:healthcare11030390. [PMID: 36766965 PMCID: PMC9914110 DOI: 10.3390/healthcare11030390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/12/2023] [Accepted: 01/16/2023] [Indexed: 02/03/2023] Open
Abstract
Although Health Level Seven (HL 7) message standards (v2, v3, Clinical Document Architecture (CDA)) have been commonly adopted, there are still issues associated with them, especially the semantic interoperability issues and lack of support for smart devices (e.g., smartphones, fitness trackers, and smartwatches), etc. In addition, healthcare organizations in many countries are still using proprietary electronic health record (EHR) message formats, making it challenging to convert to other data formats-particularly the latest HL7 Fast Health Interoperability Resources (FHIR) data standard. The FHIR is based on modern web technologies such as HTTP, XML, and JSON and would be capable of overcoming the shortcomings of the previous standards and supporting modern smart devices. Therefore, the FHIR standard could help the healthcare industry to avail the latest technologies benefits and improve data interoperability. The data representation and mapping from the legacy data standards (i.e., HL7 v2 and EHR) to the FHIR is necessary for the healthcare sector. However, direct data mapping or conversion from the traditional data standards to the FHIR data standard is challenging because of the nature and formats of the data. Therefore, in this article, we propose a framework that aims to convert proprietary EHR messages into the HL7 v2 format and apply an unsupervised clustering approach using the DBSCAN (density-based spatial clustering of applications with noise) algorithm to automatically group a variety of these HL7 v2 messages regardless of their semantic origins. The proposed framework's implementation lays the groundwork to provide a generic mapping model with multi-point and multi-format data conversion input into the FHIR. Our experimental results show the proposed framework's ability to automatically cluster various HL7 v2 message formats and provide analytic insight behind them.
Collapse
|
11
|
Surian D, Wang Y, Coiera E, Magrabi F. Using automated methods to detect safety problems with health information technology: a scoping review. J Am Med Inform Assoc 2023; 30:382-392. [PMID: 36374227 PMCID: PMC9846685 DOI: 10.1093/jamia/ocac220] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/14/2022] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVE To summarize the research literature evaluating automated methods for early detection of safety problems with health information technology (HIT). MATERIALS AND METHODS We searched bibliographic databases including MEDLINE, ACM Digital, Embase, CINAHL Complete, PsycINFO, and Web of Science from January 2010 to June 2021 for studies evaluating the performance of automated methods to detect HIT problems. HIT problems were reviewed using an existing classification for safety concerns. Automated methods were categorized into rule-based, statistical, and machine learning methods, and their performance in detecting HIT problems was assessed. The review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta Analyses extension for Scoping Reviews statement. RESULTS Of the 45 studies identified, the majority (n = 27, 60%) focused on detecting use errors involving electronic health records and order entry systems. Machine learning (n = 22) and statistical modeling (n = 17) were the most common methods. Unsupervised learning was used to detect use errors in laboratory test results, prescriptions, and patient records while supervised learning was used to detect technical errors arising from hardware or software issues. Statistical modeling was used to detect use errors, unauthorized access, and clinical decision support system malfunctions while rule-based methods primarily focused on use errors. CONCLUSIONS A wide variety of rule-based, statistical, and machine learning methods have been applied to automate the detection of safety problems with HIT. Many opportunities remain to systematically study their application and effectiveness in real-world settings.
Collapse
Affiliation(s)
- Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Ying Wang
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Farah Magrabi
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
12
|
Detecting anomalous sequences in electronic health records using higher-order tensor networks. J Biomed Inform 2022; 135:104219. [DOI: 10.1016/j.jbi.2022.104219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/03/2022] [Indexed: 11/23/2022]
|
13
|
Appelbaum L, Kaplan ID, Palchuk MB, Kundrot S, Winer-Jones JP, Rinard M. Development and Experience with Cancer Risk Prediction Models Using Federated Databases and Electronic Health Records. Digit Health 2022. [DOI: 10.36255/exon-publications-digital-health-federated-databases] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
14
|
Lee HA, Lin PY, Solomatina AI, Koshevoy IO, Tunik SP, Lin HW, Pan SW, Ho ML. Glucose Sensing in Human Whole Blood Based on Near-Infrared Phosphors and Outlier Treatment with the Programming Language "R". ACS OMEGA 2022; 7:198-206. [PMID: 35036691 PMCID: PMC8757351 DOI: 10.1021/acsomega.1c04344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/09/2021] [Indexed: 06/14/2023]
Abstract
A near-infrared paper-based analytical device (NIR-PAD) for glucose detection in whole blood was based on iridium(III) metal complexes embedded in a three-dimensional (3D) enzyme gel. These complexes emit NIR luminescence, can avoid interference from the color of blood, and increase the sensitivity of sensing glucose. The glucose reaction behaviors of another two different iridium(III) and platinum(II) complexes were also tested. When the glucose solution was added to the device, the oxidation of glucose by glucose oxidase caused oxygen consumption and increased the intensity of the phosphorescence emission. To the best of our knowledge, this is the first time that data have been treated with the programming language "R", which uses Tukey's test to identify the outliers in the data and calculate a median for establishing a calibration curve, in order to improve the accuracy of NIR-PADs for sensing glucose. Compared with other published devices, NIR-PADs exhibit a wider linear range (1-30 mM, [relative emission intensity] = 0.0250[glucose] + 0.0451, and R 2 = 0.9984), a low detection limit (0.7 mM), a short response time (<2 s), and a small sample volume (2 μL). Finally, blood specimens were obtained from 19 patients enrolled in Taipei Veterans General Hospital under an approved IRB protocol (Taiwan; 2017-12-002CC). The sensors exhibited remarkable characteristics for glucose detection in comparison with other methods, including the clinical method in hospitals as well as those without blood sample pretreatment or a dilution factor. The above results confirm that NIR-PAD sensors can be put to practical use for glucose detection.
Collapse
Affiliation(s)
- Hsia-An Lee
- Department
of Chemistry, Soochow University, 70 Linhsi Road,
Shihlin, Taipei 111, Taiwan
| | - Peng-Yi Lin
- Department
of Chemistry, Soochow University, 70 Linhsi Road,
Shihlin, Taipei 111, Taiwan
| | - Anastasia I. Solomatina
- Institute
of Chemistry, St. Petersburg State University, Universitetskii pr. 26, St. Petersburg 198504, Russia
| | - Igor O. Koshevoy
- Department
of Chemistry, University of Eastern Finland, Joensuu 80101, Finland
| | - Sergey P. Tunik
- Institute
of Chemistry, St. Petersburg State University, Universitetskii pr. 26, St. Petersburg 198504, Russia
| | - Hui-Wen Lin
- Department
of Mathematics, Soochow University, 70 Linhsi Road,
Shihlin, Taipei 111, Taiwan
| | - Sheng-Wei Pan
- Department
of Chest Medicine, Taipei Veterans General
Hospital, Taipei 11217, Taiwan
- School
of Medicine, National Yang Ming Chiao Tung
University, Taipei 11221, Taiwan
| | - Mei-Lin Ho
- Department
of Chemistry, Soochow University, 70 Linhsi Road,
Shihlin, Taipei 111, Taiwan
| |
Collapse
|
15
|
Razzaghi H, Greenberg J, Bailey LC. Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use. Learn Health Syst 2022; 6:e10264. [PMID: 35036548 PMCID: PMC8753309 DOI: 10.1002/lrh2.10264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION Secondary use of electronic health record (EHR) data for research requires that the data are fit for use. Data quality (DQ) frameworks have traditionally focused on structural conformance and completeness of clinical data extracted from source systems. In this paper, we propose a framework for evaluating semantic DQ that will allow researchers to evaluate fitness for use prior to analyses. METHODS We reviewed current DQ literature, as well as experience from recent multisite network studies, and identified gaps in the literature and current practice. Derived principles were used to construct the conceptual framework with attention to both analytic fitness and informatics practice. RESULTS We developed a systematic framework that guides researchers in assessing whether a data source is fit for use for their intended study or project. It combines tools for evaluating clinical context with DQ principles, as well as factoring in the characteristics of the data source, in order to develop semantic DQ checks. CONCLUSIONS Our framework provides a systematic process for DQ development. Further work is needed to codify practices and metadata around both structural and semantic data quality.
Collapse
Affiliation(s)
- Hanieh Razzaghi
- Department of Pediatrics and Biomedical and Health InformaticsChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
- Metadata Research CenterCollege of Computing and Informatics, Drexel UniversityPhiladelphiaPennsylvaniaUSA
| | - Jane Greenberg
- Metadata Research CenterCollege of Computing and Informatics, Drexel UniversityPhiladelphiaPennsylvaniaUSA
| | - L. Charles Bailey
- Department of Pediatrics and Biomedical and Health InformaticsChildren's Hospital of PhiladelphiaPhiladelphiaPennsylvaniaUSA
- Department of PediatricsPerelman School of Medicine, University of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
16
|
McDaniel CC, Chou C. Clinical risk factors and social needs of 30-day readmission among patients with diabetes: A retrospective study of the Deep South. FRONTIERS IN CLINICAL DIABETES AND HEALTHCARE 2022; 3:1050579. [PMID: 36992731 PMCID: PMC10012098 DOI: 10.3389/fcdhc.2022.1050579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 10/10/2022] [Indexed: 03/31/2023]
Abstract
Introduction Evidence is needed for 30-day readmission risk factors (clinical factors and social needs) among patients with diabetes in the Deep South. To address this need, our objectives were to identify risk factors associated with 30-day readmissions among this population and determine the added predictive value of considering social needs. Methods This retrospective cohort study utilized electronic health records from an urban health system in the Southeastern U.S. The unit of analysis was index hospitalization with a 30-day washout period. The index hospitalizations were preceded by a 6-month pre-index period to capture risk factors (including social needs), and hospitalizations were followed 30 days post-discharge to evaluate all-cause readmissions (1=readmission; 0=no readmission). We performed unadjusted (chi-square and student's t-test, where applicable) and adjusted analyses (multiple logistic regression) to predict 30-day readmissions. Results A total of 26,332 adults were retained in the study population. Eligible patients contributed a total of 42,126 index hospitalizations, and the readmission rate was 15.21%. Risk factors associated with 30-day readmissions included demographics (e.g., age, race/ethnicity, insurance), characteristics of hospitalizations (e.g., admission type, discharge status, length of stay), labs and vitals (e.g., highest and lowest blood glucose measurements, systolic and diastolic blood pressure), co-existing chronic conditions, and preadmission antihyperglycemic medication use. In univariate analyses of social needs, activities of daily living (p<0.001), alcohol use (p<0.001), substance use (p=0.002), smoking/tobacco use (p<0.001), employment status (p<0.001), housing stability (p<0.001), and social support (p=0.043) were significantly associated with readmission status. In the sensitivity analysis, former alcohol use was significantly associated with higher odds of readmission compared to no alcohol use [aOR (95% CI): 1.121 (1.008-1.247)]. Conclusions Clinical assessment of readmission risk in the Deep South should consider patients' demographics, characteristics of hospitalizations, labs, vitals, co-existing chronic conditions, preadmission antihyperglycemic medication use, and social need (i.e., former alcohol use). Factors associated with readmission risk can help pharmacists and other healthcare providers identify high-risk patient groups for all-cause 30-day readmissions during transitions of care. Further research is needed about the influence of social needs on readmissions among populations with diabetes to understand the potential clinical utility of incorporating social needs into clinical services.
Collapse
Affiliation(s)
- Cassidi C. McDaniel
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL, United States
| | - Chiahung Chou
- Department of Health Outcomes Research and Policy, Harrison College of Pharmacy, Auburn University, Auburn, AL, United States
- Department of Medical Research, China Medical University Hospital, Taichung, Taiwan
- *Correspondence: Chiahung Chou,
| |
Collapse
|
17
|
Fränti P, Sieranoja S, Wikström K, Laatikainen T. Clustering Diagnoses from 58M Patient Visits in Finland 2015–2018 (Preprint). JMIR Med Inform 2021; 10:e35422. [PMID: 35507390 PMCID: PMC9118010 DOI: 10.2196/35422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 02/25/2022] [Accepted: 03/02/2022] [Indexed: 12/21/2022] Open
Affiliation(s)
- Pasi Fränti
- Machine Learning Group, School of Computing, University of Eastern Finland, Joensuu, Finland
| | - Sami Sieranoja
- Machine Learning Group, School of Computing, University of Eastern Finland, Joensuu, Finland
| | - Katja Wikström
- Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland
- The Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Tiina Laatikainen
- Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland
- The Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
| |
Collapse
|
18
|
Erandathi M, Chung Wang WY, Hsieh CC. Clustering the countries for quantifying the status of Covid-19 through time series analysis. INFORMATION DISCOVERY AND DELIVERY 2021. [DOI: 10.1108/idd-03-2021-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
This study aims to use financial stability and health facilities of countries, to cluster them for making a more consensus environment for manifesting the status of Covid-19 in a justifiable manner. The scarcity of the categorisation of the countries of the world in a common platform, and the requirement of manifesting the pandemic status such as Covid-19 in a justifiable manner create the demanding requirement. This study mainly focusses on assisting to generate a liable manifesto to criticise the span of viral infection of the severe acute respiratory syndrome coronavirus-2 over the globe.
Design/methodology/approach
Data for this study has been gathered from official websites of the World Bank, and the world in data. The Louvain clustering method has been used to cluster the countries based on their financial strength and health facilities. The resulted clusters are visualised using Silhouette plots. The anomalies of the clusters had been used to quantify the pandemic situation. The status of Covid-19 has been manifested with the time series analysis through python programming.
Findings
The countries of the world have been clustered into seven, where developed countries divided into three clusters and the countries with transition economies and developing clustered together into four clusters. The time series analysis of recognised anomalies of the clusters assist to monitor the government responses and analyse the efficiency of used safety measures against the pandemic.
Originality/value
This study’s resulted clusters are highly valuable as a division of countries of the whole world for evaluating the health systems and for the regional levels. Further, the results of time series analysis are beneficial in monitoring the government responses and analysing the efficiency of used safety measures against the pandemic.
Collapse
|
19
|
Churová V, Vyškovský R, Maršálová K, Kudláček D, Schwarz D. Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study. JMIR Med Inform 2021; 9:e27172. [PMID: 33851576 PMCID: PMC8140384 DOI: 10.2196/27172] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 04/01/2021] [Accepted: 04/12/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Statistical analysis, which has become an integral part of evidence-based medicine, relies heavily on data quality that is of critical importance in modern clinical research. Input data are not only at risk of being falsified or fabricated, but also at risk of being mishandled by investigators. OBJECTIVE The urgent need to assure the highest data quality possible has led to the implementation of various auditing strategies designed to monitor clinical trials and detect errors of different origin that frequently occur in the field. The objective of this study was to describe a machine learning-based algorithm to detect anomalous patterns in data created as a consequence of carelessness, systematic error, or intentionally by entering fabricated values. METHODS A particular electronic data capture (EDC) system, which is used for data management in clinical registries, is presented including its architecture and data structure. This EDC system features an algorithm based on machine learning designed to detect anomalous patterns in quantitative data. The detection algorithm combines clustering with a series of 7 distance metrics that serve to determine the strength of an anomaly. For the detection process, the thresholds and combinations of the metrics were used and the detection performance was evaluated and validated in the experiments involving simulated anomalous data and real-world data. RESULTS Five different clinical registries related to neuroscience were presented-all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric-based classifiers. CONCLUSIONS The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85%.
Collapse
Affiliation(s)
- Vendula Churová
- Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Ltd, Brno, Czech Republic
| | - Roman Vyškovský
- Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Ltd, Brno, Czech Republic
| | | | - David Kudláček
- Institute of Biostatistics and Analyses, Ltd, Brno, Czech Republic
| | - Daniel Schwarz
- Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Institute of Biostatistics and Analyses, Ltd, Brno, Czech Republic
| |
Collapse
|
20
|
Ronzio L, Cabitza F, Barbaro A, Banfi G. Has the Flood Entered the Basement? A Systematic Literature Review about Machine Learning in Laboratory Medicine. Diagnostics (Basel) 2021; 11:372. [PMID: 33671623 PMCID: PMC7926482 DOI: 10.3390/diagnostics11020372] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/08/2021] [Accepted: 02/18/2021] [Indexed: 02/08/2023] Open
Abstract
This article presents a systematic literature review that expands and updates a previous review on the application of machine learning to laboratory medicine. We used Scopus and PubMed to collect, select and analyse the papers published from 2017 to the present in order to highlight the main studies that have applied machine learning techniques to haematochemical parameters and to review their diagnostic and prognostic performance. In doing so, we aim to address the question we asked three years ago about the potential of these techniques in laboratory medicine and the need to leverage a tool that was still under-utilised at that time.
Collapse
Affiliation(s)
- Luca Ronzio
- Department of Informatics, University of Milano-Bicocca, 20126 Milan, Italy;
| | - Federico Cabitza
- Department of Informatics, University of Milano-Bicocca, 20126 Milan, Italy;
| | - Alessandro Barbaro
- IRCCS Istituto Ortopedico Galeazzi, Via Riccardo Galeazzi, 4, 20161 Milan, Italy; (A.B.); (G.B.)
| | - Giuseppe Banfi
- IRCCS Istituto Ortopedico Galeazzi, Via Riccardo Galeazzi, 4, 20161 Milan, Italy; (A.B.); (G.B.)
- School of Medicine, University Vita-Salute San Raffaele, Via Olgettina, 58, 20132 Milan, Italy
| |
Collapse
|
21
|
Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021; 13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations. METHODS A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank. RESULTS Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB. CONCLUSIONS Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.
Collapse
Affiliation(s)
- Jessica K Dennis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Julia M Sealock
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Peter Straub
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Younga H Lee
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Donald Hucks
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ky'Era Actkins
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
| | - Annika Faucon
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yen-Chen Anne Feng
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Tian Ge
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Slavina B Goleva
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Maria Niarchou
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kritika Singh
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Theodore Morley
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jordan W Smoller
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas M Ruderfer
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Jonathan D Mosley
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lea K Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
- Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.
| |
Collapse
|
22
|
Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records. INFORMATICS 2020. [DOI: 10.3390/informatics7020017] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.
Collapse
|