Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Estiri H, Klann JG, Murphy SN. A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak 2019;19:142. [PMID: 31337390 DOI: 10.1186/s12911-019-0852-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 06/26/2019] [Indexed: 12/03/2022] Open

For:	Estiri H, Klann JG, Murphy SN. A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak 2019;19:142. [PMID: 31337390 DOI: 10.1186/s12911-019-0852-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 06/26/2019] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

McDaniel CC, Lo-Ciganic WH, Chou C. Diabetes-related complications, glycemic levels, and healthcare utilization outcomes after therapeutic inertia in type 2 diabetes mellitus. Prim Care Diabetes 2024;18:188-195. [PMID: 38185576 DOI: 10.1016/j.pcd.2023.12.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/09/2024]

Zayed AM, Saegeman V, Delvaux N. Establishing the Reportable Interval for Routine Clinical Laboratory Tests: A Data-Driven Strategy Leveraging Retrospective Electronic Medical Record Data. J Appl Lab Med 2024:jfae021. [PMID: 38642405 DOI: 10.1093/jalm/jfae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 02/05/2024] [Indexed: 04/22/2024]

Abstract

BACKGROUND

This paper presents a data-driven strategy for establishing the reportable interval in clinical laboratory testing. The reportable interval defines the range of laboratory result values beyond which reporting should be withheld. The lack of clear guidelines and methodology for determining the reportable interval has led to potential errors in reporting and patient risk.

METHODS

To address this gap, the study developed an integrated strategy that combines statistical analysis, expert review, and hypothetical outlier calculations. A large data set from an accredited clinical laboratory was utilized, analyzing over 124 million laboratory test records from 916 distinct tests. The Dixon test was applied to identify outliers and establish the highest and lowest non-outlier result values for each test, which were validated by clinical pathology experts. The methodology also included matching the reportable intervals with relevant Logical Observation Identifiers Names and Codes (LOINC) and Unified Code for Units of Measure (UCUM)-valid units for broader applicability.

RESULTS

Upon establishing the reportable interval for 135 routine laboratory tests (493 LOINC codes), we applied these to a primary care laboratory data set of 23 million records, demonstrating their efficacy with over 1% of result records identified as implausible.

CONCLUSIONS

We developed and tested a data-driven strategy for establishing reportable intervals utilizing large electronic medical record (EMR) data sets. Implementing the established interval in clinical laboratory settings can improve autoverification systems, enhance data reliability, and reduce errors in patient care. Ongoing refinement and reporting of cases exceeding the reportable limits will contribute to continuous improvement in laboratory result management and patient safety.

Collapse

Ru B, Sillah A, Desai K, Chandwani S, Yao L, Kothari S. Real-World Data Quality Framework for Oncology Time to Treatment Discontinuation Use Case: Implementation and Evaluation Study. JMIR Med Inform 2024;12:e47744. [PMID: 38446504 PMCID: PMC10955397 DOI: 10.2196/47744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 11/30/2023] [Accepted: 01/14/2024] [Indexed: 03/07/2024] Open

Abstract

BACKGROUND

The importance of real-world evidence is widely recognized in observational oncology studies. However, the lack of interoperable data quality standards in the fragmented health information technology landscape represents an important challenge. Therefore, adopting validated systematic methods for evaluating data quality is important for oncology outcomes research leveraging real-world data (RWD).

OBJECTIVE

This study aims to implement real-world time to treatment discontinuation (rwTTD) for a systemic anticancer therapy (SACT) as a new use case for the Use Case Specific Relevance and Quality Assessment, a framework linking data quality and relevance in fit-for-purpose RWD assessment.

METHODS

To define the rwTTD use case, we mapped the operational definition of rwTTD to RWD elements commonly available from oncology electronic health record-derived data sets. We identified 20 tasks to check the completeness and plausibility of data elements concerning SACT use, line of therapy (LOT), death date, and length of follow-up. Using descriptive statistics, we illustrated how to implement the Use Case Specific Relevance and Quality Assessment on 2 oncology databases (Data sets A and B) to estimate the rwTTD of an SACT drug (target SACT) for patients with advanced head and neck cancer diagnosed on or after January 1, 2015.

RESULTS

A total of 1200 (24.96%) of 4808 patients in Data set A and 237 (5.92%) of 4003 patients in Data set B received the target SACT, suggesting better relevance of the former in estimating the rwTTD of the target SACT. The 2 data sets differed with regard to the terminology used for SACT drugs, LOT format, and target SACT LOT distribution over time. Data set B appeared to have less complete SACT records, longer lags in incorporating the latest data, and incomplete mortality data, suggesting a lack of fitness for estimating rwTTD.

CONCLUSIONS

The fit-for-purpose data quality assessment demonstrated substantial variability in the quality of the 2 real-world data sets. The data quality specifications applied for rwTTD estimation can be expanded to support a broad spectrum of oncology use cases.

Collapse

Lewis AE, Weiskopf N, Abrams ZB, Foraker R, Lai AM, Payne PRO, Gupta A. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc 2023;30:1730-1740. [PMID: 37390812 PMCID: PMC10531113 DOI: 10.1093/jamia/ocad120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/16/2023] [Accepted: 06/23/2023] [Indexed: 07/02/2023] Open

de la Iglesia I, Vivó M, Chocrón P, Maeztu GD, Gojenola K, Atutxa A. An open source corpus and automatic tool for section identification in Spanish health records. J Biomed Inform 2023;145:104461. [PMID: 37536643 DOI: 10.1016/j.jbi.2023.104461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 07/12/2023] [Accepted: 07/25/2023] [Indexed: 08/05/2023]

Baechle C, Lang A, Strassburger K, Kuss O, Burkart V, Szendroedi J, Müssig K, Weber KS, Schrauwen-Hinderling V, Herder C, Roden M, Schlesinger S. Association of a lifestyle score with cardiometabolic markers among individuals with diabetes: a cross-sectional study. BMJ Open Diabetes Res Care 2023;11:e003469. [PMID: 37433698 DOI: 10.1136/bmjdrc-2023-003469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/13/2023] [Indexed: 07/13/2023] Open

Abstract

INTRODUCTION

To investigate the associations of a lifestyle score with various cardiovascular risk markers, indicators for fatty liver disease as well as MRI-determined total, subcutaneous and visceral adipose tissue mass in adults with new-onset diabetes.

RESEARCH DESIGN AND METHODS

This cross-sectional analysis included 196 individuals with type 1 (median age: 35 years; median body mass index (BMI): 24 kg/m²) and 272 with type 2 diabetes (median age: 53 years; median BMI: 31 kg/m²) from the German Diabetes Study. A healthy lifestyle score was generated based on healthy diet, moderate alcohol consumption, recreational activity, non-smoking and non-obese BMI. These factors were summed to form a score ranging from 0 to 5. Multivariable linear and non-linear regression models were used.

RESULTS

In total, 8.1% of the individuals adhered to none or one, 17.7% to two, 29.7% to three, 26.7% to four, and 17.7% to all five favorable lifestyle factors. High compared with low adherence to the lifestyle score was associated with more favorable outcome measures, including triglycerides (β (95% CI) -49.1 mg/dL (-76.7; -21.4)), low-density lipoprotein (-16.7 mg/dL (-31.3; -2.0)), and high-density lipoprotein cholesterol (13.5 mg/dL (7.6; 19.4)), glycated hemoglobin (-0.5% (-0.8%; -0.1%)), high-sensitivity C reactive protein (-0.4 mg/dL (-0.6; -0.2)), as well as lower hepatic fat content (-8.3% (-11.9%; -4.7%)), and visceral adipose tissue mass (-1.8 dm³ (-2.9; -0.7)). The dose-response analyses showed that adherence to every additional healthy lifestyle factor was associated with more beneficial risk profiles.

CONCLUSIONS

Adherence to each additional healthy lifestyle factor was beneficially associated with cardiovascular risk markers, indicators of fatty liver disease and adipose tissue mass. Strongest associations were observed for adherence to all healthy lifestyle factors in combination.

TRIAL REGISTRATION NUMBER

NCT01055093.

Collapse

Affiliation(s)

Christina Baechle Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
Alexander Lang Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
Klaus Strassburger Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany
Oliver Kuss Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Center for Health and Society, Medical Faculty and University Hospital Duesseldorf, Heinrich Heine University, Duesseldorf, Germany
Volker Burkart German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
Julia Szendroedi German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany Internal Medicine I and Clinical Chemistry, University Hospital Heidelberg, Heidelberg, Germany Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
Karsten Müssig German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany Department of Internal Medicine and Gastroenterology, Niels Stensen Hospitals, Franziskus Hospital Harderberg, Georgsmarienhutte, Germany
Katharina Susanne Weber German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany Institute for Epidemiology, Kiel University, Kiel, Germany
Vera Schrauwen-Hinderling German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany
Christian Herder German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
Michael Roden German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany Department of Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine Uinversity, Duesseldorf, Germany
Sabrina Schlesinger Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Duesseldorf, Germany German Center for Diabetes Research (DZD), Partner Duesseldorf, Muenchen-Neuherberg, Germany

Collapse

Röchner P, Rothlauf F. Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries. BMC Med Res Methodol 2023;23:125. [PMID: 37226114 DOI: 10.1186/s12874-023-01946-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open

Abstract

BACKGROUND

Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense.

METHODS

Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a real-world scenario by medical domain experts.

RESULTS

Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified [Formula: see text] of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, [Formula: see text] of the proposed 300 records in each sample were implausible. This corresponds to a precision of [Formula: see text] for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was [Formula: see text] and the sensitivity of FindFPOF was [Formula: see text]. Both anomaly detection methods had a specificity of [Formula: see text]. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample.

CONCLUSIONS

Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.

Collapse

Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023;25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open

Abstract

BACKGROUND

The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact.

OBJECTIVE

The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ?

METHODS

Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework.

RESULTS

The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes.

CONCLUSIONS

The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.

Collapse

Ozonze O, Scott PJ, Hopgood AA. Automating Electronic Health Record Data Quality Assessment. J Med Syst 2023;47:23. [PMID: 36781551 PMCID: PMC9925537 DOI: 10.1007/s10916-022-01892-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/15/2022] [Indexed: 02/15/2023]

A Framework for Automatic Clustering of EHR Messages Using a Spatial Clustering Approach. Healthcare (Basel) 2023;11:healthcare11030390. [PMID: 36766965 PMCID: PMC9914110 DOI: 10.3390/healthcare11030390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/12/2023] [Accepted: 01/16/2023] [Indexed: 02/03/2023] Open

Abstract

Although Health Level Seven (HL 7) message standards (v2, v3, Clinical Document Architecture (CDA)) have been commonly adopted, there are still issues associated with them, especially the semantic interoperability issues and lack of support for smart devices (e.g., smartphones, fitness trackers, and smartwatches), etc. In addition, healthcare organizations in many countries are still using proprietary electronic health record (EHR) message formats, making it challenging to convert to other data formats-particularly the latest HL7 Fast Health Interoperability Resources (FHIR) data standard. The FHIR is based on modern web technologies such as HTTP, XML, and JSON and would be capable of overcoming the shortcomings of the previous standards and supporting modern smart devices. Therefore, the FHIR standard could help the healthcare industry to avail the latest technologies benefits and improve data interoperability. The data representation and mapping from the legacy data standards (i.e., HL7 v2 and EHR) to the FHIR is necessary for the healthcare sector. However, direct data mapping or conversion from the traditional data standards to the FHIR data standard is challenging because of the nature and formats of the data. Therefore, in this article, we propose a framework that aims to convert proprietary EHR messages into the HL7 v2 format and apply an unsupervised clustering approach using the DBSCAN (density-based spatial clustering of applications with noise) algorithm to automatically group a variety of these HL7 v2 messages regardless of their semantic origins. The proposed framework's implementation lays the groundwork to provide a generic mapping model with multi-point and multi-format data conversion input into the FHIR. Our experimental results show the proposed framework's ability to automatically cluster various HL7 v2 message formats and provide analytic insight behind them.

Collapse

Surian D, Wang Y, Coiera E, Magrabi F. Using automated methods to detect safety problems with health information technology: a scoping review. J Am Med Inform Assoc 2023;30:382-392. [PMID: 36374227 PMCID: PMC9846685 DOI: 10.1093/jamia/ocac220] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 10/14/2022] [Accepted: 11/01/2022] [Indexed: 11/16/2022] Open

Detecting anomalous sequences in electronic health records using higher-order tensor networks. J Biomed Inform 2022;135:104219. [DOI: 10.1016/j.jbi.2022.104219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/03/2022] [Indexed: 11/23/2022]

Appelbaum L, Kaplan ID, Palchuk MB, Kundrot S, Winer-Jones JP, Rinard M. Development and Experience with Cancer Risk Prediction Models Using Federated Databases and Electronic Health Records. Digit Health 2022. [DOI: 10.36255/exon-publications-digital-health-federated-databases] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Lee HA, Lin PY, Solomatina AI, Koshevoy IO, Tunik SP, Lin HW, Pan SW, Ho ML. Glucose Sensing in Human Whole Blood Based on Near-Infrared Phosphors and Outlier Treatment with the Programming Language "R". ACS OMEGA 2022;7:198-206. [PMID: 35036691 PMCID: PMC8757351 DOI: 10.1021/acsomega.1c04344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 12/09/2021] [Indexed: 06/14/2023]

Razzaghi H, Greenberg J, Bailey LC. Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use. Learn Health Syst 2022;6:e10264. [PMID: 35036548 PMCID: PMC8753309 DOI: 10.1002/lrh2.10264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 11/10/2022] Open

McDaniel CC, Chou C. Clinical risk factors and social needs of 30-day readmission among patients with diabetes: A retrospective study of the Deep South. FRONTIERS IN CLINICAL DIABETES AND HEALTHCARE 2022;3:1050579. [PMID: 36992731 PMCID: PMC10012098 DOI: 10.3389/fcdhc.2022.1050579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 10/10/2022] [Indexed: 03/31/2023]

Abstract

Introduction

Evidence is needed for 30-day readmission risk factors (clinical factors and social needs) among patients with diabetes in the Deep South. To address this need, our objectives were to identify risk factors associated with 30-day readmissions among this population and determine the added predictive value of considering social needs.

Methods

This retrospective cohort study utilized electronic health records from an urban health system in the Southeastern U.S. The unit of analysis was index hospitalization with a 30-day washout period. The index hospitalizations were preceded by a 6-month pre-index period to capture risk factors (including social needs), and hospitalizations were followed 30 days post-discharge to evaluate all-cause readmissions (1=readmission; 0=no readmission). We performed unadjusted (chi-square and student's t-test, where applicable) and adjusted analyses (multiple logistic regression) to predict 30-day readmissions.

Results

A total of 26,332 adults were retained in the study population. Eligible patients contributed a total of 42,126 index hospitalizations, and the readmission rate was 15.21%. Risk factors associated with 30-day readmissions included demographics (e.g., age, race/ethnicity, insurance), characteristics of hospitalizations (e.g., admission type, discharge status, length of stay), labs and vitals (e.g., highest and lowest blood glucose measurements, systolic and diastolic blood pressure), co-existing chronic conditions, and preadmission antihyperglycemic medication use. In univariate analyses of social needs, activities of daily living (p<0.001), alcohol use (p<0.001), substance use (p=0.002), smoking/tobacco use (p<0.001), employment status (p<0.001), housing stability (p<0.001), and social support (p=0.043) were significantly associated with readmission status. In the sensitivity analysis, former alcohol use was significantly associated with higher odds of readmission compared to no alcohol use [aOR (95% CI): 1.121 (1.008-1.247)].

Conclusions

Clinical assessment of readmission risk in the Deep South should consider patients' demographics, characteristics of hospitalizations, labs, vitals, co-existing chronic conditions, preadmission antihyperglycemic medication use, and social need (i.e., former alcohol use). Factors associated with readmission risk can help pharmacists and other healthcare providers identify high-risk patient groups for all-cause 30-day readmissions during transitions of care. Further research is needed about the influence of social needs on readmissions among populations with diabetes to understand the potential clinical utility of incorporating social needs into clinical services.

Collapse

Fränti P, Sieranoja S, Wikström K, Laatikainen T. Clustering Diagnoses from 58M Patient Visits in Finland 2015–2018 (Preprint). JMIR Med Inform 2021;10:e35422. [PMID: 35507390 PMCID: PMC9118010 DOI: 10.2196/35422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 02/25/2022] [Accepted: 03/02/2022] [Indexed: 12/21/2022] Open

Erandathi M, Chung Wang WY, Hsieh CC. Clustering the countries for quantifying the status of Covid-19 through time series analysis. INFORMATION DISCOVERY AND DELIVERY 2021. [DOI: 10.1108/idd-03-2021-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose This study aims to use financial stability and health facilities of countries, to cluster them for making a more consensus environment for manifesting the status of Covid-19 in a justifiable manner. The scarcity of the categorisation of the countries of the world in a common platform, and the requirement of manifesting the pandemic status such as Covid-19 in a justifiable manner create the demanding requirement. This study mainly focusses on assisting to generate a liable manifesto to criticise the span of viral infection of the severe acute respiratory syndrome coronavirus-2 over the globe. Design/methodology/approach Data for this study has been gathered from official websites of the World Bank, and the world in data. The Louvain clustering method has been used to cluster the countries based on their financial strength and health facilities. The resulted clusters are visualised using Silhouette plots. The anomalies of the clusters had been used to quantify the pandemic situation. The status of Covid-19 has been manifested with the time series analysis through python programming. Findings The countries of the world have been clustered into seven, where developed countries divided into three clusters and the countries with transition economies and developing clustered together into four clusters. The time series analysis of recognised anomalies of the clusters assist to monitor the government responses and analyse the efficiency of used safety measures against the pandemic. Originality/value This study’s resulted clusters are highly valuable as a division of countries of the whole world for evaluating the health systems and for the regional levels. Further, the results of time series analysis are beneficial in monitoring the government responses and analysing the efficiency of used safety measures against the pandemic. Collapse

Churová V, Vyškovský R, Maršálová K, Kudláček D, Schwarz D. Anomaly Detection Algorithm for Real-World Data and Evidence in Clinical Research: Implementation, Evaluation, and Validation Study. JMIR Med Inform 2021;9:e27172. [PMID: 33851576 PMCID: PMC8140384 DOI: 10.2196/27172] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 04/01/2021] [Accepted: 04/12/2021] [Indexed: 01/23/2023] Open

Abstract

BACKGROUND

Statistical analysis, which has become an integral part of evidence-based medicine, relies heavily on data quality that is of critical importance in modern clinical research. Input data are not only at risk of being falsified or fabricated, but also at risk of being mishandled by investigators.

OBJECTIVE

The urgent need to assure the highest data quality possible has led to the implementation of various auditing strategies designed to monitor clinical trials and detect errors of different origin that frequently occur in the field. The objective of this study was to describe a machine learning-based algorithm to detect anomalous patterns in data created as a consequence of carelessness, systematic error, or intentionally by entering fabricated values.

METHODS

A particular electronic data capture (EDC) system, which is used for data management in clinical registries, is presented including its architecture and data structure. This EDC system features an algorithm based on machine learning designed to detect anomalous patterns in quantitative data. The detection algorithm combines clustering with a series of 7 distance metrics that serve to determine the strength of an anomaly. For the detection process, the thresholds and combinations of the metrics were used and the detection performance was evaluated and validated in the experiments involving simulated anomalous data and real-world data.

RESULTS

Five different clinical registries related to neuroscience were presented-all of them running in the given EDC system. Two of the registries were selected for the evaluation experiments and served also to validate the detection performance on an independent data set. The best performing combination of the distance metrics was that of Canberra, Manhattan, and Mahalanobis, whereas Cosine and Chebyshev metrics had been excluded from further analysis due to the lowest performance when used as single distance metric-based classifiers.

CONCLUSIONS

The experimental results demonstrate that the algorithm is universal in nature, and as such may be implemented in other EDC systems, and is capable of anomalous data detection with a sensitivity exceeding 85%.

Collapse

Ronzio L, Cabitza F, Barbaro A, Banfi G. Has the Flood Entered the Basement? A Systematic Literature Review about Machine Learning in Laboratory Medicine. Diagnostics (Basel) 2021;11:372. [PMID: 33671623 PMCID: PMC7926482 DOI: 10.3390/diagnostics11020372] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/08/2021] [Accepted: 02/18/2021] [Indexed: 02/08/2023] Open

Dennis JK, Sealock JM, Straub P, Lee YH, Hucks D, Actkins K, Faucon A, Feng YCA, Ge T, Goleva SB, Niarchou M, Singh K, Morley T, Smoller JW, Ruderfer DM, Mosley JD, Chen G, Davis LK. Clinical laboratory test-wide association scan of polygenic scores identifies biomarkers of complex disease. Genome Med 2021;13:6. [PMID: 33441150 PMCID: PMC7807864 DOI: 10.1186/s13073-020-00820-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 12/08/2020] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

Clinical laboratory (lab) tests are used in clinical practice to diagnose, treat, and monitor disease conditions. Test results are stored in electronic health records (EHRs), and a growing number of EHRs are linked to patient DNA, offering unprecedented opportunities to query relationships between genetic risk for complex disease and quantitative physiological measurements collected on large populations.

METHODS

A total of 3075 quantitative lab tests were extracted from Vanderbilt University Medical Center's (VUMC) EHR system and cleaned for population-level analysis according to our QualityLab protocol. Lab values extracted from BioVU were compared with previous population studies using heritability and genetic correlation analyses. We then tested the hypothesis that polygenic risk scores for biomarkers and complex disease are associated with biomarkers of disease extracted from the EHR. In a proof of concept analyses, we focused on lipids and coronary artery disease (CAD). We cleaned lab traits extracted from the EHR performed lab-wide association scans (LabWAS) of the lipids and CAD polygenic risk scores across 315 heritable lab tests then replicated the pipeline and analyses in the Massachusetts General Brigham Biobank.

RESULTS

Heritability estimates of lipid values (after cleaning with QualityLab) were comparable to previous reports and polygenic scores for lipids were strongly associated with their referent lipid in a LabWAS. LabWAS of the polygenic score for CAD recapitulated canonical heart disease biomarker profiles including decreased HDL, increased pre-medication LDL, triglycerides, blood glucose, and glycated hemoglobin (HgbA1C) in European and African descent populations. Notably, many of these associations remained even after adjusting for the presence of cardiovascular disease and were replicated in the MGBB.

CONCLUSIONS

Polygenic risk scores can be used to identify biomarkers of complex disease in large-scale EHR-based genomic analyses, providing new avenues for discovery of novel biomarkers and deeper understanding of disease trajectories in pre-symptomatic individuals. We present two methods and associated software, QualityLab and LabWAS, to clean and analyze EHR labs at scale and perform a Lab-Wide Association Scan.

Collapse

Affiliation(s)

Jessica K Dennis Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
Julia M Sealock Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Peter Straub Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Younga H Lee Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Donald Hucks Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Ky'Era Actkins Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Microbiology, Immunology, and Physiology, Meharry Medical College, Nashville, TN, 37232, USA
Annika Faucon Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Yen-Chen Anne Feng Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
Tian Ge Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Slavina B Goleva Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Maria Niarchou Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Kritika Singh Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Theodore Morley Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Jordan W Smoller Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA Department of Psychiatry, Harvard Medical School, Boston, MA, 02115, USA Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
Douglas M Ruderfer Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Jonathan D Mosley Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53706, USA
Lea K Davis Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Departments of Medicine and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. Division of Genetic Medicine, Department of Medicine, Vanderbilt Genetics Institute, Vanderbilt University, 511-A Light Hall, 2215 Garland Ave, Nashville, TN, 37232, USA.

Collapse

Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records. INFORMATICS 2020. [DOI: 10.3390/informatics7020017] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open