1
|
Converting OMOP CDM to phenopackets: A model alignment and patient data representation evaluation. J Biomed Inform 2024; 155:104659. [PMID: 38777085 DOI: 10.1016/j.jbi.2024.104659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 05/11/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]
Abstract
OBJECTIVE This study aims to promote interoperability in precision medicine and translational research by aligning the Observational Medical Outcomes Partnership (OMOP) and Phenopackets data models. Phenopackets is an expert knowledge-driven schema designed to facilitate the storage and exchange of multimodal patient data, and support downstream analysis. The first goal of this paper is to explore model alignment by characterizing the common data models using a newly developed data transformation process and evaluation method. Second, using OMOP normalized clinical data, we evaluate the mapping of real-world patient data to Phenopackets. We evaluate the suitability of Phenopackets as a patient data representation for real-world clinical cases. METHODS We identified mappings between OMOP and Phenopackets and applied them to a real patient dataset to assess the transformation's success. We analyzed gaps between the models and identified key considerations for transforming data between them. Further, to improve ambiguous alignment, we incorporated Unified Medical Language System (UMLS) semantic type-based filtering to direct individual concepts to their most appropriate domain and conducted a domain-expert evaluation of the mapping's clinical utility. RESULTS The OMOP to Phenopacket transformation pipeline was executed for 1,000 Alzheimer's disease patients and successfully mapped all required entities. However, due to missing values in OMOP for required Phenopacket attributes, 10.2 % of records were lost. The use of UMLS-semantic type filtering for ambiguous alignment of individual concepts resulted in 96 % agreement with clinical thinking, increased from 68 % when mapping exclusively by domain correspondence. CONCLUSION This study presents a pipeline to transform data from OMOP to Phenopackets. We identified considerations for the transformation to ensure data quality, handling restrictions for successful Phenopacket validation and discrepant data formats. We identified unmappable Phenopacket attributes that focus on specialty use cases, such as genomics or oncology, which OMOP does not currently support. We introduce UMLS semantic type filtering to resolve ambiguous alignment to Phenopacket entities to be most appropriate for real-world interpretation. We provide a systematic approach to align OMOP and Phenopackets schemas. Our work facilitates future use of Phenopackets in clinical applications by addressing key barriers to interoperability when deriving a Phenopacket from real-world patient data.
Collapse
|
2
|
Why do probabilistic clinical models fail to transport between sites. NPJ Digit Med 2024; 7:53. [PMID: 38429353 PMCID: PMC10907678 DOI: 10.1038/s41746-024-01037-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/14/2024] [Indexed: 03/03/2024] Open
Abstract
The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we argue that we should typically expect this failure to transport, and we present common sources for it, divided into those under the control of the experimenter and those inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of probabilistic clinical models.
Collapse
|
3
|
Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records. Front Endocrinol (Lausanne) 2024; 15:1298628. [PMID: 38356959 PMCID: PMC10866556 DOI: 10.3389/fendo.2024.1298628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 01/08/2024] [Indexed: 02/16/2024] Open
Abstract
Introduction Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. Methods This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. Results We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. Conclusion Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.
Collapse
|
4
|
Around the EQUATOR with Clin-STAR: Prediction modeling opportunities and challenges in aging research. J Am Geriatr Soc 2023. [PMID: 38032070 DOI: 10.1111/jgs.18704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/16/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023]
Abstract
The 2015 Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement was published to improve reporting transparency for prediction modeling studies. The objective of this review is to highlight methodologic challenges that aging-focused researchers will encounter when designing and reporting studies involving prediction models for older adults and provide guidance for addressing these challenges. In following the 22-item TRIPOD checklist, researchers must consider the representativeness of cohorts used (e.g., whether older adults with frailty, cognitive impairment, and social isolation were included), strategies for incorporating common geriatric predictors (e.g., age, comorbidities, functional status, and frailty), methods for handling missing data and competing risk of death, and assessment of model performance heterogeneity across important subgroups (e.g., age, sex, race, and ethnicity). We provide guidance to help aging-focused researchers develop, validate, and report models that can inform and improve patient care, which we label "TRIPOD-65."
Collapse
|
5
|
Making the Improbable Possible: Generalizing Models Designed for a Syndrome-Based, Heterogeneous Patient Landscape. Crit Care Clin 2023; 39:751-768. [PMID: 37704338 PMCID: PMC10758922 DOI: 10.1016/j.ccc.2023.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
Syndromic conditions, such as sepsis, are commonly encountered in the intensive care unit. Although these conditions are easy for clinicians to grasp, these conditions may limit the performance of machine-learning algorithms. Individual hospital practice patterns may limit external generalizability. Data missingness is another barrier to optimal algorithm performance and various strategies exist to mitigate this. Recent advances in data science, such as transfer learning, conformal prediction, and continual learning, may improve generalizability of machine-learning algorithms in critically ill patients. Randomized trials with these approaches are indicated to demonstrate improvements in patient-centered outcomes at this point.
Collapse
|
6
|
Imputation and missing indicators for handling missing data in the development and deployment of clinical prediction models: A simulation study. Stat Methods Med Res 2023; 32:1461-1477. [PMID: 37105540 PMCID: PMC10515473 DOI: 10.1177/09622802231165001] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Background: In clinical prediction modelling, missing data can occur at any stage of the model pipeline; development, validation or deployment. Multiple imputation is often recommended yet challenging to apply at deployment; for example, the outcome cannot be in the imputation model, as recommended under multiple imputation. Regression imputation uses a fitted model to impute the predicted value of missing predictors from observed data, and could offer a pragmatic alternative at deployment. Moreover, the use of missing indicators has been proposed to handle informative missingness, but it is currently unknown how well this method performs in the context of clinical prediction models. Methods: We simulated data under various missing data mechanisms to compare the predictive performance of clinical prediction models developed using both imputation methods. We consider deployment scenarios where missing data is permitted or prohibited, imputation models that use or omit the outcome, and clinical prediction models that include or omit missing indicators. We assume that the missingness mechanism remains constant across the model pipeline. We also apply the proposed strategies to critical care data. Results: With complete data available at deployment, our findings were in line with existing recommendations; that the outcome should be used to impute development data when using multiple imputation and omitted under regression imputation. When missingness is allowed at deployment, omitting the outcome from the imputation model at the development was preferred. Missing indicators improved model performance in many cases but can be harmful under outcome-dependent missingness. Conclusion: We provide evidence that commonly taught principles of handling missing data via multiple imputation may not apply to clinical prediction models, particularly when data can be missing at deployment. We observed comparable predictive performance under multiple imputation and regression imputation. The performance of the missing data handling method must be evaluated on a study-by-study basis, and the most appropriate strategy for handling missing data at development should consider whether missing data are allowed at deployment. Some guidance is provided.
Collapse
|
7
|
Analysis of Medico-Legal Complaint Data: A Retrospective Study of Three Large Italian University Hospitals. Healthcare (Basel) 2023; 11:healthcare11101406. [PMID: 37239691 DOI: 10.3390/healthcare11101406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 05/01/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
(1) Background: Identifying hospital-related critical, and excellent, areas represents the main goal of this paper, in both a national and local setting. Information was collected and organized for an internal company's reports, regarding civil litigation that has been affecting the hospital, to relate the obtained results with the phenomenon of medical malpractice on a national scale. This is for the development of targeted improvement strategies, and for investing available resources in a proficient way. (2) Methods: In the present study, data from claims management in Umberto I General Hospital, Agostino Gemelli University Hospital Foundation and Campus Bio-Medico University Hospital Foundation, from 2013 to 2020 were collected. A total of 2098 files were examined, and a set of 13 outcome indicators in the assessment of "quality of care" was proposed. (3) Results: From the total number, only 779 records (37.1%) were attributable to the categories indexable for the present analysis. This data highlights how, following a correct and rigorous categorization of hospital events, it is possible to analyze these medico-legal aspects using a small number of indicators. Furthermore, it is important to consider how a consistent percentage of remaining events was difficult to index, and was also of poor scientific interest. (4) Conclusions: The proposed indicators do not require standards to be compared to, but provide a useful instrument for comparative purposes. In fact, in addition to comparative assessment between different business realities distributed throughout the territory, the use of outcome indicators allows for a longitudinal analysis evaluating the performance of an individual structure over time.
Collapse
|
8
|
Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration. BMJ 2023; 380:e071058. [PMID: 36750236 PMCID: PMC9903176 DOI: 10.1136/bmj-2022-071058] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/07/2022] [Indexed: 02/09/2023]
|
9
|
Defining measures of kidney function in observational studies using routine health care data: methodological and reporting considerations. Kidney Int 2023; 103:53-69. [PMID: 36280224 DOI: 10.1016/j.kint.2022.09.020] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 08/31/2022] [Accepted: 09/09/2022] [Indexed: 11/06/2022]
Abstract
The availability of electronic health records and access to a large number of routine measurements of serum creatinine and urinary albumin enhance the possibilities for epidemiologic research in kidney disease. However, the frequency of health care use and laboratory testing is determined by health status and indication, imposing certain challenges when identifying patients with kidney injury or disease, when using markers of kidney function as covariates, or when evaluating kidney outcomes. Depending on the specific research question, this may influence the interpretation, generalizability, and/or validity of study results. This review illustrates the heterogeneity of working definitions of kidney disease in the scientific literature and discusses advantages and limitations of the most commonly used approaches using 3 examples. We summarize ways to identify and overcome possible biases and conclude by proposing a framework for reporting definitions of exposures and outcomes in studies of kidney disease using routinely collected health care data.
Collapse
|
10
|
Randomized Trials With Repeatedly Measured Outcomes: Handling Irregular and Potentially Informative Assessment Times. Epidemiol Rev 2022; 44:121-137. [PMID: 36259969 PMCID: PMC10362939 DOI: 10.1093/epirev/mxac010] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 09/28/2022] [Accepted: 10/12/2022] [Indexed: 12/29/2022] Open
Abstract
Randomized trials are often designed to collect outcomes at fixed points in time after randomization. In practice, the number and timing of outcome assessments can vary among participants (i.e., irregular assessment). In fact, the timing of assessments may be associated with the outcome of interest (i.e., informative assessment). For example, in a trial evaluating the effectiveness of treatments for major depressive disorder, not only did the timings of outcome assessments vary among participants but symptom scores were associated with assessment frequency. This type of informative observation requires appropriate statistical analysis. Although analytic methods have been developed, they are rarely used. In this article, we review the literature on irregular assessments with a view toward developing recommendations for analyzing trials with irregular and potentially informative assessment times. We show how the choice of analytic approach hinges on assumptions about the relationship between the assessment and outcome processes. We argue that irregular assessment should be treated with the same care as missing data, and we propose that trialists adopt strategies to minimize the extent of irregularity; describe the extent of irregularity in assessment times; make their assumptions about the relationships between assessment times and outcomes explicit; adopt analytic techniques that are appropriate to their assumptions; and assess the sensitivity of trial results to their assumptions.
Collapse
|
11
|
Development and validation of a dynamic 48-hour in-hospital mortality risk stratification for COVID-19 in a UK teaching hospital: a retrospective cohort study. BMJ Open 2022; 12:e060026. [PMID: 36691139 PMCID: PMC9445230 DOI: 10.1136/bmjopen-2021-060026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 07/13/2022] [Indexed: 02/02/2023] Open
Abstract
OBJECTIVES To develop a disease stratification model for COVID-19 that updates according to changes in a patient's condition while in hospital to facilitate patient management and resource allocation. DESIGN In this retrospective cohort study, we adopted a landmarking approach to dynamic prediction of all-cause in-hospital mortality over the next 48 hours. We accounted for informative predictor missingness and selected predictors using penalised regression. SETTING All data used in this study were obtained from a single UK teaching hospital. PARTICIPANTS We developed the model using 473 consecutive patients with COVID-19 presenting to a UK hospital between 1 March 2020 and 12 September 2020; and temporally validated using data on 1119 patients presenting between 13 September 2020 and 17 March 2021. PRIMARY AND SECONDARY OUTCOME MEASURES The primary outcome is all-cause in-hospital mortality within 48 hours of the prediction time. We accounted for the competing risks of discharge from hospital alive and transfer to a tertiary intensive care unit for extracorporeal membrane oxygenation. RESULTS Our final model includes age, Clinical Frailty Scale score, heart rate, respiratory rate, oxygen saturation/fractional inspired oxygen ratio, white cell count, presence of acidosis (pH <7.35) and interleukin-6. Internal validation achieved an area under the receiver operating characteristic (AUROC) of 0.90 (95% CI 0.87 to 0.93) and temporal validation gave an AUROC of 0.86 (95% CI 0.83 to 0.88). CONCLUSIONS Our model incorporates both static risk factors (eg, age) and evolving clinical and laboratory data, to provide a dynamic risk prediction model that adapts to both sudden and gradual changes in an individual patient's clinical condition. On successful external validation, the model has the potential to be a powerful clinical risk assessment tool. TRIAL REGISTRATION The study is registered as 'researchregistry5464' on the Research Registry (www.researchregistry.com).
Collapse
|
12
|
Accommodating heterogeneous missing data patterns for prostate cancer risk prediction. BMC Med Res Methodol 2022; 22:200. [PMID: 35864460 PMCID: PMC9306143 DOI: 10.1186/s12874-022-01674-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/04/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND We compared six commonly used logistic regression methods for accommodating missing risk factor data from multiple heterogeneous cohorts, in which some cohorts do not collect some risk factors at all, and developed an online risk prediction tool that accommodates missing risk factors from the end-user. METHODS Ten North American and European cohorts from the Prostate Biopsy Collaborative Group (PBCG) were used for fitting a risk prediction tool for clinically significant prostate cancer, defined as Gleason grade group ≥ 2 on standard TRUS prostate biopsy. One large European PBCG cohort was withheld for external validation, where calibration-in-the-large (CIL), calibration curves, and area-underneath-the-receiver-operating characteristic curve (AUC) were evaluated. Ten-fold leave-one-cohort-internal validation further validated the optimal missing data approach. RESULTS Among 12,703 biopsies from 10 training cohorts, 3,597 (28%) had clinically significant prostate cancer, compared to 1,757 of 5,540 (32%) in the external validation cohort. In external validation, the available cases method that pooled individual patient data containing all risk factors input by an end-user had best CIL, under-predicting risks as percentages by 2.9% on average, and obtained an AUC of 75.7%. Imputation had the worst CIL (-13.3%). The available cases method was further validated as optimal in internal cross-validation and thus used for development of an online risk tool. For end-users of the risk tool, two risk factors were mandatory: serum prostate-specific antigen (PSA) and age, and ten were optional: digital rectal exam, prostate volume, prior negative biopsy, 5-alpha-reductase-inhibitor use, prior PSA screen, African ancestry, Hispanic ethnicity, first-degree prostate-, breast-, and second-degree prostate-cancer family history. CONCLUSION Developers of clinical risk prediction tools should optimize use of available data and sources even in the presence of high amounts of missing data and offer options for users with missing risk factors.
Collapse
|
13
|
An Electronic Health Record-Compatible Model to Predict Personalized Treatment Effects From the Diabetes Prevention Program: A Cross-Evidence Synthesis Approach Using Clinical Trial and Real-World Data. Mayo Clin Proc 2022; 97:703-715. [PMID: 34782125 DOI: 10.1016/j.mayocp.2021.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/30/2021] [Accepted: 09/09/2021] [Indexed: 11/15/2022]
Abstract
OBJECTIVE To develop an electronic health record (EHR)-based risk tool that provides point-of-care estimates of diabetes risk to support targeting interventions to patients most likely to benefit. PATIENTS AND METHODS A risk prediction model was developed and validated in a large observational database of patients with an index visit date between January 1, 2012, and December 31, 2016, with treatment effect estimates from risk-based reanalysis of clinical trial data. The risk model development cohort included 1.1 million patients with prediabetes from the OptumLabs Data Warehouse (OLDW); the validation cohort included a distinct sample of 1.1 million patients in OLDW. The randomly assigned clinical trial cohort included 3081 people from the Diabetes Prevention Program (DPP) study. RESULTS Eleven variables reliably obtainable from the EHR were used to predict diabetes risk. This model validated well in the OLDW (C statistic = 0.76; observed 3-year diabetes rate was 1.8% (95% confidence interval [CI], 1.7 to 1.9) in the lowest-risk quarter and 19.6% (19.4 to 19.8) in the highest-risk quarter). In the DPP, the hazard ratio (HR) for lifestyle modification was constant across all levels of risk (HR, 0.43; 95% CI, 0.35 to 0.53), whereas the HR for metformin was highly risk dependent (HR, 1.1; 95% CI, 0.61 to 2.0 in the lowest-risk quarter vs HR, 0.45; 95% CI, 0.35 to 0.59 in the highest-risk quarter). Fifty-three percent of the benefits of population-wide dissemination of the DPP lifestyle modification and 73% of the benefits of population-wide metformin therapy can be obtained by targeting the highest-risk quarter of patients. CONCLUSION The Tufts-Predictive Analytics and Comparative Effectiveness DPP Risk model is an EHR-compatible tool that might support targeted diabetes prevention to more efficiently realize the benefits of the DPP interventions.
Collapse
|
14
|
Observability and its impact on differential bias for clinical prediction models. J Am Med Inform Assoc 2022; 29:937-943. [PMID: 35211742 PMCID: PMC9006687 DOI: 10.1093/jamia/ocac019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 01/12/2022] [Accepted: 02/01/2022] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Electronic health records have incomplete capture of patient outcomes. We consider the case when observability is differential across a predictor. Including such a predictor (sensitive variable) can lead to algorithmic bias, potentially exacerbating health inequities. MATERIALS AND METHODS We define bias for a clinical prediction model (CPM) as the difference between the true and estimated risk, and differential bias as bias that differs across a sensitive variable. We illustrate the genesis of differential bias via a 2-stage process, where conditional on having the outcome of interest, the outcome is differentially observed. We use simulations and a real-data example to demonstrate the possible impact of including a sensitive variable in a CPM. RESULTS If there is differential observability based on a sensitive variable, including it in a CPM can induce differential bias. However, if the sensitive variable impacts the outcome but not observability, it is better to include it. When a sensitive variable impacts both observability and the outcome no simple recommendation can be provided. We show that one cannot use observed data to detect differential bias. DISCUSSION Our study furthers the literature on observability, showing that differential observability can lead to algorithmic bias. This highlights the importance of considering whether to include sensitive variables in CPMs. CONCLUSION Including a sensitive variable in a CPM depends on whether it truly affects the outcome or just the observability of the outcome. Since this cannot be distinguished with observed data, observability is an implicit assumption of CPMs.
Collapse
|
15
|
Abstract
Electronic health records (EHRs) offer unprecedented opportunities to answer epidemiologic questions. However, unlike in ordinary cohort studies or randomized trials, EHR data are collected somewhat idiosyncratically. In particular, patients who have more contact with the medical system have more opportunities to receive diagnoses, which are then recorded in their EHRs. The goal of this article is to shed light on the nature and scope of this phenomenon, known as informative presence, which can bias estimates of associations. We show how this can be characterized as an instance of misclassification bias. As a consequence, we show that informative presence bias can occur in a broader range of settings than previously thought, and that simple adjustment for the number of visits as a confounder may not fully correct for bias. Additionally, where previous work has considered only underdiagnosis, investigators are often concerned about overdiagnosis; we show how this changes the settings in which bias manifests. We report on a comprehensive series of simulations to shed light on when to expect informative presence bias, how it can be mitigated in some cases, and cases in which new methods need to be developed.
Collapse
|
16
|
Subcategorizing EHR diagnosis codes to improve clinical application of machine learning models. Int J Med Inform 2021; 156:104588. [PMID: 34607290 DOI: 10.1016/j.ijmedinf.2021.104588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/17/2021] [Accepted: 09/19/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic health record (EHR) data is commonly used for secondary purposes such as research and clinical decision support. However, reuse of EHR data presents several challenges including but not limited to identifying all diagnoses associated with a patient's clinical encounter. The purpose of this study was to assess the feasibility of developing a schema to identify and subclassify all structured diagnosis codes for a patient encounter. METHODS To develop a subclassification schema we used EHR data from an interhospital transport data repository that contained complete hospital encounter level data. Eight discrete data sources containing structured diagnosis codes were identified. Diagnosis codes were normalized using the Unified Medical Language System and additional EHR data were combined with standardized terminologies to create and validate the subcategories. We then employed random forest to assess the usefulness of the new subcategorized diagnoses to predict post-interhospital transfer mortality by building 2 models, one using standard diagnosis codes, and one using the new subcategorized diagnosis codes. RESULTS Six subcategories of diagnoses were identified and validated. The subcategories included: primary or admitting diagnoses (10%), past medical, surgical or social history (9%), problem list (20%), comorbidity (24%), discharge diagnoses (6%), and unmapped diagnoses (31%). The subcategorized model outperformed the standard model, achieving a training AUROC of 0.97 versus 0.95 and testing model AUROC of 0.81 versus 0.46. DISCUSSION Our work demonstrates that merging structured diagnosis codes with additional EHR data and secondary data sources provides additional information to understand the role of diagnosis throughout a clinical encounter and improves predictive model performance. Further work is necessary to assess if subcategorizing produces benefits in interpreting the results of prognostic models and/or operationalizing the results in clinical decision support applications.
Collapse
|
17
|
Evaluating the impact of covariate lookback times on performance of patient-level prediction models. BMC Med Res Methodol 2021; 21:180. [PMID: 34454423 PMCID: PMC8403343 DOI: 10.1186/s12874-021-01370-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 08/06/2021] [Indexed: 12/03/2022] Open
Abstract
Background The goal of our study is to examine the impact of the lookback length when engineering features to use in developing predictive models using observational healthcare data. Using a longer lookback for feature engineering gives more insight about patients but increases the issue of left-censoring. Methods We used five US observational databases to develop patient-level prediction models. A target cohort of subjects with hypertensive drug exposures and outcome cohorts of subjects with acute (stroke and gastrointestinal bleeding) and chronic outcomes (diabetes and chronic kidney disease) were developed. Candidate predictors that exist on or prior to the target index date were derived within the following lookback periods: 14, 30, 90, 180, 365, 730, and all days prior to index were evaluated. We predicted the risk of outcomes occurring 1 day until 365 days after index. Ten lasso logistic models for each lookback period were generated to create a distribution of area under the curve (AUC) metrics to evaluate the discriminative performance of the models. Calibration intercept and slope were also calculated. Impact on external validation performance was investigated across five databases. Results The maximum differences in AUCs for the models developed using different lookback periods within a database was < 0.04 for diabetes (in MDCR AUC of 0.593 with 14-day lookback vs. AUC of 0.631 with all-time lookback) and 0.012 for renal impairment (in MDCR AUC of 0.675 with 30-day lookback vs. AUC of 0.687 with 365-day lookback ). For the acute outcomes, the max difference in AUC across lookbacks within a database was 0.015 (in MDCD AUC of 0.767 with 14-day lookback vs. AUC 0.782 with 365-day lookback) for stroke and < 0.03 for gastrointestinal bleeding (in CCAE AUC of 0.631 with 14-day lookback vs. AUC of 0.660 with 730-day lookback). Conclusions In general the choice of covariate lookback had only a small impact on discrimination and calibration, with a short lookback (< 180 days) occasionally decreasing discrimination. Based on the results, if training a logistic regression model for prediction then using covariates with a 365 day lookback appear to be a good tradeoff between performance and interpretation. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01370-2.
Collapse
|