1
|
Safety planning intervention and follow-up: A telehealth service model for suicidal individuals in emergency department settings: Study design and protocol. Contemp Clin Trials 2024; 140:107492. [PMID: 38484793 PMCID: PMC11071175 DOI: 10.1016/j.cct.2024.107492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 02/29/2024] [Accepted: 03/05/2024] [Indexed: 03/26/2024]
Abstract
BACKGROUND The Safety Planning Intervention with follow-up services (SPI+) is a promising suicide prevention intervention, yet many Emergency Departments (EDs) lack the resources for adequate implementation. Comprehensive strategies addressing structural and organizational barriers are needed to optimize SPI+ implementation and scale-up. This protocol describes a test of one strategy in which ED staff connect at-risk patients to expert clinicians from a Suicide Prevention Consultation Center (SPCC) via telehealth. METHOD This stepped wedge, cluster-randomized trial compares the effectiveness, implementation, cost, and cost offsets of SPI+ delivered by SPCC clinicians versus ED-based clinicians (enhanced usual care; EUC). Eight EDs will start with EUC and cross over to the SPCC phase. Blocks of two EDs will be randomly assigned to start dates 3 months apart. Approximately 13,320 adults discharged following a suicide-related ED visit will be included; EUC and SPCC samples will comprise patients from before and after SPCC crossover, respectively. Effectiveness data sources are electronic health records, administrative claims, and the National Death Index. Primary effectiveness outcomes are presence of suicidal behavior and number/type of mental healthcare visits and secondary outcomes include number/type of suicide-related acute services 6-months post-discharge. We will use the same data sources to assess cost offsets to gauge SPCC scalability and sustainability. We will examine preliminary implementation outcomes (reach, adoption, fidelity, acceptability, and feasibility) through patient, clinician, and health-system leader interviews and surveys. CONCLUSION If the SPCC demonstrates clinical effectiveness and health system cost reduction, it may be a scalable model for evidence-based suicide prevention in the ED.
Collapse
|
2
|
Neurological diagnoses in hospitalized COVID-19 patients associated with adverse outcomes: A multinational cohort study. PLOS DIGITAL HEALTH 2024; 3:e0000484. [PMID: 38620037 PMCID: PMC11018281 DOI: 10.1371/journal.pdig.0000484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 03/06/2024] [Indexed: 04/17/2024]
Abstract
Few studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries. For adults, we used a federated learning approach whereby we ran Cox proportional hazard models locally at each healthcare system and performed a meta-analysis on the aggregated results to estimate the overall risk of adverse outcomes across our geographically diverse populations. For children, we reported descriptive statistics separately due to their low frequency of neurological involvement and poor outcomes. Among the 106,229 hospitalized COVID-19 patients (104,031 patients ≥18 years; 2,198 patients <18 years, January 2020-October 2021), 15,101 (14%) had at least one CNS diagnosis, while 2,788 (3%) had at least one PNS diagnosis. After controlling for demographics and pre-existing conditions, adults with CNS involvement had longer hospital stay (11 versus 6 days) and greater risk of (Hazard Ratio = 1.78) and faster time to death (12 versus 24 days) than patients with no neurological condition (NNC) during acute COVID-19 hospitalization. Adults with PNS involvement also had longer hospital stay but lower risk of mortality than the NNC group. Although children had a low frequency of neurological involvement during COVID-19 hospitalization, a substantially higher proportion of children with CNS involvement died compared to those with NNC (6% vs 1%). Overall, patients with concurrent CNS manifestation during acute COVID-19 hospitalization faced greater risks for adverse clinical outcomes than patients without any neurological diagnosis. Our global informatics framework using a federated approach (versus a centralized data collection approach) has utility for clinical discovery beyond COVID-19.
Collapse
|
3
|
Realizing the Potential of Social Determinants Data: A Scoping Review of Approaches for Screening, Linkage, Extraction, Analysis and Interventions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.04.24302242. [PMID: 38370703 PMCID: PMC10871446 DOI: 10.1101/2024.02.04.24302242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Social determinants of health (SDoH) like socioeconomics and neighborhoods strongly influence outcomes, yet standardized SDoH data is lacking in electronic health records (EHR), limiting research and care quality. Methods We searched PubMed using keywords "SDOH" and "EHR", underwent title/abstract and full-text screening. Included records were analyzed under five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions. Results We identified 685 articles, of which 324 underwent full review. Key findings include tailored screening instruments implemented across settings, census and claims data linkage providing contextual SDoH profiles, rule-based and neural network systems extracting SDoH from notes using NLP, connections found between SDoH data and healthcare utilization/chronic disease control, and integrated care management programs executed. However, considerable variability persists across data sources, tools, and outcomes. Discussion Despite progress identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical to fulfill the potential of SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Collapse
|
4
|
Developing a Framework to Infer Opioid Use Disorder Severity From Clinical Notes to Inform Natural Language Processing Methods: Characterization Study. JMIR Ment Health 2024; 11:e53366. [PMID: 38224481 PMCID: PMC10825772 DOI: 10.2196/53366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/16/2024] Open
Abstract
BACKGROUND Information regarding opioid use disorder (OUD) status and severity is important for patient care. Clinical notes provide valuable information for detecting and characterizing problematic opioid use, necessitating development of natural language processing (NLP) tools, which in turn requires reliably labeled OUD-relevant text and understanding of documentation patterns. OBJECTIVE To inform automated NLP methods, we aimed to develop and evaluate an annotation schema for characterizing OUD and its severity, and to document patterns of OUD-relevant information within clinical notes of heterogeneous patient cohorts. METHODS We developed an annotation schema to characterize OUD severity based on criteria from the Diagnostic and Statistical Manual of Mental Disorders, 5th edition. In total, 2 annotators reviewed clinical notes from key encounters of 100 adult patients with varied evidence of OUD, including patients with and those without chronic pain, with and without medication treatment for OUD, and a control group. We completed annotations at the sentence level. We calculated severity scores based on annotation of note text with 18 classes aligned with criteria for OUD severity and determined positive predictive values for OUD severity. RESULTS The annotation schema contained 27 classes. We annotated 1436 sentences from 82 patients; notes of 18 patients (11 of whom were controls) contained no relevant information. Interannotator agreement was above 70% for 11 of 15 batches of reviewed notes. Severity scores for control group patients were all 0. Among noncontrol patients, the mean severity score was 5.1 (SD 3.2), indicating moderate OUD, and the positive predictive value for detecting moderate or severe OUD was 0.71. Progress notes and notes from emergency department and outpatient settings contained the most and greatest diversity of information. Substance misuse and psychiatric classes were most prevalent and highly correlated across note types with high co-occurrence across patients. CONCLUSIONS Implementation of the annotation schema demonstrated strong potential for inferring OUD severity based on key information in a small set of clinical notes and highlighting where such information is documented. These advancements will facilitate NLP tool development to improve OUD prevention, diagnosis, and treatment.
Collapse
|
5
|
Patient Phenotyping for Atopic Dermatitis with Transformers and Machine Learning. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.25.23294636. [PMID: 37693571 PMCID: PMC10491276 DOI: 10.1101/2023.08.25.23294636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Background Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research studies into identifying the causes and treatment for this disease has great potential to provide benefit for these individuals. However, AD clinical trial recruitment is a non-trivial task due to variance in diagnostic precision and phenotypic definitions leveraged by different clinicians as well as time spent finding, recruiting, and enrolling patients by clinicians to become study subjects. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment. Objective Our study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD. Methods We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values where each value indicates whether they meet a different criteria for AD diagnosis. Results The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting). Conclusions Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing knowledge discovery of better treatment options for AD.
Collapse
|
6
|
A voice-based digital assistant for intelligent prompting of evidence-based practices during ICU rounds. J Biomed Inform 2023; 146:104483. [PMID: 37657712 PMCID: PMC10591951 DOI: 10.1016/j.jbi.2023.104483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/21/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
OBJECTIVE To evaluate the technical feasibility and potential value of a digital assistant that prompts intensive care unit (ICU) rounding teams to use evidence-based practices based on analysis of their real-time discussions. METHODS We evaluated a novel voice-based digital assistant which audio records and processes the ICU care team's rounding discussions to determine which evidence-based practices are applicable to the patient but have yet to be addressed by the team. The system would then prompt the team to consider indicated but not yet delivered practices, thereby reducing cognitive burden compared to traditional rigid rounding checklists. In a retrospective analysis, we applied automatic transcription, natural language processing, and a rule-based expert system to generate personalized prompts for each patient in 106 audio-recorded ICU rounding discussions. To assess technical feasibility, we compared the system's prompts to those created by experienced critical care nurses who directly observed rounds. To assess potential value, we also compared the system's prompts to a hypothetical paper checklist containing all evidence-based practices. RESULTS The positive predictive value, negative predictive value, true positive rate, and true negative rate of the system's prompts were 0.45 ± 0.06, 0.83 ± 0.04, 0.68 ± 0.07, and 0.66 ± 0.04, respectively. If implemented in lieu of a paper checklist, the system would generate 56% fewer prompts per patient, with 50%±17% greater precision. CONCLUSION A voice-based digital assistant can reduce prompts per patient compared to traditional approaches for improving evidence uptake on ICU rounds. Additional work is needed to evaluate field performance and team acceptance.
Collapse
|
7
|
Identifying Barriers to Post-Acute Care Referral and Characterizing Negative Patient Preferences Among Hospitalized Older Adults Using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:606-615. [PMID: 37128417 PMCID: PMC10148308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Our objective was to detect common barriers to post-acute care (B2PAC) among hospitalized older adults using natural language processing (NLP) of clinical notes from patients discharged home when a clinical decision support system recommended post-acute care. We annotated B2PAC sentences from discharge planning notes and developed an NLP classifier to identify the highest-value B2PAC class (negative patient preferences). Thirteen machine learning models were compared with Amazon's AutoGluon deep learning model. The study included 594 acute care notes from 100 patient encounters (1156 sentences contained 11 B2PAC) in a large academic health system. The most frequent and modifiable B2PAC class was negative patient preferences (18.3%). The best supervised model was Extreme Gradient Boosting (F1: 0.859), but the deep learning model performed better (F1: 0.916). Alerting clinicians of negative patient preferences early in the hospitalization can prompt interventions such as patient education to ensure patients receive the right level of care and avoid negative outcomes.
Collapse
|
8
|
Measuring Performance on the ABCDEF Bundle During Interprofessional Rounds via a Nurse-Based Assessment Tool. Am J Crit Care 2023; 32:92-99. [PMID: 36854912 DOI: 10.4037/ajcc2023755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
BACKGROUND Nurse-led rounding checklists are a common strategy for facilitating evidence-based practice in the intensive care unit (ICU). To streamline checklist workflow, some ICUs have the nurse or another individual listen to the conversation and customize the checklist for each patient. Such customizations assume that individuals can reliably assess whether checklist items have been addressed. OBJECTIVE To evaluate whether 1 critical care nurse can reliably assess checklist items on rounds. METHODS Two nurses performed in-person observation of multidisciplinary ICU rounds. Using a standardized paper-based assessment tool, each nurse indicated whether 17 items related to the ABCDEF bundle were discussed during rounds. For each item, generalizability coefficients were used as a measure of reliability, with a single-rater value of 0.70 or greater considered sufficient to support its assessment by 1 nurse. RESULTS The nurse observers assessed 118 patient discussions across 15 observation days. For 11 of 17 items (65%), the generalizability coefficient for a single rater met or exceeded the 0.70 threshold. The generalizability coefficients (95% CIs) of a single rater for key items were as follows: pain, 0.86 (0.74-0.97); delirium score, 0.74 (0.64-0.83); agitation score, 0.72 (0.33-1.00); spontaneous awakening trial, 0.67 (0.49-0.83); spontaneous breathing trial, 0.80 (0.70-0.89); mobility, 0.79 (0.69-0.87); and family (future/past) engagement, 0.82 (0.73-0.90). CONCLUSION Using a paper-based assessment tool, a single trained critical care nurse can reliably assess the discussion of elements of the ABCDEF bundle during multidisciplinary rounds.
Collapse
|
9
|
Informative missingness: What can we learn from patterns in missing laboratory data in the electronic health record? J Biomed Inform 2023; 139:104306. [PMID: 36738870 PMCID: PMC10849195 DOI: 10.1016/j.jbi.2023.104306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/21/2023] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
BACKGROUND In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients. METHODS We collected and analyzed demographic, diagnosis, and laboratory data for 69,939 patients with positive COVID-19 PCR tests across three countries from 1 January 2020 through 30 September 2021. We analyzed missing laboratory measurements across sites, missingness stratification by demographic variables, temporal trends of missingness, correlations between labs based on missingness indicators over time, and clustering of groups of labs based on their missingness/ordering pattern. RESULTS With these analyses, we identified mapping issues faced in seven out of 15 sites. We also identified nuances in data collection and variable definition for the various sites. Temporal trend analyses may support the use of laboratory test result missingness patterns in identifying severe COVID-19 patients. Lastly, using missingness patterns, we determined relationships between various labs that reflect clinical behaviors. CONCLUSION In this work, we use computational approaches to relate missingness patterns to hospital treatment capacity and highlight the heterogeneity of looking at COVID-19 over time and at multiple sites, where there might be different phases, policies, etc. Changes in missingness could suggest a change in a patient's condition, and patterns of missingness among laboratory measurements could potentially identify clinical outcomes. This allows sites to consider missing data as informative to analyses and help researchers identify which sites are better poised to study particular questions.
Collapse
|
10
|
Toward Predicting 30-Day Readmission Among Oncology Patients: Identifying Timely and Actionable Risk Factors. JCO Clin Cancer Inform 2023; 7:e2200097. [PMID: 36809006 PMCID: PMC10476733 DOI: 10.1200/cci.22.00097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/05/2022] [Accepted: 01/13/2023] [Indexed: 02/23/2023] Open
Abstract
PURPOSE Predicting 30-day readmission risk is paramount to improving the quality of patient care. In this study, we compare sets of patient-, provider-, and community-level variables that are available at two different points of a patient's inpatient encounter (first 48 hours and the full encounter) to train readmission prediction models and identify possible targets for appropriate interventions that can potentially reduce avoidable readmissions. METHODS Using electronic health record data from a retrospective cohort of 2,460 oncology patients and a comprehensive machine learning analysis pipeline, we trained and tested models predicting 30-day readmission on the basis of data available within the first 48 hours of admission and from the entire hospital encounter. RESULTS Leveraging all features, the light gradient boosting model produced higher, but comparable performance (area under receiver operating characteristic curve [AUROC]: 0.711) with the Epic model (AUROC: 0.697). Given features in the first 48 hours, the random forest model produces higher AUROC (0.684) than the Epic model (AUROC: 0.676). Both models flagged patients with a similar distribution of race and sex; however, our light gradient boosting and random forest models were more inclusive, flagging more patients among younger age groups. The Epic models were more sensitive to identifying patients with an average lower zip income. Our 48-hour models were powered by novel features at various levels: patient (weight change over 365 days, depression symptoms, laboratory values, and cancer type), hospital (winter discharge and hospital admission type), and community (zip income and marital status of partner). CONCLUSION We developed and validated models comparable with the existing Epic 30-day readmission models with several novel actionable insights that could create service interventions deployed by the case management or discharge planning teams that may decrease readmission rates over time.
Collapse
|
11
|
Acute respiratory distress syndrome after SARS-CoV-2 infection on young adult population: International observational federated study based on electronic health records through the 4CE consortium. PLoS One 2023; 18:e0266985. [PMID: 36598895 DOI: 10.1371/journal.pone.0266985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 11/09/2022] [Indexed: 01/05/2023] Open
Abstract
PURPOSE In young adults (18 to 49 years old), investigation of the acute respiratory distress syndrome (ARDS) after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been limited. We evaluated the risk factors and outcomes of ARDS following infection with SARS-CoV-2 in a young adult population. METHODS A retrospective cohort study was conducted between January 1st, 2020 and February 28th, 2021 using patient-level electronic health records (EHR), across 241 United States hospitals and 43 European hospitals participating in the Consortium for Clinical Characterization of COVID-19 by EHR (4CE). To identify the risk factors associated with ARDS, we compared young patients with and without ARDS through a federated analysis. We further compared the outcomes between young and old patients with ARDS. RESULTS Among the 75,377 hospitalized patients with positive SARS-CoV-2 PCR, 1001 young adults presented with ARDS (7.8% of young hospitalized adults). Their mortality rate at 90 days was 16.2% and they presented with a similar complication rate for infection than older adults with ARDS. Peptic ulcer disease, paralysis, obesity, congestive heart failure, valvular disease, diabetes, chronic pulmonary disease and liver disease were associated with a higher risk of ARDS. We described a high prevalence of obesity (53%), hypertension (38%- although not significantly associated with ARDS), and diabetes (32%). CONCLUSION Trough an innovative method, a large international cohort study of young adults developing ARDS after SARS-CoV-2 infection has been gather. It demonstrated the poor outcomes of this population and associated risk factor.
Collapse
|
12
|
Long-term kidney function recovery and mortality after COVID-19-associated acute kidney injury: An international multi-centre observational cohort study. EClinicalMedicine 2023; 55:101724. [PMID: 36381999 PMCID: PMC9640184 DOI: 10.1016/j.eclinm.2022.101724] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 10/12/2022] [Accepted: 10/12/2022] [Indexed: 11/09/2022] Open
Abstract
Background While acute kidney injury (AKI) is a common complication in COVID-19, data on post-AKI kidney function recovery and the clinical factors associated with poor kidney function recovery is lacking. Methods A retrospective multi-centre observational cohort study comprising 12,891 hospitalized patients aged 18 years or older with a diagnosis of SARS-CoV-2 infection confirmed by polymerase chain reaction from 1 January 2020 to 10 September 2020, and with at least one serum creatinine value 1-365 days prior to admission. Mortality and serum creatinine values were obtained up to 10 September 2021. Findings Advanced age (HR 2.77, 95%CI 2.53-3.04, p < 0.0001), severe COVID-19 (HR 2.91, 95%CI 2.03-4.17, p < 0.0001), severe AKI (KDIGO stage 3: HR 4.22, 95%CI 3.55-5.00, p < 0.0001), and ischemic heart disease (HR 1.26, 95%CI 1.14-1.39, p < 0.0001) were associated with worse mortality outcomes. AKI severity (KDIGO stage 3: HR 0.41, 95%CI 0.37-0.46, p < 0.0001) was associated with worse kidney function recovery, whereas remdesivir use (HR 1.34, 95%CI 1.17-1.54, p < 0.0001) was associated with better kidney function recovery. In a subset of patients without chronic kidney disease, advanced age (HR 1.38, 95%CI 1.20-1.58, p < 0.0001), male sex (HR 1.67, 95%CI 1.45-1.93, p < 0.0001), severe AKI (KDIGO stage 3: HR 11.68, 95%CI 9.80-13.91, p < 0.0001), and hypertension (HR 1.22, 95%CI 1.10-1.36, p = 0.0002) were associated with post-AKI kidney function impairment. Furthermore, patients with COVID-19-associated AKI had significant and persistent elevations of baseline serum creatinine 125% or more at 180 days (RR 1.49, 95%CI 1.32-1.67) and 365 days (RR 1.54, 95%CI 1.21-1.96) compared to COVID-19 patients with no AKI. Interpretation COVID-19-associated AKI was associated with higher mortality, and severe COVID-19-associated AKI was associated with worse long-term post-AKI kidney function recovery. Funding Authors are supported by various funders, with full details stated in the acknowledgement section.
Collapse
|
13
|
SurvMaximin: Robust federated approach to transporting survival risk prediction models. J Biomed Inform 2022; 134:104176. [PMID: 36007785 PMCID: PMC9707637 DOI: 10.1016/j.jbi.2022.104176] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 07/18/2022] [Accepted: 08/15/2022] [Indexed: 10/15/2022]
Abstract
OBJECTIVE For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information. MATERIALS AND METHODS For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population. The target population can be an entire cohort comprised of all centers, corresponding to federated learning, or a single center, corresponding to transfer learning. RESULTS Simulation studies and a real-world international electronic health records application study, with 15 participating health care centers across three countries (France, Germany, and the U.S.), show that the proposed SurvMaximin algorithm achieves comparable or higher accuracy compared with the estimator using only the information of the target site and other existing methods. The SurvMaximin estimator is robust to variations in sample sizes and estimated feature coefficients between centers, which amounts to significantly improved estimates for target sites with fewer observations. CONCLUSIONS The SurvMaximin method is well suited for both federated and transfer learning in the high-dimensional survival analysis setting. SurvMaximin only requires a one-time summary information exchange from participating centers. Estimated regression vectors can be very heterogeneous. SurvMaximin provides robust Cox feature coefficient estimates without outcome information in the target population and is privacy-preserving.
Collapse
|
14
|
Changes in laboratory value improvement and mortality rates over the course of the pandemic: an international retrospective cohort study of hospitalised patients infected with SARS-CoV-2. BMJ Open 2022; 12:e057725. [PMID: 35738646 PMCID: PMC9226470 DOI: 10.1136/bmjopen-2021-057725] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 06/12/2022] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE To assess changes in international mortality rates and laboratory recovery rates during hospitalisation for patients hospitalised with SARS-CoV-2 between the first wave (1 March to 30 June 2020) and the second wave (1 July 2020 to 31 January 2021) of the COVID-19 pandemic. DESIGN, SETTING AND PARTICIPANTS This is a retrospective cohort study of 83 178 hospitalised patients admitted between 7 days before or 14 days after PCR-confirmed SARS-CoV-2 infection within the Consortium for Clinical Characterization of COVID-19 by Electronic Health Record, an international multihealthcare system collaborative of 288 hospitals in the USA and Europe. The laboratory recovery rates and mortality rates over time were compared between the two waves of the pandemic. PRIMARY AND SECONDARY OUTCOME MEASURES The primary outcome was all-cause mortality rate within 28 days after hospitalisation stratified by predicted low, medium and high mortality risk at baseline. The secondary outcome was the average rate of change in laboratory values during the first week of hospitalisation. RESULTS Baseline Charlson Comorbidity Index and laboratory values at admission were not significantly different between the first and second waves. The improvement in laboratory values over time was faster in the second wave compared with the first. The average C reactive protein rate of change was -4.72 mg/dL vs -4.14 mg/dL per day (p=0.05). The mortality rates within each risk category significantly decreased over time, with the most substantial decrease in the high-risk group (42.3% in March-April 2020 vs 30.8% in November 2020 to January 2021, p<0.001) and a moderate decrease in the intermediate-risk group (21.5% in March-April 2020 vs 14.3% in November 2020 to January 2021, p<0.001). CONCLUSIONS Admission profiles of patients hospitalised with SARS-CoV-2 infection did not differ greatly between the first and second waves of the pandemic, but there were notable differences in laboratory improvement rates during hospitalisation. Mortality risks among patients with similar risk profiles decreased over the course of the pandemic. The improvement in laboratory values and mortality risk was consistent across multiple countries.
Collapse
|
15
|
International comparisons of laboratory values from the 4CE collaborative to predict COVID-19 mortality. NPJ Digit Med 2022; 5:74. [PMID: 35697747 PMCID: PMC9192605 DOI: 10.1038/s41746-022-00601-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 03/11/2022] [Indexed: 01/08/2023] Open
Abstract
Given the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates across healthcare systems, countries, and continents. Specifically, we trained a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission. These models were compared at site, country, and continent level. Of the 39,969 hospitalized patients with COVID-19 (68.6% male), 5717 (14.3%) died. In the Cox model, age, albumin, AST, creatine, CRP, and white blood cell count are most predictive of mortality. The baseline covariates are more predictive of mortality during the early days of COVID-19 hospitalization. Models trained at healthcare systems with larger cohort size largely retain good transportability performance when porting to different sites. The combination of routine laboratory test values at admission along with basic demographic features can predict mortality in patients hospitalized with COVID-19. Importantly, this potentially deployable model differs from prior work by demonstrating not only consistent performance but also reliable transportability across healthcare systems in the US and Europe, highlighting the generalizability of this model and the overall approach.
Collapse
|
16
|
Determinants of hospital outcomes for patients with COVID-19 in the University of Pennsylvania Health System. PLoS One 2022; 17:e0268528. [PMID: 35588434 PMCID: PMC9119468 DOI: 10.1371/journal.pone.0268528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 05/02/2022] [Indexed: 11/30/2022] Open
Abstract
There is growing evidence that racial and ethnic minorities bear a disproportionate burden from COVID-19. Temporal changes in the pandemic epidemiology and diversity in the clinical course require careful study to identify determinants of poor outcomes. We analyzed 6255 hospitalized individuals with PCR-confirmed SARS-CoV-2 infection from one of 5 hospitals in the University of Pennsylvania Health System between March 2020 and March 2021, using electronic health records to assess risk factors and outcomes through 8 weeks post-admission. Discharge, readmission and mortality outcomes were analyzed in a multi-state model with multivariable Cox models for each transition. Mortality varied markedly over time, with cumulative incidence (95% CI) 30 days post-admission of 19.1% (16.9, 21.3) in March-April 2020, 5.7% (4.2, 7.5) in July-October 2020 and 10.5% (9.1,12.0) in January-March 2021; 26% of deaths occurred after discharge. Average age (SD) at admission varied from 62.7 (17.6) to 54.8 (19.9) to 60.5 (18.1); mechanical ventilation use declined from 21.3% to 9-11%. Compared to Caucasian, Black race was associated with more severe disease at admission, higher rates of co-morbidities and residing in a low-income zip code. Between-race risk differences in mortality risk diminished in multivariable models; while admitting hospital, increasing age, admission early in the pandemic, and severe disease and low blood pressure at admission were associated with increased mortality hazard. Hispanic ethnicity was associated with fewer baseline co-morbidities and lower mortality hazard (0.57, 95% CI: 0.37, .087). Multi-state modeling allows for a unified framework to analyze multiple outcomes throughout the disease course. Morbidity and mortality for hospitalized COVID-19 patients varied over time but post-discharge mortality remained non-trivial. Black race was associated with more risk factors for morbidity and with treatment at hospitals with lower mortality. Multivariable models suggest there are not between-race differences in outcomes. Future work is needed to better understand the identified between-hospital differences in mortality.
Collapse
|
17
|
Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing. Front Public Health 2022; 10:850619. [PMID: 35615042 PMCID: PMC9124945 DOI: 10.3389/fpubh.2022.850619] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 04/19/2022] [Indexed: 11/25/2022] Open
Abstract
Background Opioid use disorder (OUD) is underdiagnosed in health system settings, limiting research on OUD using electronic health records (EHRs). Medical encounter notes can enrich structured EHR data with documented signs and symptoms of OUD and social risks and behaviors. To capture this information at scale, natural language processing (NLP) tools must be developed and evaluated. We developed and applied an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors, and automated the annotation schema using machine learning and deep learning-based approaches. Methods Using the MIMIC-III Critical Care Database, we queried hospital discharge summaries of patients with International Classification of Diseases (ICD-9) OUD diagnostic codes. We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. Two annotators reviewed discharge summaries from 100 patients. We randomly sampled patients with their associated annotated sentences and divided them into training (66 patients; 2,127 annotated sentences) and testing (29 patients; 1,149 annotated sentences) sets. We used the training set to generate features, employing three NLP algorithms/knowledge sources. We trained and tested prediction models for classification with a traditional machine learner (logistic regression) and deep learning approach (Autogluon based on ELECTRA's replaced token detection model). We applied a five-fold cross-validation approach to reduce bias in performance estimates. Results The resulting annotation schema contained 32 classes. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48 to 66%. Five classes had a sufficient number of annotations for automation; of these, we observed consistently high performance (F1-scores) across training and testing sets for drug screening (training: 91-96; testing: 91-94) and opioid type (training: 86-96; testing: 86-99). Performance dropped from training and to testing sets for other drug use (training: 52-65; testing: 40-48), pain management (training: 72-78; testing: 61-78) and psychiatric (training: 73-80; testing: 72). Autogluon achieved the highest performance. Conclusion This pilot study demonstrated that rich information regarding problematic opioid use can be manually identified by annotators. However, more training samples and features would improve our ability to reliably identify less common classes from clinical text, including text from outpatient settings.
Collapse
|
18
|
Performance of polygenic risk scores for cancer prediction in a racially diverse academic biobank. Genet Med 2022; 24:601-609. [PMID: 34906489 PMCID: PMC9680700 DOI: 10.1016/j.gim.2021.10.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 08/09/2021] [Accepted: 10/22/2021] [Indexed: 01/08/2023] Open
Abstract
PURPOSE Genome-wide association studies have identified hundreds of single nucleotide variations (formerly single nucleotide polymorphisms) associated with several cancers, but the predictive ability of polygenic risk scores (PRSs) is unclear, especially among non-Whites. METHODS PRSs were derived from genome-wide significant single-nucleotide variations for 15 cancers in 20,079 individuals in an academic biobank. We evaluated the improvement in discriminatory accuracy by including cancer-specific PRS in patients of genetically-determined African and European ancestry. RESULTS Among the individuals of European genetic ancestry, PRSs for breast, colon, melanoma, and prostate were significantly associated with their respective cancers. Among the individuals of African genetic ancestry, PRSs for breast, colon, prostate, and thyroid were significantly associated with their respective cancers. The area under the curve of the model consisting of age, sex, and principal components was 0.621 to 0.710, and it increased by 1% to 4% with the inclusion of PRS in individuals of European genetic ancestry. In individuals of African genetic ancestry, area under the curve was overall higher in the model without the PRS (0.723-0.810) but increased by <1% with the inclusion of PRS for most cancers. CONCLUSION PRS moderately increased the ability to discriminate the cancer status in individuals of European but not African ancestry. Further large-scale studies are needed to identify ancestry-specific genetic factors in non-White populations to incorporate PRS into cancer risk assessment.
Collapse
|
19
|
Using Natural Language Processing to Classify Serious Illness Communication with Oncology Patients. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2022:168-177. [PMID: 35854756 PMCID: PMC9285137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
One core measure of healthcare quality set forth by the Institute of Medicine is whether care decisions match patient goals. High-quality "serious illness communication" about patient goals and prognosis is required to support patient-centered decision-making, however current methods are not sensitive enough to measure the quality of this communication or determine whether care delivered matches patient priorities. Natural language processing (NLP) offers an efficient method for identification and evaluation of documented serious illness communication, which could serve as the basis for future quality metrics in oncology and other forms of serious illness. In this study, we trained NLP algorithms to identify and characterize serious illness communication with oncology patients.
Collapse
|
20
|
Authorship Correction: International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res 2021; 23:e34625. [PMID: 34889759 PMCID: PMC8672293 DOI: 10.2196/34625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 11/10/2021] [Indexed: 11/15/2022] Open
|
21
|
Multinational characterization of neurological phenotypes in patients hospitalized with COVID-19. Sci Rep 2021; 11:20238. [PMID: 34642371 PMCID: PMC8510999 DOI: 10.1038/s41598-021-99481-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/23/2021] [Indexed: 01/08/2023] Open
Abstract
Neurological complications worsen outcomes in COVID-19. To define the prevalence of neurological conditions among hospitalized patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test in geographically diverse multinational populations during early pandemic, we used electronic health records (EHR) from 338 participating hospitals across 6 countries and 3 continents (January-September 2020) for a cross-sectional analysis. We assessed the frequency of International Classification of Disease code of neurological conditions by countries, healthcare systems, time before and after admission for COVID-19 and COVID-19 severity. Among 35,177 hospitalized patients with SARS-CoV-2 infection, there was an increase in the proportion with disorders of consciousness (5.8%, 95% confidence interval [CI] 3.7-7.8%, pFDR < 0.001) and unspecified disorders of the brain (8.1%, 5.7-10.5%, pFDR < 0.001) when compared to the pre-admission proportion. During hospitalization, the relative risk of disorders of consciousness (22%, 19-25%), cerebrovascular diseases (24%, 13-35%), nontraumatic intracranial hemorrhage (34%, 20-50%), encephalitis and/or myelitis (37%, 17-60%) and myopathy (72%, 67-77%) were higher for patients with severe COVID-19 when compared to those who never experienced severe COVID-19. Leveraging a multinational network to capture standardized EHR data, we highlighted the increased prevalence of central and peripheral neurological phenotypes in patients hospitalized with COVID-19, particularly among those with severe disease.
Collapse
|
22
|
International Changes in COVID-19 Clinical Trajectories Across 315 Hospitals and 6 Countries: Retrospective Cohort Study. J Med Internet Res 2021; 23:e31400. [PMID: 34533459 PMCID: PMC8510151 DOI: 10.2196/31400] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/02/2021] [Accepted: 09/02/2021] [Indexed: 02/06/2023] Open
Abstract
Background Many countries have experienced 2 predominant waves of COVID-19–related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. Objective In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. Methods Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. Results Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. Conclusions Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.
Collapse
|
23
|
Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc 2021; 28:1411-1420. [PMID: 33566082 PMCID: PMC7928835 DOI: 10.1093/jamia/ocab018] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 01/14/2021] [Accepted: 01/29/2021] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVE The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. MATERIALS AND METHODS Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. RESULTS The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. DISCUSSION We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. CONCLUSIONS We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
Collapse
|
24
|
Why Is the Electronic Health Record So Challenging for Research and Clinical Care? Methods Inf Med 2021; 60:32-48. [PMID: 34282602 DOI: 10.1055/s-0041-1731784] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
BACKGROUND The electronic health record (EHR) has become increasingly ubiquitous. At the same time, health professionals have been turning to this resource for access to data that is needed for the delivery of health care and for clinical research. There is little doubt that the EHR has made both of these functions easier than earlier days when we relied on paper-based clinical records. Coupled with modern database and data warehouse systems, high-speed networks, and the ability to share clinical data with others are large number of challenges that arguably limit the optimal use of the EHR OBJECTIVES: Our goal was to provide an exhaustive reference for those who use the EHR in clinical and research contexts, but also for health information systems professionals as they design, implement, and maintain EHR systems. METHODS This study includes a panel of 24 biomedical informatics researchers, information technology professionals, and clinicians, all of whom have extensive experience in design, implementation, and maintenance of EHR systems, or in using the EHR as clinicians or researchers. All members of the panel are affiliated with Penn Medicine at the University of Pennsylvania and have experience with a variety of different EHR platforms and systems and how they have evolved over time. RESULTS Each of the authors has shared their knowledge and experience in using the EHR in a suite of 20 short essays, each representing a specific challenge and classified according to a functional hierarchy of interlocking facets such as usability and usefulness, data quality, standards, governance, data integration, clinical care, and clinical research. CONCLUSION We provide here a set of perspectives on the challenges posed by the EHR to clinical and research users.
Collapse
|
25
|
Effect of Targeted Messaging on Return to In-Person Visits During the COVID-19 Pandemic: A Randomized Clinical Trial. JAMA Netw Open 2021; 4:e2115211. [PMID: 34190999 PMCID: PMC8246312 DOI: 10.1001/jamanetworkopen.2021.15211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 04/27/2021] [Indexed: 11/14/2022] Open
|
26
|
Association of Neighborhood-Level Factors and COVID-19 Infection Patterns in Philadelphia Using Spatial Regression. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:545-554. [PMID: 34457170 PMCID: PMC8378638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
As of August 2020, there were ~6 million COVID-19 cases in the United States of America, resulting in ~200,000 deaths. Informatics approaches are needed to better understand the role of individual and community risk factors for COVID-19. We developed an informatics method to integrate SARS-CoV-2 data with multiple neighborhood-level factors from the American Community Survey and opendataphilly.org. We assessed the spatial association between neighborhood-level factors and the frequency of SARS-CoV-2 positivity, separately across all patients and across asymptomatic patients. We found that neighborhoods with higher proportions of individuals with a high-school degree and/or who were identified as Hispanic/Latinx were more likely to have higher SARS-CoV-2 positivity rates, after adjusting for other neighborhood covariates. Patients from neighborhoods with higher proportions of individuals receiving public assistance and/or identified as White were less likely to test positive for SARS-CoV-2. Our approach and its findings could inform future public health efforts.
Collapse
|
27
|
International Comparisons of Harmonized Laboratory Value Trajectories to Predict Severe COVID-19: Leveraging the 4CE Collaborative Across 342 Hospitals and 6 Countries: A Retrospective Cohort Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2020.12.16.20247684. [PMID: 33564777 PMCID: PMC7872369 DOI: 10.1101/2020.12.16.20247684] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Objectives To perform an international comparison of the trajectory of laboratory values among hospitalized patients with COVID-19 who develop severe disease and identify optimal timing of laboratory value collection to predict severity across hospitals and regions. Design Retrospective cohort study. Setting The Consortium for Clinical Characterization of COVID-19 by EHR (4CE), an international multi-site data-sharing collaborative of 342 hospitals in the US and in Europe. Participants Patients hospitalized with COVID-19, admitted before or after PCR-confirmed result for SARS-CoV-2. Primary and secondary outcome measures Patients were categorized as "ever-severe" or "never-severe" using the validated 4CE severity criteria. Eighteen laboratory tests associated with poor COVID-19-related outcomes were evaluated for predictive accuracy by area under the curve (AUC), compared between the severity categories. Subgroup analysis was performed to validate a subset of laboratory values as predictive of severity against a published algorithm. A subset of laboratory values (CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin) was compared between North American and European sites for severity prediction. Results Of 36,447 patients with COVID-19, 19,953 (43.7%) were categorized as ever-severe. Most patients (78.7%) were 50 years of age or older and male (60.5%). Longitudinal trajectories of CRP, albumin, LDH, neutrophil count, D-dimer, and procalcitonin showed association with disease severity. Significant differences of laboratory values at admission were found between the two groups. With the exception of D-dimer, predictive discrimination of laboratory values did not improve after admission. Sub-group analysis using age, D-dimer, CRP, and lymphocyte count as predictive of severity at admission showed similar discrimination to a published algorithm (AUC=0.88 and 0.91, respectively). Both models deteriorated in predictive accuracy as the disease progressed. On average, no difference in severity prediction was found between North American and European sites. Conclusions Laboratory test values at admission can be used to predict severity in patients with COVID-19. Prediction models show consistency across international sites highlighting the potential generalizability of these models.
Collapse
|
28
|
Multinational Prevalence of Neurological Phenotypes in Patients Hospitalized with COVID-19. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021. [PMID: 33655281 PMCID: PMC7924306 DOI: 10.1101/2021.01.27.21249817] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
OBJECTIVE: Neurological complications can worsen outcomes in COVID-19. We defined the prevalence of a wide range of neurological conditions among patients hospitalized with COVID-19 in geographically diverse multinational populations. METHODS: Using electronic health record (EHR) data from 348 participating hospitals across 6 countries and 3 continents between January and September 2020, we performed a cross-sectional study of hospitalized adult and pediatric patients with a positive SARS-CoV-2 reverse transcription polymerase chain reaction test, both with and without severe COVID-19. We assessed the frequency of each disease category and 3-character International Classification of Disease (ICD) code of neurological diseases by countries, sites, time before and after admission for COVID-19, and COVID-19 severity. RESULTS: Among the 35,177 hospitalized patients with SARS-CoV-2 infection, there was increased prevalence of disorders of consciousness (5.8%, 95% confidence interval [CI]: 3.7%−7.8%, pFDR<.001) and unspecified disorders of the brain (8.1%, 95%CI: 5.7%−10.5%, pFDR<.001), compared to pre-admission prevalence. During hospitalization, patients who experienced severe COVID-19 status had 22% (95%CI: 19%−25%) increase in the relative risk (RR) of disorders of consciousness, 24% (95%CI: 13%−35%) increase in other cerebrovascular diseases, 34% (95%CI: 20%−50%) increase in nontraumatic intracranial hemorrhage, 37% (95%CI: 17%−60%) increase in encephalitis and/or myelitis, and 72% (95%CI: 67%−77%) increase in myopathy compared to those who never experienced severe disease. INTERPRETATION: Using an international network and common EHR data elements, we highlight an increase in the prevalence of central and peripheral neurological phenotypes in patients hospitalized with SARS-CoV-2 infection, particularly among those with severe disease.
Collapse
|
29
|
A Preliminary Characterization of Canonicalized and Non-Canonicalized Section Headers Across Variable Clinical Note Types. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:1268-1276. [PMID: 33936503 PMCID: PMC8075444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In the electronic health record, the majority of clinically relevant information is stored within clinical notes. Most clinical notes follow a set organizational structure composed of canonicalized section headers that facilitate clinical review and information gathering. Standardized section header terminologies such as the SecTag terminology permit the identification and standardization of headers to a canonicalized form. Although the SecTag terminology has been evaluated extensively for history & physical notes, the coverage of canonical section header terms has not been assessed across other note types. For this pilot study, we conducted a coverage study and characterization of canonical section headers across 5 common, clinical note types and a generalizability study of canonical section headers detected within two types of clinical notes from Penn Medicine.
Collapse
|
30
|
Annotation and extraction of age and temporally-related events from clinical histories. BMC Med Inform Decis Mak 2020; 20:338. [PMID: 33380319 PMCID: PMC7772895 DOI: 10.1186/s12911-020-01333-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 11/16/2020] [Indexed: 01/13/2023] Open
Abstract
Background Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient’s disease risk, understanding disease progression, and studying therapeutic outcomes. However, details of age and temporally-specified clinical events are not well captured, consistently codified, and readily available to research databases for study. Methods We expanded upon existing annotation schemes to capture additional age and temporal information, conducted an annotation study to validate our expanded schema, and developed a prototypical, rule-based Named Entity Recognizer to extract our novel clinical named entities (NE). The annotation study was conducted on 138 discharge summaries from the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus. In addition to existing NE classes (TIMEX3, SUBJECT_CLASS, DISEASE_DISORDER), our schema proposes 3 additional NEs (AGE, PROCEDURE, OTHER_EVENTS). We also propose new attributes, e.g., “degree_relation” which captures the degree of biological relation for subjects annotated under SUBJECT_CLASS. As a proof of concept, we applied the schema to 49 H&P notes to encode pertinent history information for a lung cancer cohort study. Results An abundance of information was captured under the new OTHER_EVENTS, PROCEDURE and AGE classes, with 23%, 10% and 8% of all annotated NEs belonging to the above classes, respectively. We observed high inter-annotator agreement of >80% for AGE and TIMEX3; the automated NLP system achieved F1 scores of 86% (AGE) and 86% (TIMEX3). Age and temporally-specified mentions within past medical, family, surgical, and social histories were common in our lung cancer data set; annotation is ongoing to support this translational research study. Conclusions Our annotation schema and NLP system can encode historical events from clinical notes to support clinical and translational research studies.
Collapse
|
31
|
Patient Interaction Phenotypes With an Automated Remote Hypertension Monitoring Program and Their Association With Blood Pressure Control: Observational Study. J Med Internet Res 2020; 22:e22493. [PMID: 33270032 PMCID: PMC7746494 DOI: 10.2196/22493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/12/2020] [Accepted: 10/24/2020] [Indexed: 01/26/2023] Open
Abstract
Background Automated texting platforms have emerged as a tool to facilitate communication between patients and health care providers with variable effects on achieving target blood pressure (BP). Understanding differences in the way patients interact with these communication platforms can inform their use and design for hypertension management. Objective Our primary aim was to explore the unique phenotypes of patient interactions with an automated text messaging platform for BP monitoring. Our secondary aim was to estimate associations between interaction phenotypes and BP control. Methods This study was a secondary analysis of data from a randomized controlled trial for adults with poorly controlled hypertension. A total of 201 patients with established primary care were assigned to the automated texting platform; messages exchanged throughout the 4-month program were analyzed. We used the k-means clustering algorithm to characterize two different interaction phenotypes: program conformity and engagement style. First, we identified unique clusters signifying differences in program conformity based on the frequency over time of error alerts, which were generated to patients when they deviated from the requested text message format (eg, ###/## for BP). Second, we explored overall engagement styles, defined by error alerts and responsiveness to text prompts, unprompted messages, and word count averages. Finally, we applied the chi-square test to identify associations between each interaction phenotype and achieving the target BP. Results We observed 3 categories of program conformity based on their frequency of error alerts: those who immediately and consistently submitted texts without system errors (perfect users, 51/201), those who did so after an initial learning period (adaptive users, 66/201), and those who consistently submitted messages generating errors to the platform (nonadaptive users, 38/201). Next, we observed 3 categories of engagement style: the enthusiast, who tended to submit unprompted messages with high word counts (17/155); the student, who inconsistently engaged (35/155); and the minimalist, who engaged only when prompted (103/155). Of all 6 phenotypes, we observed a statistically significant association between patients demonstrating the minimalist communication style (high adherence, few unprompted messages, limited information sharing) and achieving target BP (P<.001). Conclusions We identified unique interaction phenotypes among patients engaging with an automated text message platform for remote BP monitoring. Only the minimalist communication style was associated with achieving target BP. Identifying and understanding interaction phenotypes may be useful for tailoring future automated texting interactions and designing future interventions to achieve better BP control.
Collapse
|
32
|
Identifying risk factors for suicidal ideation across a large community healthcare system. J Affect Disord 2020; 276:1038-1045. [PMID: 32763588 DOI: 10.1016/j.jad.2020.07.047] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Revised: 05/14/2020] [Accepted: 07/05/2020] [Indexed: 11/17/2022]
Abstract
BACKGROUND Suicide is the tenth leading cause of death in the United States. Several studies have leveraged electronic health record (EHR) data to predict suicide risk in veteran and military samples; however, few studies have investigated suicide risk factors in a large-scale community health population. METHODS Clinical data was queried for 9,811 patients from the Penn Medicine Health System who had completed a Patient Health Questionnaire-9 (PHQ-9) documented in the EHR between January 2017 and June 2019. Patient demographics, PHQ-9 scores, and psychiatric comorbidities were extracted from the EHR. Univariate and multivariable logistic regressions were applied to determine significant risk factors associated with suicide ideation responses from the PHQ-9. RESULTS One-quarter (25.8%% of patients endorsed suicide ideation. Univariate analysis found 22 risk factors of suicide ideation. Multivariable logistic regression found significant positive associations (Odds Ratio, (95% Confidence Interval)) with the following: younger ages less than 18 years: 2.1, (1.69, 2.60) and 19-24 years: 1.55, (1.29, 1.87)), single marital status (1.22, (1.08, 1.38)), African American (1.22, (1.08, 1.38)), non-commercial insurance (1.16, (1.03, 1.31)), multiple comorbidities (1 comorbidity (1.65, (1.32, 2.07); 2 comorbidities (2.07, (1.61, 2.64)), 3+ comorbidities (2.49, (1.87, 3.33))), bipolar disorders (Type I: 1.38, (1.14, 1.67) and Type II: 1.94, (1.52, 2.49)), depressive disorders (1.70, (1.49, 1.94)), obsessive compulsive disorder (OCD) (1.43, (1.08, 1.90)), and stress disorders (1.53, (1.33, 1.76)). CONCLUSION Community EHR information can be used to predict suicidal ideation. This information can be used to design tools for identifying patients at risk for suicide in real-time.
Collapse
|
33
|
Risk of Persistent Opioid Use following Major Surgery in Matched Samples of Patients with and without Cancer. Cancer Epidemiol Biomarkers Prev 2020; 29:2126-2133. [PMID: 32859580 DOI: 10.1158/1055-9965.epi-20-0628] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 07/14/2020] [Accepted: 08/20/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND The opioid crisis has reached epidemic proportions, yet risk of persistent opioid use following curative intent surgery for cancer and factors influencing this risk are not well understood. METHODS We used electronic health record data from 3,901 adult patients who received a prescription for an opioid analgesic related to hysterectomy or large bowel surgery from January 1, 2013, through June 30, 2018. Patients with and without a cancer diagnosis were matched on the basis of demographic, clinical, and procedural variables and compared for persistent opioid use. RESULTS Cancer diagnosis was associated with greater risk for persistent opioid use after hysterectomy [18.9% vs. 9.6%; adjusted OR (aOR), 2.26; 95% confidence interval (CI), 1.38-3.69; P = 0.001], but not after large bowel surgery (28.3% vs. 24.1%; aOR 1.25; 95% CI, 0.97-1.59; P = 0.09). In the cancer hysterectomy cohort, persistent opioid use was associated with cancer stage (increased rates among those with stage III cancer compared with stage I) and use of neoadjuvant or adjuvant chemotherapy; however, these factors were not associated with persistent opioid use in the large bowel cohort. CONCLUSIONS Patients with cancer may have an increased risk of persistent opioid use following hysterectomy. IMPACT Risks and benefits of opioid analgesia for surgical pain among patients with cancer undergoing hysterectomy should be carefully considered.
Collapse
|
34
|
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020. [PMID: 32864472 DOI: 10.1101/2020.04.13.20059691v5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
Collapse
|
35
|
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium. NPJ Digit Med 2020; 3:109. [PMID: 32864472 PMCID: PMC7438496 DOI: 10.1038/s41746-020-00308-0] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 06/16/2020] [Indexed: 12/18/2022] Open
Abstract
We leveraged the largely untapped resource of electronic health record data to address critical clinical and epidemiological questions about Coronavirus Disease 2019 (COVID-19). To do this, we formed an international consortium (4CE) of 96 hospitals across five countries (www.covidclinical.net). Contributors utilized the Informatics for Integrating Biology and the Bedside (i2b2) or Observational Medical Outcomes Partnership (OMOP) platforms to map to a common data model. The group focused on temporal changes in key laboratory test values. Harmonized data were analyzed locally and converted to a shared aggregate form for rapid analysis and visualization of regional differences and global commonalities. Data covered 27,584 COVID-19 cases with 187,802 laboratory tests. Case counts and laboratory trajectories were concordant with existing literature. Laboratory tests at the time of diagnosis showed hospital-level differences equivalent to country-level variation across the consortium partners. Despite the limitations of decentralized data generation, we established a framework to capture the trajectory of COVID-19 disease in patients and their response to interventions.
Collapse
|
36
|
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has had a significant impact on population health and wellbeing. Biomedical informatics is central to COVID-19 research efforts and for the delivery of healthcare for COVID-19 patients. Critical to this effort is the participation of informaticians who typically work on other basic science or clinical problems. The goal of this editorial is to highlight some examples of COVID-19 research areas that could benefit from informatics expertise. Each research idea summarizes the COVID-19 application area, followed by an informatics methodology, approach, or technology that could make a contribution. It is our hope that this piece will motivate and make it easy for some informaticians to adopt COVID-19 research projects.
Collapse
|
37
|
A novel tool for standardizing clinical data in a semantically rich model. J Biomed Inform 2020; 112S:100086. [PMID: 34417005 DOI: 10.1016/j.yjbinx.2020.100086] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 09/08/2020] [Accepted: 09/09/2020] [Indexed: 11/18/2022]
Abstract
Standardizing clinical information in a semantically rich data model is useful for promoting interoperability and facilitating high quality research. Semantic Web technologies such as Resource Description Framework can be utilized to their full potential when a model accurately reflects the semantics of the clinical situation it describes. To this end, ontologies that abide by sound organizational principles can be used as the building blocks of a semantically rich model for the storage of clinical data. However, it is a challenge to programmatically define such a model and load data from disparate sources. The PennTURBO Semantic Engine is a tool developed at the University of Pennsylvania that transforms concise RDF data into a source-independent, semantically rich model. This system sources classes from an application ontology and specifically defines how instances of those classes may relate to each other. Additionally, the system defines and executes RDF data transformations by launching dynamically generated SPARQL update statements. The Semantic Engine was designed as a generalizable data standardization tool, and is able to work with various data models and incoming data sources. Its human-readable configuration files can easily be shared between institutions, providing the basis for collaboration on a standard data model.
Collapse
|
38
|
Carnival: A Graph-Based Data Integration and Query Tool to Support Patient Cohort Generation for Clinical Research. Stud Health Technol Inform 2019; 264:35-39. [PMID: 31437880 DOI: 10.3233/shti190178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Clinical research studies often leverage various heterogeneous data sources including patient electronic health record, online survey, and genomic data. We introduce a graph-based, data integration and query tool called Carnival. We demonstrate its powerful ability to unify data from these disparate data sources to create datasets for two studies: prevalence and incidence case/control matches for coronary artery disease and controls for Marfan syndrome. We conclude with future directions for Carnival development.
Collapse
|
39
|
Moonstone: a novel natural language processing system for inferring social risk from clinical narratives. J Biomed Semantics 2019; 10:6. [PMID: 30975223 PMCID: PMC6458709 DOI: 10.1186/s13326-019-0198-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 03/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone, a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information — in this case, housing situation, living alone, and social support — from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. Results An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest (housing situation, living alone, and social support). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 (homeless/marginally housed) to 0.98 (lives at home/not homeless), accuracy scores ranging from 0.63 (lives in facility) to 0.95 (lives alone), and sensitivity (i.e. recall) scores ranging from 0.75 (lives in facility) to 0.97 (lives alone). Conclusions The Moonstone system is — to the best of our knowledge — the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good (lives in facility) to excellent (lives alone) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.
Collapse
|
40
|
Documentation of ENDS Use in the Veterans Affairs Electronic Health Record (2008-2014). Am J Prev Med 2019; 56:474-475. [PMID: 30777165 DOI: 10.1016/j.amepre.2018.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 10/16/2018] [Accepted: 10/17/2018] [Indexed: 10/27/2022]
|
41
|
Preparing next-generation scientists for biomedical big data: artificial intelligence approaches. Per Med 2019; 16:247-257. [PMID: 30760118 DOI: 10.2217/pme-2018-0145] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Personalized medicine is being realized by our ability to measure biological and environmental information about patients. Much of these data are being stored in electronic health records yielding big data that presents challenges for its management and analysis. Here, we review several areas of knowledge that are necessary for next-generation scientists to fully realize the potential of biomedical big data. We begin with an overview of big data and its storage and management. We then review statistics and data science as foundational topics followed by a core curriculum of artificial intelligence, machine learning and natural language processing that are needed to develop predictive models for clinical decision making. We end with some specific training recommendations for preparing next-generation scientists for biomedical big data.
Collapse
|
42
|
Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:173-181. [PMID: 31258969 PMCID: PMC6568127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Background. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). Objective. To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. Methods. Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. Results. For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. Conclusion. NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.
Collapse
|
43
|
Detecting Evidence of Intra-abdominal Surgical Site Infections from Radiology Reports Using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:515-524. [PMID: 29854116 PMCID: PMC5977582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Free-text reports in electronic health records (EHRs) contain medically significant information - signs, symptoms, findings, diagnoses - recorded by clinicians during patient encounters. These reports contain rich clinical information which can be leveraged for surveillance of disease and occurrence of adverse events. In order to gain meaningful knowledge from these text reports to support surveillance efforts, information must first be converted into a structured, computable format. Traditional methods rely on manual review of charts, which can be costly and inefficient. Natural language processing (NLP) methods offer an efficient, alternative approach to extracting the information and can achieve a similar level of accuracy. We developed an NLP system to automatically identify mentions of surgical site infections in radiology reports and classify reports containing evidence of surgical site infections leveraging these mentions. We evaluated our system using a reference standard of reports annotated by domain experts, administrative data generated for each patient encounter, and a machine learning-based approach.
Collapse
|
44
|
Understanding patient satisfaction with received healthcare services: A natural language processing approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:524-533. [PMID: 28269848 PMCID: PMC5333198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Important information is encoded in free-text patient comments. We determine the most common topics in patient comments, design automatic topic classifiers, identify comments ' sentiment, and find new topics in negative comments. Our annotation scheme consisted of 28 topics, with positive and negative sentiment. Within those 28 topics, the seven most frequent accounted for 63% of annotations. For automated topic classification, we developed vocabulary-based and Naive Bayes ' classifiers. For sentiment analysis, another Naive Bayes ' classifier was used. Finally, we used topic modeling to search for unexpected topics within negative comments. The seven most common topics were appointment access, appointment wait, empathy, explanation, friendliness, practice environment, and overall experience. The best F-measures from our classifier were 0.52(NB), 0.57(NB), 0.36(Vocab), 0.74(NB), 0.40(NB), and 0.44(Vocab), respectively. F- scores ranged from 0.16 to 0.74. The sentiment classification F-score was 0.84. Negative comment topic modeling revealed complaints about appointment access, appointment wait, and time spent with physician.
Collapse
|
45
|
Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics 2016; 7:43. [PMID: 27370271 PMCID: PMC4930590 DOI: 10.1186/s13326-016-0084-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 06/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. METHODS In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. RESULTS The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. CONCLUSION Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.
Collapse
|
46
|
Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 2016; 7:26. [PMID: 27175226 PMCID: PMC4863379 DOI: 10.1186/s13326-016-0065-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time. METHODS In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats (structures) and linguistic descriptions (expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText's, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes. RESULTS We observed that most carotid mentions are recorded in prose using categorical expressions, within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently. CONCLUSION We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.
Collapse
|
47
|
Towards a Generalizable Time Expression Model for Temporal Reasoning in Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1252-1259. [PMID: 26958265 PMCID: PMC4765564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurate temporal identification and normalization is imperative for many biomedical and clinical tasks such as generating timelines and identifying phenotypes. A major natural language processing challenge is developing and evaluating a generalizable temporal modeling approach that performs well across corpora and institutions. Our long-term goal is to create such a model. We initiate our work on reaching this goal by focusing on temporal expression (TIMEX3) identification. We present a systematic approach to 1) generalize existing solutions for automated TIMEX3 span detection, and 2) assess similarities and differences by various instantiations of TIMEX3 models applied on separate clinical corpora. When evaluated on the 2012 i2b2 and the 2015 Clinical TempEval challenge corpora, our conclusion is that our approach is successful - we achieve competitive results for automated classification, and we identify similarities and differences in TIMEX3 modeling that will be informative in the development of a simplified, general temporal model.
Collapse
|
48
|
Semantic annotation of clinical events for generating a problem list. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:1032-41. [PMID: 24551392 PMCID: PMC3900128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We present a pilot study of an annotation schema representing problems and their attributes, along with their relationship to temporal modifiers. We evaluated the ability for humans to annotate clinical reports using the schema and assessed the contribution of semantic annotations in determining the status of a problem mention as active, inactive, proposed, resolved, negated, or other. Our hypothesis is that the schema captures semantic information useful for generating an accurate problem list. Clinical named entities such as reference events, time points, time durations, aspectual phase, ordering words and their relationships including modifications and ordering relations can be annotated by humans with low to moderate recall. Once identified, most attributes can be annotated with low to moderate agreement. Some attributes - Experiencer, Existence, and Certainty - are more informative than other attributes - Intermittency and Generalized/Conditional - for predicting a problem mention's status. Support vector machine outperformed Naïve Bayes and Decision Tree for predicting a problem's status.
Collapse
|
49
|
|
50
|
Extending the NegEx lexicon for multiple languages. Stud Health Technol Inform 2013; 192:677-681. [PMID: 23920642 PMCID: PMC3923890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We translated an existing English negation lexicon (NegEx) to Swedish, French, and German and compared the lexicon on corpora from each language. We observed Zipf's law for all languages, i.e., a few phrases occur a large number of times, and a large number of phrases occur fewer times. Negation triggers "no" and "not" were common for all languages; however, other triggers varied considerably. The lexicon is available in OWL and RDF format and can be extended to other languages. We discuss the challenges in translating negation triggers to other languages and issues in representing multilingual lexical knowledge.
Collapse
|