1
|
Fleurence RL, Kent S, Adamson B, Tcheng J, Balicer R, Ross JS, Haynes K, Muller P, Campbell J, Bouée-Benhamiche E, García Martí S, Ramsey S. Assessing Real-World Data From Electronic Health Records for Health Technology Assessment: The SUITABILITY Checklist: A Good Practices Report of an ISPOR Task Force. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2024; 27:692-701. [PMID: 38871437 DOI: 10.1016/j.jval.2024.01.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 01/23/2024] [Indexed: 06/15/2024]
Abstract
This ISPOR Good Practices report provides a framework for assessing the suitability of electronic health records data for use in health technology assessments (HTAs). Although electronic health record (EHR) data can fill evidence gaps and improve decisions, several important limitations can affect its validity and relevance. The ISPOR framework includes 2 components: data delineation and data fitness for purpose. Data delineation provides a complete understanding of the data and an assessment of its trustworthiness by describing (1) data characteristics; (2) data provenance; and (3) data governance. Fitness for purpose comprises (1) data reliability items, ie, how accurate and complete the estimates are for answering the question at hand and (2) data relevance items, which assess how well the data are suited to answer the particular question from a decision-making perspective. The report includes a checklist specific to EHR data reporting: the ISPOR SUITABILITY Checklist. It also provides recommendations for HTA agencies and policy makers to improve the use of EHR-derived data over time. The report concludes with a discussion of limitations and future directions in the field, including the potential impact from the substantial and rapid advances in the diffusion and capabilities of large language models and generative artificial intelligence. The report's immediate audiences are HTA evidence developers and users. We anticipate that it will also be useful to other stakeholders, particularly regulators and manufacturers, in the future.
Collapse
Affiliation(s)
| | - Seamus Kent
- Erasmus School of Health & Policy Management, Erasmus University, Rotterdam, The Netherlands
| | | | | | | | - Joseph S Ross
- Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kevin Haynes
- Janssen Research and Development, Titusville, NJ, USA
| | - Patrick Muller
- Centre for Guidelines, National Institute for Health and Care Excellence, Manchester or London, England, UK
| | - Jon Campbell
- National Pharmaceutical Council, Washington, DC, USA
| | - Elsa Bouée-Benhamiche
- Public Health and Healthcare Division, Institut National du Cancer, Boulogne-Billancourt, France
| | - Sebastián García Martí
- Health Technology Assessment and Health Economics Department, Institute for Clinical Effectiveness and Health Policy, Buenos Aires, Argentina
| | - Scott Ramsey
- Hutchinson Institute for Cancer Outcomes Research, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| |
Collapse
|
2
|
Wang Z, Song X, Waitman LR, Hyams JS, Denson LA. Fitness-for-use of Retrospective Multicenter Electronic Health Records to Conduct Outcome Analysis for Pediatric Ulcerative Colitis. Medicine (Baltimore) 2024; 103:e37395. [PMID: 38489703 PMCID: PMC10939680 DOI: 10.1097/md.0000000000037395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/11/2023] [Accepted: 02/06/2024] [Indexed: 03/17/2024] Open
Abstract
The use of electronic health records has garnered interest as an approach for conducting innovative outcome research and producing real-world evidence at a reduced cost compared to traditional clinical trials. The study aimed to evaluate the utility of deidentified EHR data from a multicenter research network to identify characteristics associated with treatment escalation (TE) in newly diagnosed pediatric ulcerative colitis patients. EHR data (01/2010-12/2021) from 13 Midwest healthcare systems (Greater Plains Collaborative) were collected for pediatric ulcerative colitis patients. We identified standard treatments, excluded missing initial therapy data, and analyzed the TE and time-to-TE outcomes. The clinical and laboratory characteristics at baseline were extracted. Logistic and Cox models were used, and the missing risk factors were imputed. Machine-learning Bayesian additive regression trees were also utilized to create partial dependence plots for assessing the associations between risk factors and clinical outcomes. A total of 502 eligible pediatric patients (aged 4-17 years) who initiated standard treatment were identified. Among them, 205 out of 502 (41%) experienced TE, with a median (P25, P75) duration of 63 (9, 237) days after the initial treatment. Additionally, 20 out of 509 (4%) patients underwent colectomy (COL) with a median (P25, P75) duration of 80 (3, 205) days. Both multivariable logistic regression and Cox proportional hazards regression demonstrated moderate discriminative power in predicting TE and time-to-TE, respectively. Common positive predictors for both TE and time-to-TE included a high monocyte proportion and elevated platelet counts. Conversely, BMI z-score, albumin, hemoglobin levels, and lymphocyte proportion were negatively associated with both TE and time-to-TE. This study demonstrates that multicenter EHR data can be used to identify a trial-comparable study sample of potentially larger size and to identify clinically meaningful endpoints for conducting outcome analysis and generating real-world evidence.
Collapse
Affiliation(s)
- Zhu Wang
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
| | - Xing Song
- Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri, Columbia, MO
| | - Lemuel R Waitman
- Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri, Columbia, MO
| | | | - Lee A Denson
- Cincinnati Children's Medical Center, Cincinnati, OH
| |
Collapse
|
3
|
Lee S, Roh GH, Kim JY, Ho Lee Y, Woo H, Lee S. Effective data quality management for electronic medical record data using SMART DATA. Int J Med Inform 2023; 180:105262. [PMID: 37871445 DOI: 10.1016/j.ijmedinf.2023.105262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/03/2023] [Accepted: 10/11/2023] [Indexed: 10/25/2023]
Abstract
OBJECTIVES In the medical field, we face many challenges, including the high cost of data collection and processing, difficult standards issues, and complex preprocessing techniques. It is necessary to establish an objective and systematic data quality management system that ensures data reliability, mitigates risks caused by incorrect data, reduces data management costs, and increases data utilization. We introduce the concept of SMART data in a data quality management system and conducted a case study using real-world data on colorectal cancer. METHODS We defined the data quality management system from three aspects (Construction - Operation - Utilization) based on the life cycle of medical data. Based on this, we proposed the "SMART DATA" concept and tested it on colorectal cancer data, which is actual real-world data. RESULTS We define "SMART DATA" as systematized, high-quality data collected based on the life cycle of data construction, operation, and utilization through quality control activities for medical data. In this study, we selected a scenario using data on colorectal cancer patients from a single medical institution provided by the Clinical Oncology Network (CONNECT). As SMART DATA, we curated 1,724 learning data and 27 Clinically Critical Set (CCS) data for colorectal cancer prediction. These datasets contributed to the development and fine-tuning of the colorectal cancer prediction model, and it was determined that CCS cases had unique characteristics and patterns that warranted additional clinical review and consideration in the context of colorectal cancer prediction. CONCLUSIONS In this study, we conducted primary research to develop a medical data quality management system. This will standardize medical data extraction and quality control methods and increase the utilization of medical data. Ultimately, we aim to provide an opportunity to develop a medical data quality management methodology and contribute to the establishment of a medical data quality management system.
Collapse
Affiliation(s)
- Seunghee Lee
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, 35365, Republic of Korea
| | - Gyun-Ho Roh
- Biomedical Research Institute, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jong-Yeup Kim
- Healthcare Data Science Center, Konyang University Hospital, Daejeon, 35365, Republic of Korea; Department of Biomedical Informatics, College of Medicine, Konyang University, Daejeon, 35365, Republic of Korea
| | - Young Ho Lee
- Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea
| | - Hyekyung Woo
- Department of Health Administration, Kongju National University, Kongju, 32588, Republic of Korea.
| | - Suehyun Lee
- Department of Computer Engineering, Gachon University, Seongnam, Republic of Korea.
| |
Collapse
|
4
|
Lewis AE, Weiskopf N, Abrams ZB, Foraker R, Lai AM, Payne PRO, Gupta A. Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc 2023; 30:1730-1740. [PMID: 37390812 PMCID: PMC10531113 DOI: 10.1093/jamia/ocad120] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/16/2023] [Accepted: 06/23/2023] [Indexed: 07/02/2023] Open
Abstract
OBJECTIVE We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies. MATERIALS AND METHODS We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process. RESULTS We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology. DISCUSSION There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality. CONCLUSION Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.
Collapse
Affiliation(s)
- Abigail E Lewis
- Division of Computational and Data Sciences, Washington University in St. Louis, St. Louis, Missouri, USA
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Nicole Weiskopf
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Zachary B Abrams
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Randi Foraker
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Albert M Lai
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Philip R O Payne
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Aditi Gupta
- Institute for Informatics, Data Science and Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
5
|
Mohamed Y, Song X, McMahon TM, Sahil S, Zozus M, Wang Z, Waitman LR. Electronic health record data quality variability across a multistate clinical research network. J Clin Transl Sci 2023; 7:e130. [PMID: 37396818 PMCID: PMC10308424 DOI: 10.1017/cts.2023.548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 04/25/2023] [Accepted: 05/05/2023] [Indexed: 07/04/2023] Open
Abstract
Background Electronic health record (EHR) data have many quality problems that may affect the outcome of research results and decision support systems. Many methods have been used to evaluate EHR data quality. However, there has yet to be a consensus on the best practice. We used a rule-based approach to assess the variability of EHR data quality across multiple healthcare systems. Methods To quantify data quality concerns across healthcare systems in a PCORnet Clinical Research Network, we used a previously tested rule-based framework tailored to the PCORnet Common Data Model to perform data quality assessment at 13 clinical sites across eight states. Results were compared with the current PCORnet data curation process to explore the differences between both methods. Additional analyses of testosterone therapy prescribing were used to explore clinical care variability and quality. Results The framework detected discrepancies across sites, revealing evident data quality variability between sites. The detailed requirements encoded the rules captured additional data errors with a specificity that aids in remediation of technical errors compared to the current PCORnet data curation process. Other rules designed to detect logical and clinical inconsistencies may also support clinical care variability and quality programs. Conclusion Rule-based EHR data quality methods quantify significant discrepancies across all sites. Medication and laboratory sources are causes of data errors.
Collapse
Affiliation(s)
- Yahia Mohamed
- University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Xing Song
- University of Missouri School of Medicine, Columbia, MO, USA
| | - Tamara M. McMahon
- University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Suman Sahil
- University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Meredith Zozus
- University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Zhan Wang
- University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | | | - Lemuel R. Waitman
- University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
- University of Missouri School of Medicine, Columbia, MO, USA
| |
Collapse
|
6
|
Ozonze O, Scott PJ, Hopgood AA. Automating Electronic Health Record Data Quality Assessment. J Med Syst 2023; 47:23. [PMID: 36781551 PMCID: PMC9925537 DOI: 10.1007/s10916-022-01892-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/15/2022] [Indexed: 02/15/2023]
Abstract
Information systems such as Electronic Health Record (EHR) systems are susceptible to data quality (DQ) issues. Given the growing importance of EHR data, there is an increasing demand for strategies and tools to help ensure that available data are fit for use. However, developing reliable data quality assessment (DQA) tools necessary for guiding and evaluating improvement efforts has remained a fundamental challenge. This review examines the state of research on operationalising EHR DQA, mainly automated tooling, and highlights necessary considerations for future implementations. We reviewed 1841 articles from PubMed, Web of Science, and Scopus published between 2011 and 2021. 23 DQA programs deployed in real-world settings to assess EHR data quality (n = 14), and a few experimental prototypes (n = 9), were identified. Many of these programs investigate completeness (n = 15) and value conformance (n = 12) quality dimensions and are backed by knowledge items gathered from domain experts (n = 9), literature reviews and existing DQ measurements (n = 3). A few DQA programs also explore the feasibility of using data-driven techniques to assess EHR data quality automatically. Overall, the automation of EHR DQA is gaining traction, but current efforts are fragmented and not backed by relevant theory. Existing programs also vary in scope, type of data supported, and how measurements are sourced. There is a need to standardise programs for assessing EHR data quality, as current evidence suggests their quality may be unknown.
Collapse
Affiliation(s)
- Obinwa Ozonze
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK
| | - Philip J Scott
- Institute of Management and Health, University of Wales Trinity Saint David, Lampeter, SA48 7ED, UK
| | - Adrian A Hopgood
- School of Computing, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, UK.
| |
Collapse
|
7
|
Rossi L, Butler S, Coakley A, Flanagan J. Nursing knowledge captured in electronic health records. Int J Nurs Knowl 2023; 34:72-84. [PMID: 35570416 DOI: 10.1111/2047-3095.12365] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 02/26/2022] [Indexed: 01/11/2023]
Abstract
PURPOSE The purpose of this study was to describe the extent to which nursing assessment data was present in the electronic health record and linked to NANDA-I, NIC, and NOC. METHODS This retrospective review used a descriptive approach to examine documentation in the electronic health records (EHR) of 10 hospitalized patients requiring cardiac surgery. A team of experts applied a Delphi consensus-building process to identify the supports and barriers for nursing documentation. FINDINGS Collection of the health history was organized using Gordon's Functional Health Pattern (FHP) Framework. Seventy-five fields were noted for the entry of nursing assessment data of which 65 focused on health history data and 30 documented physical findings and observations. There were no references to the defining characteristics or etiologies with any of the diagnostic labels used. Care plans included the nursing diagnoses, goals of care, and interventions, although there was a lack of clear alignment between the assessment, NANDA-I, NIC, and NOC and the care plan. Progress note documentation addressed significant events in the patient's clinical course; however, these were not nursing problem or diagnosis focused. Four expert reviewers arrived at consensus regarding the supports and challenges impacting nurses' ability to document data depicting nursing's contribution to care using a FHP and standardized nursing language in the EHR. CONCLUSIONS The EHR provides an opportunity to reflect nursing clinical judgment and make nursing care visible. These findings suggest there are challenges to capturing nurse focused data elements in the EHR. IMPLICATIONS FOR NURSING PRACTICE This work has important implications for clinicians, educators, and administrators alike. EHR systems must accurately capture nurses' contribution to patient care to plan for resource allocation and quality care delivery. Ultimately, the development of standardized data sources reflecting the outcomes of nursing care will expand the opportunities to advance nursing knowledge.
Collapse
Affiliation(s)
- Laura Rossi
- Simmons University Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Shawna Butler
- Massachusetts General Hospital, Boston, Massachusetts, USA.,University of Massachusetts, Boston, Massachusetts, USA
| | | | - Jane Flanagan
- Massachusetts General Hospital, Boston, Massachusetts, USA.,Boston College, Chestnut Hill, Massachusetts, USA
| |
Collapse
|
8
|
Tokede B, Yansane A, White J, Bangar S, Mullins J, Brandon R, Gantela S, Kookal K, Rindal D, Lee CT, Lin GH, Spallek H, Kalenderian E, Walji M. Translating periodontal data to knowledge in a learning health system. J Am Dent Assoc 2022; 153:996-1004. [PMID: 35970673 PMCID: PMC9830777 DOI: 10.1016/j.adaj.2022.06.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 06/07/2022] [Accepted: 06/14/2022] [Indexed: 01/12/2023]
Abstract
BACKGROUND A learning health system (LHS) is a health system in which patients and clinicians work together to choose care on the basis of best evidence and to drive discovery as a natural outgrowth of every clinical encounter to ensure the right care at the right time. An LHS for dentistry is now feasible, as an increased number of oral health care encounters are captured in electronic health records (EHRs). METHODS The authors used EHRs data to track periodontal health outcomes at 3 large dental institutions. The 2 outcomes of interest were a new periodontitis case (for patients who had not received a diagnosis of periodontitis previously) and tooth loss due to progression of periodontal disease. RESULTS The authors assessed a total of 494,272 examinations (new periodontitis outcome: n = 168,442; new tooth loss outcome: n = 325,830), representing a total of 194,984 patients. Dynamic dashboards displaying performance on both measures over time allow users to compare demographic and risk factors for patients. The incidence of new periodontitis and tooth loss was 4.3% and 1.2%, respectively. CONCLUSIONS Periodontal disease, diagnosis, prevention, and treatment are particularly well suited for an LHS model. The results showed the feasibility of automated extraction and interpretation of critical data elements from the EHRs. The 2 outcome measures are being implemented as part of a dental LHS. The authors are using this knowledge to target the main drivers of poorer periodontal outcomes in a specific patient population, and they continue to use clinical health data for the purpose of learning and improvement. PRACTICAL IMPLICATIONS Dental institutions of any size can conduct contemporaneous self-evaluation and immediately implement targeted strategies to improve oral health outcomes.
Collapse
Affiliation(s)
- Bunmi Tokede
- Department of Diagnostic and Biomedical Sciences, The University of Texas Health Science Center at Houston, Houston, TX
| | - Alfa Yansane
- Preventative and Restorative Dental Sciences, School of Dentistry, University of California, San Francisco, San Francisco, CA
| | - Joel White
- Preventative and Restorative Dental Sciences, School of Dentistry, University of California, San Francisco, San Francisco, CA
| | - Suhasini Bangar
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX
| | | | - Ryan Brandon
- Willamette Dental Group and Skourtes Institute, Hillsboro, OR
| | - Swaroop Gantela
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX
| | - Krishna Kookal
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX
| | - Donald Rindal
- HealthPartners Institute, Minneapolis, MN, and an associate dental director for research, HealthPartners Dental Group, Minneapolis, MN
| | - Chun-Teh Lee
- Department of Periodontics and Dental Hygiene, School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX
| | - Guo-Hao Lin
- School of Dentistry, University of California, San Francisco, CA
| | - Heiko Spallek
- The University of Sydney, Sydney, New South Wales, Australia
| | - Elsbeth Kalenderian
- professor, Department of Preventive and Restorative Dental Sciences, School of Dentistry, University of California, San Francisco, San Francisco, CA; a professor, Academic Centre for Dentistry, Amsterdam, The Netherlands; senior lecturer, Harvard School of Dental Medicine, Boston, MA; and an Extraordinary Professor, University of Pretoria School of Dentistry, Pretoria, South Africa
| | - Muhammad Walji
- Diagnostic and Biomedical Sciences Department, School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX
| |
Collapse
|
9
|
McDonald PL, Phillips J, Harwood K, Maring J, van der Wees PJ. Identifying requisite learning health system competencies: a scoping review. BMJ Open 2022; 12:e061124. [PMID: 35998963 PMCID: PMC9403130 DOI: 10.1136/bmjopen-2022-061124] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
OBJECTIVES Learning health systems (LHS) integrate knowledge and practice through cycles of continuous quality improvement and learning to increase healthcare quality. LHS have been conceptualised through multiple frameworks and models. Our aim is to identify and describe the requisite individual competencies (knowledge, skills and attitudes) and system competencies (capacities, characteristics and capabilities) described in existing literature in relation to operationalising LHS. METHODS A scoping review was conducted with descriptive and thematic analysis to identify and map competencies of LHS for individuals/patients, health system workers and systems. Articles until April 2020 were included based on a systematic literature search and selection process. Themes were developed using a consensus process until agreement was reached among team members. RESULTS Eighty-nine articles were included with most studies conducted in the USA (68 articles). The largest number of publications represented competencies at the system level, followed by health system worker competencies. Themes identified at the individual/patient level were knowledge and skills to understand and share information with an established system and the ability to interact with the technology used to collect data. Themes at the health system worker level were skills in evidence-based practice, leadership and teamwork skills, analytical and technological skills required to use a 'digital ecosystem', data-science knowledge and skill and self-reflective capacity. Researchers embedded within LHS require a specific set of competencies. Themes identified at the system level were data, infrastructure and standardisation; integration of data and workflow; and culture and climate supporting ongoing learning. CONCLUSION The identified individual stakeholder competencies within LHS and the system capabilities of LHS provide a solid base for the further development and evaluation of LHS. International collaboration for stimulating LHS will assist in further establishing the knowledge base for LHS.
Collapse
Affiliation(s)
- Paige L McDonald
- Department of Clinical Research and Leadership, The George Washington University, Washington, District of Columbia, USA
| | - Jessica Phillips
- Department of Clinical Research and Leadership, The George Washington University, Washington, District of Columbia, USA
| | - Kenneth Harwood
- College of Health and Education, Marymount University, Arlington, Virginia, USA
| | - Joyce Maring
- Department of Health, Human Function, The George Washington University, Washington, District of Columbia, USA
| | - Philip J van der Wees
- Department of Clinical Research and Leadership, The George Washington University, Washington, District of Columbia, USA
- Rehabilitation and IQ Healthcare, Radboudumc, Nijmegen, The Netherlands
| |
Collapse
|
10
|
Fardal Ø, Skau I, Nevland K, Grytten J. Proposing a model for auditing data quality of long-term periodontal outcome studies. Acta Odontol Scand 2022; 80:374-381. [PMID: 34962852 DOI: 10.1080/00016357.2021.2020895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE The assessment of the success of conventional periodontal therapy is based on retrospective studies from private practice and university clinics. Due to their marked heterogeneity, it is difficult to assess the data quality and rate these studies. The aim is to test a model for auditing and rating the data quality of periodontal outcome studies. METHODS The method was adapted from the NIH Health Care Systems Collaboratory model, which uses three data quality dimensions: completeness (including all the relevant variables), consistency (ensuring that the same variables are compared) and accuracy (proportion of data in error with a gold standard). The model was applied to studies from a Norwegian specialist practice and data from the Norwegian Health database to test if the auditing process was workable using real world data. RESULTS Forty-seven risk and prognostic factors were included for completeness. Seven variables were specified for consistency: tooth loss, smoking, systemic conditions, oral hygiene, individual tooth prognosis, maintenance profiles and timing of extractions. The factors tested showed a 95.7% completeness and an average accuracy deviation from the gold standard of -2.3% for each of the risk/prognostic factors and an overall study score of 93.3%. CONCLUSIONS It was possible to develop a method for auditing and rating the quality of periodontal outcome studies. The model was tested using both real world data including risk and prognostic factors from individual outcome studies and national big data. The application of the model to these sets of data showed a high accuracy of the risk/prognostic factors and a close relationship with national big data.
Collapse
Affiliation(s)
- Øystein Fardal
- Private Practice, Egersund, Norway
- Institute of Education for Medical and Dental Sciences, University of Aberdeen, Aberdeen, UK
- Institute of Community Dentistry, University of Oslo, Oslo, Norway
| | - Irene Skau
- Institute of Community Dentistry, University of Oslo, Oslo, Norway
| | | | - Jostein Grytten
- Institute of Community Dentistry, University of Oslo, Oslo, Norway
- Department of Obstetrics and Gynecology, Institute of Clinical Medicine, Akershus University Hospital, Lørenskog, Norway
| |
Collapse
|
11
|
Williams BA, Voyce S, Sidney S, Roger VL, Plante TB, Larson S, LaMonte MJ, Labarthe DR, DeBarmore BM, Chang AR, Chamberlain AM, Benziger CP. Establishing a National Cardiovascular Disease Surveillance System in the United States Using Electronic Health Record Data: Key Strengths and Limitations. J Am Heart Assoc 2022; 11:e024409. [PMID: 35411783 PMCID: PMC9238467 DOI: 10.1161/jaha.121.024409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cardiovascular disease surveillance involves quantifying the evolving population-level burden of cardiovascular outcomes and risk factors as a data-driven initial step followed by the implementation of interventional strategies designed to alleviate this burden in the target population. Despite widespread acknowledgement of its potential value, a national surveillance system dedicated specifically to cardiovascular disease does not currently exist in the United States. Routinely collected health care data such as from electronic health records (EHRs) are a possible means of achieving national surveillance. Accordingly, this article elaborates on some key strengths and limitations of using EHR data for establishing a national cardiovascular disease surveillance system. Key strengths discussed include the: (1) ubiquity of EHRs and consequent ability to create a more "national" surveillance system, (2) existence of a common data infrastructure underlying the health care enterprise with respect to data domains and the nomenclature by which these data are expressed, (3) longitudinal length and detail that define EHR data when individuals repeatedly patronize a health care organization, and (4) breadth of outcomes capable of being surveilled with EHRs. Key limitations discussed include the: (1) incomplete ascertainment of health information related to health care-seeking behavior and the disconnect of health care data generated at separate health care organizations, (2) suspect data quality resulting from the default information-gathering processes within the clinical enterprise, (3) questionable ability to surveil patients through EHRs in the absence of documented interactions, and (4) the challenge in interpreting temporal trends in health metrics, which can be obscured by changing clinical and administrative processes.
Collapse
|
12
|
Diaz-Garelli F, Long A, Bancks MP, Bertoni AG, Narayanan A, Wells BJ. Developing a Data Quality Standard Primer for Cardiovascular Risk Assessment from Electronic Health Record Data Using the DataGauge Process. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:388-397. [PMID: 35308992 PMCID: PMC8861746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The learning health systems aim to support the needs of patients with chronic diseases, which require methods that account for electronic health recorded (EHR) data limitations. EHR data is often used to calculate cardiovascular risk scores. However, it is unclear whether EHR data presents high enough quality to provide accurate estimates. Still, there is currently no open standard available to assess data quality for such applications. We applied the DataGauge process to develop a data quality standard based on expert clinical, analytical and informatics knowledge by conducting four interviews and one focus group that produced 61 individual data quality requirements. These requirements covered all standard data quality dimensions and uncovered 705 quality issues in EHR data for 456 patients. These requirements will be expanded and further validated in future work. Our work initiates the development of open and explicit data quality standards for specific secondary uses of clinical data.
Collapse
Affiliation(s)
| | - Andrew Long
- University of North Carolina at Charlotte. Charlotte NC
| | | | | | | | | |
Collapse
|
13
|
Applying Machine Learning in Distributed Data Networks for Pharmacoepidemiologic and Pharmacovigilance Studies: Opportunities, Challenges, and Considerations. Drug Saf 2022; 45:493-510. [PMID: 35579813 PMCID: PMC9112258 DOI: 10.1007/s40264-022-01158-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2022] [Indexed: 01/28/2023]
Abstract
Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.
Collapse
|
14
|
Koscielniak N, Piatt G, Friedman C, Vinson A, Richesson R, Tucker C. Development of a standards-based phenotype model for gross motor function to support learning health systems in pediatric rehabilitation. Learn Health Syst 2022; 6:e10266. [PMID: 35036550 PMCID: PMC8753308 DOI: 10.1002/lrh2.10266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/19/2021] [Accepted: 03/29/2021] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION Research and continuous quality improvement in pediatric rehabilitation settings require standardized data and a systematic approach to use these data. METHODS We systematically examined pediatric data concepts from a pediatric learning network to determine capacity for capturing gross motor function (GMF) for children with Cerebral Palsy (CP) as a demonstration for enabling infrastructure for research and quality improvement activities of an LHS. We used an iterative approach to construct phenotype models of GMF from standardized data element concepts based on case definitions from the Gross Motor Function Classification System (GMFCS). Data concepts were selected using a theory and expert-informed process and resulted in the construction of four phenotype models of GMF: an overall model and three classes corresponding to deviations in GMF for CP populations. RESULTS Sixty five data element concepts were identified for the overall GMF phenotype model. The 65 data elements correspond to 20 variables and logic statements that instantiate membership into one of three clinically meaningful classes of GMF. Data element concepts and variables are organized into five domains relevant to modeling GMF: Neurologic Function, Mobility Performance, Activity Performance, Motor Performance, and Device Use. CONCLUSION Our experience provides an approach for organizations to leverage existing data for care improvement and research in other conditions. This is the first consensus-based and theory-driven specification of data elements and logic to support identification and labeling of GMF in patients for measuring improvements in care or the impact of new treatments. More research is needed to validate this phenotype model and the extent that these data differentiate between classes of GMF to support various LHS activities.
Collapse
Affiliation(s)
- Nikolas Koscielniak
- Clinical and Translational Science InstituteWake Forest University School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Gretchen Piatt
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Charles Friedman
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Alexandra Vinson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Rachel Richesson
- Department of Learning Health SciencesUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Carole Tucker
- Department of Health and Rehabilitation SciencesTemple UniversityPhiladelphiaPennsylvaniaUSA
| |
Collapse
|
15
|
Data validation techniques used in admission discharge and transfer systems: Necessity of use and effect on data quality. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
16
|
Constructing Epidemiologic Cohorts from Electronic Health Record Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182413193. [PMID: 34948800 PMCID: PMC8701170 DOI: 10.3390/ijerph182413193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 11/17/2022]
Abstract
In the United States, electronic health records (EHR) are increasingly being incorporated into healthcare organizations to document patient health and services rendered. EHRs serve as a vast repository of demographic, diagnostic, procedural, therapeutic, and laboratory test data generated during the routine provision of health care. The appeal of using EHR data for epidemiologic research is clear: EHRs generate large datasets on real-world patient populations in an easily retrievable form permitting the cost-efficient execution of epidemiologic studies on a wide array of topics. Constructing epidemiologic cohorts from EHR data involves as a defining feature the development of data machinery, which transforms raw EHR data into an epidemiologic dataset from which appropriate inference can be drawn. Though data machinery includes many features, the current report focuses on three aspects of machinery development of high salience to EHR-based epidemiology: (1) selecting study participants; (2) defining “baseline” and assembly of baseline characteristics; and (3) follow-up for future outcomes. For each, the defining features and unique challenges with respect to EHR-based epidemiology are discussed. An ongoing example illustrates key points. EHR-based epidemiology will become more prominent as EHR data sources continue to proliferate. Epidemiologists must continue to improve the methods of EHR-based epidemiology given the relevance of EHRs in today’s healthcare ecosystem.
Collapse
|
17
|
Tute E, Ganapathy N, Wulff A. A data driven learning approach for the assessment of data quality. BMC Med Inform Decis Mak 2021; 21:302. [PMID: 34724930 PMCID: PMC8561935 DOI: 10.1186/s12911-021-01656-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open
Abstract
Background Data quality assessment is important but complex and task dependent. Identifying suitable measurement methods and reference ranges for assessing their results is challenging. Manually inspecting the measurement results and current data driven approaches for learning which results indicate data quality issues have considerable limitations, e.g. to identify task dependent thresholds for measurement results that indicate data quality issues. Objectives To explore the applicability and potential benefits of a data driven approach to learn task dependent knowledge about suitable measurement methods and assessment of their results. Such knowledge could be useful for others to determine whether a local data stock is suitable for a given task. Methods We started by creating artificial data with previously defined data quality issues and applied a set of generic measurement methods on this data (e.g. a method to count the number of values in a certain variable or the mean value of the values). We trained decision trees on exported measurement methods’ results and corresponding outcome data (data that indicated the data’s suitability for a use case). For evaluation, we derived rules for potential measurement methods and reference values from the decision trees and compared these regarding their coverage of the true data quality issues artificially created in the dataset. Three researchers independently derived these rules. One with knowledge about present data quality issues and two without. Results Our self-trained decision trees were able to indicate rules for 12 of 19 previously defined data quality issues. Learned knowledge about measurement methods and their assessment was complementary to manual interpretation of measurement methods’ results. Conclusions Our data driven approach derives sensible knowledge for task dependent data quality assessment and complements other current approaches. Based on labeled measurement methods’ results as training data, our approach successfully suggested applicable rules for checking data quality characteristics that determine whether a dataset is suitable for a given task. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01656-x.
Collapse
Affiliation(s)
- Erik Tute
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany.
| | - Nagarajan Ganapathy
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany
| | - Antje Wulff
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany
| |
Collapse
|
18
|
Simon GE, Bindman AB, Dreyer NA, Platt R, Watanabe JH, Horberg M, Hernandez A, Califf RM. When Can We Trust Real-World Data To Evaluate New Medical Treatments? Clin Pharmacol Ther 2021; 111:24-29. [PMID: 33932030 PMCID: PMC9292968 DOI: 10.1002/cpt.2252] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/24/2021] [Indexed: 11/15/2022]
Abstract
Concerns regarding both the limited generalizability and the slow pace of traditional randomized trials have led to calls for greater use of real‐world evidence (RWE) in the evaluation of new treatments or products. RWE studies often rely on real‐world data (RWD), including data extracted from healthcare records or data captured by mobile phones or other consumer devices. Global assessments of RWD sources are not helpful in assessing whether any specific RWD element is fit for any specific purpose. Instead, evidence generators and evidence consumers should clearly identify the specific health state or clinical phenomenon of interest and then consider each step between that clinical phenomenon and its representation in a research database. We propose specific questions regarding potential error or bias affecting each of those steps: Would a person experiencing this clinical phenomenon present for care in this setting or interact with this recording device? Would this clinical phenomenon be accurately recognized or assessed? How might the recording environment or tools affect accurate and consistent recording of this clinical phenomenon? Can data elements from different sources be harmonized, both technically (same format) and semantically (same meaning)? Can the original data elements be consistently reduced to a useful clinical phenotype? Addressing these questions requires a range of clinical, organizational, and technical expertise. Transparency regarding each step in the creation of RWD is essential if evidence consumers are to rely on RWE studies.
Collapse
Affiliation(s)
- Gregory E Simon
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Andrew B Bindman
- Kaiser Foundation Health Plan and Hospitals, Redwood City, California, USA
| | | | - Richard Platt
- Harvard Pilgrim Health Care Institute and Harvard Medical School, Hartford, Connecticut, USA
| | - Jonathan H Watanabe
- University of California Irvine School of Pharmacy and Pharmaceutical Sciences, Irvine, California, USA
| | - Michael Horberg
- Kaiser Permanente Mid-Atlantic Permanente Research Institute and Mid-Atlantic Permanente Medical Group, Rockville, Maryland, USA
| | | | - Robert M Califf
- Verily Life Sciences and Google Health, South San Francisco, California, USA
| |
Collapse
|
19
|
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, Huebner M, Schmidt B, Sauerbrei W, Richter A. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol 2021; 21:63. [PMID: 33810787 PMCID: PMC8019177 DOI: 10.1186/s12874-021-01252-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/12/2021] [Indexed: 12/21/2022] Open
Abstract
Background No standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments. Methods Developments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP). Results The data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website. Conclusions The presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research.
Collapse
Affiliation(s)
- Carsten Oliver Schmidt
- Institute for Community Medicine, Department SHIP-KEF, University Medicine Greifswald, Greifswald, Germany.
| | - Stephan Struckmann
- Institute for Community Medicine, Department SHIP-KEF, University Medicine Greifswald, Greifswald, Germany
| | - Cornelia Enzenbach
- Institute for Medical Informatics, Statistics, and Epidemiology, University of Leipzig, Leipzig, Germany
| | - Achim Reineke
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Jürgen Stausberg
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University of Duisburg-Essen, Duisburg, Germany
| | - Stefan Damerow
- Robert Koch Institute, Department of Epidemiology and Health Monitoring, Berlin, Germany
| | - Marianne Huebner
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | - Börge Schmidt
- Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University of Duisburg-Essen, Duisburg, Germany
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Adrian Richter
- Institute for Community Medicine, Department SHIP-KEF, University Medicine Greifswald, Greifswald, Germany
| |
Collapse
|
20
|
Tute E, Scheffner I, Marschollek M. A method for interoperable knowledge-based data quality assessment. BMC Med Inform Decis Mak 2021; 21:93. [PMID: 33750371 PMCID: PMC7942002 DOI: 10.1186/s12911-021-01458-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 02/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Assessing the quality of healthcare data is a complex task including the selection of suitable measurement methods (MM) and adequately assessing their results. OBJECTIVES To present an interoperable data quality (DQ) assessment method that formalizes MMs based on standardized data definitions and intends to support collaborative governance of DQ-assessment knowledge, e.g. which MMs to apply and how to assess their results in different situations. METHODS We describe and explain central concepts of our method using the example of its first real world application in a study on predictive biomarkers for rejection and other injuries of kidney transplants. We applied our open source tool-openCQA-that implements our method utilizing the openEHR specifications. Means to support collaborative governance of DQ-assessment knowledge are the version-control system git and openEHR clinical information models. RESULTS Applying the method on the study's dataset showed satisfactory practicability of the described concepts and produced useful results for DQ-assessment. CONCLUSIONS The main contribution of our work is to provide applicable concepts and a tested exemplary open source implementation for interoperable and knowledge-based DQ-assessment in healthcare that considers the need for flexible task and domain specific requirements.
Collapse
Affiliation(s)
- Erik Tute
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Irina Scheffner
- Department of Nephrology, Hannover Medical School, Hannover, Germany
| | - Michael Marschollek
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| |
Collapse
|
21
|
Estiri H, Vasey S, Murphy SN. Generative transfer learning for measuring plausibility of EHR diagnosis records. J Am Med Inform Assoc 2021; 28:559-568. [PMID: 33043366 PMCID: PMC7936395 DOI: 10.1093/jamia/ocaa215] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 08/18/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease. MATERIALS AND METHODS Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features). RESULTS We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases. DISCUSSION The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes. CONCLUSION Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.
Collapse
Affiliation(s)
- Hossein Estiri
- Harvard Medical School, Boston, Massachusetts, USA
- Massachusetts General Hospital, Boston, Massachusetts, USA
- Mass General Brigham, Boston, Massachusetts, USA
| | - Sebastien Vasey
- Department of Mathematics, Harvard University, Cambridge, Massachusetts, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, Massachusetts, USA
- Massachusetts General Hospital, Boston, Massachusetts, USA
- Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|
22
|
Liaw ST, Guo JGN, Ansari S, Jonnagaddala J, Godinho MA, Borelli AJ, de Lusignan S, Capurro D, Liyanage H, Bhattal N, Bennett V, Chan J, Kahn MG. Quality assessment of real-world data repositories across the data life cycle: A literature review. J Am Med Inform Assoc 2021; 28:1591-1599. [PMID: 33496785 DOI: 10.1093/jamia/ocaa340] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 12/22/2020] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. MATERIALS AND METHODS The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. RESULTS The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. CONCLUSIONS A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
Collapse
Affiliation(s)
- Siaw-Teng Liaw
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Jason Guan Nan Guo
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Sameera Ansari
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Jitendra Jonnagaddala
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Myron Anthony Godinho
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Alder Jose Borelli
- WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
| | - Simon de Lusignan
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
| | - Daniel Capurro
- Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Victoria, Australia
| | - Harshana Liyanage
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
| | - Navreet Bhattal
- Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
| | - Vicki Bennett
- Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
| | - Jaclyn Chan
- Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
| | - Michael G Kahn
- Department of Pediatrics (Section of Informatics and Data Sciences), University of Colorado Anschutz Medical Campus, Denver, Colorado, USA
| |
Collapse
|
23
|
Diaz-Garelli F, Lenoir KM, Wells BJ. Catch Me if You Can: Acute Events Hidden in Structured Chronic Disease Diagnosis Descriptions Show Detectable Recording Patterns in EHR. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:373-382. [PMID: 33936410 PMCID: PMC8075503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Our previous research shows that structured cancer DX description data accuracy varied across electronic health record (EHR) segments (e.g. encounter DX, problem list, etc.). We provide initial evidence corroborating these findings in EHRs from patients with diabetes. We hypothesized that the odds of recording an "uncontrolled diabetes" DX increased after a hemoglobin A1c result above 9% and that this rate would vary across EHR segments. Our statistical models revealed that each DX indicating uncontrolled diabetes was 2.6% more likely to occur post-A1c>9% overall (adj-p=.0005) and 3.9% after controlling for EHR segment (adj-p<.0001). However, odds ratios varied across segments (1.021
Collapse
|
24
|
Estiri H, Klann JG, Weiler SR, Alema-Mensah E, Joseph Applegate R, Lozinski G, Patibandla N, Wei K, Adams WG, Natter MD, Ofili EO, Ostasiewski B, Quarshie A, Rosenthal GE, Bernstam EV, Mandl KD, Murphy SN. A federated EHR network data completeness tracking system. J Am Med Inform Assoc 2020; 26:637-645. [PMID: 30925587 PMCID: PMC6586954 DOI: 10.1093/jamia/ocz014] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Revised: 01/04/2019] [Accepted: 01/17/2019] [Indexed: 02/03/2023] Open
Abstract
OBJECTIVE The study sought to design, pilot, and evaluate a federated data completeness tracking system (CTX) for assessing completeness in research data extracted from electronic health record data across the Accessible Research Commons for Health (ARCH) Clinical Data Research Network. MATERIALS AND METHODS The CTX applies a systems-based approach to design workflow and technology for assessing completeness across distributed electronic health record data repositories participating in a queryable, federated network. The CTX invokes 2 positive feedback loops that utilize open source tools (DQe-c and Vue) to integrate technology and human actors in a system geared for increasing capacity and taking action. A pilot implementation of the system involved 6 ARCH partner sites between January 2017 and May 2018. RESULTS The ARCH CTX has enabled the network to monitor and, if needed, adjust its data management processes to maintain complete datasets for secondary use. The system allows the network and its partner sites to profile data completeness both at the network and partner site levels. Interactive visualizations presenting the current state of completeness in the context of the entire network as well as changes in completeness across time were valued among the CTX user base. DISCUSSION Distributed clinical data networks are complex systems. Top-down approaches that solely rely on technology to report data completeness may be necessary but not sufficient for improving completeness (and quality) of data in large-scale clinical data networks. Improving and maintaining complete (high-quality) data in such complex environments entails sociotechnical systems that exploit technology and empower human actors to engage in the process of high-quality data curating. CONCLUSIONS The CTX has increased the network's capacity to rapidly identify data completeness issues and empowered ARCH partner sites to get involved in improving the completeness of respective data in their repositories.
Collapse
Affiliation(s)
- Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | - Jeffrey G Klann
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA
| | | | | | - R Joseph Applegate
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Galina Lozinski
- Boston University School of Medicine/Boston Medical Center, Boston, Massachusetts, USA
| | - Nandan Patibandla
- Information Services Department, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Kun Wei
- Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - William G Adams
- Department of Pediatrics, Boston University School of Medicine/Boston Medical Center, Boston, Massachusetts, USA
| | - Marc D Natter
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Program in Pediatric Rheumatology, Department of Pediatrics, Mass General Hospital for Children, Boston, Massachusetts, USA
| | | | | | | | - Gary E Rosenthal
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Elmer V Bernstam
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.,Division of General Internal Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA.,Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing, Partners HealthCare, Charlestown, Massachusetts, USA.,Harvard Medical School, Boston, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.,Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
25
|
March S, Andrich S, Drepper J, Horenkamp-Sonntag D, Icks A, Ihle P, Kieschke J, Kollhorst B, Maier B, Meyer I, Müller G, Ohlmeier C, Peschke D, Richter A, Rosenbusch ML, Scholten N, Schulz M, Stallmann C, Swart E, Wobbe-Ribinski S, Wolter A, Zeidler J, Hoffmann F. Good Practice Data Linkage (GPD): A Translation of the German Version. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17217852. [PMID: 33120886 PMCID: PMC7663300 DOI: 10.3390/ijerph17217852] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/16/2020] [Accepted: 10/22/2020] [Indexed: 12/14/2022]
Abstract
The data linkage of different data sources for research purposes is being increasingly used in recent years. However, generally accepted methodological guidance is missing. The aim of this article is to provide methodological guidelines and recommendations for research projects that have been consented to across different German research societies. Another aim is to endow readers with a checklist for the critical appraisal of research proposals and articles. This Good Practice Data Linkage (GPD) was already published in German in 2019, but the aspects mentioned can easily be transferred to an international context, especially for other European Union (EU) member states. Therefore, it is now also published in English. Since 2016, an expert panel of members of different German scientific societies have worked together and developed seven guidelines with a total of 27 practical recommendations. These recommendations include (1) the research objectives, research questions, data sources, and resources; (2) the data infrastructure and data flow; (3) data protection; (4) ethics; (5) the key variables and linkage methods; (6) data validation/quality assurance; and (7) the long-term use of data for questions still to be determined. The authors provide a rationale for each recommendation. Future revisions will include new developments in science and updates of data privacy regulations.
Collapse
Affiliation(s)
- Stefanie March
- Institute for Social Medicine and Health Systems Research (ISMHSR), Medical Faculty, Otto von Guericke University Magdeburg, 39120 Magdeburg, Germany; (S.M.); (C.S.); (E.S.)
- Department of Social Work, Health and Media, Magdeburg-Stendal University of Applied Sciences, 39114 Magdeburg, Germany
| | - Silke Andrich
- Institute for Health Services Research and Health Economics, Centre for Health and Society, Faculty of Medicine, Heinrich-Heine-University Düsseldorf, 40225 Dusseldorf, Germany; (S.A.); (A.I.)
- Institute for Health Services Research and Health Economics, German Diabetes Center, Leibniz Center for Diabetes Research at the Heinrich-Heine-University Düsseldorf, 40225 Dusseldorf, Germany
| | - Johannes Drepper
- TMF—Technology, Methods, and Infrastructure for Networked Medical Research, 10117 Berlin, Germany;
| | | | - Andrea Icks
- Institute for Health Services Research and Health Economics, Centre for Health and Society, Faculty of Medicine, Heinrich-Heine-University Düsseldorf, 40225 Dusseldorf, Germany; (S.A.); (A.I.)
- Institute for Health Services Research and Health Economics, German Diabetes Center, Leibniz Center for Diabetes Research at the Heinrich-Heine-University Düsseldorf, 40225 Dusseldorf, Germany
| | - Peter Ihle
- PMV Research Group, University of Cologne, 50931 Cologne, Germany; (P.I.); (I.M.)
| | - Joachim Kieschke
- Epidemiological Cancer Registry of Lower Saxony, Register Center, 26121 Oldenburg, Germany;
| | - Bianca Kollhorst
- Leibniz Institute for Prevention Research and Epidemiology—BIPS Department Biometry and Data Management, 28359 Bremen, Germany;
| | - Birga Maier
- Berlin-Brandenburg Myocardial Infarction Registry e. V., 10317 Berlin, Germany;
| | - Ingo Meyer
- PMV Research Group, University of Cologne, 50931 Cologne, Germany; (P.I.); (I.M.)
| | - Gabriele Müller
- Center for Evidence-Based Healthcare (ZEGV), University Hospital and Faculty of Medicine Carl Gustav Carus, Technical University of Dresden, 01307 Dresden, Germany;
| | | | - Dirk Peschke
- Institute for Public Health and Nursing Research (IPP), University of Bremen, 28359 Bremen, Germany;
- Department of Applied Health Sciences, University of Health Bochum, 44801 Bochum, Germany
| | - Adrian Richter
- Institute for Community Medicine, Department SHIP-KEF, Greifswald University Medical Center, 17475 Greifswald, Germany;
| | - Marie-Luise Rosenbusch
- Central Research Institute for Ambulatory Healthcare in Germany (Zi), Department of Data Science and Healthcare Analyses, 10587 Berlin, Germany; (M.-L.R.); (M.S.)
| | - Nadine Scholten
- Institute of Medical Sociology, Health Services Research and Rehabilitation Science (IMVR), Faculty of Human Sciences and Faculty of Medicine, University of Cologne, 50933 Cologne, Germany;
| | - Mandy Schulz
- Central Research Institute for Ambulatory Healthcare in Germany (Zi), Department of Data Science and Healthcare Analyses, 10587 Berlin, Germany; (M.-L.R.); (M.S.)
| | - Christoph Stallmann
- Institute for Social Medicine and Health Systems Research (ISMHSR), Medical Faculty, Otto von Guericke University Magdeburg, 39120 Magdeburg, Germany; (S.M.); (C.S.); (E.S.)
| | - Enno Swart
- Institute for Social Medicine and Health Systems Research (ISMHSR), Medical Faculty, Otto von Guericke University Magdeburg, 39120 Magdeburg, Germany; (S.M.); (C.S.); (E.S.)
| | - Stefanie Wobbe-Ribinski
- DAK Gesundheit, Health Services Research and Innovation, 20097 Hamburg, Germany; (S.W.-R.); (A.W.)
| | - Antke Wolter
- DAK Gesundheit, Health Services Research and Innovation, 20097 Hamburg, Germany; (S.W.-R.); (A.W.)
| | - Jan Zeidler
- Center for Health Economics Research Hanover (CHERH), Leibniz University Hanover, 30159 Hanover, Germany;
| | - Falk Hoffmann
- Faculty of Medicine and Health Sciences, Department of Healthcare Research, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
- Correspondence:
| |
Collapse
|
26
|
Population-level surveillance of congenital heart defects among adolescents and adults in Colorado: Implications of record linkage. Am Heart J 2020; 226:75-84. [PMID: 32526532 DOI: 10.1016/j.ahj.2020.04.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 04/14/2020] [Indexed: 11/23/2022]
Abstract
BACKGROUND The objective was to describe the design of a population-level electronic health record (EHR) and insurance claims-based surveillance system of adolescents and adults with congenital heart defects (CHDs) in Colorado and to evaluate the bias introduced by duplicate cases across data sources. METHODS The Colorado CHD Surveillance System ascertained individuals aged 11-64 years with a CHD based on International Classification of Diseases, Ninth Revision, Clinical Modification diagnostic coding between 2011 and 2013 from a diverse network of health care systems and an All Payer Claims Database (APCD). A probability-based identity reconciliation algorithm identified duplicate cases. Logistic regression was conducted to investigate bias introduced by duplicate cases on the relationship between CHD severity (severe compared to moderate/mild) and adverse outcomes including all-cause mortality, inpatient hospitalization, and major adverse cardiac events (myocardial infarction, congestive heart failure, or cerebrovascular event). Sensitivity analyses were conducted to investigate bias introduced by the sole use or exclusion of APCD data. RESULTS A total of 12,293 unique cases were identified, of which 3,476 had a within or between data source duplicate. Duplicate cases were more likely to be in the youngest age group and have private health insurance, a severe heart defect, a CHD comorbidity, and higher health care utilization. We found that failure to resolve duplicate cases between data sources would inflate the relationship between CHD severity and both morbidity and mortality outcomes by ~15%. Sensitivity analyses indicate that scenarios in which APCD was excluded from case finding or relied upon as the sole source of case finding would also result in an overestimation of the relationship between a CHD severity and major adverse outcomes. DISCUSSION Aggregated EHR- and claims-based surveillance systems of adolescents and adults with CHD that fail to account for duplicate records will introduce considerable bias into research findings. CONCLUSION Population-level surveillance systems for rare chronic conditions, such as congenital heart disease, based on aggregation of EHR and claims data require sophisticated identity reconciliation methods to prevent bias introduced by duplicate cases.
Collapse
|
27
|
Savitz ST, Savitz LA, Fleming NS, Shah ND, Go AS. How much can we trust electronic health record data? HEALTHCARE-THE JOURNAL OF DELIVERY SCIENCE AND INNOVATION 2020; 8:100444. [PMID: 32919583 DOI: 10.1016/j.hjdsi.2020.100444] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 05/25/2020] [Accepted: 06/11/2020] [Indexed: 01/03/2023]
Abstract
Trust in EHR data is becoming increasingly important as a greater share of clinical and health services research use EHR data. We discuss reasons for distrust and acknowledge limitations. Researchers continue to use EHR data because of strengths including greater clinical detail than sources like administrative billing claims. Further, many limitations are addressable with existing methods including data quality checks and common data frameworks. We discuss how to build greater trust in the use of EHR data for research, including additional transparency and research priority areas that will both enhance existing strengths of the EHR and mitigate its limitations.
Collapse
Affiliation(s)
- Samuel T Savitz
- Kaiser Permanente Northern California Division of Research, USA
| | | | | | - Nilay D Shah
- Division of Health Care Policy & Research, The Mayo Clinic, USA
| | - Alan S Go
- Kaiser Permanente Northern California Division of Research, USA; Department of Epidemiology, Biostatistics and Medicine, University of California, San Francisco, USA; Departments of Medicine, Health Research and Policy, Stanford University School of Medicine, USA
| |
Collapse
|
28
|
Callahan A, Shah NH, Chen JH. Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data. Ann Intern Med 2020; 172:S79-S84. [PMID: 32479175 PMCID: PMC7413106 DOI: 10.7326/m19-0873] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Electronic health records (EHRs) are an increasingly important source of real-world health care data for observational research. Analyses of data collected for purposes other than research require careful consideration of data quality as well as the general research and reporting principles relevant to observational studies. The core principles for observational research in general also apply to observational research using EHR data, and these are well addressed in prior literature and guidelines. This article provides additional recommendations for EHR-based research. Considerations unique to EHR-based studies include assessment of the accuracy of computer-executable cohort definitions that can incorporate unstructured data from clinical notes and management of data challenges, such as irregular sampling, missingness, and variation across time and place. Principled application of existing research and reporting guidelines alongside these additional considerations will improve the quality of EHR-based observational studies.
Collapse
Affiliation(s)
- Alison Callahan
- Center for Biomedical Informatics Research, School of Medicine, Stanford University (A.C., N.H.S.)
| | - Nigam H Shah
- Center for Biomedical Informatics Research, School of Medicine, Stanford University (A.C., N.H.S.)
| | - Jonathan H Chen
- Division of Hospital Medicine, School of Medicine, Stanford University (J.H.C.)
| |
Collapse
|
29
|
Diaz-Garelli F, Strowd R, Lawson VL, Mayorga ME, Wells BJ, Lycan TW, Topaloglu U. Workflow Differences Affect Data Accuracy in Oncologic EHRs: A First Step Toward Detangling the Diagnosis Data Babel. JCO Clin Cancer Inform 2020; 4:529-538. [PMID: 32543899 PMCID: PMC7331128 DOI: 10.1200/cci.19.00114] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2020] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Diagnosis (DX) information is key to clinical data reuse, yet accessible structured DX data often lack accuracy. Previous research hints at workflow differences in cancer DX entry, but their link to clinical data quality is unclear. We hypothesized that there is a statistically significant relationship between workflow-describing variables and DX data quality. METHODS We extracted DX data from encounter and order tables within our electronic health records (EHRs) for a cohort of patients with confirmed brain neoplasms. We built and optimized logistic regressions to predict the odds of fully accurate (ie, correct neoplasm type and anatomic site), inaccurate, and suboptimal (ie, vague) DX entry across clinical workflows. We selected our variables based on correlation strength of each outcome variable. RESULTS Both workflow and personnel variables were predictive of DX data quality. For example, a DX entered in departments other than oncology had up to 2.89 times higher odds of being accurate (P < .0001) compared with an oncology department; an outpatient care location had up to 98% fewer odds of being inaccurate (P < .0001), but had 458 times higher odds of being suboptimal (P < .0001) compared with main campus, including the cancer center; and a DX recoded by a physician assistant had 85% fewer odds of being suboptimal (P = .005) compared with those entered by physicians. CONCLUSION These results suggest that differences across clinical workflows and the clinical personnel producing EHR data affect clinical data quality. They also suggest that the need for specific structured DX data recording varies across clinical workflows and may be dependent on clinical information needs. Clinicians and researchers reusing oncologic data should consider such heterogeneity when conducting secondary analyses of EHR data.
Collapse
Affiliation(s)
- Franck Diaz-Garelli
- University of North Carolina at Charlotte, Charlotte, NC
- Wake Forest School of Medicine, Winston Salem, NC
| | - Roy Strowd
- Wake Forest School of Medicine, Winston Salem, NC
| | - Virginia L. Lawson
- University of North Carolina at Charlotte, Charlotte, NC
- Wake Forest School of Medicine, Winston Salem, NC
| | | | | | | | | |
Collapse
|
30
|
Kato M, Tanaka K, Kida M, Ryozawa S, Matsuda K, Fujishiro M, Saito Y, Ohtsuka K, Oda I, Katada C, Kobayashi K, Hoteya S, Horimatsu T, Kodashima S, Matsuda T, Muto M, Yamamoto H, Iwakiri R, Kutsumi H, Miyata H, Kato M, Haruma K, Fujimoto K, Uemura N, Kaminishi M, Tajiri H. Multicenter database registry for endoscopic retrograde cholangiopancreatography: Japan Endoscopic Database Project. Dig Endosc 2020; 32:494-502. [PMID: 31361923 DOI: 10.1111/den.13495] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 07/26/2019] [Indexed: 02/08/2023]
Abstract
BACKGROUND AND AIM Few studies have reported on a national, population-based endoscopic retrograde cholangiopancreatography (ERCP) database. Hence, in 2015, we established a multicenter ERCP database registry, the Japan Endoscopic Database (JED) Project in preparation for a nationwide endoscopic database. The objective the present study was to evaluate this registry before the establishment of a nationwide endoscopic database. METHODS From 1 January 2015 to 31 March 2017, we collected and analyzed the ERCP data of all patients who underwent ERCP in four participating centers in the JED Project based on the JED protocol. RESULTS Four centers carried out 4104 ERCP on 2173 patients. Data entry of ERCP information (age, 100%; gender, 100%; American Society of Anesthesiologists Physical Status Classification System, 74.5%; scope, 92.7%; time to ERCP, 100%; antithrombotic drug information, 55.0%; primary selective common bile duct [CBD] cannulation methods, 73.0%; number of attempts at primary selective CBD cannulation, 67.6%; overall selective CBD cannulation methods, 68.9%; ERCP procedure time, 66.3%; fluoroscopy time, 65.1%; adverse events, 74.9%; serum amylase levels 1 day post-ERCP, 36.5%) was accurately extracted from the four centers. Success rate of CBD cannulation by level of ERCP difficulty was 98.5%, 99.0%, and 96.4% in grades 1, 2, and 3, respectively. Complication rate by overall selective CBD cannulation method was 5.6%, 7.6%, and 10.5% in the contrast-assisted technique, guidewire-assisted technique, and cross-over method, respectively. CONCLUSION Data from this evaluation of the JED Project, a multicenter ERCP database registry, suggest the feasibility of establishing a nationwide ERCP database and its challenges.
Collapse
Affiliation(s)
- Masayuki Kato
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Kiyohito Tanaka
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan.,Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Mitsuhiro Kida
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Shomei Ryozawa
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Koji Matsuda
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan.,Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Mitsuhiro Fujishiro
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan.,Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Yutaka Saito
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan.,Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Kazuo Ohtsuka
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Ichiro Oda
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Chikatoshi Katada
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Kiyonori Kobayashi
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Shu Hoteya
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Takahiro Horimatsu
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Shinya Kodashima
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Takahisa Matsuda
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Manabu Muto
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Hironori Yamamoto
- Minimal Standard Endoscopic Database (MSED-J) Creation Subcommittee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Ryuichi Iwakiri
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Hiromu Kutsumi
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Hiroaki Miyata
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Mototsugu Kato
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Ken Haruma
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Kazuma Fujimoto
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Naomi Uemura
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Michio Kaminishi
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| | - Hisao Tajiri
- Japan Endoscopy Database (JED) Project Committee, Japan Gastroenterological Endoscopy Society, Tokyo, Japan
| |
Collapse
|
31
|
Patorno E, Schneeweiss S, Wang SV. Transparency in real-world evidence (RWE) studies to build confidence for decision-making: Reporting RWE research in diabetes. Diabetes Obes Metab 2020; 22 Suppl 3:45-59. [PMID: 32250527 PMCID: PMC7472869 DOI: 10.1111/dom.13918] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 10/29/2019] [Accepted: 11/09/2019] [Indexed: 12/28/2022]
Abstract
Transparency of real-world evidence (RWE) studies is critical to understanding how findings of a specific study were derived and is a necessary foundation to assessing validity and determination of whether decisions should be informed by the findings. In the present paper, we lay out strategies to improve clarity in the reporting of comparative effectiveness studies using real-world data that were generated by the routine operation of a healthcare system. This may include claims data, electronic health records, wearable devices, patient-reported outcomes or patient registries. These recommendations were discussed with multiple stakeholders, including regulators, payers, academics and journal editors, and endorsed by two professional societies that focus on RWE. We remind readers interested in diabetes research of the utility of conceptualizing a target trial that is then emulated by a RWE study when planning and communicating about RWE study implementation. We recommend the use of a graphical representation showcasing temporality of key longitudinal study design choices. We highlight study elements that should be reported to provide the clarity necessary to make a study reproducible. Finally, we suggest registering study protocols to increase process transparency. With these tools the readership of diabetes RWE studies will be able to more efficiently understand each study and be more able to assess a study's validity with reasonably high confidence before making decisions based on its findings.
Collapse
Affiliation(s)
- Elisabetta Patorno
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States
| | - Sebastian Schneeweiss
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States
| | - Shirley V. Wang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, United States
| |
Collapse
|
32
|
Woods JA, Johnson CE, Allingham SF, Ngo HT, Katzenellenbogen JM, Thompson SC. Collaborative data familiarisation and quality assessment: Reflections from use of a national dataset to investigate palliative care for Indigenous Australians. Health Inf Manag 2020; 50:64-75. [PMID: 32216561 DOI: 10.1177/1833358320908957] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND Data quality is fundamental to the integrity of quantitative research. The role of external researchers in data quality assessment (DQA) remains ill-defined in the context of secondary use for research of large, centrally curated health datasets. In order to investigate equity of palliative care provided to Indigenous Australian patients, researchers accessed a now-historical version of a national palliative care dataset developed primarily for the purpose of continuous quality improvement. OBJECTIVES (i) To apply a generic DQA framework to the dataset and (ii) to report the process and results of this assessment and examine the consequences for conducting the research. METHOD The data were systematically examined for completeness, consistency and credibility. Data quality issues relevant to the Indigenous identifier and framing of research questions were of particular interest. RESULTS The dataset comprised 477,518 records of 144,951 patients (Indigenous N = 1515; missing Indigenous identifier N = 4998) collected from participating specialist palliative care services during a period (1 January 2010-30 June 2015) in which data-checking systems underwent substantial upgrades. Progressive improvement in completeness of data over the study period was evident. The data were error-free with respect to many credibility and consistency checks, with anomalies detected reported to data managers. As the proportion of missing values remained substantial for some clinical care variables, multiple imputation procedures were used in subsequent analyses. CONCLUSION AND IMPLICATIONS In secondary use of large curated datasets, DQA by external researchers may both influence proposed analytical methods and contribute to improvement of data curation processes through feedback to data managers.
Collapse
Affiliation(s)
- John A Woods
- 2720The University of Western Australia, Australia
| | - Claire E Johnson
- 2720The University of Western Australia, Australia.,2541Monash University, Australia.,Eastern Health, Victoria, Australia
| | | | - Hanh T Ngo
- 2720The University of Western Australia, Australia
| | | | | |
Collapse
|
33
|
Corey KM, Helmkamp J, Simons M, Curtis L, Marsolo K, Balu S, Gao M, Nichols M, Watson J, Mureebe L, Kirk AD, Sendak M. Assessing Quality of Surgical Real-World Data from an Automated Electronic Health Record Pipeline. J Am Coll Surg 2020; 230:295-305.e12. [DOI: 10.1016/j.jamcollsurg.2019.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 12/19/2019] [Accepted: 12/19/2019] [Indexed: 11/17/2022]
|
34
|
Otero Varela L, Le Pogam MA, Metcalfe A, Kristensen PK, Hider P, Patel A, Kim H, Carlini E, Perego R, Gini R. Empowering knowledge generation through international data network: the IMeCCHI-DATANETWORK. Int J Popul Data Sci 2020; 5:1125. [PMID: 32935050 PMCID: PMC7473294 DOI: 10.23889/ijpds.v5i1.1125] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Introduction The International Methodology Consortium for Coded Health Information (IMeCCHI) is a collaboration of health services researchers who promote methodological advances in coded health information. The IMeCCHI-DATANETWORK initiative focuses on developing a multi-purpose distributed data infrastructure and common data model (CDM) to enable cross-border data sharing and international comparisons. Methods IMeCCHI consortium partners from six different countries – Canada, Denmark, Italy, New Zealand, South Korea, and Switzerland – used a questionnaire to describe their original databases which differ in size, structure, content and coding systems. To standardize these data, they agreed on a CDM and mapped their population-based databases to meet the CDM specifications. At the end of this process, local data had a more homogenous content and structure, which made them syntactically and semantically interoperable. Data transformation was performed using a common data management software called TheMatrix. Results The CDM encompasses four tables of structured data (person characteristics, hospitalizations, outpatient prescription medication and death), linked at the individual level through a person identifier. It can be used to answer research questions across countries using locally converted databases, which facilitates study replication in a distributed fashion. As a proof-of-concept study, an initial research question was addressed using an agreed protocol. Local data were transformed in csv files in the CDM structure and TheMatrix was tested to transform the standardized data from each partner into local analytical datasets. This allowed results to be shared between countries, whilst maintaining local control over each region’s data. Conclusion The IMeCCHI-DATANETWORK, a model of a distributed data network, demonstrated that it is feasible to analyze international data using standardized analytical methods that enable independent analyses by regions, without relocating datasets thereby protecting local confidentiality obligations. The distributed data infrastructure can produce results that can be generalized to several countries, while facilitating cross-border data sharing and international comparisons. Keywords Common data model, international comparison, cross-border data sharing, interoperability, observational data
Collapse
Affiliation(s)
- L Otero Varela
- Department of Community Health Sciences, Cumming School of Medicine, Calgary, Canada
| | - M-A Le Pogam
- Department of Epidemiology and Health Systems, Center for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland
| | - A Metcalfe
- Department of Community Health Sciences, Cumming School of Medicine, Calgary, Canada
| | - P K Kristensen
- Department of Clinical Epidemiology, Aarhus University, Denmark
| | - P Hider
- Department of Population Health, University of Otago, Christchurch, New Zealand
| | - A Patel
- Department of Community Health Sciences, Cumming School of Medicine, Calgary, Canada
| | - H Kim
- Graduate School of Public Health Dept. of Public Health Sciences; Institute of Aging; and Institute of Health and Environment, Seoul National University, Seoul, Republic of Korea
| | - E Carlini
- Istituto di Scienza e Tecnologie dell'Informazione, Pisa, Italy
| | - R Perego
- Istituto di Scienza e Tecnologie dell'Informazione, Pisa, Italy
| | - R Gini
- Agenzia Regionale di Sanità della Toscana, Firenze, Italy
| |
Collapse
|
35
|
Brossier D, Sauthier M, Mathieu A, Goyer I, Emeriaud G, Jouvet P. Qualitative subjective assessment of a high-resolution database in a paediatric intensive care unit-Elaborating the perpetual patient's ID card. J Eval Clin Pract 2020; 26:86-91. [PMID: 31206940 DOI: 10.1111/jep.13193] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 05/09/2019] [Accepted: 05/10/2019] [Indexed: 12/01/2022]
Abstract
OBJECTIVE The main purpose of our study was to subjectively assess the quality of a paediatric intensive care unit (PICU) database according to the Directory of Clinical Databases (DoCDat) criteria. DESIGN AND SETTING A survey was conducted between April 1 and June 15, 2018, among the Sainte Justine PICU research group. POPULATION Every member of this group whose research activity required the use of the database and/or who was involved in the development/validation of the database. INTERVENTIONS None. MEASUREMENTS AND MAIN RESULTS All 10 research team members (one Information Technology specialist, one junior medical student, and eight clinician researchers) who used the high-resolution database fulfilled the survey (100% response rate). The median quality level of the Sainte Justine PICU database across all the 10 criteria was 3 (2-4), rated on a 1 (worst) to 4 (best) numeric scale. When compared with previously assessed databases through the DoCDat criteria, we found that the Sainte Justine PICU database performance was similar. CONCLUSIONS The PICU high-resolution database appeared of good quality when subjectively assessed by the DoCDat criteria. Further validation procedures are mandatory. We suggest that data quality assessment and validation procedures should be reported when creating a new database.
Collapse
Affiliation(s)
- David Brossier
- Pediatric Intensive Care Unit, CHU Sainte Justine, University of Montreal, Montreal, Québec, Canada.,CHU Sainte Justine, CHU Sainte Justine Research Institute, Montreal, Québec, Canada.,CHU de Caen, Pediatric Intensive Care Unit, Caen, F-14000, France.,Université Caen Normandie, School of Medicine, Caen, F-14000, France.,Laboratoire de Psychologie Caen Normandie, Université Caen Normandie, Caen, F-14000, France
| | - Michael Sauthier
- Pediatric Intensive Care Unit, CHU Sainte Justine, University of Montreal, Montreal, Québec, Canada.,CHU Sainte Justine, CHU Sainte Justine Research Institute, Montreal, Québec, Canada
| | - Audrey Mathieu
- Pediatric Intensive Care Unit, CHU Sainte Justine, University of Montreal, Montreal, Québec, Canada.,CHU Sainte Justine, CHU Sainte Justine Research Institute, Montreal, Québec, Canada
| | | | - Guillaume Emeriaud
- Pediatric Intensive Care Unit, CHU Sainte Justine, University of Montreal, Montreal, Québec, Canada.,CHU Sainte Justine, CHU Sainte Justine Research Institute, Montreal, Québec, Canada
| | - Philippe Jouvet
- Pediatric Intensive Care Unit, CHU Sainte Justine, University of Montreal, Montreal, Québec, Canada.,CHU Sainte Justine, CHU Sainte Justine Research Institute, Montreal, Québec, Canada
| |
Collapse
|
36
|
Kwan BM, Dickinson LM, Glasgow RE, Sajatovic M, Gritz M, Holtrop JS, Nease DE, Ritchie N, Nederveld A, Gurfinkel D, Waxmonsky JA. The Invested in Diabetes Study Protocol: a cluster randomized pragmatic trial comparing standardized and patient-driven diabetes shared medical appointments. Trials 2020; 21:65. [PMID: 31924249 PMCID: PMC6954498 DOI: 10.1186/s13063-019-3938-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open
Abstract
Background Shared medical appointments (SMAs) have been shown to be an efficient and effective strategy for providing diabetes self-management education and self-management support. SMA features vary and it is not known which features are most effective for different patients and practice settings. The Invested in Diabetes study tests the comparative effectiveness of SMAs with and without multidisciplinary care teams and patient topic choice for improving patient-centered and clinical outcomes related to diabetes. Methods This study compares the effectiveness of two SMA approaches using the Targeted Training for Illness Management (TTIM) curriculum. Standardized SMAs are led by a health educator with a set order of TTIM topics. Patient-driven SMAs are delivered collaboratively by a multidisciplinary care team (health educator, medical provider, behavioral health provider, and a peer mentor); patients select the order and emphasis on TTIM topics. Invested in Diabetes is a cluster randomized pragmatic trial involving approximately 1440 adult patients with type 2 diabetes. Twenty primary care practices will be randomly assigned to either standardized or patient-driven SMAs. A mixed-methods evaluation will include quantitative (practice- and patient-level data) and qualitative (practice and patient interviews, observation) components. The primary patient-centered outcome is diabetes distress. Secondary outcomes include autonomy support, self-management behaviors, clinical outcomes, patient reach, and practice-level value and sustainability. Discussion Practice and patient stakeholder input guided protocol development for this pragmatic trial comparing SMA approaches. Implementation strategies from the enhanced Replicating Effective Programs framework will help ensure practices maintain fidelity to intervention protocols while tailoring workflows to their settings. Invested in Diabetes will contribute to the literature on chronic illness management and implementation science using the RE-AIM model. Trial registration ClinicalTrials.gov, NCT03590041. Registered on 5 July 2018.
Collapse
Affiliation(s)
- Bethany M Kwan
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA.
| | - L Miriam Dickinson
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Russell E Glasgow
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA.,VA Eastern Colorado QUERI and Geriatric Research Centers, 1055 Clermont St, Denver, CO, 80220, USA
| | - Martha Sajatovic
- Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH, 44106, USA
| | - Mark Gritz
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Jodi Summers Holtrop
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Don E Nease
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Natalie Ritchie
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA.,Denver Health and Hospital Authority, 777 Bannock St, Denver, CO, 80204, USA
| | - Andrea Nederveld
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Dennis Gurfinkel
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA
| | - Jeanette A Waxmonsky
- University of Colorado School of Medicine, 13199 E Montview Blvd Ste 210, Aurora, CO, 80045, USA.,VA Eastern Colorado QUERI and Geriatric Research Centers, 1055 Clermont St, Denver, CO, 80220, USA
| |
Collapse
|
37
|
Simon GE, Shortreed SM, Rossom RC, Penfold RB, Sperl-Hillen JAM, O'Connor P. Principles and procedures for data and safety monitoring in pragmatic clinical trials. Trials 2019; 20:690. [PMID: 31815644 PMCID: PMC6902512 DOI: 10.1186/s13063-019-3869-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 10/31/2019] [Indexed: 11/27/2022] Open
Abstract
Background All clinical trial investigators have ethical and regulatory obligations to monitor participant safety and trial integrity. Specific procedures for meeting these obligations, however, may differ substantially between pragmatic trials and traditional explanatory clinical trials. Methods/Results Appropriate monitoring of clinical trials typically includes assessing rate of recruitment or enrollment; monitoring safe and effective delivery of study treatments; assuring that study staff act to minimize risks; monitoring quality and timeliness of study data; and considering interim analyses for early detection of benefit, harm, or futility. Each of these responsibilities applies to pragmatic clinical trials. Just as design of pragmatic trials typically involves specific and necessary departures from methods of explanatory clinical trials, appropriate monitoring of pragmatic trials typically requires specific departures from monitoring procedures used in explanatory clinical trials. We discuss how specific aspects of pragmatic trial design and operations influence selection of monitoring procedures and illustrate those choices using examples from three ongoing pragmatic trials conducted by the Mental Health Research Network. Conclusions Pragmatic trial investigators should not routinely adopt monitoring procedures used in explanatory clinical trials. Instead, investigators should consider core principles of trial monitoring and design monitoring procedures appropriate for each pragmatic trial.
Collapse
Affiliation(s)
- Gregory E Simon
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA.
| | - Susan M Shortreed
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | | | - Robert B Penfold
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | | | | |
Collapse
|
38
|
Looten V, Kong Win Chang L, Neuraz A, Landau-Loriot MA, Vedie B, Paul JL, Mauge L, Rivet N, Bonifati A, Chatellier G, Burgun A, Rance B. What can millions of laboratory test results tell us about the temporal aspect of data quality? Study of data spanning 17 years in a clinical data warehouse. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 181:104825. [PMID: 30612785 DOI: 10.1016/j.cmpb.2018.12.030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 12/24/2018] [Accepted: 12/28/2018] [Indexed: 06/09/2023]
Abstract
OBJECTIVE To identify common temporal evolution profiles in biological data and propose a semi-automated method to these patterns in a clinical data warehouse (CDW). MATERIALS AND METHODS We leveraged the CDW of the European Hospital Georges Pompidou and tracked the evolution of 192 biological parameters over a period of 17 years (for 445,000 + patients, and 131 million laboratory test results). RESULTS We identified three common profiles of evolution: discretization, breakpoints, and trends. We developed computational and statistical methods to identify these profiles in the CDW. Overall, of the 192 observed biological parameters (87,814,136 values), 135 presented at least one evolution. We identified breakpoints in 30 distinct parameters, discretizations in 32, and trends in 79. DISCUSSION AND CONCLUSION our method allowed the identification of several temporal events in the data. Considering the distribution over time of these events, we identified probable causes for the observed profiles: instruments or software upgrades and changes in computation formulas. We evaluated the potential impact for data reuse. Finally, we formulated recommendations to enable safe use and sharing of biological data collection to limit the impact of data evolution in retrospective and federated studies (e.g. the annotation of laboratory parameters presenting breakpoints or trends).
Collapse
Affiliation(s)
- Vincent Looten
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France
| | | | - Antoine Neuraz
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Necker - Enfants Malades, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Marie-Anne Landau-Loriot
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Benoit Vedie
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Jean-Louis Paul
- Hôpital Européen Georges Pompidou, Department of Biochimistry, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Laëtitia Mauge
- Hôpital Européen Georges Pompidou, Department of Hematology, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Nadia Rivet
- Hôpital Européen Georges Pompidou, Department of Hematology, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Angela Bonifati
- LIRIS UMR CNRS 5205, Université Claude Bernard Lyon 1, Villeurbanne, France
| | - Gilles Chatellier
- Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France
| | - Anita Burgun
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France; Hôpital Necker - Enfants Malades, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, France
| | - Bastien Rance
- INSERM, Centre de Recherche des Cordeliers, UMRS 1138, Université Paris Descartes, Sorbonne Paris Cité, Paris, France; Hôpital Européen Georges Pompidou, Department of Medical Informatics, Assistance Publique - Hôpitaux de Paris (AP-HP), Université Paris Descartes, 20 rue Leblanc, 75015 Paris, France.
| |
Collapse
|
39
|
Daniel C, Serre P, Orlova N, Bréant S, Paris N, Griffon N. Initializing a hospital-wide data quality program. The AP-HP experience. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 181:104804. [PMID: 30497872 DOI: 10.1016/j.cmpb.2018.10.016] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 08/09/2018] [Accepted: 10/26/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVES Data Quality (DQ) programs are recognized as a critical aspect of new-generation research platforms using electronic health record (EHR) data for building Learning Healthcare Systems. The AP-HP Clinical Data Repository aggregates EHR data from 37 hospitals to enable large-scale research and secondary data analysis. This paper describes the DQ program currently in place at AP-HP and the lessons learned from two DQ campaigns initiated in 2017. MATERIALS AND METHODS As part of the AP-HP DQ program, two domains - patient identification (PI) and healthcare services (HS) - were selected for conducting DQ campaigns consisting of 5 phases: defining the scope, measuring, analyzing, improving and controlling DQ. Semi-automated DQ profiling was conducted in two data sets - the PI data set containing 8.8 M patients and the HS data set containing 13,099 consultation agendas and 2122 care units. Seventeen DQ measures were defined and DQ issues were classified using a unified DQ reporting framework. For each domain, actions plans were defined for improving and monitoring prioritized DQ issues. RESULTS Eleven identified DQ issues (8 for the PI data set and 3 for the HS data set) were categorized into completeness (n = 6), conformance (n = 3) and plausibility (n = 2) DQ issues. DQ issues were caused by errors from data originators, ETL issues or limitations of the EHR data entry tool. The action plans included sixteen actions (9 for the PI domain and 7 for the HS domain). Though only partial implementation, the DQ campaigns already resulted in significant improvement of DQ measures. CONCLUSION DQ assessments of hospital information systems are largely unpublished. The preliminary results of two DQ campaigns conducted at AP-HP illustrate the benefit of the engagement into a DQ program. The adoption of a unified DQ reporting framework enables the communication of DQ findings in a well-defined manner with a shared vocabulary. Dedicated tooling is needed to automate and extend the scope of the generic DQ program. Specific DQ checks will be additionally defined on a per-study basis to evaluate whether EHR data fits for specific uses.
Collapse
Affiliation(s)
- Christel Daniel
- DSI WIND, AP-HP, Paris, France; INSERM, U1142, LIMICS, F-75006, Paris, France; Sorbonne Universités, Paris, France.
| | | | | | | | | | - Nicolas Griffon
- DSI WIND, AP-HP, Paris, France; INSERM, U1142, LIMICS, F-75006, Paris, France; Sorbonne Universités, Paris, France
| |
Collapse
|
40
|
Diaz-Garelli JF, Strowd R, Ahmed T, Wells BJ, Merrill R, Laurini J, Pasche B, Topaloglu U. A tale of three subspecialties: Diagnosis recording patterns are internally consistent but Specialty-Dependent. JAMIA Open 2019; 2:369-377. [PMID: 31984369 PMCID: PMC6951969 DOI: 10.1093/jamiaopen/ooz020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 04/22/2019] [Accepted: 05/27/2019] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Structured diagnosis (DX) are crucial for secondary use of electronic health record (EHR) data. However, they are often suboptimally recorded. Our previous work showed initial evidence of variable DX recording patterns in oncology charts even after biopsy records are available. OBJECTIVE We verified this finding's internal and external validity. We hypothesized that this recording pattern would be preserved in a larger cohort of patients for the same disease. We also hypothesized that this effect would vary across subspecialties. METHODS We extracted DX data from EHRs of patients treated for brain, lung, and pancreatic neoplasms, identified through clinician-led chart reviews. We used statistical methods (i.e., binomial and mixed model regressions) to test our hypotheses. RESULTS We found variable recording patterns in brain neoplasm DX (i.e., larger number of distinct DX-OR = 2.2, P < 0.0001, higher descriptive specificity scores-OR = 1.4, P < 0.0001-and much higher entropy after the BX-OR = 3.8 P = 0.004 and OR = 8.0, P < 0.0001), confirming our initial findings. We also found strikingly different patterns for lung and pancreas DX. Although both seemed to have much lower DX sequence entropy after the BX-OR = 0.198, P = 0.015 and OR = 0.099, P = 0.015, respectively compared to OR = 3.8 P = 0.004). We also found statistically significant differences between the brain dataset and both the lung (P < 0.0001) and pancreas (0.009 CONCLUSION Our results suggest that disease-specific DX entry patterns exist and are established differently by clinical subspecialty. These differences should be accounted for during clinical data reuse and data quality assessments but also during EHR entry system design to maximize accurate, precise and consistent data entry likelihood.
Collapse
Affiliation(s)
| | - Roy Strowd
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Tamjeed Ahmed
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Brian J Wells
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Rebecca Merrill
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Javier Laurini
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Boris Pasche
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| | - Umit Topaloglu
- Wake Forest Baptist Medical Center, Winston Salem, North Carolina, USA
| |
Collapse
|
41
|
Pratt J, Jeffers D, King EC, Kappelman MD, Collins J, Margolis P, Baron H, Bass JA, Bassett MD, Beasley GL, Benkov KJ, Bornstein JA, Cabrera JM, Crandall W, Dancel LD, Garin-Laflam MP, Grunow JE, Hirsch BZ, Hoffenberg E, Israel E, Jester TW, Kiparissi F, Lakhole A, Lapsia SP, Minar P, Navarro FA, Neef H, Park KT, Pashankar DS, Patel AS, Pineiro VM, Samson CM, Sandberg KC, Steiner SJ, Strople JA, Sudel B, Sullivan JS, Suskind DL, Uppal V, Wali PD. Implementing a Novel Quality Improvement-Based Approach to Data Quality Monitoring and Enhancement in a Multipurpose Clinical Registry. EGEMS (WASHINGTON, DC) 2019; 7:51. [PMID: 31646151 PMCID: PMC6777196 DOI: 10.5334/egems.262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 09/10/2019] [Indexed: 12/02/2022]
Abstract
OBJECTIVE To implement a quality improvement based system to measure and improve data quality in an observational clinical registry to support a Learning Healthcare System. DATA SOURCE ImproveCareNow Network registry, which as of September 2019 contained data from 314,250 visits of 43,305 pediatric Inflammatory Bowel Disease (IBD) patients at 109 participating care centers. STUDY DESIGN The impact of data quality improvement support to care centers was evaluated using statistical process control methodology. Data quality measures were defined, performance feedback of those measures using statistical process control charts was implemented, and reports that identified data items not following data quality checks were developed to enable centers to monitor and improve the quality of their data. PRINCIPAL FINDINGS There was a pattern of improvement across measures of data quality. The proportion of visits with complete critical data increased from 72 percent to 82 percent. The percent of registered patients improved from 59 percent to 83 percent. Of three additional measures of data consistency and timeliness, one improved performance from 42 percent to 63 percent. Performance declined on one measure due to changes in network documentation practices and maturation. There was variation among care centers in data quality. CONCLUSIONS A quality improvement based approach to data quality monitoring and improvement is feasible and effective.
Collapse
Affiliation(s)
| | | | - Eileen C. King
- Cincinnati Children’s Hospital Medical Center, University of Cincinnati, US
| | | | | | - Peter Margolis
- Cincinnati Children’s Hospital Medical Center, University of Cincinnati, US
| | - Howard Baron
- Pediatric Gastroenterology & Nutrition Associates, US
| | | | | | - Genie L. Beasley
- UF Health Pediatric Gastroenterology, Hepatology and Nutrition, US
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Phillip Minar
- Cincinnati Children’s Hospital Medical Center, University of Cincinnati, US
| | | | - Haley Neef
- University of Michigan – C.S. Mott Children’s Hospital, US
| | | | | | | | | | | | | | | | | | | | | | | | - Vikas Uppal
- Nemours Children’s Health System – Wilmington, US
| | | |
Collapse
|
42
|
Ta CN, Weng C. Detecting Systemic Data Quality Issues in Electronic Health Records. Stud Health Technol Inform 2019; 264:383-387. [PMID: 31437950 PMCID: PMC6857180 DOI: 10.3233/shti190248] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Secondary analysis of electronic health records for clinical research faces significant challenges due to known data quality issues in health data observationally collected for clinical care and the data biases caused by standard healthcare processes. In this manuscript, we contribute methodology for data quality assessment by plotting domain-level (conditions (diagnoses), drugs, and procedures) aggregate statistics and concept-level temporal frequencies (i.e., annual prevalence rates of clinical concepts). We detect common temporal patterns in concept frequencies by normalizing and clustering annual concept frequencies using K-means clustering. We apply these methods to the Columbia University Irving Medical Center Observational Medical Outcomes Partnership database. The resulting domain-aggregate and cluster plots show a variety of patterns. We review the patterns found in the condition domain and investigate the processes that shape them. We find that these patterns suggest data quality issues influenced by system-wide factors that affect individual concept frequencies.
Collapse
Affiliation(s)
- Casey N Ta
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
43
|
Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network. EGEMS 2019; 7:36. [PMID: 31531382 PMCID: PMC6676917 DOI: 10.5334/egems.294] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Background: Clinical data research networks (CDRNs) aggregate electronic health record data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs. Implementation: Using a specific CDRN as use case, the workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking. Results: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment. Conclusions: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network, and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive and sufficient resources should be dedicated for investigating problems and optimizing data for research.
Collapse
|
44
|
Diaz-Garelli JF, Bernstam EV, Lee M, Hwang KO, Rahbar MH, Johnson TR. DataGauge: A Practical Process for Systematically Designing and Implementing Quality Assessments of Repurposed Clinical Data. EGEMS (WASHINGTON, DC) 2019; 7:32. [PMID: 31367649 PMCID: PMC6659577 DOI: 10.5334/egems.286] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 02/19/2019] [Indexed: 11/29/2022]
Abstract
The well-known hazards of repurposing data make Data Quality (DQ) assessment a vital step towards ensuring valid results regardless of analytical methods. However, there is no systematic process to implement DQ assessments for secondary uses of clinical data. This paper presents DataGauge, a systematic process for designing and implementing DQ assessments to evaluate repurposed data for a specific secondary use. DataGauge is composed of five steps: (1) Define information needs, (2) Develop a formal Data Needs Model (DNM), (3) Use the DNM and DQ theory to develop goal-specific DQ assessment requirements, (4) Extract DNM-specified data, and (5) Evaluate according to DQ requirements. DataGauge's main contribution is integrating general DQ theory and DQ assessment methods into a systematic process. This process supports the integration and practical implementation of existing Electronic Health Record-specific DQ assessment guidelines. DataGauge also provides an initial theory-based guidance framework that ties the DNM to DQ testing methods for each DQ dimension to aid the design of DQ assessments. This framework can be augmented with existing DQ guidelines to enable systematic assessment. DataGauge sets the stage for future systematic DQ assessment research by defining an assessment process, capable of adapting to a broad range of clinical datasets and secondary uses. Defining DataGauge sets the stage for new research directions such as DQ theory integration, DQ requirements portability research, DQ assessment tool development and DQ assessment tool usability.
Collapse
Affiliation(s)
| | - Elmer V. Bernstam
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, US
| | - MinJae Lee
- McGovern Medical School, The University of Texas Health Science Center at Houston, US
| | - Kevin O. Hwang
- McGovern Medical School, The University of Texas Health Science Center at Houston, US
| | - Mohammad H. Rahbar
- McGovern Medical School, The University of Texas Health Science Center at Houston, US
| | - Todd R. Johnson
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, US
| |
Collapse
|
45
|
A clustering approach for detecting implausible observation values in electronic health records data. BMC Med Inform Decis Mak 2019; 19:142. [PMID: 31337390 PMCID: PMC6652024 DOI: 10.1186/s12911-019-0852-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 06/26/2019] [Indexed: 12/03/2022] Open
Abstract
Background Identifying implausible clinical observations (e.g., laboratory test and vital sign values) in Electronic Health Record (EHR) data using rule-based procedures is challenging. Anomaly/outlier detection methods can be applied as an alternative algorithmic approach to flagging such implausible values in EHRs. Methods The primary objectives of this research were to develop and test an unsupervised clustering-based anomaly/outlier detection approach for detecting implausible observations in EHR data as an alternative algorithmic solution to the existing procedures. Our approach is built upon two underlying hypotheses that, (i) when there are large number of observations, implausible records should be sparse, and therefore (ii) if these data are clustered properly, clusters with sparse populations should represent implausible observations. To test these hypotheses, we applied an unsupervised clustering algorithm to EHR observation data on 50 laboratory tests from Partners HealthCare. We tested different specifications of the clustering approach and computed confusion matrix indices against a set of silver-standard plausibility thresholds. We compared the results from the proposed approach with conventional anomaly detection (CAD) approaches, including standard deviation and Mahalanobis distance. Results We found that the clustering approach produced results with exceptional specificity and high sensitivity. Compared with the conventional anomaly detection approaches, our proposed clustering approach resulted in significantly smaller number of false positive cases. Conclusion Our contributions include (i) a clustering approach for identifying implausible EHR observations, (ii) evidence that implausible observations are sparse in EHR laboratory test results, (iii) a parallel implementation of the clustering approach on i2b2 star schema, and (3) a set of silver-standard plausibility thresholds for 50 laboratory tests that can be used in other studies for validation. The proposed algorithmic solution can augment human decisions to improve data quality. Therefore, a workflow is needed to complement the algorithm’s job and initiate necessary actions that need to be taken in order to improve the quality of data. Electronic supplementary material The online version of this article (10.1186/s12911-019-0852-6) contains supplementary material, which is available to authorized users.
Collapse
|
46
|
Diaz-Garelli JF, Strowd R, Wells BJ, Ahmed T, Merrill R, Topaloglu U. Lost in Translation: Diagnosis Records Show More Inaccuracies After Biopsy in Oncology Care EHRs. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:325-334. [PMID: 31258985 PMCID: PMC6568058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of diagnosis (DX) data is crucial to secondary use of electronic health record (EHR) data, yet accessible structured DX data often lack in accuracy. DX descriptions associated with structured DX codes vary even after recording biopsy results; this may indicate poor data quality. We hypothesized that biopsy reports in cancer care charts do not improve intrinsic DX data quality. We analyzed DX data for a manually well-annotated cohort of patients with brain neoplasms. We built statistical models to predict the number of fully-accurate (i.e., correct neoplasm type and anatomical location) and inaccurate DX (i.e. type or location contradicts cohort data) descriptions. We found some evidence of statistically larger numbers of fully-accurate (RR=3.07, p=0.030) but stronger evidence of much larger numbers of inaccurate DX (RR=12.3, p=0.001 and RR=19.6, p<0.0001) after biopsy result recording. Still, 65.9% of all DX records were neither fully-accurate nor fully-inaccurate. These results suggest EHRs must be modified to support more reliable DX data recording and secondary use of EHR data.
Collapse
Affiliation(s)
| | - Roy Strowd
- Wake Forest Baptist Medical Center, Winston Salem, NC
| | - Brian J Wells
- Wake Forest Baptist Medical Center, Winston Salem, NC
| | - Tamjeed Ahmed
- Wake Forest Baptist Medical Center, Winston Salem, NC
| | | | | |
Collapse
|
47
|
Abstract
Introduction: In aggregate, existing data quality (DQ) checks are currently represented in heterogeneous formats, making it difficult to compare, categorize, and index checks. This study contributes a data element-function conceptual model to facilitate the categorization and indexing of DQ checks and explores the feasibility of leveraging natural language processing (NLP) for scalable acquisition of knowledge of common data elements and functions from DQ checks narratives. Methods: The model defines a “data element”, the primary focus of the check, and a “function”, the qualitative or quantitative measure over a data element. We applied NLP techniques to extract both from 172 checks for Observational Health Data Sciences and Informatics (OHDSI) and 3,434 checks for Kaiser Permanente’s Center for Effectiveness and Safety Research (CESR). Results: The model was able to classify all checks. A total of 751 unique data elements and 24 unique functions were extracted. The top five frequent data element-function pairings for OHDSI were Person-Count (55 checks), Insurance-Distribution (17), Medication-Count (16), Condition-Count (14), and Observations-Count (13); for CESR, they were Medication-Variable Type (175), Medication-Missing (172), Medication-Existence (152), Medication-Count (127), and Socioeconomic Factors-Variable Type (114). Conclusions: This study shows the efficacy of the data element-function conceptual model for classifying DQ checks, demonstrates early promise of NLP-assisted knowledge acquisition, and reveals the great heterogeneity in the focus in DQ checks, confirming variation in intrinsic checks and use-case specific “fitness-for-use” checks.
Collapse
|
48
|
Doria-Rose VP, Greenlee RT, Buist DSM, Miglioretti DL, Corley DA, Brown JS, Clancy HA, Tuzzio L, Moy LM, Hornbrook MC, Brown ML, Ritzwoller DP, Kushi LH, Greene SM. Collaborating on Data, Science, and Infrastructure: The 20-Year Journey of the Cancer Research Network. EGEMS (WASHINGTON, DC) 2019; 7:7. [PMID: 30972356 PMCID: PMC6450242 DOI: 10.5334/egems.273] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 10/16/2018] [Indexed: 12/13/2022]
Abstract
The Cancer Research Network (CRN) is a consortium of 12 research groups, each affiliated with a nonprofit integrated health care delivery system, that was first funded in 1998. The overall goal of the CRN is to support and facilitate collaborative cancer research within its component delivery systems. This paper describes the CRN's 20-year experience and evolution. The network combined its members' scientific capabilities and data resources to create an infrastructure that has ultimately supported over 275 projects. Insights about the strengths and limitations of electronic health data for research, approaches to optimizing multidisciplinary collaboration, and the role of a health services research infrastructure to complement traditional clinical trials and large observational datasets are described, along with recommendations for other research consortia.
Collapse
Affiliation(s)
- V. Paul Doria-Rose
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, US
| | | | - Diana S. M. Buist
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
| | - Diana L. Miglioretti
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
- University of California Davis School of Medicine, Davis, CA, US
| | - Douglas A. Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | - Jeffrey S. Brown
- Department of Population Medicine, Harvard Medical School, Boston, MA, US
- Harvard Pilgrim Health Care Institute, Boston, MA, US
| | - Heather A. Clancy
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | - Leah Tuzzio
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, US
| | - Lisa M. Moy
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | - Mark C. Hornbrook
- Center for Health Research, Kaiser Permanente Northwest, Portland, OR, US
- Retired
| | - Martin L. Brown
- Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD, US
- Retired
| | | | - Lawrence H. Kushi
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, US
| | | |
Collapse
|
49
|
Abstract
Objective: Multi-organizational research requires a multi-organizational data quality assessment (DQA) process that combines and compares data across participating organizations. We demonstrate how such a DQA approach complements traditional checks of internal reliability and validity by allowing for assessments of data consistency and the evaluation of data patterns in the absence of an external “gold standard.” Methods: We describe the DQA process employed by the Data Coordinating Center (DCC) for Kaiser Permanente’s (KP) Center for Effectiveness and Safety Research (CESR). We emphasize the CESR DQA reporting system that compares data summaries from the eight KP organizations in a consistent, standardized manner. Results: We provide examples of multi-organization comparisons from DQA to confirm expectations about different aspects of data quality. These include: 1) comparison of direct data extraction from the electronic health records (EHR) and 2) comparison of non-EHR data from disparate sources. Discussion: The CESR DCC has developed codes and procedures for efficiently implementing and reporting DQA. The CESR DCC approach is to 1) distribute DQA tools to empower data managers at each organization to assess their data quality at any time, 2) summarize and disseminate findings to address data shortfalls or document idiosyncrasies, and 3) engage data managers and end-users in an exchange of knowledge about the quality and its fitness for use. Conclusion: The KP CESR DQA model is applicable to networks hoping to improve data quality. The multi-organizational reporting system promotes transparency of DQA, adds to network knowledge about data quality, and informs research.
Collapse
|
50
|
Sketris IS, Carter N, Traynor RL, Watts D, Kelly K. Building a framework for the evaluation of knowledge translation for the Canadian Network for Observational Drug Effect Studies. Pharmacoepidemiol Drug Saf 2019; 29 Suppl 1:8-25. [PMID: 30788900 PMCID: PMC6972643 DOI: 10.1002/pds.4738] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 11/28/2018] [Accepted: 12/19/2018] [Indexed: 12/27/2022]
Abstract
Purpose The Canadian Network for Observational Drug Effect Studies (CNODES), a network of pharmacoepidemiologists and other researchers from seven provincial sites, provides evidence on the benefits and risks of drugs used by Canadians. The Knowledge Translation Team, one of CNODES' four main teams, evaluates the impact of its efforts using an iterative and emergent approach. This article shares key lessons from early evaluation phases, including identifying stakeholders and their evaluation needs, choosing evaluation theories and approaches, and developing evaluation questions, designs, and methods appropriate for the CNODES context. Methods Stakeholder analysis was conducted using documentary analysis to determine key contextual factors and research evidence needs of decision maker partners and other stakeholders. Selected theories and frameworks from the evaluation and knowledge translation literature informed decisions about evaluation design and implementation. A developmental approach to evaluation was deemed appropriate due to the innovative, complex, and ever‐changing context. Results A theory of change, logic model, and potential evaluation questions were developed, informed by the stakeholder analysis. Early indicators of program impact (citation metrics, alternative metrics) have been documented; efforts to collect data on additional indicators are ongoing. Conclusion A flexible, iterative, and emergent evaluation approach allows the Knowledge Translation Team to apply lessons learned from completed projects to ongoing research projects, adapt its approaches based on stakeholder needs, document successes, and be accountable to funders/stakeholders. This evaluation approach may be useful for other international pharmacoepidemiology research networks planning and implementing evaluations of similarly complex, multistakeholder initiatives that are subject to constant change.
Collapse
Affiliation(s)
- Ingrid S Sketris
- Faculty of Health Professions, College of Pharmacy, Dalhousie University, Halifax, Canada
| | - Nancy Carter
- REAL Evaluation Services, Nova Scotia Health Research Foundation, Halifax, Canada
| | - Robyn L Traynor
- Department of Community Health & Epidemiology, Dalhousie University, Halifax, Canada
| | - Dorian Watts
- REAL Evaluation Services, Nova Scotia Health Research Foundation, Halifax, Canada
| | - Kim Kelly
- Nova Scotia Health Authority, Halifax, Canada
| | | |
Collapse
|