1
|
Chu J, Zhang Y, Huang F, Si L, Huang S, Huang Z. Disentangled representation for sequential treatment effect estimation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107175. [PMID: 36242866 DOI: 10.1016/j.cmpb.2022.107175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 10/04/2022] [Accepted: 10/04/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVE Treatment effect estimation, as a fundamental problem in causal inference, focuses on estimating the outcome difference between different treatments. However, in clinical observational data, some patient covariates (such as gender, age) not only affect the outcomes but also affect the treatment assignment. Such covariates, named as confounders, produce distribution discrepancies between different treatment groups, thereby introducing the selection bias for the estimation of treatment effects. The situation is even more complicated in longitudinal data, because the confounders are time-varying that are subject to patient history and meanwhile affect the future outcomes and treatment assignments. Existing methods mainly work on cross-sectional data obtained at a specific time point, but cannot process the time-varying confounders hidden in the longitudinal data. METHODS In this study, we address this problem for the first time by disentangled representation learning, which considers the observational data as consisting of three components, including outcome-specific factors, treatment-specific factors, and time-varying confounders. Based on this, the proposed approach adopts a recurrent neural network-based framework to process sequential information and learn the disentangled representations of the components from longitudinal observational sequences, captures the posterior distributions of latent factors by multi-task learning strategy. Moreover, mutual information-based regularization is adopted to eliminate the time-varying confounders. In this way, the association between patient history and treatment assignment is removed and the estimation can be effectively conducted. RESULTS We evaluate our model in a realistic set-up using a model of tumor growth. The proposed model achieves the best performance over benchmark models for both one-step ahead prediction (0.70% vs 0.74% for the-state-of-the-art model, when γ = 3. Measured by normalized root mean square error, the lower the better) and five-step ahead prediction (1.47% vs 1.83%) in most cases. By increasing the effect of confounders, our proposed model always shows superiority against the state-of-the-art model. In addition, we adopted T-SNE to visualize the disentangled representations and present the effectiveness of disentanglement explicitly and intuitively. CONCLUSIONS The experimental results indicate the powerful capacity of our model in learning disentangled representations from longitudinal observational data and dealing with the time-varying confounders, and demonstrate the surpassing performance achieved by our proposed model on dynamic treatment effect estimation.
Collapse
Affiliation(s)
- Jiebin Chu
- Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Yaoyun Zhang
- Alibaba Group, Hangzhou, Zhejiang Province, China
| | - Fei Huang
- Alibaba Group, Hangzhou, Zhejiang Province, China
| | - Luo Si
- Alibaba Group, Hangzhou, Zhejiang Province, China
| | | | | |
Collapse
|
2
|
Li R, Wang H, Tu W. Robust estimation of heterogeneous treatment effects using electronic health record data. Stat Med 2021; 40:2713-2752. [PMID: 33738800 DOI: 10.1002/sim.8926] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 01/23/2021] [Accepted: 02/06/2021] [Indexed: 11/08/2022]
Abstract
Estimation of heterogeneous treatment effects is an essential component of precision medicine. Model and algorithm-based methods have been developed within the causal inference framework to achieve valid estimation and inference. Existing methods such as the A-learner, R-learner, modified covariates method (with and without efficiency augmentation), inverse propensity score weighting, and augmented inverse propensity score weighting have been proposed mostly under the square error loss function. The performance of these methods in the presence of data irregularity and high dimensionality, such as that encountered in electronic health record (EHR) data analysis, has been less studied. In this research, we describe a general formulation that unifies many of the existing learners through a common score function. The new formulation allows the incorporation of least absolute deviation (LAD) regression and dimension reduction techniques to counter the challenges in EHR data analysis. We show that under a set of mild regularity conditions, the resultant estimator has an asymptotic normal distribution. Within this framework, we proposed two specific estimators for EHR analysis based on weighted LAD with penalties for sparsity and smoothness simultaneously. Our simulation studies show that the proposed methods are more robust to outliers under various circumstances. We use these methods to assess the blood pressure-lowering effects of two commonly used antihypertensive therapies.
Collapse
Affiliation(s)
- Ruohong Li
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine and Fairbanks School of Public Health, Indianapolis, Indiana, USA
| | - Honglang Wang
- Department of Mathematical Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Wanzhu Tu
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine and Fairbanks School of Public Health, Indianapolis, Indiana, USA
| |
Collapse
|
3
|
Lindemer E, Jouni M, Nikolaev N, Reidy P, Mattie H, Rogers JK, Giangreco L, Sherman M, Bartels M, Panch T. A pragmatic methodology for the evaluation of digital care management in the context of multimorbidity. J Med Econ 2021; 24:373-385. [PMID: 33588669 DOI: 10.1080/13696998.2021.1890416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Multimorbidity is a defining challenge for health systems and requires coordination of care delivery and care management. Care management is a clinical service designed to remotely engage patients between visits and after discharge in order to support self-management of chronic and emergent conditions, encourage increased use of scheduled care and address the use of unscheduled care. Care management can be provided using digital technology - digital care management. A robust methodology to assess digital care management, or any traditional or digital primary care intervention aimed at longitudinal management of multimorbidity, does not exist outside of randomized controlled trials (RCTs). RCTs are not always generalizable and are also not feasible for most healthcare organizations. We describe here a novel and pragmatic methodology for the evaluation of digital care management that is generalizable to any longitudinal intervention for multimorbidity irrespective of its mode of delivery. This methodology implements propensity matching with bootstrapping to address some of the major challenges in evaluation including identification of robust outcome measures, selection of an appropriate control population, small sample sizes with class imbalances, and limitations of RCTs. We apply this methodology to the evaluation of digital care management at a U.S. payor and demonstrate a 9% reduction in ER utilization, a 17% reduction in inpatient admissions, and a 29% increase in the utilization of preventive medicine services. From these utilization outcomes, we drive forward an estimated cost saving that is specific to a single payor's payment structure for the study time period of $641 per-member-per-month at 3 months. We compare these results to those derived from existing observational approaches, 1:1 and 1:n propensity matching, and discuss the circumstances in which our methodology has advantages over existing techniques. Whilst our methodology focuses on cost and utilization and is applied in the U.S. context, it is applicable to other outcomes such as Patient Reported Outcome Measures (PROMS) or clinical biometrics and can be used in other health system contexts where the challenge of multimorbidity is prevalent.
Collapse
Affiliation(s)
| | | | | | | | - Heather Mattie
- Wellframe Inc, Boston, MA, USA
- Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | | | | | | | | |
Collapse
|
4
|
Garrido MM, Lum J, Pizer SD. Vector-based kernel weighting: A simple estimator for improving precision and bias of average treatment effects in multiple treatment settings. Stat Med 2020; 40:1204-1223. [PMID: 33327037 DOI: 10.1002/sim.8836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 10/27/2020] [Accepted: 11/14/2020] [Indexed: 11/08/2022]
Abstract
Treatment effect estimation must account for observed confounding, in which factors affect treatment assignment and outcomes simultaneously. Ignoring observed confounding risks concluding that a helpful treatment is not beneficial or that a treatment is safe when actually harmful. Propensity score matching or weighting adjusts for observed confounding, but the best way to use propensity scores for multiple treatments is unknown. It is unclear when choice of a different weighting or matching strategy leads to divergent inferences. We used Monte Carlo simulations (1000 replications) to examine sensitivity of multivalued treatment inferences to propensity score weighting or matching strategies. We consider five variants of propensity score adjustment: inverse probability of treatment weights, generalized propensity score matching, kernel weights (KW), vector matching, and a new hybrid that is easily implemented-vector-based kernel weighting (VBKW). VBKW matches observations with similar propensity score vectors, assigning greater KW to observations with similar probabilities within a given bandwidth. We varied degree of propensity score model misspecification, sample size, treatment effect heterogeneity, initial covariate imbalance, and sample distribution across treatment groups. We evaluated sensitivity of results to propensity score estimation technique (multinomial logit or multinomial probit). Across simulations, VBKW performed equally or better than the other methods in terms of bias, efficiency, and covariate balance measured via prognostic scores. Our simulations suggest that VBKW is amenable to full automation and is less sensitive to PS model misspecification than other methods used to account for observed confounding in multivalued treatment analyses.
Collapse
Affiliation(s)
- Melissa M Garrido
- Partnered Evidence-based Policy Resource Center, Boston VA Healthcare System, Boston, Massachusetts, USA.,Department of Health Law, Policy and Management, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Jessica Lum
- Partnered Evidence-based Policy Resource Center, Boston VA Healthcare System, Boston, Massachusetts, USA
| | - Steven D Pizer
- Partnered Evidence-based Policy Resource Center, Boston VA Healthcare System, Boston, Massachusetts, USA.,Department of Health Law, Policy and Management, Boston University School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Puli A, Ranganath R. General Control Functions for Causal Effect Estimation from Instrumental Variables. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2020; 33:8440-8451. [PMID: 33953525 PMCID: PMC8096518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Causal effect estimation relies on separating the variation in the outcome into parts due to the treatment and due to the confounders. To achieve this separation, practitioners often use external sources of randomness that only influence the treatment called instrumental variables (IVs). We study variables constructed from treatment and IV that help estimate effects, called control functions. We characterize general control functions for effect estimation in a meta-identification result. Then, we show that structural assumptions on the treatment process allow the construction of general control functions, thereby guaranteeing identification. To construct general control functions and estimate effects, we develop the general control function method (GCFN). GCFN's first stage called variational decoupling (VDE) constructs general control functions by recovering the residual variation in the treatment given the IV. Using VDE's control function, GCFN's second stage estimates effects via regression. Further, we develop semi-supervised GCFN to construct general control functions using subsets of data that have both IV and confounders observed as supervision; this needs no structural treatment process assumptions. We evaluate GCFN on low and high dimensional simulated data and on recovering the causal effect of slave export on modern community trust [30].
Collapse
|
6
|
Karmakar B, Small DS. Assessment of the extent of corroboration of an elaborate theory of a causal hypothesis using partial conjunctions of evidence factors. Ann Stat 2020. [DOI: 10.1214/19-aos1929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Veridical Causal Inference using Propensity Score Methods for Comparative Effectiveness Research with Medical Claims. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2020; 21:206-228. [PMID: 34040495 DOI: 10.1007/s10742-020-00222-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Medical insurance claims are becoming increasingly common data sources to answer a variety of questions in biomedical research. Although comprehensive in terms of longitudinal characterization of disease development and progression for a potentially large number of patients, population-based inference using these datasets require thoughtful modifications to sample selection and analytic strategies relative to other types of studies. Along with complex selection bias and missing data issues, claims-based studies are purely observational, which limits effective understanding and characterization of the treatment differences between groups being compared. All these issues contribute to a crisis in reproducibility and replication of comparative findings using medical claims. This paper offers practical guidance to the analytical process, demonstrates methods for estimating causal treatment effects with propensity score methods for several types of outcomes common to such studies, such as binary, count, time to event and longitudinally-varying measures, and also aims to increase transparency and reproducibility of reporting of results from these investigations. We provide an online version of the paper with readily implementable code for the entire analysis pipeline to serve as a guided tutorial for practitioners. The online version can be accessed at https://rydaro.github.io/. The analytic pipeline is illustrated using a sub-cohort of patients with advanced prostate cancer from the large Clinformatics TM Data Mart Database (OptumInsight, Eden Prairie, Minnesota), consisting of 73 million distinct private payer insurees from 2001-2016.
Collapse
|
8
|
Brown DW, DeSantis SM, Greene TJ, Maroufy V, Yaseen A, Wu H, Williams G, Swartz MD. A novel approach for propensity score matching and stratification for multiple treatments: Application to an electronic health record-derived study. Stat Med 2020; 39:2308-2323. [PMID: 32297677 PMCID: PMC7334100 DOI: 10.1002/sim.8540] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Revised: 02/16/2020] [Accepted: 03/07/2020] [Indexed: 11/06/2022]
Abstract
Currently, methods for conducting multiple treatment propensity scoring in the presence of high-dimensional covariate spaces that result from "big data" are lacking-the most prominent method relies on inverse probability treatment weighting (IPTW). However, IPTW only utilizes one element of the generalized propensity score (GPS) vector, which can lead to a loss of information and inadequate covariate balance in the presence of multiple treatments. This limitation motivates the development of a novel propensity score method that uses the entire GPS vector to establish a scalar balancing score that, when adjusted for, achieves covariate balance in the presence of potentially high-dimensional covariates. Specifically, the generalized propensity score cumulative distribution function (GPS-CDF) method is introduced. A one-parameter power function fits the CDF of the GPS vector and a resulting scalar balancing score is used for matching and/or stratification. Simulation results show superior performance of the new method compared to IPTW both in achieving covariate balance and estimating average treatment effects in the presence of multiple treatments. The proposed approach is applied to a study derived from electronic medical records to determine the causal relationship between three different vasopressors and mortality in patients with non-traumatic aneurysmal subarachnoid hemorrhage. Results suggest that the GPS-CDF method performs well when applied to large observational studies with multiple treatments that have large covariate spaces.
Collapse
Affiliation(s)
- Derek W Brown
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
- Cancer Prevention Fellowship Program, Division of Cancer Prevention, National Cancer Institute, Rockville, Maryland, USA
| | - Stacia M DeSantis
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Thomas J Greene
- GlaxoSmithKline, Division of Biostatistics, Philadelphia, Pennsylvania, USA
| | - Vahed Maroufy
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Ashraf Yaseen
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hulin Wu
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
- University of Texas School of Biomedical Informatics, Houston, Texas, USA
| | - George Williams
- Department of Anesthesiology, McGovern Medical School at UTHealth, Houston, Texas, USA
| | - Michael D Swartz
- Department of Biostatistics and Data Science, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
9
|
Hyer JM, Ejaz A, Tsilimigras DI, Paredes AZ, Mehta R, Pawlik TM. Novel Machine Learning Approach to Identify Preoperative Risk Factors Associated With Super-Utilization of Medicare Expenditure Following Surgery. JAMA Surg 2020; 154:1014-1021. [PMID: 31411664 DOI: 10.1001/jamasurg.2019.2979] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Importance Typically defined as the top 5% of health care users, super-utilizers are responsible for an estimated 40% to 55% of all health care costs. Little is known about which factors may be associated with increased risk of long-term postoperative super-utilization. Objective To identify clusters of patients with distinct constellations of clinical and comorbid patterns who may be associated with an elevated risk of super-utilization in the year following elective surgery. Design, Setting, and Participants A retrospective longitudinal cohort study of 1 049 160 patients who underwent abdominal aortic aneurysm repair, coronary artery bypass graft, colectomy, total hip arthroplasty, total knee arthroplasty, or lung resection were identified from the 100% Medicare inpatient and outpatient Standard Analytic Files at all inpatient facilities performing 1 or more of the evaluated surgical procedures from 2013 to 2015. Data from 2012 to 2016 were used to evaluate expenditures in the year preceding and following surgery. Using a machine learning approach known as Logic Forest, comorbidities and interactions of comorbidities that put patients at an increased chance of becoming a super-utilizer were identified. All comorbidities, as defined by the Charlson (range, 0-24) and Elixhauser (range, 0-29) comorbidity indices, were used in the analysis. Higher scores indicated higher comorbidity burden. Data analysis was completed on November 16, 2018. Main Outcome and Measures Super-utilization of health care in the year following surgery. Results In total, 1 049 160 patients met inclusion criteria and were included in the analytic cohort. Their median (interquartile range) age was 73 (69-78) years, and approximately 40% were male. Super-utilizers comprised 4.8% of the overall cohort (n = 79 746) yet incurred 31.7% of the expenditures. Although the difference in overall expenditures per person between super-utilizers ($4049) and low users ($2148) was relatively modest prior to surgery, the difference in expenditures between super-utilizers ($79 698) vs low users ($2977) was marked in the year following surgery. Risk factors associated with super-utilization of health care included hemiplegia/paraplegia (odds ratio, 5.2; 95% CI, 4.4-6.2), weight loss (odds ratio, 3.5; 95% CI, 2.9-4.2), and congestive heart failure with chronic kidney disease stages I to IV (odds ratio, 3.4; 95% CI, 3.0-3.9). Conclusions and Relevance Super-utilizers comprised only a small fraction of the surgical population yet were responsible for a disproportionate amount of Medicare expenditure. Certain subpopulations were associated with super-utilization of health care following surgical intervention despite having lower overall use in the preoperative period.
Collapse
Affiliation(s)
- J Madison Hyer
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus
| | - Aslam Ejaz
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus
| | - Diamantis I Tsilimigras
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus
| | - Anghela Z Paredes
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus
| | - Rittal Mehta
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus
| | - Timothy M Pawlik
- Division of Surgical Oncology, Department of Surgery, Solove Research Institute, The Ohio State University, Wexner Medical Center, James Cancer Hospital, Columbus.,Deputy Editor
| |
Collapse
|
10
|
Callahan A, Shah NH, Chen JH. Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data. Ann Intern Med 2020; 172:S79-S84. [PMID: 32479175 PMCID: PMC7413106 DOI: 10.7326/m19-0873] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Electronic health records (EHRs) are an increasingly important source of real-world health care data for observational research. Analyses of data collected for purposes other than research require careful consideration of data quality as well as the general research and reporting principles relevant to observational studies. The core principles for observational research in general also apply to observational research using EHR data, and these are well addressed in prior literature and guidelines. This article provides additional recommendations for EHR-based research. Considerations unique to EHR-based studies include assessment of the accuracy of computer-executable cohort definitions that can incorporate unstructured data from clinical notes and management of data challenges, such as irregular sampling, missingness, and variation across time and place. Principled application of existing research and reporting guidelines alongside these additional considerations will improve the quality of EHR-based observational studies.
Collapse
Affiliation(s)
- Alison Callahan
- Center for Biomedical Informatics Research, School of Medicine, Stanford University (A.C., N.H.S.)
| | - Nigam H Shah
- Center for Biomedical Informatics Research, School of Medicine, Stanford University (A.C., N.H.S.)
| | - Jonathan H Chen
- Division of Hospital Medicine, School of Medicine, Stanford University (J.H.C.)
| |
Collapse
|
11
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
12
|
Thompson CA, Jin A, Luft HS, Lichtensztajn DY, Allen L, Liang SY, Schumacher BT, Gomez SL. Population-Based Registry Linkages to Improve Validity of Electronic Health Record-Based Cancer Research. Cancer Epidemiol Biomarkers Prev 2020; 29:796-806. [PMID: 32066621 DOI: 10.1158/1055-9965.epi-19-0882] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 11/01/2019] [Accepted: 02/12/2020] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND There is tremendous potential to leverage the value gained from integrating electronic health records (EHR) and population-based cancer registry data for research. Registries provide diagnosis details, tumor characteristics, and treatment summaries, while EHRs contain rich clinical detail. A carefully conducted cancer registry linkage may also be used to improve the internal and external validity of inferences made from EHR-based studies. METHODS We linked the EHRs of a large, multispecialty, mixed-payer health care system with the statewide cancer registry and assessed the validity of our linked population. For internal validity, we identify patients that might be "missed" in a linkage, threatening the internal validity of an EHR study population. For generalizability, we compared linked cases with all other cancer patients in the 22-county EHR catchment region. RESULTS From an EHR population of 4.5 million, we identified 306,554 patients with cancer, 26% of the catchment region patients with cancer; 22.7% of linked patients were diagnosed with cancer after they migrated away from our health care system highlighting an advantage of system-wide linkage. We observed demographic differences between EHR patients and non-EHR patients in the surrounding region and demonstrated use of selection probabilities with model-based standardization to improve generalizability. CONCLUSIONS Our experiences set the foundation to encourage and inform researchers interested in working with EHRs for cancer research as well as provide context for leveraging linkages to assess and improve validity and generalizability. IMPACT Researchers conducting linkages may benefit from considering one or more of these approaches to establish and evaluate the validity of their EHR-based populations.See all articles in this CEBP Focus section, "Modernizing Population Science."
Collapse
Affiliation(s)
- Caroline A Thompson
- School of Public Health, San Diego State University, San Diego, California.
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
- University of California San Diego School of Medicine, San Diego, California
| | - Anqi Jin
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Harold S Luft
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Daphne Y Lichtensztajn
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
| | - Laura Allen
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
| | - Su-Ying Liang
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Benjamin T Schumacher
- School of Public Health, San Diego State University, San Diego, California
- University of California San Diego School of Medicine, San Diego, California
| | - Scarlett Lin Gomez
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California
| |
Collapse
|
13
|
Casucci S, Lin L, Hewner S, Nikolaev A. Estimating the causal effects of chronic disease combinations on 30-day hospital readmissions based on observational Medicaid data. J Am Med Inform Assoc 2019; 25:670-678. [PMID: 29202188 DOI: 10.1093/jamia/ocx141] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Accepted: 11/07/2017] [Indexed: 11/12/2022] Open
Abstract
Objective Demonstrate how observational causal inference methods can generate insights into the impact of chronic disease combinations on patients' 30-day hospital readmissions. Materials and Methods Causal effect estimation was used to quantify the impact of each risk factor scenario (ie, chronic disease combination) associated with chronic kidney disease and heart failure (HF) for adult Medicaid beneficiaries with initial hospitalizations in 2 New York State counties. The experimental protocol: (1) created matched risk factor and comparator groups, (2) assessed covariate balance in the matched groups, and (3) estimated causal effects and their statistical significance. Causality lattices summarized the impact of chronic disease comorbidities on readmissions. Results Chronic disease combinations were ordered with respect to their causal impact on readmissions. Of disease combinations associated with HF, the combination of HF, coronary artery disease, and tobacco abuse (in that order) had the highest causal effect on readmission rate (+22.3%); of disease combinations associated with chronic kidney disease, the combination of chronic kidney disease, coronary artery disease, and diabetes had the highest effect (+9.5%). Discussion Multi-hypothesis causal analysis reveals the effects of chronic disease comorbidities on health outcomes. Understanding these effects will guide the development of health care programs that address unique care needs of different patient subpopulations. Additionally, these insights bring new attention to individuals at high risk for readmission based on chronic disease comorbidities, allowing for more personalized attention and prioritization of care. Conclusion Multi-hypothesis causal analysis, a new methodological tool, generates meaningful insights from health care claims data, guiding the design of care and intervention programs.
Collapse
Affiliation(s)
- Sabrina Casucci
- Industrial and Systems Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Li Lin
- Industrial and Systems Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| | - Sharon Hewner
- School of Nursing, State University of New York at Buffalo, Buffalo, NY, USA
| | - Alexander Nikolaev
- Industrial and Systems Engineering, State University of New York at Buffalo, Buffalo, NY, USA
| |
Collapse
|
14
|
Chen P, Dong W, Lu X, Kaymak U, He K, Huang Z. Deep representation learning for individualized treatment effect estimation using electronic health records. J Biomed Inform 2019; 100:103303. [PMID: 31610264 DOI: 10.1016/j.jbi.2019.103303] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Revised: 09/22/2019] [Accepted: 10/07/2019] [Indexed: 12/25/2022]
Abstract
Utilizing clinical observational data to estimate individualized treatment effects (ITE) is a challenging task, as confounding inevitably exists in clinical data. Most of the existing models for ITE estimation tackle this problem by creating unbiased estimators of the treatment effects. Although valuable, learning a balanced representation is sometimes directly opposed to the objective of learning an effective and discriminative model for ITE estimation. We propose a novel hybrid model bridging multi-task deep learning and K-nearest neighbors (KNN) for ITE estimation. In detail, the proposed model firstly adopts multi-task deep learning to extract both outcome-predictive and treatment-specific latent representations from Electronic Health Records (EHR), by jointly performing the outcome prediction and treatment category classification. Thereafter, we estimate counterfactual outcomes by KNN based on the learned hidden representations. We validate the proposed model on a widely used semi-simulated dataset, i.e. IHDP, and a real-world clinical dataset consisting of 736 heart failure (HF) patients. The performance of our model remains robust and reaches 1.7 and 0.23 in terms of Precision in the estimation of heterogeneous effect (PEHE) and average treatment effect (ATE), respectively, on IHDP dataset, and 0.703 and 0.796 in terms of accuracy and F1 score respectively, on HF dataset. The results demonstrate that the proposed model achieves competitive performance over state-of-the-art models. In addition, the results reveal several findings which are consistent with existing medical domain knowledge, and discover certain suggestive hypotheses that could be validated through further investigations in the clinical domain.
Collapse
Affiliation(s)
- Peipei Chen
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, 310008 Hangzhou, China; School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
| | - Wei Dong
- Department of Cardiology, Chinese PLA General Hospital, 100853 Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, 310008 Hangzhou, China; School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands
| | - Uzay Kaymak
- School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, the Netherlands; College of Biomedical Engineering and Instrumental Science, Zhejiang University, 310008 Hangzhou, China
| | - Kunlun He
- Department of Cardiology, Chinese PLA General Hospital, 100853 Beijing, China.
| | - Zhengxing Huang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, 310008 Hangzhou, China.
| |
Collapse
|
15
|
Karmakar B, French B, Small DS. Integrating the evidence from evidence factors in observational studies. Biometrika 2019. [DOI: 10.1093/biomet/asz003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
A sensitivity analysis for an observational study assesses how much bias, due to nonrandom assignment of treatment, would be necessary to change the conclusions of an analysis that assumes treatment assignment was effectively random. The evidence for a treatment effect can be strengthened if two different analyses, which could be affected by different types of biases, are both somewhat insensitive to bias. The finding from the observational study is then said to be replicated. Evidence factors allow for two independent analyses to be constructed from the same dataset. When combining the evidence factors, the Type I error rate must be controlled to obtain valid inference. A powerful method is developed for controlling the familywise error rate for sensitivity analyses with evidence factors. It is shown that the Bahadur efficiency of sensitivity analysis for the combined evidence is greater than for either evidence factor alone. The proposed methods are illustrated through a study of the effect of radiation exposure on the risk of cancer. An R package, evidenceFactors, is available from CRAN to implement the methods of the paper.
Collapse
Affiliation(s)
- B Karmakar
- Department of Statistics, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, Pennsylvania 19104-6340, U.S.A
| | - B French
- Department of Statistics, Radiation Effects Research Foundation, 5-2 Hijiyama Park, Minami-ku, Hiroshima 732-0815, Japan
| | - D S Small
- Department of Statistics, The Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, Pennsylvania 19104-6340, U.S.A
| |
Collapse
|
16
|
Schuler A, Callahan A, Jung K, Shah NH. Performing an Informatics Consult: Methods and Challenges. J Am Coll Radiol 2018; 15:563-568. [PMID: 29396125 PMCID: PMC5901653 DOI: 10.1016/j.jacr.2017.12.023] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 12/15/2017] [Indexed: 12/24/2022]
Abstract
Our health care system is plagued by missed opportunities, waste, and harm. Data generated in the course of care are often underutilized, scientific insight goes untranslated, and evidence is overlooked. To address these problems, we envisioned a system where aggregate patient data can be used at the bedside to provide practice-based evidence. To create that system, we directly connect practicing physicians to clinical researchers and data scientists through an informatics consult. Our team processes and classifies questions posed by clinicians, identifies the appropriate patient data to use, runs the appropriate analyses, and returns an answer, ideally in a 48-hour time window. Here, we discuss the methods that are used for data extraction, processing, and analysis in our consult. We continue to refine our informatics consult service, moving closer to a learning health care system.
Collapse
Affiliation(s)
- Alejandro Schuler
- Center for Biomedical Informatics Research, Stanford University, Stanford, California.
| | - Alison Callahan
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Kenneth Jung
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, California
| |
Collapse
|
17
|
Abstract
The third paper in a series on how learning health systems can use routinely collected electronic health data (EHD) to advance knowledge and support continuous learning, this review describes how analytical methods for individual-level electronic health data EHD, including regression approaches, interrupted time series (ITS) analyses, instrumental variables, and propensity score methods, can also be used to address the question of whether the intervention “works.” The two major potential sources of bias in non-experimental studies of health care interventions are that the treatment groups compared do not have the same probability of treatment or exposure and the potential for confounding by unmeasured covariates. Although very different, the approaches presented in this chapter are all based on assumptions about data, causal relationships, and biases. For instance, regression approaches assume that the relationship between the treatment, outcome, and other variables is properly specified, all of the variables are available for analysis (i.e., no unobserved confounders) and measured without error, and that the error term is independent and identically distributed. The instrumental variables approach requires identifying an instrument that is related to the assignment of treatment but otherwise has no direct on the outcome. Propensity score methods approaches, on the other hand, assume that there are no unobserved confounders. The epidemiological designs discussed also make assumptions, for instance that individuals can serve as their own control. To properly address these assumptions, analysts should conduct sensitivity analyses within the assumptions of each method to assess the potential impact of what cannot be observed. Researchers also should analyze the same data with different analytical approaches that make alternative assumptions, and to apply the same methods to different data sets. Finally, different analytical methods, each subject to different biases, should be used in combination and together with different designs, to limit the potential for bias in the final results.
Collapse
|
18
|
Abstract
The second paper in a series on how learning health systems can use routinely collected electronic health data (EHD) to advance knowledge and support continuous learning, this review summarizes study design approaches, including choosing appropriate data sources, and methods for design and analysis of natural and quasi-experiments. The primary strength of study design approaches described in this section is that they study the impact of a deliberate intervention in real-world settings, which is critical for external validity. These evaluation designs address estimating the counterfactual - what would have happened if the intervention had not been implemented. At the individual level, epidemiologic designs focus on identifying situations in which bias is minimized. Natural and quasi-experiments focus on situations where the change in assignment breaks the usual links that could lead to confounding, reverse causation, and so forth. And because these observational studies typically use data gathered for patient management or administrative purposes, the possibility of observation bias is minimized. The disadvantages are that one cannot necessarily attribute the effect to the intervention (as opposed to other things that might have changed), and the results do not indicate what about the intervention made a difference. Because they cannot rely on randomization to establish causality, program evaluation methods demand a more careful consideration of the "theory" of the intervention and how it is expected to play out. A logic model describing this theory can help to design appropriate comparisons, account for all influential variables in a model, and help to ensure that evaluation studies focus on the critical intermediate and long-term outcomes as well as possible confounders.
Collapse
|
19
|
Gottlieb A, Yanover C, Cahan A, Goldschmidt Y. Estimating the effects of second-line therapy for type 2 diabetes mellitus: retrospective cohort study. BMJ Open Diabetes Res Care 2017; 5:e000435. [PMID: 29299328 PMCID: PMC5730938 DOI: 10.1136/bmjdrc-2017-000435] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Revised: 10/03/2017] [Accepted: 10/11/2017] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVE Metformin is the recommended initial drug treatment in type 2 diabetes mellitus, but there is no clearly preferred choice for an additional drug when indicated. We compare the counterfactual drug effectiveness in lowering glycated hemoglobin (HbA1c) levels and effect on body mass index (BMI) of four diabetes second-line drug classes using electronic health records. STUDY DESIGN AND SETTING Retrospective analysis of electronic health records of US-based patients in the Explorys database using causal inference methodology to adjust for patient censoring and confounders. PARTICIPANTS AND EXPOSURES Our cohort consisted of more than 40 000 patients with type 2 diabetes, prescribed metformin along with a drug out of four second-line drug classes-sulfonylureas, thiazolidinediones, dipeptidyl peptidase 4 (DPP-4) inhibitors and glucagon-like peptide-1 agonists-during the years 2000-2015. Roughly, 17 000 of these patients were followed for 12 months after being prescribed a second-line drug. MAIN OUTCOME MEASURES HbA1c and BMI of these patients after 6 and 12 months following treatment. RESULTS We demonstrate that all four drug classes reduce HbA1c levels, but the effect of sulfonylureas after 6 and 12 months of treatment is less pronounced compared with other classes. We also estimate that DPP-4 inhibitors decrease body weight significantly more than sulfonylureas and thiazolidinediones. CONCLUSION Our results are in line with current knowledge on second-line drug effectiveness and effect on BMI. They demonstrate that causal inference from electronic health records is an effective way for conducting multitreatment causal inference studies.
Collapse
Affiliation(s)
- Assaf Gottlieb
- Machine Learning for Healthcare and Life Sciences, IBM Research, Haifa, Israel
| | - Chen Yanover
- Machine Learning for Healthcare and Life Sciences, IBM Research, Haifa, Israel
| | - Amos Cahan
- Machine Learning for Healthcare and Life Sciences, IBM Research, Haifa, Israel
| | - Yaara Goldschmidt
- Machine Learning for Healthcare and Life Sciences, IBM Research, Haifa, Israel
| |
Collapse
|
20
|
Applications of the propensity score weighting method in psychogeriatric research: correcting selection bias and adjusting for confounders. Int Psychogeriatr 2017; 29:703-706. [PMID: 28095944 DOI: 10.1017/s1041610216002490] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The propensity score (PS) weighting method is an analytic technique that has been applied in multiple fields for a number of purposes. Here, we discuss two common applications, which are (1) to correct for selection bias and (2) to adjust for confounding variables when estimating the effect of an exposure variable on the outcome of interest.
Collapse
|
21
|
Lu M, Li J, Rupp LB, Holmberg SD, Moorman AC, Spradling PR, Teshale EH, Zhou Y, Boscarino JA, Schmidt MA, Lamerato LE, Trinacty C, Trudeau S, Gordon SC. Hepatitis C treatment failure is associated with increased risk of hepatocellular carcinoma. J Viral Hepat 2016; 23:718-29. [PMID: 27028626 PMCID: PMC5724043 DOI: 10.1111/jvh.12538] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 02/18/2016] [Indexed: 12/11/2022]
Abstract
Sustained virological response (SVR) to antiviral therapy for hepatitis C (HCV) reduces risk of hepatocellular carcinoma (HCC), but there is little information regarding how treatment failure (TF) compares to lack of treatment. We evaluated the impact of treatment status on risk of HCC using data from the Chronic Hepatitis Cohort Study (CHeCS-an observational study based in four large US health systems, with up to 7 years of follow-up on patients). Multivariable analyses were used to adjust for bias in treatment selection, as well as other covariates, followed by sensitivity analyses. Among 10 091 HCV patients, 3681 (36%) received treatment, 2099 (57%) experienced treatment failure (TF), and 1582 (43%) of these achieved sustained virological response (SVR). TF patients demonstrated almost twice the risk of HCC than untreated patients [adjusted hazard ratio (aHR) = 1.95, 95% confidence interval (CI) 1.50-2.53]; this risk persisted across all stages of fibrosis. Several sensitivity analyses validated these results. Although African Americans were at increased risk of treatment failure, they were at lower risk for HCC and all-cause mortality compared to White patients. SVR patients had lower risk of HCC than TF patients (aHR = 0.48, CI 0.31-0.73), whereas treatment - regardless of outcome - reduced all-cause mortality (aHR = 0.45, CI 0.34-0.60 for SVR patients; aHR = 0.78, CI 0.65-0.93 for TF patients).
Collapse
Affiliation(s)
- Mei Lu
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Jia Li
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Loralee B. Rupp
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, MI, USA
| | - Scott D. Holmberg
- Division of Viral Hepatitis, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Anne C. Moorman
- Division of Viral Hepatitis, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Philip R. Spradling
- Division of Viral Hepatitis, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Eyasu H. Teshale
- Division of Viral Hepatitis, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yueren Zhou
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | | | - Mark A. Schmidt
- Center for Health Research, Kaiser Permanente–Northwest, Portland, OR, Portland
| | - Lois E. Lamerato
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Connie Trinacty
- Center for Health Research, Kaiser Permanente–Hawai’i, Waipahu, HI, USA
| | - Sheri Trudeau
- Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA
| | - Stuart C. Gordon
- Division of Gastroenterology and Hepatology, Henry Ford Health System, Detroit, MI, USA
| | | |
Collapse
|
22
|
Das-Munshi J, Ashworth M, Gaughran F, Hull S, Morgan C, Nazroo J, Roberts A, Rose D, Schofield P, Stewart R, Thornicroft G, Prince MJ. Ethnicity and cardiovascular health inequalities in people with severe mental illnesses: protocol for the E-CHASM study. Soc Psychiatry Psychiatr Epidemiol 2016; 51:627-38. [PMID: 26846127 PMCID: PMC4823321 DOI: 10.1007/s00127-016-1185-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/18/2016] [Indexed: 11/29/2022]
Abstract
PURPOSE People with severe mental illnesses (SMI) experience a 17- to 20-year reduction in life expectancy. One-third of deaths are due to cardiovascular disease. This study will establish the relationship of SMI with cardiovascular disease in ethnic minority groups (Indian, Pakistani, Bangladeshi, black Caribbean, black African and Irish), in the UK. METHODS E-CHASM is a mixed methods study utilising data from 1.25 million electronic patient records. Secondary analysis of routine patient records will establish if differences in cause-specific mortality, cardiovascular disease prevalence and disparities in accessing healthcare for ethnic minority people living with SMI exist. A nested qualitative study will be used to assess barriers to accessing healthcare, both from the perspectives of service users and providers. RESULTS In primary care, 993,116 individuals, aged 18+, provided data from 186/189 (98 %) practices in four inner-city boroughs (local government areas) in London. Prevalence of SMI according to primary care records, ranged from 1.3-1.7 %, across boroughs. The primary care sample included Bangladeshi [n = 94,643 (10 %)], Indian [n = 6086 (6 %)], Pakistani [n = 35,596 (4 %)], black Caribbean [n = 45,013 (5 %)], black African [n = 75,454 (8 %)] and Irish people [n = 13,745 (1 %)]. In the secondary care database, 12,432 individuals with SMI over 2007-2013 contributed information; prevalent diagnoses were schizophrenia [n = 6805 (55 %)], schizoaffective disorders [n = 1438 (12 %)] and bipolar affective disorder [n = 4112 (33 %)]. Largest ethnic minority groups in this sample were black Caribbean [1432 (12 %)] and black African (1393 (11 %)). CONCLUSIONS There is a dearth of research examining cardiovascular disease in minority ethnic groups with severe mental illnesses. The E-CHASM study will address this knowledge gap.
Collapse
Affiliation(s)
- J Das-Munshi
- Department of Health Service and Population Research, Centre for Epidemiology and Public Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, PO 33, London, SE5 8AF, UK.
| | - M Ashworth
- Division of Health and Social Care Research, Department of Primary Care and Public Health Sciences, King's College London, 3rd Floor, Addison House, Guy's Campus, London, SE1 1UL, UK
| | - F Gaughran
- South London and Maudsley Trust and King's College London, London, UK
| | - S Hull
- Centre for Primary Care and Public Health, Blizard Institute, Queen Mary University of London, Yvonne Carter Building, 58 Turner Street, London, E1 2AB, UK
| | - C Morgan
- Department of Health Service and Population Research, Centre for Epidemiology and Public Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, PO 33, London, SE5 8AF, UK
| | - J Nazroo
- University of Manchester, Manchester, England
| | - A Roberts
- Natural Language Processing Group, Department of Computer Science, University of Sheffield, Sheffield, England
| | - D Rose
- Department of Health Service and Population Research, Centre for Epidemiology and Public Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, PO 33, London, SE5 8AF, UK
| | - P Schofield
- Division of Health and Social Care Research, Department of Primary Care and Public Health Sciences, King's College London, 3rd Floor, Addison House, Guy's Campus, London, SE1 1UL, UK
| | - R Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, Room M1.06, De Crespigny Park, London, SE5 8AF, UK
| | - G Thornicroft
- Department of Health Service and Population Research, Centre for Epidemiology and Public Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, PO 33, London, SE5 8AF, UK
| | - M J Prince
- Department of Health Service and Population Research, Centre for Epidemiology and Public Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, PO 33, London, SE5 8AF, UK
| |
Collapse
|
23
|
Deeny SR, Steventon A. Making sense of the shadows: priorities for creating a learning healthcare system based on routinely collected data. BMJ Qual Saf 2015; 24:505-15. [PMID: 26065466 PMCID: PMC4515981 DOI: 10.1136/bmjqs-2015-004278] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 04/13/2015] [Indexed: 11/08/2022]
Abstract
Socrates described a group of people chained up inside a cave, who mistook shadows of objects on a wall for reality. This allegory comes to mind when considering 'routinely collected data'-the massive data sets, generated as part of the routine operation of the modern healthcare service. There is keen interest in routine data and the seemingly comprehensive view of healthcare they offer, and we outline a number of examples in which they were used successfully, including the Birmingham OwnHealth study, in which routine data were used with matched control groups to assess the effect of telephone health coaching on hospital utilisation.Routine data differ from data collected primarily for the purposes of research, and this means that analysts cannot assume that they provide the full or accurate clinical picture, let alone a full description of the health of the population. We show that major methodological challenges in using routine data arise from the difficulty of understanding the gap between patient and their 'data shadow'. Strategies to overcome this challenge include more extensive data linkage, developing analytical methods and collecting more data on a routine basis, including from the patient while away from the clinic. In addition, creating a learning health system will require greater alignment between the analysis and the decisions that will be taken; between analysts and people interested in quality improvement; and between the analysis undertaken and public attitudes regarding appropriate use of data.
Collapse
|
24
|
Thompson CA, Kurian AW, Luft HS. Linking electronic health records to better understand breast cancer patient pathways within and between two health systems. EGEMS 2015; 3:1127. [PMID: 25992389 PMCID: PMC4435001 DOI: 10.13063/2327-9214.1127] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
INTRODUCTION In a fragmented health care system, research can be challenging when one seeks to follow cancer patients as they seek care which can continue for months or years and may reflect many physician and patient decisions. Claims data track patients, but lack clinical detail. Linking routine electronic health record (EHR) data with clinical registry data allows one to gain a more complete picture of the patient journey through a cancer care episode. However, valid analytical approaches to examining care trajectories must be longitudinal and account for the dynamic nature of what is "seen" in the EHR. METHODS The Oncoshare database combines clinical detail from the California Cancer Registry and EHR data from two large health care organizations in the same catchment area-a multisite community practice and an academic medical center-for all women treated in either organization for breast cancer from 2000 to 2012. We classified EHR encounters data according to typical periods of the cancer care episode (screening, diagnosis, treatment) and posttreatment surveillance, as well as by facility used to better characterize patterns of care for patients seen at both organizations. FINDINGS We identified a "treated" cohort consisting of women receiving interventions for their initial cancer diagnosis, and classified their encounters over time across multiple dimensions (type of care, provider of care, and timing of care with respect to their cancer diagnosis). Forty-three percent of the patients were treated at the academic center only, 42 percent at the community center only, and 16 percent of the patients obtained care at both health care organizations. Compared to women seen at only one organization, the last group had similar-length initial care episodes, but more frequently had multiple episodes and longer observation periods. DISCUSSION Linking EHR data from neighboring systems can enhance our information on care trajectories, but careful consideration of the complexity of the treatment process and data generating mechanisms is necessary to make valid inferences. CONCLUSION/NEXT STEPS If analyzed as a timeline, and with careful characterization of diagnostic tests, surgical interventions, and type and frequency of physician encounters, the pathways taken by women through their breast cancer episode may lead to better understanding of patient decisions.
Collapse
|