1
|
Libuy N, Harron K, Gilbert R, Caulton R, Cameron E, Blackburn R. Linking education and hospital data in England: linkage process and quality. Int J Popul Data Sci 2021; 6:1671. [PMID: 34568585 PMCID: PMC8445153 DOI: 10.23889/ijpds.v6i1.1671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
Abstract
INTRODUCTION Linkage of administrative data for universal state education and National Health Service (NHS) hospital care would enable research into the inter-relationships between education and health for all children in England. OBJECTIVES We aim to describe the linkage process and evaluate the quality of linkage of four one-year birth cohorts within the National Pupil Database (NPD) and Hospital Episode Statistics (HES). METHODS We used multi-step deterministic linkage algorithms to link longitudinal records from state schools to the chronology of records in the NHS Personal Demographics Service (PDS; linkage stage 1), and HES (linkage stage 2). We calculated linkage rates and compared pupil characteristics in linked and unlinked samples for each stage of linkage and each cohort (1990/91, 1996/97, 1999/00, and 2004/05). RESULTS Of the 2,287,671 pupil records, 2,174,601 (95%) linked to HES. Linkage rates improved over time (92% in 1990/91 to 99% in 2004/05). Ethnic minority pupils and those living in more deprived areas were less likely to be matched to hospital records, but differences in pupil characteristics between linked and unlinked samples were moderate to small. CONCLUSION We linked nearly all pupils to at least one hospital record. The high coverage of the linkage represents a unique opportunity for wide-scale analyses across the domains of health and education. However, missed links disproportionately affected ethnic minorities or those living in the poorest neighbourhoods: selection bias could be mitigated by increasing the quality and completeness of identifiers recorded in administrative data or the application of statistical methods that account for missed links. HIGHLIGHTS Longitudinal administrative records for all children attending state school and acute hospital services in England have been used for research for more than two decades, but lack of a shared unique identifier has limited scope for linkage between these databases.We applied multi-step deterministic linkage algorithms to 4 one-year cohorts of children born 1 September-31 August in 1990/91, 1996/97, 1999/00 and 2004/05. In stage 1, full names, date of birth, and postcode histories from education data in the National Pupil Database were linked to the NHS Personal Demographic Service. In stage 2, NHS number, postcode, date of birth and sex were linked to hospital records in Hospital Episode Statistics.Between 92% and 99% of school pupils linked to at least one hospital record. Ethnic minority pupils and pupils who were living in the most deprived areas were least likely to link. Ethnic minority pupils were less likely than white children to link at the first step in both algorithms.Bias due to linkage errors could lead to an underestimate of the health needs in disadvantaged groups. Improved data quality, more sensitive linkage algorithms, and/or statistical methods that account for missed links in analyses, should be considered to reduce linkage bias.
Collapse
Affiliation(s)
- Nicolás Libuy
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Katie Harron
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
| | - Ruth Gilbert
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
| | | | | | - Ruth Blackburn
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| |
Collapse
|
2
|
Doidge JC, Morris JK, Harron KL, Stevens S, Gilbert R. Prevalence of Down's Syndrome in England, 1998-2013: Comparison of linked surveillance data and electronic health records. Int J Popul Data Sci 2020; 5:1157. [PMID: 32864476 PMCID: PMC7115985 DOI: 10.23889/ijpds.v5i1.1157] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
INTRODUCTION Disease registers and electronic health records are valuable resources for disease surveillance and research but can be limited by variation in data quality over time. Quality may be limited in terms of the accuracy of clinical information, of the internal linkage that supports person-based analysis of most administrative datasets, or by errors in linkage between multiple datasets. OBJECTIVES By linking the National Down Syndrome Cytogenetic Register (NDSCR) to Hospital Episode Statistics for England (HES), we aimed to assess the quality of each and establish a consistent approach for analysis of trends in prevalence of Down's syndrome among live births in England. METHODS Probabilistic record linkage of NDSCR to HES for the period 1998-2013 was supported by linkage of babies to mothers within HES. Comparison of prevalence estimates in England were made using NDSCR only, HES data only, and linked data. Capture-recapture analysis and quantitative bias analysis were used to account for potential errors, including false positive diagnostic codes, unrecorded diagnoses, and linkage error. RESULTS Analyses of single-source data indicated increasing live birth prevalence of Down's Syndrome, particularly in the analysis of HES. Linked data indicated a contrastingly stable prevalence of 12.3 (plausible range: 11.6-12.7) cases per 10 000 live births. CONCLUSION Case ascertainment in NDSCR improved slightly over time, creating a picture of slowly increasing prevalence. The emerging epidemic suggested by HES primarily reflects improving linkage within HES (assignment of unique patient identifiers to hospital episodes). Administrative data are valuable but trends should be interpreted with caution, and with assessment of data quality over time. Data linkage with quantitative bias analysis can provide more robust estimation and, in this case, stronger evidence that prevalence is not increasing. Routine linkage of administrative and register data can enhance the value of each.
Collapse
Affiliation(s)
- JC Doidge
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
- Intensive Care National Audit and Research Centre, London, WC1V 6AZ, UK
| | - JK Morris
- Population Health Research Institute, St George's University of London, London, SW17 0RE, UK
| | - KL Harron
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
| | - S Stevens
- Public Health England, London, SE1 6LH, UK
| | - R Gilbert
- UCL Great Ormond Street Institute of Child Health, University College London, London, WC1N 1EH, UK
- Health Data Research UK, University College London, London, NW1 2DA, UK
| |
Collapse
|
3
|
Nechuta S, Mukhopadhyay S, Krishnaswami S, Golladay M, McPheeters M. Record Linkage Approaches Using Prescription Drug Monitoring Program and Mortality Data for Public Health Analyses and Epidemiologic Studies. Epidemiology 2020; 31:22-31. [PMID: 31592867 PMCID: PMC6889900 DOI: 10.1097/ede.0000000000001110] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 09/25/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND The use of Prescription Drug Monitoring Program (PDMP) data has greatly increased in recent years as these data have accumulated as part of the response to the opioid epidemic in the United States. We evaluated the accuracy of record linkage approaches using the Controlled Substance Monitoring Database (Tennessee's [TN] PDMP, 2012-2016) and mortality data on all drug overdose decedents in Tennessee (2013-2016). METHODS We compared total, missed, and false positive (FP) matches (with manual verification of all FPs) across approaches that included a variety of data cleaning and matching methods (probabilistic/fuzzy vs. deterministic) for patient and death linkages, and prescription history. We evaluated the influence of linkage approaches on key prescription measures used in public health analyses. We evaluated characteristics (e.g., age, education, sex) of missed matches and incorrect matches to consider potential bias. RESULTS The most accurate probabilistic/fuzzy matching approach identified 4,714 overdose deaths (vs. the deterministic approach, n = 4,572), with a low FP linkage error (<1%) and high correct match proportion (95% vs. 92% and ~90% for probabilistic approaches not using comprehensive data cleaning). Estimation of all prescription measures improved (vs. deterministic approach). For example, frequency (%) of decedents filling an oxycodone prescription in the last 60 days (n = 1,371 [32%] vs. n = 1,443 [33%]). Missed overdose decedents were more likely to be younger, male, nonwhite, and of higher education. CONCLUSION Implications of study findings include underreporting, prescribing and outcome misclassification, and reduced generalizability to population risk groups, information of importance to epidemiologists and researchers using PDMP data.
Collapse
Affiliation(s)
- Sarah Nechuta
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Sutapa Mukhopadhyay
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Shanthi Krishnaswami
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Molly Golladay
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| | - Melissa McPheeters
- From the Tennessee Department of Health, Office of Informatics and Analytics, 710 James Robertson Parkway, Nashville, TN
| |
Collapse
|
4
|
Sograte-Idrissi S, Oleksiievets N, Isbaner S, Eggert-Martinez M, Enderlein J, Tsukanov R, Opazo F. Nanobody Detection of Standard Fluorescent Proteins Enables Multi-Target DNA-PAINT with High Resolution and Minimal Displacement Errors. Cells 2019; 8:cells8010048. [PMID: 30646582 PMCID: PMC6357156 DOI: 10.3390/cells8010048] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 01/10/2019] [Accepted: 01/11/2019] [Indexed: 01/06/2023] Open
Abstract
DNA point accumulation for imaging in nanoscale topography (PAINT) is a rapidly developing fluorescence super-resolution technique, which allows for reaching spatial resolutions below 10 nm. It also enables the imaging of multiple targets in the same sample. However, using DNA-PAINT to observe cellular structures at such resolution remains challenging. Antibodies, which are commonly used for this purpose, lead to a displacement between the target protein and the reporting fluorophore of 20⁻25 nm, thus limiting the resolving power. Here, we used nanobodies to minimize this linkage error to ~4 nm. We demonstrate multiplexed imaging by using three nanobodies, each able to bind to a different family of fluorescent proteins. We couple the nanobodies with single DNA strands via a straight forward and stoichiometric chemical conjugation. Additionally, we built a versatile computer-controlled microfluidic setup to enable multiplexed DNA-PAINT in an efficient manner. As a proof of principle, we labeled and imaged proteins on mitochondria, the Golgi apparatus, and chromatin. We obtained super-resolved images of the three targets with 20 nm resolution, and within only 35 minutes acquisition time.
Collapse
Affiliation(s)
- Shama Sograte-Idrissi
- Institute of Neuro- and Sensory Physiology, University Medical Center Göttingen, 37073 Göttingen, Germany.
- Center for Biostructural Imaging of Neurodegeneration (BIN), University of Göttingen Medical Center, 37075 Göttingen, Germany.
- International Max Planck Research School for Molecular Biology, Göttingen, Germany.
| | - Nazar Oleksiievets
- Third Institute of Physics-Biophysics, Georg August University, 37077 Göttingen, Germany.
| | - Sebastian Isbaner
- Third Institute of Physics-Biophysics, Georg August University, 37077 Göttingen, Germany.
| | - Mariana Eggert-Martinez
- Institute of Neuro- and Sensory Physiology, University Medical Center Göttingen, 37073 Göttingen, Germany.
- Center for Biostructural Imaging of Neurodegeneration (BIN), University of Göttingen Medical Center, 37075 Göttingen, Germany.
- International Max Planck Research School for Molecular Biology, Göttingen, Germany.
| | - Jörg Enderlein
- Third Institute of Physics-Biophysics, Georg August University, 37077 Göttingen, Germany.
| | - Roman Tsukanov
- Third Institute of Physics-Biophysics, Georg August University, 37077 Göttingen, Germany.
| | - Felipe Opazo
- Institute of Neuro- and Sensory Physiology, University Medical Center Göttingen, 37073 Göttingen, Germany.
- Center for Biostructural Imaging of Neurodegeneration (BIN), University of Göttingen Medical Center, 37075 Göttingen, Germany.
| |
Collapse
|
5
|
Harron KL, Doidge JC, Knight HE, Gilbert RE, Goldstein H, Cromwell DA, van der Meulen JH. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol 2018; 46:1699-1710. [PMID: 29025131 PMCID: PMC5837697 DOI: 10.1093/ije/dyx177] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2017] [Indexed: 11/14/2022] Open
Abstract
Linked datasets are an important resource for epidemiological and clinical studies, but linkage error can lead to biased results. For data security reasons, linkage of personal identifiers is often performed by a third party, making it difficult for researchers to assess the quality of the linked dataset in the context of specific research questions. This is compounded by a lack of guidance on how to determine the potential impact of linkage error. We describe how linkage quality can be evaluated and provide widely applicable guidance for both data providers and researchers. Using an illustrative example of a linked dataset of maternal and baby hospital records, we demonstrate three approaches for evaluating linkage quality: applying the linkage algorithm to a subset of gold standard data to quantify linkage error; comparing characteristics of linked and unlinked data to identify potential sources of bias; and evaluating the sensitivity of results to changes in the linkage procedure. These approaches can inform our understanding of the potential impact of linkage error and provide an opportunity to select the most appropriate linkage procedure for a specific analysis. Evaluating linkage quality in this way will improve the quality and transparency of epidemiological and clinical research using linked data.
Collapse
Affiliation(s)
- Katie L Harron
- Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine, London, UK
| | - James C Doidge
- Administrative Data Research Centre for England, UCL Great Ormond Street Institute of Child Health, UCL, London, UK.,Centre for Population Health Research, University of South Australia, Adelaide, Australia
| | - Hannah E Knight
- Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine, London, UK.,Lindsay Stewart Centre for Audit and Clinical Informatics, Royal College of Obstetricians and Gynaecologists, London, UK
| | - Ruth E Gilbert
- Administrative Data Research Centre for England, UCL Great Ormond Street Institute of Child Health, UCL, London, UK
| | - Harvey Goldstein
- Administrative Data Research Centre for England, UCL Great Ormond Street Institute of Child Health, UCL, London, UK.,Graduate School of Education, University of Bristol, Bristol, UK
| | - David A Cromwell
- Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine, London, UK
| | - Jan H van der Meulen
- Department of Health Services Research and Policy, London School of Hygiene & Tropical Medicine, London, UK
| |
Collapse
|