1
|
Bressler CJ, Malthaner L, Pondel N, Letson MM, Kline D, Leonard JC. Identifying Children at Risk for Maltreatment Using Emergency Medical Services' Data: An Exploratory Study. CHILD MALTREATMENT 2024; 29:37-46. [PMID: 36205182 DOI: 10.1177/10775595221127925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The objective of this study was to use natural language processing to query Emergency Medical Services (EMS) electronic health records (EHRs) to identify variables associated with child maltreatment. We hypothesized the variables identified would show an association between the Emergency Medical Services encounter and risk of a children maltreatment report. This study is a retrospective cohort study of children with an EMS encounter from 1/1/11-12/31/18. NLP of EMS EHRs was conducted to generate single words, bigrams and trigrams. Clinically plausible risk factors for child maltreatment were established, where presence of the word(s) indicated presence of the hypothesized risk factor. The EMS encounters were probabilistically linked to child maltreatment reports. Univariable associations were assessed, and a multivariable logistic regression was conducted to determine a final set of predictors. 11 variables showed an association in the multivariable modeling. Sexual, abuse, chronic condition, developmental delay, unconscious on arrival, criminal activity/police, ingestion/inhalation/exposure, and <2 years old showed positive associations with child maltreatment reports. Refusal and DOA/PEA/asystole held negative associations. This study demonstrated that through EMS EHRs, risk factors for child maltreatment can be identified. A future direction of this work include developing a tool that screens EMS EHRs for households at risk for maltreatment.
Collapse
Affiliation(s)
- Colleen J Bressler
- Division of Child and Family Advocacy, Nationwide Children's Hospital, Columbus, OH, USA
- Nationwide Children's Hospital Section of Emergency Medicine, Columbus, OH, USA
| | - Lauren Malthaner
- Nationwide Children's Hospital Center for Injury Research and Policy at the Research Institute, Columbus, OH, USA
| | - Nicholas Pondel
- College of Public Health, The Ohio State University, Columbus, OH, USA
| | - Megan M Letson
- Division of Child and Family Advocacy, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine
| | - David Kline
- Department of Biomedical Informatics, Center for Biostatistics, The Ohio State University College of Medicine
| | - Julie C Leonard
- Nationwide Children's Hospital Section of Emergency Medicine, Columbus, OH, USA
- Nationwide Children's Hospital Center for Injury Research and Policy at the Research Institute, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine
| |
Collapse
|
2
|
Jiao Y, Lesueur F, Azencott CA, Laurent M, Mebirouk N, Laborde L, Beauvallet J, Dondon MG, Eon-Marchais S, Laugé A, Noguès C, Andrieu N, Stoppa-Lyonnet D, Caputo SM. A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. BMC Med Res Methodol 2021; 21:155. [PMID: 34325649 PMCID: PMC8320036 DOI: 10.1186/s12874-021-01299-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 04/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors. METHODS To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named "PRL + ML") combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988-0.992) than either PRL (range 0.916-0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants). CONCLUSIONS Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.
Collapse
Affiliation(s)
- Yue Jiao
- Department of Genetics, Institut Curie, PSL Research University, Paris, France.,Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Fabienne Lesueur
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Chloé-Agathe Azencott
- Inserm, U900, Paris, France.,Mines ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
| | - Maïté Laurent
- Department of Genetics, Institut Curie, PSL Research University, Paris, France
| | - Noura Mebirouk
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Lilian Laborde
- Institut Paoli-Calmettes, Centre de Traitement des Données IPC-PACA, Département de la Recherche Clinique et de l'Innovation, Marseille, France
| | - Juana Beauvallet
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Marie-Gabrielle Dondon
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Séverine Eon-Marchais
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Anthony Laugé
- Department of Genetics, Institut Curie, PSL Research University, Paris, France
| | | | | | - Catherine Noguès
- Institut Paoli-Calmettes, Département d'Anticipation et de Suivi du Cancer, Oncogénétique clinique, Marseille France Inserm, U830, Université Paris Descartes, Paris, France.,Aix Marseille Univ, INSERM, IRD, SESSTIM, Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale, Marseille, France
| | - Nadine Andrieu
- Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
| | - Dominique Stoppa-Lyonnet
- Department of Genetics, Institut Curie, PSL Research University, Paris, France.,Paris University, Paris, France.,Inserm, U830, Paris, France
| | - Sandrine M Caputo
- Department of Genetics, Institut Curie, PSL Research University, Paris, France.
| |
Collapse
|
3
|
Rohde F, Franke M, Sehili Z, Lablans M, Rahm E. Optimization of the Mainzelliste software for fast privacy-preserving record linkage. J Transl Med 2021; 19:33. [PMID: 33451317 PMCID: PMC7809773 DOI: 10.1186/s12967-020-02678-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases. METHODS We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage. RESULTS The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality. CONCLUSION We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.
Collapse
Affiliation(s)
- Florens Rohde
- Database Group, University of Leipzig, Leipzig, Germany.
| | - Martin Franke
- Database Group, University of Leipzig, Leipzig, Germany
| | - Ziad Sehili
- Database Group, University of Leipzig, Leipzig, Germany
| | - Martin Lablans
- Federated Information Systems, German Cancer Research Center, Heidelberg, Germany.,Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
| | - Erhard Rahm
- Database Group, University of Leipzig, Leipzig, Germany
| |
Collapse
|
4
|
Stammler S, Kussel T, Schoppmann P, Stampe F, Tremper G, Katzenbeisser S, Hamacher K, Lablans M. Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation. Bioinformatics 2020; 38:1657-1668. [PMID: 32871006 PMCID: PMC8896632 DOI: 10.1093/bioinformatics/btaa764] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 07/24/2020] [Accepted: 08/25/2020] [Indexed: 11/17/2022] Open
Abstract
Motivation Record Linkage has versatile applications in real-world data analysis contexts, where several datasets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records. Results We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10 000 records can be done in 48 s over a heavily delayed (100 ms) network connection, or 3.9 s with a low-latency connection. Availability and implementation The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Martin Lablans
- German Cancer Research Center, Heidelberg, Germany.,University Medical Centre Mannheim, Germany
| |
Collapse
|
5
|
Comparing record linkage software programs and algorithms using real-world data. PLoS One 2019; 14:e0221459. [PMID: 31550255 PMCID: PMC6759179 DOI: 10.1371/journal.pone.0221459] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 08/08/2019] [Indexed: 01/17/2023] Open
Abstract
Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04–93.36%) and positive predictive value (PPV) (range 86.67–97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms.
Collapse
|
6
|
Bressler CJ, Letson MM, Kline D, McCarthy T, Davis J, Leonard JC. Characteristics of Neighborhoods Where Emergency Medical Services Encounter Children at Risk for Maltreatment. PREHOSP EMERG CARE 2019; 23:672-682. [PMID: 30703337 DOI: 10.1080/10903127.2019.1573940] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Objective: The objective of this study was to determine if neighborhood rates of pediatric Emergency Medical Services (EMS) encounters correlate with rates of child maltreatment reporting and if there are neighborhood-level risk factors for EMS encountering children with maltreatment reports. Methods: We conducted a retrospective cohort study using the electronic medical records of children ages <18 years who had Columbus Division of Fire EMS encounters between 2011 and 2015. We used Nationwide Children's Hospital electronic medical records to identify child maltreatment reports. The EMS scene addresses and home addresses associated with maltreatment reports were geocoded independently and rates for each Census tract were calculated. The maltreatment reports were matched to the EMS encounters using name, gender, and date of birth. Rates of EMS encounters with children that had a maltreatment report were calculated for each Census tract. Census tract demographic information was obtained from the American Community Survey. Bayesian conditional autoregressive Poisson models were used to calculate rate ratios for census tract variables to determine their relationship to EMS encountering children with maltreatment reports. Results: A total of 44,002 EMS encounters and 4,298 maltreatment reports were included in the study. The Spearman correlation coefficient relating rates of EMS encounters to rates of maltreatment reports within census tracts was 0.72 (95% confidence interval, 0.65-0.77). Within the study period, a total of 1,134 EMS encounters were linked to 578 children with maltreatment reports. Poverty was the only independent risk factor for EMS encountering children with maltreatment reports. The multivariate analysis also identified protective factors, which included neighborhoods with higher proportions of residents who had bachelor's degrees, spoke a language other than English, and had the same residence the previous year. Conclusion: This study showed that in Franklin County, Ohio, neighborhoods with high EMS utilization had a strong positive correlation with areas that had high rates of child maltreatment reports. We also identified four neighborhood characteristics that were independently associated with EMS encountering children at risk for maltreatment (risk factor: poverty; protective factors: residents with college educations, non-English speaking households, and residents maintaining the same residence as the previous year).
Collapse
|
7
|
Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data. J Med Syst 2018; 42:82. [PMID: 29569065 DOI: 10.1007/s10916-018-0944-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/15/2018] [Indexed: 10/17/2022]
|
8
|
Smith D. Secure pseudonymisation for privacy-preserving probabilistic record linkage. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2017. [DOI: 10.1016/j.jisa.2017.01.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Benedetto G, Prima AD, Sciacca S, Grosso G. Design, functionality, and validity of the SWInCaRe, a web-based application used to administer cancer registry records. Health Informatics J 2017; 25:149-160. [PMID: 28438105 DOI: 10.1177/1460458217704253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
We described the design of a web-based application (the Software Integrated Cancer Registry-SWInCaRe) used to administer data in a cancer registry and tested its validity and usability. A sample of 11,680 records was considered to compare the manual and automatic procedures. Sensibility and specificity, the Health IT Usability Evaluation Scale, and a cost-efficiency analysis were tested. Several data sources were used to build data packages through text-mining and record linkage algorithms. The automatic procedure showed small yet measurable improvements in both data linkage process and cancer cases estimation. Users perceived the application as useful to improve the time of coding and difficulty of the process: both time and cost-analysis were in favor of the automatic procedure. The web-based application resulted in a useful tool for the cancer registry, but some improvements are necessary to overcome limitations observed and to further automatize the process.
Collapse
Affiliation(s)
- Giovanni Benedetto
- Integrated Cancer Registry of Catania-Messina-Siracusa-Enna, Azienda Ospedaliero-Universitaria "Policlinico-Vittorio Emanuele", Catania, Italy
| | - Alessia Di Prima
- Integrated Cancer Registry of Catania-Messina-Siracusa-Enna, Azienda Ospedaliero-Universitaria "Policlinico-Vittorio Emanuele", Catania, Italy
| | - Salvatore Sciacca
- Integrated Cancer Registry of Catania-Messina-Siracusa-Enna, Azienda Ospedaliero-Universitaria "Policlinico-Vittorio Emanuele", Catania, Italy
| | - Giuseppe Grosso
- Integrated Cancer Registry of Catania-Messina-Siracusa-Enna, Azienda Ospedaliero-Universitaria "Policlinico-Vittorio Emanuele", Catania, Italy
| |
Collapse
|
10
|
Morgan AS, Marlow N, Costeloe K, Draper ES. Investigating increased admissions to neonatal intensive care in England between 1995 and 2006: data linkage study using Hospital Episode Statistics. BMC Med Res Methodol 2016; 16:57. [PMID: 27206571 PMCID: PMC4875750 DOI: 10.1186/s12874-016-0152-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 04/30/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A 44 % increase was observed in admissions to neonatal intensive care of babies born ≤26 weeks completed gestational age in England between 1995 and 2006. Hospital Episode Statistics (HES) may provide supplementary information to investigate this. The methods and results of a probabilistic data linkage exercise are reported. METHODS Two data sets were linked for each year (1995 and 2006) using 3 different algorithms (Fellegi and Sunter, Contiero and estimation-maximisation). RESULTS In 1995, linkage was performed between 668 EPICure and 486,705 HES records; 1,820 linked pairs were identified of which 422 (63.17 %) were confirmed. In 2006, from 2,750 EPICure and 631,401 HES records, 8,913 linked pairs were identified with 1,662 (60.40 %) confirmed as true. Reported births in HES at <26 weeks gestation increased 37.0 % from 867 to 1188. CONCLUSIONS Results support the EPICure findings that there was an increase in the birth rate for extremely premature babies between 1995 and 2006. There were insufficient data available for detailed investigation. Routine data sources may not be suitable for investigations at the margins of viability.
Collapse
Affiliation(s)
- Andrei S. Morgan
- />Institute for Womens’ Health, UCL, 74 Huntley Street, London, UK
| | - Neil Marlow
- />Institute for Womens’ Health, UCL, 74 Huntley Street, London, UK
| | | | | |
Collapse
|
11
|
Lablans M, Borg A, Ückert F. A RESTful interface to pseudonymization services in modern web applications. BMC Med Inform Decis Mak 2015; 15:2. [PMID: 25656224 PMCID: PMC4350982 DOI: 10.1186/s12911-014-0123-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 12/12/2014] [Indexed: 12/02/2022] Open
Abstract
Background Medical research networks rely on record linkage and pseudonymization to determine which records from different sources relate to the same patient. To establish informational separation of powers, the required identifying data are redirected to a trusted third party that has, in turn, no access to medical data. This pseudonymization service receives identifying data, compares them with a list of already reported patient records and replies with a (new or existing) pseudonym. We found existing solutions to be technically outdated, complex to implement or not suitable for internet-based research infrastructures. In this article, we propose a new RESTful pseudonymization interface tailored for use in web applications accessed by modern web browsers. Methods The interface is modelled as a resource-oriented architecture, which is based on the representational state transfer (REST) architectural style. We translated typical use-cases into resources to be manipulated with well-known HTTP verbs. Patients can be re-identified in real-time by authorized users’ web browsers using temporary identifiers. We encourage the use of PID strings for pseudonyms and the EpiLink algorithm for record linkage. As a proof of concept, we developed a Java Servlet as reference implementation. Results The following resources have been identified: Sessions allow data associated with a client to be stored beyond a single request while still maintaining statelessness. Tokens authorize for a specified action and thus allow the delegation of authentication. Patients are identified by one or more pseudonyms and carry identifying fields. Relying on HTTP calls alone, the interface is firewall-friendly. The reference implementation has proven to be production stable. Conclusion The RESTful pseudonymization interface fits the requirements of web-based scenarios and allows building applications that make pseudonymization transparent to the user using ordinary web technology. The open-source reference implementation implements the web interface as well as a scientifically grounded algorithm to generate non-speaking pseudonyms.
Collapse
Affiliation(s)
- Martin Lablans
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Straße 69, Mainz, 55131, Germany.
| | - Andreas Borg
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Straße 69, Mainz, 55131, Germany.
| | - Frank Ückert
- Institute of Medical Biostatistics, Epidemiology and Informatics, University Medical Center of the Johannes Gutenberg University Mainz, Obere Zahlbacher Straße 69, Mainz, 55131, Germany.
| |
Collapse
|
12
|
Contiero P, Berrino F, Tagliabue G, Mastroianni A, Di Mauro MG, Fabiano S, Annulli M, Muti P. Fasting blood glucose and long-term prognosis of non-metastatic breast cancer: a cohort study. Breast Cancer Res Treat 2013; 138:951-9. [PMID: 23568483 DOI: 10.1007/s10549-013-2519-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 04/01/2013] [Indexed: 10/27/2022]
Abstract
High circulating glucose has been associated with increased risk of breast cancer (BC). There may also be a link between serum glucose and prognosis in women treated for BC. We assessed the effect of peridiagnostic fasting blood glucose and body mass index (BMI) on long-term BC prognosis. We retrospectively investigated 1,261 women diagnosed and treated for stage I-III BC at the National Cancer Institute, Milan, in 1996, 1999 and 2000. Data on blood tests and follow-up were obtained by linking electronic archives, with follow-up to end of 2009. Multivariate Cox modelling estimated hazard ratios (HR) with 95 % confidence intervals (CI) for distant metastasis, recurrence and death (all causes) in relation to categorized peridiagnostic fasting blood glucose and BMI. Mediation analysis investigated whether blood glucose mediated the BMI-breast cancer prognosis association. The risks of distant metastasis were significantly higher for all other quintiles compared to the lowest glucose quintile (reference <87 mg/dL) (respective HRs: 1.99 95 % CI 1.23-3.24, 1.85 95 % CI 1.14-3.0, 1.73 95 % CI 1.07-2.8, and 1.91 95 % CI 1.15-3.17). The risk of recurrence was significantly higher for all other glucose quintiles compared to the first. The risk of death was significantly higher than reference in the second, fourth and fifth quintiles. Women with BMI ≥ 25 kg/m(2) had significantly greater risks of recurrence and distant metastasis than those with BMI < 25 kg/m(2), irrespective of blood glucose. The increased risks remained invariant over a median follow-up of 9.5 years. Mediation analysis indicated that glucose and BMI had independent effects on BC prognosis. Peridiagnostic high fasting glucose and obesity predict worsened short- and long-term outcomes in BC patients. Maintaining healthy blood glucose levels and normal weight may improve prognosis.
Collapse
Affiliation(s)
- Paolo Contiero
- Cancer Registry and Environmental Epidemiology Division, Scientific Directorate, Fondazione IRCCS Istituto Nazionale dei Tumori, Via Venezian 1, 20133 Milan, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Campbell KM. Impact of record-linkage methodology on performance indicators and multivariate relationships. J Subst Abuse Treat 2008; 36:110-7. [PMID: 18657944 DOI: 10.1016/j.jsat.2008.05.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 04/15/2008] [Accepted: 05/05/2008] [Indexed: 10/21/2022]
Abstract
Program evaluation often requires the linkage of records from independently maintained data systems (e.g., substance abuse treatment and criminal justice). Data entry errors (e.g., misspelled names, transposed digits) complicate the linkage task. In this investigation, three record-linkage algorithms (match-merge, common patient identifier, and probabilistic) are used to link recipients of publicly funded outpatient substance abuse treatment to statewide arrest and death data. The impact of record-linkage algorithm on performance indicators, prevalence indicators (i.e., arrest rates, and death rates), and hazard ratios derived from a multivariate survival analysis predicting risk of arrest following admission to outpatient substance abuse treatment is evaluated. Choice of algorithm substantially impacted estimates of arrest rates (range: year prior to admission, 39.8%-53.4%; year following admission, 24.7%-33.1%). The hazard ratio associated with "prior arrest" as a predictor of arrest following admission to outpatient substance abuse treatment (hazard ratio range = 0.20-0.37, p < .05) was also influenced by algorithm choice.
Collapse
Affiliation(s)
- Kevin M Campbell
- Washington State Division of Alcohol and Substance Abuse, Box 45330, Olympia, WA 98504-5330, USA.
| |
Collapse
|
14
|
Contiero P, Tittarelli A, Maghini A, Fabiano S, Frassoldi E, Costa E, Gada D, Codazzi T, Crosignani P, Tessandori R, Tagliabue G. Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system. J Biomed Inform 2008; 41:24-32. [PMID: 17452020 DOI: 10.1016/j.jbi.2007.03.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Revised: 02/02/2007] [Accepted: 03/13/2007] [Indexed: 11/18/2022]
Abstract
Automated software for cancer registration, called Open Registry and developed by ourselves was adopted by the Varese (population-based) Cancer Registry starting from 1997. Since the use of automated cancer registration is increasing, it is important to assess the quality and completeness of the automated data being produced. In this study, we assessed the completeness of the automatically generated data by comparison with a gold standard of all cases identified by manual and automatic systems for the year 1997 when the automated system was introduced, and the manual system was still in operation. We also evaluated the efficiency of the automated system. 5027 cases were generated automatically; 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Sixty-nine cases (1.3%) were not recorded automatically, the most common reason (0.8%) being that the incidence record was dated 1998, even though the case was incident in 1997. A total of 98.7% of all cases found were picked up by the automated system. A completeness figure of 98.7% indicates that the automatic procedure is a valid alternative to manual methods for routine case generation. The fact that 59% of cases were registered automatically indicates that the system can speed up data production and enhance registry efficiency.
Collapse
Affiliation(s)
- Paolo Contiero
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Tagliabue G, Tessandori R, Caramaschi F, Fabiano S, Maghini A, Tittarelli A, Vergani D, Bellotti M, Pisani S, Gambino ML, Frassoldi E, Costa E, Gada D, Crosignani P, Contiero P. Descriptive epidemiology of selected birth defects, areas of Lombardy, Italy, 1999. Popul Health Metr 2007; 5:4. [PMID: 17531093 PMCID: PMC1894780 DOI: 10.1186/1478-7954-5-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2006] [Accepted: 05/25/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Birth defects are a leading cause of neonatal and infant mortality in Italy, however little is known of the etiology of most defects. Improvements in diagnosis have revealed increasing numbers of clinically insignificant defects, while improvements in treatment have increased the survival of those with more serious and complex defects. For etiological studies, prevention, and management, it is important to have population-based monitoring which provides reliable data on the prevalence at birth of such defects. METHODS We recently initiated population-based birth defect monitoring in the Provinces of Mantova, Sondrio and Varese of the Region of Lombardy, northern Italy, and report data for the first year of operation (1999). The registry uses all-electronic source files (hospital discharge files, death certificates, regional health files, and pathology reports) and a proven case-generation methodology, which is described. The data were checked manually by consulting clinical records in hospitals. Completeness was checked against birth certificates by capture-recapture. Data on cases were coded according to the four-digit malformation codes of the International Classification of Diseases, Ninth Revision (ICD-9). We present data only on selected defects. RESULTS We found 246 selected birth defects in 12,008 live births in 1999, 148 among boys and 98 among girls. Congenital heart defects (particularly septal defects) were the most common (90.8/10,000), followed by defects of the genitourinary tract (34.1/10, 000) (particularly hypospadias in boys), digestive system (23.3/10,000) and central nervous system (14.9/10,000), orofacial clefts (10.8/10,000) and Down syndrome (8.3/10,000). Completeness was satisfactory: analysis of birth certificates resulted in the addition of two birth defect cases to the registry. CONCLUSION This is the first population-based analysis on selected major birth defects in the Region. The high birth prevalences for septal heart defect and hypospadias are probably due to the inclusion of minor defects and lack of coding standardization; the latter problem also seems important for other defects. However the data produced are useful for estimating the demands made on the health system by babies with birth defects.
Collapse
Affiliation(s)
- Giovanna Tagliabue
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | | | | | - Sabrina Fabiano
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Anna Maghini
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Andrea Tittarelli
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Daniele Vergani
- Cardiology Service, Presidio Ospedaliero Vittore Buzzi, Milano, Italy
| | - Maria Bellotti
- Department of Obstetrics and Gynecology, DMCO San Paolo, University of Milan, Italy
| | | | | | - Emanuela Frassoldi
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Enrica Costa
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Daniela Gada
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Paolo Crosignani
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| | - Paolo Contiero
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan, Italy
| |
Collapse
|
16
|
Tagliabue G, Maghini A, Fabiano S, Tittarelli A, Frassoldi E, Costa E, Nobile S, Codazzi T, Crosignani P, Tessandori R, Contiero P. Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration. Popul Health Metr 2006; 4:10. [PMID: 17007640 PMCID: PMC1592124 DOI: 10.1186/1478-7954-4-10] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Accepted: 09/28/2006] [Indexed: 11/26/2022] Open
Abstract
Background Automated procedures are increasingly used in cancer registration, and it is important that the data produced are systematically checked for consistency and accuracy. We evaluated an automated procedure for cancer registration adopted by the Lombardy Cancer Registry in 1997, comparing automatically-generated diagnostic codes with those produced manually over one year (1997). Methods The automatically generated cancer cases were produced by Open Registry algorithms. For manual registration, trained staff consulted clinical records, pathology reports and death certificates. The social security code, present and checked in both databases in all cases, was used to match the files in the automatic and manual databases. The cancer cases generated by the two methods were compared by manual revision. Results The automated procedure generated 5027 cases: 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Among the cases accepted automatically, discrepancies in data items (surname, first name, sex and date of birth) constituted 8.5% of cases, and discrepancies in the first three digits of the ICD-9 code constituted 1.6%. Among flagged cases, cancers of female genital tract, hematopoietic system, metastatic and ill-defined sites, and oropharynx predominated. The usual reasons were use of specific vs. generic codes, presence of multiple primaries, and use of extranodal vs. nodal codes for lymphomas. The percentage of automatically accepted cases ranged from 83% for breast and thyroid cancers to 13% for metastatic and ill-defined cancer sites. Conclusion Since 59% of cases were accepted automatically and contained relatively few, mostly trivial discrepancies, the automatic procedure is efficient for routine case generation effectively cutting the workload required for routine case checking by this amount. Among cases not accepted automatically, discrepancies were mainly due to variations in coding practice.
Collapse
Affiliation(s)
- Giovanna Tagliabue
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Anna Maghini
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Sabrina Fabiano
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Andrea Tittarelli
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Emanuela Frassoldi
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Enrica Costa
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Silvia Nobile
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Tiziana Codazzi
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Paolo Crosignani
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| | - Roberto Tessandori
- Province of Sondrio Health Authority, Via Stelvio 35A, 23100, Sondrio, Italy
| | - Paolo Contiero
- Cancer Registry Division, Istituto Nazionale per lo Studio e la Cura dei Tumori, Via Venezian 1, 20133 Milan, Italy
| |
Collapse
|