Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tittarelli A, Tagliabue G, Maghini A, Fabiano S, Crosignani P, Tessandori R, Contiero P. The EpiLink Record Linkage Software. Methods Inf Med 2018. [DOI: 10.1055/s-0038-1633924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

For:	Tittarelli A, Tagliabue G, Maghini A, Fabiano S, Crosignani P, Tessandori R, Contiero P. The EpiLink Record Linkage Software. Methods Inf Med 2018. [DOI: 10.1055/s-0038-1633924] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Number

Cited by Other Article(s)

Bressler CJ, Malthaner L, Pondel N, Letson MM, Kline D, Leonard JC. Identifying Children at Risk for Maltreatment Using Emergency Medical Services' Data: An Exploratory Study. CHILD MALTREATMENT 2024;29:37-46. [PMID: 36205182 DOI: 10.1177/10775595221127925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Jiao Y, Lesueur F, Azencott CA, Laurent M, Mebirouk N, Laborde L, Beauvallet J, Dondon MG, Eon-Marchais S, Laugé A, Noguès C, Andrieu N, Stoppa-Lyonnet D, Caputo SM. A new hybrid record linkage process to make epidemiological databases interoperable: application to the GEMO and GENEPSO studies involving BRCA1 and BRCA2 mutation carriers. BMC Med Res Methodol 2021;21:155. [PMID: 34325649 PMCID: PMC8320036 DOI: 10.1186/s12874-021-01299-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 04/29/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Linking independent sources of data describing the same individuals enable innovative epidemiological and health studies but require a robust record linkage approach. We describe a hybrid record linkage process to link databases from two independent ongoing French national studies, GEMO (Genetic Modifiers of BRCA1 and BRCA2), which focuses on the identification of genetic factors modifying cancer risk of BRCA1 and BRCA2 mutation carriers, and GENEPSO (prospective cohort of BRCAx mutation carriers), which focuses on environmental and lifestyle risk factors.

METHODS

To identify as many as possible of the individuals participating in the two studies but not registered by a shared identifier, we combined probabilistic record linkage (PRL) and supervised machine learning (ML). This approach (named "PRL + ML") combined together the candidate matches identified by both approaches. We built the ML model using the gold standard on a first version of the two databases as a training dataset. This gold standard was obtained from PRL-derived matches verified by an exhaustive manual review. Results The Random Forest (RF) algorithm showed a highest recall (0.985) among six widely used ML algorithms: RF, Bagged trees, AdaBoost, Support Vector Machine, Neural Network. Therefore, RF was selected to build the ML model since our goal was to identify the maximum number of true matches. Our combined linkage PRL + ML showed a higher recall (range 0.988-0.992) than either PRL (range 0.916-0.991) or ML (0.981) alone. It identified 1995 individuals participating in both GEMO (6375 participants) and GENEPSO (4925 participants).

CONCLUSIONS

Our hybrid linkage process represents an efficient tool for linking GEMO and GENEPSO. It may be generalizable to other epidemiological studies involving other databases and registries.

Collapse

Affiliation(s)

Yue Jiao Department of Genetics, Institut Curie, PSL Research University, Paris, France.,Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Fabienne Lesueur Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Chloé-Agathe Azencott Inserm, U900, Paris, France.,Mines ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, France
Maïté Laurent Department of Genetics, Institut Curie, PSL Research University, Paris, France
Noura Mebirouk Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Lilian Laborde Institut Paoli-Calmettes, Centre de Traitement des Données IPC-PACA, Département de la Recherche Clinique et de l'Innovation, Marseille, France
Juana Beauvallet Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Marie-Gabrielle Dondon Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Séverine Eon-Marchais Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Anthony Laugé Department of Genetics, Institut Curie, PSL Research University, Paris, France


Catherine Noguès Institut Paoli-Calmettes, Département d'Anticipation et de Suivi du Cancer, Oncogénétique clinique, Marseille France Inserm, U830, Université Paris Descartes, Paris, France.,Aix Marseille Univ, INSERM, IRD, SESSTIM, Sciences Economiques et Sociales de la Santé & Traitement de l'Information Médicale, Marseille, France
Nadine Andrieu Inserm, U900, Paris, France.,Institut Curie, PSL Research University, Mines ParisTech, Paris, France
Dominique Stoppa-Lyonnet Department of Genetics, Institut Curie, PSL Research University, Paris, France.,Paris University, Paris, France.,Inserm, U830, Paris, France
Sandrine M Caputo Department of Genetics, Institut Curie, PSL Research University, Paris, France.

Collapse

Rohde F, Franke M, Sehili Z, Lablans M, Rahm E. Optimization of the Mainzelliste software for fast privacy-preserving record linkage. J Transl Med 2021;19:33. [PMID: 33451317 PMCID: PMC7809773 DOI: 10.1186/s12967-020-02678-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/14/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases.

METHODS

We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage.

RESULTS

The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality.

CONCLUSION

We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.

Collapse

Stammler S, Kussel T, Schoppmann P, Stampe F, Tremper G, Katzenbeisser S, Hamacher K, Lablans M. Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation. Bioinformatics 2020;38:1657-1668. [PMID: 32871006 PMCID: PMC8896632 DOI: 10.1093/bioinformatics/btaa764] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 07/24/2020] [Accepted: 08/25/2020] [Indexed: 11/17/2022] Open

Comparing record linkage software programs and algorithms using real-world data. PLoS One 2019;14:e0221459. [PMID: 31550255 PMCID: PMC6759179 DOI: 10.1371/journal.pone.0221459] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 08/08/2019] [Indexed: 01/17/2023] Open

Abstract

Linkage of medical databases, including insurer claims and electronic health records (EHRs), is increasingly common. However, few studies have investigated the behavior and output of linkage software. To determine how linkage quality is affected by different algorithms, blocking variables, methods for string matching and weight determination, and decision rules, we compared the performance of 4 nonproprietary linkage software packages linking patient identifiers from noninteroperable inpatient and outpatient EHRs. We linked datasets using first and last name, gender, and date of birth (DOB). We evaluated DOB and year of birth (YOB) as blocking variables and used exact and inexact matching methods. We compared the weights assigned to record pairs and evaluated how matching weights corresponded to a gold standard, medical record number. Deduplicated datasets contained 69,523 inpatient and 176,154 outpatient records, respectively. Linkage runs blocking on DOB produced weights ranging in number from 8 for exact matching to 64,273 for inexact matching. Linkage runs blocking on YOB produced 8 to 916,806 weights. Exact matching matched record pairs with identical test characteristics (sensitivity 90.48%, specificity 99.78%) for the highest ranked group, but algorithms differentially prioritized certain variables. Inexact matching behaved more variably, leading to dramatic differences in sensitivity (range 0.04–93.36%) and positive predictive value (PPV) (range 86.67–97.35%), even for the most highly ranked record pairs. Blocking on DOB led to higher PPV of highly ranked record pairs. An ensemble approach based on averaging scaled matching weights led to modestly improved accuracy. In summary, we found few differences in the rankings of record pairs with the highest matching weights across 4 linkage packages. Performance was more consistent for exact string matching than for inexact string matching. Most methods and software packages performed similarly when comparing matching accuracy with the gold standard. In some settings, an ensemble matching approach may outperform individual linkage algorithms.

Collapse

Bressler CJ, Letson MM, Kline D, McCarthy T, Davis J, Leonard JC. Characteristics of Neighborhoods Where Emergency Medical Services Encounter Children at Risk for Maltreatment. PREHOSP EMERG CARE 2019;23:672-682. [PMID: 30703337 DOI: 10.1080/10903127.2019.1573940] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Abstract

Objective: The objective of this study was to determine if neighborhood rates of pediatric Emergency Medical Services (EMS) encounters correlate with rates of child maltreatment reporting and if there are neighborhood-level risk factors for EMS encountering children with maltreatment reports. Methods: We conducted a retrospective cohort study using the electronic medical records of children ages <18 years who had Columbus Division of Fire EMS encounters between 2011 and 2015. We used Nationwide Children's Hospital electronic medical records to identify child maltreatment reports. The EMS scene addresses and home addresses associated with maltreatment reports were geocoded independently and rates for each Census tract were calculated. The maltreatment reports were matched to the EMS encounters using name, gender, and date of birth. Rates of EMS encounters with children that had a maltreatment report were calculated for each Census tract. Census tract demographic information was obtained from the American Community Survey. Bayesian conditional autoregressive Poisson models were used to calculate rate ratios for census tract variables to determine their relationship to EMS encountering children with maltreatment reports. Results: A total of 44,002 EMS encounters and 4,298 maltreatment reports were included in the study. The Spearman correlation coefficient relating rates of EMS encounters to rates of maltreatment reports within census tracts was 0.72 (95% confidence interval, 0.65-0.77). Within the study period, a total of 1,134 EMS encounters were linked to 578 children with maltreatment reports. Poverty was the only independent risk factor for EMS encountering children with maltreatment reports. The multivariate analysis also identified protective factors, which included neighborhoods with higher proportions of residents who had bachelor's degrees, spoke a language other than English, and had the same residence the previous year. Conclusion: This study showed that in Franklin County, Ohio, neighborhoods with high EMS utilization had a strong positive correlation with areas that had high rates of child maltreatment reports. We also identified four neighborhood characteristics that were independently associated with EMS encountering children at risk for maltreatment (risk factor: poverty; protective factors: residents with college educations, non-English speaking households, and residents maintaining the same residence as the previous year).

Collapse

Deterministic and Probabilistic Record Linkage: an Application to Primary Care Data. J Med Syst 2018;42:82. [PMID: 29569065 DOI: 10.1007/s10916-018-0944-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/15/2018] [Indexed: 10/17/2022]

Smith D. Secure pseudonymisation for privacy-preserving probabilistic record linkage. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2017. [DOI: 10.1016/j.jisa.2017.01.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Benedetto G, Prima AD, Sciacca S, Grosso G. Design, functionality, and validity of the SWInCaRe, a web-based application used to administer cancer registry records. Health Informatics J 2017;25:149-160. [PMID: 28438105 DOI: 10.1177/1460458217704253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Morgan AS, Marlow N, Costeloe K, Draper ES. Investigating increased admissions to neonatal intensive care in England between 1995 and 2006: data linkage study using Hospital Episode Statistics. BMC Med Res Methodol 2016;16:57. [PMID: 27206571 PMCID: PMC4875750 DOI: 10.1186/s12874-016-0152-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 04/30/2016] [Indexed: 11/10/2022] Open

Lablans M, Borg A, Ückert F. A RESTful interface to pseudonymization services in modern web applications. BMC Med Inform Decis Mak 2015;15:2. [PMID: 25656224 PMCID: PMC4350982 DOI: 10.1186/s12911-014-0123-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 12/12/2014] [Indexed: 12/02/2022] Open

Abstract

Background

Medical research networks rely on record linkage and pseudonymization to determine which records from different sources relate to the same patient. To establish informational separation of powers, the required identifying data are redirected to a trusted third party that has, in turn, no access to medical data. This pseudonymization service receives identifying data, compares them with a list of already reported patient records and replies with a (new or existing) pseudonym. We found existing solutions to be technically outdated, complex to implement or not suitable for internet-based research infrastructures. In this article, we propose a new RESTful pseudonymization interface tailored for use in web applications accessed by modern web browsers.

Methods

The interface is modelled as a resource-oriented architecture, which is based on the representational state transfer (REST) architectural style. We translated typical use-cases into resources to be manipulated with well-known HTTP verbs. Patients can be re-identified in real-time by authorized users’ web browsers using temporary identifiers. We encourage the use of PID strings for pseudonyms and the EpiLink algorithm for record linkage. As a proof of concept, we developed a Java Servlet as reference implementation.

Results

The following resources have been identified: Sessions allow data associated with a client to be stored beyond a single request while still maintaining statelessness. Tokens authorize for a specified action and thus allow the delegation of authentication. Patients are identified by one or more pseudonyms and carry identifying fields. Relying on HTTP calls alone, the interface is firewall-friendly. The reference implementation has proven to be production stable.

Conclusion

The RESTful pseudonymization interface fits the requirements of web-based scenarios and allows building applications that make pseudonymization transparent to the user using ordinary web technology. The open-source reference implementation implements the web interface as well as a scientifically grounded algorithm to generate non-speaking pseudonyms.

Collapse

Contiero P, Berrino F, Tagliabue G, Mastroianni A, Di Mauro MG, Fabiano S, Annulli M, Muti P. Fasting blood glucose and long-term prognosis of non-metastatic breast cancer: a cohort study. Breast Cancer Res Treat 2013;138:951-9. [PMID: 23568483 DOI: 10.1007/s10549-013-2519-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 04/01/2013] [Indexed: 10/27/2022]

Abstract

High circulating glucose has been associated with increased risk of breast cancer (BC). There may also be a link between serum glucose and prognosis in women treated for BC. We assessed the effect of peridiagnostic fasting blood glucose and body mass index (BMI) on long-term BC prognosis. We retrospectively investigated 1,261 women diagnosed and treated for stage I-III BC at the National Cancer Institute, Milan, in 1996, 1999 and 2000. Data on blood tests and follow-up were obtained by linking electronic archives, with follow-up to end of 2009. Multivariate Cox modelling estimated hazard ratios (HR) with 95 % confidence intervals (CI) for distant metastasis, recurrence and death (all causes) in relation to categorized peridiagnostic fasting blood glucose and BMI. Mediation analysis investigated whether blood glucose mediated the BMI-breast cancer prognosis association. The risks of distant metastasis were significantly higher for all other quintiles compared to the lowest glucose quintile (reference <87 mg/dL) (respective HRs: 1.99 95 % CI 1.23-3.24, 1.85 95 % CI 1.14-3.0, 1.73 95 % CI 1.07-2.8, and 1.91 95 % CI 1.15-3.17). The risk of recurrence was significantly higher for all other glucose quintiles compared to the first. The risk of death was significantly higher than reference in the second, fourth and fifth quintiles. Women with BMI ≥ 25 kg/m(2) had significantly greater risks of recurrence and distant metastasis than those with BMI < 25 kg/m(2), irrespective of blood glucose. The increased risks remained invariant over a median follow-up of 9.5 years. Mediation analysis indicated that glucose and BMI had independent effects on BC prognosis. Peridiagnostic high fasting glucose and obesity predict worsened short- and long-term outcomes in BC patients. Maintaining healthy blood glucose levels and normal weight may improve prognosis.

Collapse

Campbell KM. Impact of record-linkage methodology on performance indicators and multivariate relationships. J Subst Abuse Treat 2008;36:110-7. [PMID: 18657944 DOI: 10.1016/j.jsat.2008.05.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 04/15/2008] [Accepted: 05/05/2008] [Indexed: 10/21/2022]

Contiero P, Tittarelli A, Maghini A, Fabiano S, Frassoldi E, Costa E, Gada D, Codazzi T, Crosignani P, Tessandori R, Tagliabue G. Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system. J Biomed Inform 2008;41:24-32. [PMID: 17452020 DOI: 10.1016/j.jbi.2007.03.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Revised: 02/02/2007] [Accepted: 03/13/2007] [Indexed: 11/18/2022]

Tagliabue G, Tessandori R, Caramaschi F, Fabiano S, Maghini A, Tittarelli A, Vergani D, Bellotti M, Pisani S, Gambino ML, Frassoldi E, Costa E, Gada D, Crosignani P, Contiero P. Descriptive epidemiology of selected birth defects, areas of Lombardy, Italy, 1999. Popul Health Metr 2007;5:4. [PMID: 17531093 PMCID: PMC1894780 DOI: 10.1186/1478-7954-5-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2006] [Accepted: 05/25/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Birth defects are a leading cause of neonatal and infant mortality in Italy, however little is known of the etiology of most defects. Improvements in diagnosis have revealed increasing numbers of clinically insignificant defects, while improvements in treatment have increased the survival of those with more serious and complex defects. For etiological studies, prevention, and management, it is important to have population-based monitoring which provides reliable data on the prevalence at birth of such defects.

METHODS

We recently initiated population-based birth defect monitoring in the Provinces of Mantova, Sondrio and Varese of the Region of Lombardy, northern Italy, and report data for the first year of operation (1999). The registry uses all-electronic source files (hospital discharge files, death certificates, regional health files, and pathology reports) and a proven case-generation methodology, which is described. The data were checked manually by consulting clinical records in hospitals. Completeness was checked against birth certificates by capture-recapture. Data on cases were coded according to the four-digit malformation codes of the International Classification of Diseases, Ninth Revision (ICD-9). We present data only on selected defects.

RESULTS

We found 246 selected birth defects in 12,008 live births in 1999, 148 among boys and 98 among girls. Congenital heart defects (particularly septal defects) were the most common (90.8/10,000), followed by defects of the genitourinary tract (34.1/10, 000) (particularly hypospadias in boys), digestive system (23.3/10,000) and central nervous system (14.9/10,000), orofacial clefts (10.8/10,000) and Down syndrome (8.3/10,000). Completeness was satisfactory: analysis of birth certificates resulted in the addition of two birth defect cases to the registry.

CONCLUSION

This is the first population-based analysis on selected major birth defects in the Region. The high birth prevalences for septal heart defect and hypospadias are probably due to the inclusion of minor defects and lack of coding standardization; the latter problem also seems important for other defects. However the data produced are useful for estimating the demands made on the health system by babies with birth defects.

Collapse

Tagliabue G, Maghini A, Fabiano S, Tittarelli A, Frassoldi E, Costa E, Nobile S, Codazzi T, Crosignani P, Tessandori R, Contiero P. Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration. Popul Health Metr 2006;4:10. [PMID: 17007640 PMCID: PMC1592124 DOI: 10.1186/1478-7954-4-10] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Accepted: 09/28/2006] [Indexed: 11/26/2022] Open

Abstract

Background

Automated procedures are increasingly used in cancer registration, and it is important that the data produced are systematically checked for consistency and accuracy. We evaluated an automated procedure for cancer registration adopted by the Lombardy Cancer Registry in 1997, comparing automatically-generated diagnostic codes with those produced manually over one year (1997).

Methods

The automatically generated cancer cases were produced by Open Registry algorithms. For manual registration, trained staff consulted clinical records, pathology reports and death certificates. The social security code, present and checked in both databases in all cases, was used to match the files in the automatic and manual databases. The cancer cases generated by the two methods were compared by manual revision.

Results

The automated procedure generated 5027 cases: 2959 (59%) were accepted automatically and 2068 (41%) were flagged for manual checking. Among the cases accepted automatically, discrepancies in data items (surname, first name, sex and date of birth) constituted 8.5% of cases, and discrepancies in the first three digits of the ICD-9 code constituted 1.6%. Among flagged cases, cancers of female genital tract, hematopoietic system, metastatic and ill-defined sites, and oropharynx predominated. The usual reasons were use of specific vs. generic codes, presence of multiple primaries, and use of extranodal vs. nodal codes for lymphomas. The percentage of automatically accepted cases ranged from 83% for breast and thyroid cancers to 13% for metastatic and ill-defined cancer sites.

Conclusion

Since 59% of cases were accepted automatically and contained relatively few, mostly trivial discrepancies, the automatic procedure is efficient for routine case generation effectively cutting the workload required for routine case checking by this amount. Among cases not accepted automatically, discrepancies were mainly due to variations in coding practice.

Collapse