Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

19
(from Reference Citation Analysis)

Article PDFs (3)

Cited by > 0 (11)

Searched Name

deidentification

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Pilgram L, Meurers T, Malin B, Schaeffner E, Eckardt KU, Prasser F. The Costs of Anonymization: Case Study Using Clinical Data. J Med Internet Res 2024;26:e49445. [PMID: 38657232 DOI: 10.2196/49445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/14/2024] [Accepted: 02/13/2024] [Indexed: 04/26/2024] Open

Abstract

BACKGROUND

Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice.

OBJECTIVE

The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study.

METHODS

The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results.

RESULTS

Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy.

CONCLUSIONS

Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data.

TRIAL REGISTRATION

German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

RR2-10.1093/ndt/gfr456.

Collapse

Moffatt C, Leshin J. Best Practices in Evolving Privacy Frameworks for Patient Age Data: Census Data Study. JMIR Form Res 2024;8:e47248. [PMID: 38526530 PMCID: PMC11002729 DOI: 10.2196/47248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 10/17/2023] [Accepted: 11/27/2023] [Indexed: 03/26/2024] Open

Abstract

BACKGROUND

Over the previous 4 decennial censuses, the population of the United States has grown older, with the proportion of individuals aged at least 90 years old in the 2010 census being more than 2 and a half times what it was in the 1980 census. This suggests that the threshold for constraining age introduced in the Safe Harbor method of the HIPAA (Health Insurance Portability and Accountability Act) in 1996 may be increased without exceeding the original levels of risk. This is desirable to maintain or even increase the utility of affected data sets without compromising privacy.

OBJECTIVE

In light of the upcoming release of 2020 census data, this study presents a straightforward recipe for updating age-constrained thresholds in the context of new census data and derives recommendations for new thresholds from the 2010 census.

METHODS

Using census data dating back to 1980, we used group size considerations to analyze the risk associated with various maximum age thresholds over time. We inferred the level of risk of the age cutoff of 90 years at the time of HIPAA's inception in 1996 and used this as a baseline from which to recommend updated cutoffs.

RESULTS

The maximum age threshold may be increased by at least 2 years without exceeding the levels of risk conferred in HIPAA's original recommendations. Moreover, in the presence of additional information that restricts the population in question to a known subgroup with increased longevity (for example, restricting to female patients), the threshold may be increased further.

CONCLUSIONS

Increasing the maximum age threshold would enable the data user to gain more utility from the data without introducing risk beyond what was originally envisioned with the enactment of HIPAA. Going forward, a recurring update of such thresholds is advised, in line with the considerations detailed in the paper.

Collapse

Lee YQ, Chen CT, Chen CC, Lee CH, Chen P, Wu CS, Dai HJ. Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study. J Med Internet Res 2024;26:e48443. [PMID: 38271060 PMCID: PMC10853853 DOI: 10.2196/48443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 10/27/2023] [Accepted: 12/05/2023] [Indexed: 01/27/2024] Open

Abstract

BACKGROUND

The widespread use of electronic health records in the clinical and biomedical fields makes the removal of protected health information (PHI) essential to maintain privacy. However, a significant portion of information is recorded in unstructured textual forms, posing a challenge for deidentification. In multilingual countries, medical records could be written in a mixture of more than one language, referred to as code mixing. Most current clinical natural language processing techniques are designed for monolingual text, and there is a need to address the deidentification of code-mixed text.

OBJECTIVE

The aim of this study was to investigate the effectiveness and underlying mechanism of fine-tuned pretrained language models (PLMs) in identifying PHI in the code-mixed context. Additionally, we aimed to evaluate the potential of prompting large language models (LLMs) for recognizing PHI in a zero-shot manner.

METHODS

We compiled the first clinical code-mixed deidentification data set consisting of text written in Chinese and English. We explored the effectiveness of fine-tuned PLMs for recognizing PHI in code-mixed content, with a focus on whether PLMs exploit naming regularity and mention coverage to achieve superior performance, by probing the developed models' outputs to examine their decision-making process. Furthermore, we investigated the potential of prompt-based in-context learning of LLMs for recognizing PHI in code-mixed text.

RESULTS

The developed methods were evaluated on a code-mixed deidentification corpus of 1700 discharge summaries. We observed that different PHI types had preferences in their occurrences within the different types of language-mixed sentences, and PLMs could effectively recognize PHI by exploiting the learned name regularity. However, the models may exhibit suboptimal results when regularity is weak or mentions contain unknown words that the representations cannot generate well. We also found that the availability of code-mixed training instances is essential for the model's performance. Furthermore, the LLM-based deidentification method was a feasible and appealing approach that can be controlled and enhanced through natural language prompts.

CONCLUSIONS

The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI.

Collapse

Liu J, Gupta S, Chen A, Wang CK, Mishra P, Dai HJ, Wong ZSY, Jonnagaddala J. OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study. J Med Internet Res 2023;25:e48145. [PMID: 38055317 PMCID: PMC10733816 DOI: 10.2196/48145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/26/2023] [Accepted: 11/22/2023] [Indexed: 12/07/2023] Open

Abstract

BACKGROUND

Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning-based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules.

OBJECTIVE

The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models.

METHODS

In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models.

RESULTS

The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time.

CONCLUSIONS

The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline.

Collapse

Patel R, Provenzano D, Loew M. Anonymization and validation of three-dimensional volumetric renderings of computed tomography data using commercially available T1-weighted magnetic resonance imaging-based algorithms. J Med Imaging (Bellingham) 2023;10:066501. [PMID: 38074629 PMCID: PMC10704182 DOI: 10.1117/1.jmi.10.6.066501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 02/12/2024] Open

Liu L, Perez-Concha O, Nguyen A, Bennett V, Blake V, Gallego B, Jorm L. Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study. Interact J Med Res 2023;12:e46322. [PMID: 37624624 PMCID: PMC10492176 DOI: 10.2196/46322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/31/2023] [Accepted: 07/24/2023] [Indexed: 08/26/2023] Open

Abstract

BACKGROUND

The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade.

OBJECTIVE

We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage.

METHODS

DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time.

RESULTS

DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity-level recall and F1-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity-level F1-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation.

CONCLUSIONS

DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve.

Collapse

Chambon PJ, Wu C, Steinkamp JM, Adleberg J, Cook TS, Langlotz CP. Automated deidentification of radiology reports combining transformer and "hide in plain sight" rule-based methods. J Am Med Inform Assoc 2023;30:318-328. [PMID: 36416419 PMCID: PMC9846681 DOI: 10.1093/jamia/ocac219] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/11/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022] Open

Bruña R, Vaghari D, Greve A, Cooper E, Mada MO, Henson RN. Modified MRI Anonymization (De-Facing) for Improved MEG Coregistration. Bioengineering (Basel) 2022;9:bioengineering9100591. [PMID: 36290559 PMCID: PMC9598466 DOI: 10.3390/bioengineering9100591] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 10/02/2022] [Accepted: 10/17/2022] [Indexed: 01/28/2023]

Nanda JK, Marchetti MA. Consent and Deidentification of Patient Images in Dermatology Journals: Observational Study. JMIR Dermatol 2022;5:e37398. [PMID: 36777646 PMCID: PMC9910807 DOI: 10.2196/37398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 06/03/2022] [Accepted: 06/10/2022] [Indexed: 11/13/2022] Open

Parobek CM, Thorsen MM, Has P, Lorenzi P, Clark MA, Russo ML, Lewkowitz AK. Video education about genetic privacy and patient perspectives about sharing prenatal genetic data: a randomized trial. Am J Obstet Gynecol 2022;227:87.e1-87.e13. [PMID: 35351406 DOI: 10.1016/j.ajog.2022.03.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 03/18/2022] [Accepted: 03/24/2022] [Indexed: 11/01/2022]

Abstract

BACKGROUND

Laboratories offering cell-free DNA often reserve the right to share prenatal genetic data for research or even commercial purposes, and obtain this permission on the patient consent form. Although it is known that nonpregnant patients are often reluctant to share their genetic data for research, pregnant patients' knowledge of, and opinions about, genetic data privacy are unknown.

OBJECTIVE

We investigated whether pregnant patients who had already undergone cell-free DNA screening were aware that genetic data derived from cell-free DNA may be shared for research. Furthermore, we examined whether pregnant patients exposed to video education about the Genetic Information Nondiscrimination Act-a federal law that mandates workplace and health insurance protections against genetic discrimination-were more willing to share cell-free DNA-related genetic data for research than pregnant patients who were unexposed.

STUDY DESIGN

In this randomized controlled trial (ClinicalTrials.gov Identifier: NCT04420858), English-speaking patients with singleton pregnancies who underwent cell-free DNA and subsequently presented at 17 0/7 to 23 6/7 weeks of gestation for a detailed anatomy scan were randomized 1:1 to a control or intervention group. Both groups viewed an infographic about cell-free DNA. In addition, the intervention group viewed an educational video about the Genetic Information Nondiscrimination Act. The primary outcomes were knowledge about, and willingness to share, prenatal genetic data from cell-free DNA by commercial laboratories for nonclinical purposes, such as research. The secondary outcomes included knowledge about existing genetic privacy laws, knowledge about the potential for reidentification of anonymized genetic data, and acceptability of various use and sharing scenarios for prenatal genetic data. Eighty-one participants per group were required for 80% power to detect an increase in willingness to share data from 60% to 80% (α=0.05).

RESULTS

A total of 747 pregnant patients were screened, and 213 patients were deemed eligible and approached for potential study participation. Of these patients, 163 (76.5%) consented and were randomized; one participant discontinued the intervention, and two participants were excluded from analysis after the intervention when it was discovered that they did not fulfill all eligibility criteria. Overall, 160 (75.1%) of those approached were included in the final analysis. Most patients in the control group (72 [90.0%]) and intervention (76 [97.4%]) group were either unsure about or incorrectly thought that cell-free DNA companies could not share prenatal genetic data for research. Participants in the intervention group were more likely to incorrectly believe that their prenatal genetic data would not be shared for nonclinical purposes than participants in the control group (28.8% in the control group vs 46.2% in the intervention; P=.03). However, video education did not increase participant willingness to share genetic data in multiple scenarios. Non-White participants were less willing than White participants to allow sharing of genetic data specifically for academic research (P<.001).

CONCLUSION

Most participants were unaware that their prenatal genetic data may be used for nonclinical purposes. Pregnant patients who were educated about the Genetic Information Nondiscrimination Act were not more willing to share genetic data than those who did not receive this education. Surprisingly, video education about the Genetic Information Nondiscrimination Act led patients to falsely believe that their data would not be shared for research, and participants who identified as racial minorities were less willing to share genetic data. New strategies are needed to improve pregnant patients' understanding of genetic privacy.

Collapse

Lee K, Dobbins NJ, McInnes B, Yetisgen M, Uzuner Ö. Transferability of neural network clinical deidentification systems. J Am Med Inform Assoc 2021;28:2661-2669. [PMID: 34586386 DOI: 10.1093/jamia/ocab207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 07/19/2021] [Accepted: 09/10/2021] [Indexed: 11/14/2022] Open

Gupta A, Lai A, Mozersky J, Ma X, Walsh H, DuBois JM. Enabling qualitative research data sharing using a natural language processing pipeline for deidentification: moving beyond HIPAA Safe Harbor identifiers. JAMIA Open 2021;4:ooab069. [PMID: 34435175 PMCID: PMC8382275 DOI: 10.1093/jamiaopen/ooab069] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 07/25/2021] [Accepted: 08/10/2021] [Indexed: 11/20/2022] Open

Zhao Z, Yang M, Tang B, Zhao T. Re-examination of Rule-Based Methods in Deidentification of Electronic Health Records: Algorithm Development and Validation. JMIR Med Inform 2020;8:e17622. [PMID: 32352384 PMCID: PMC7226054 DOI: 10.2196/17622] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 02/28/2020] [Accepted: 03/11/2020] [Indexed: 11/28/2022] Open

Johnson AEW, Bulgarelli L, Pollard TJ. Deidentification of free-text medical records using pre-trained bidirectional transformers. Proc ACM Conf Health Inference Learn (2020) 2020;2020:214-221. [PMID: 34350426 PMCID: PMC8330601 DOI: 10.1145/3368555.3384455] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Carrell DS, Cronkite DJ, Li M(R, Nyemba S, Malin BA, Aberdeen JS, Hirschman L. The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight. J Am Med Inform Assoc 2019;26:1536-1544. [PMID: 31390016 PMCID: PMC6857511 DOI: 10.1093/jamia/ocz114] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 05/08/2019] [Accepted: 06/13/2019] [Indexed: 12/12/2022] Open

Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019;21:e13484. [PMID: 31152528 PMCID: PMC6658290 DOI: 10.2196/13484] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/26/2019] [Indexed: 01/19/2023] Open

Abstract

Background

The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other.

Objective

This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers.

Methods

Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently.

Results

After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data.

Conclusions

Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.

Collapse

Ismail M, Philbin J. Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format. J Med Imaging (Bellingham) 2015;2:026501. [PMID: 26158117 DOI: 10.1117/1.jmi.2.2.026501] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 05/08/2015] [Indexed: 11/14/2022] Open

Clunie DA, Gebow D. Block selective redaction for minimizing loss during de-identification of burned in text in irreversibly compressed JPEG medical images. J Med Imaging (Bellingham) 2015;2:016501. [PMID: 26158090 DOI: 10.1117/1.jmi.2.1.016501] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/03/2015] [Indexed: 11/14/2022] Open

Robinson JD. Beyond the DICOM header: additional issues in deidentification. AJR Am J Roentgenol 2014;203:W658-64. [PMID: 25415732 DOI: 10.2214/AJR.13.11789] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]