1
|
Pilgram L, Meurers T, Malin B, Schaeffner E, Eckardt KU, Prasser F. The Costs of Anonymization: Case Study Using Clinical Data. J Med Internet Res 2024; 26:e49445. [PMID: 38657232 DOI: 10.2196/49445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/14/2024] [Accepted: 02/13/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.1093/ndt/gfr456.
Collapse
Affiliation(s)
- Lisa Pilgram
- Junior Digital Clinician Scientist Program, Biomedical Innovation Academy, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Thierry Meurers
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Elke Schaeffner
- Institute of Public Health, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Kai-Uwe Eckardt
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Hypertension, Universitätsklinikum Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
2
|
Schippers P, Rösch G, Sohn R, Holzapfel M, Junker M, Rapp AE, Jenei-Lanzl Z, Drees P, Zaucke F, Meurer A. A Lightweight Browser-Based Tool for Collaborative and Blinded Image Analysis. J Imaging 2024; 10:33. [PMID: 38392082 PMCID: PMC10889326 DOI: 10.3390/jimaging10020033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 01/21/2024] [Accepted: 01/25/2024] [Indexed: 02/24/2024] Open
Abstract
Collaborative manual image analysis by multiple experts in different locations is an essential workflow in biomedical science. However, sharing the images and writing down results by hand or merging results from separate spreadsheets can be error-prone. Moreover, blinding and anonymization are essential to address subjectivity and bias. Here, we propose a new workflow for collaborative image analysis using a lightweight online tool named Tyche. The new workflow allows experts to access images via temporarily valid URLs and analyze them blind in a random order inside a web browser with the means to store the results in the same window. The results are then immediately computed and visible to the project master. The new workflow could be used for multi-center studies, inter- and intraobserver studies, and score validations.
Collapse
Affiliation(s)
- Philipp Schippers
- Department of Orthopedics and Traumatology, University Medical Center of the Johannes Gutenberg, University Mainz, 55131 Mainz, Germany
- Department of Orthopedics (Friedrichsheim), University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Gundula Rösch
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Rebecca Sohn
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Matthias Holzapfel
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Marius Junker
- Department of Orthopedics (Friedrichsheim), University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
- Department of Orthopedics, Tabea Hospital Hamburg, 22587 Hamburg, Germany
| | - Anna E Rapp
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Zsuzsa Jenei-Lanzl
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Philipp Drees
- Department of Orthopedics and Traumatology, University Medical Center of the Johannes Gutenberg, University Mainz, 55131 Mainz, Germany
| | - Frank Zaucke
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
| | - Andrea Meurer
- Department of Orthopedics (Friedrichsheim), University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
- Department of Orthopedics (Friedrichsheim), Dr. Rolf M. Schwiete Research Unit for Osteoarthritis, University Hospital Frankfurt, Goethe University, 60528 Frankfurt am Main, Germany
- Medical Park St. Hubertus Klinik, 83707 Bad Wiessee, Germany
| |
Collapse
|
3
|
Kokomoto K, Okawa R, Nakano K, Nozaki K. Panoramic Radiograph Generation and Image Reconstruction from Latent Vectors Using a Generative Adversarial Network. Stud Health Technol Inform 2024; 310:1499-1500. [PMID: 38269715 DOI: 10.3233/shti231263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
In this study, StyleGAN2 was trained with panoramic radiographs, and original images were projected into the latent space of StyleGAN2. The resulting latent vectors were input into StyleGAN2, and corresponding images were generated to reconstruct the original images. The original and reconstructed images were evaluated by pediatric dentists and found to be similar. Our results suggest that StyleGAN2 could be applied to the anonymization and data compression of medical images.
Collapse
Affiliation(s)
- Kazuma Kokomoto
- Division for Medical Informatics, Osaka University Dental Hospital, Japan
| | - Rena Okawa
- Department of Pediatric Dentistry, Osaka University Graduate School of Dentistry, Japan
| | - Kazuhiko Nakano
- Department of Pediatric Dentistry, Osaka University Graduate School of Dentistry, Japan
| | - Kazunori Nozaki
- Division for Medical Informatics, Osaka University Dental Hospital, Japan
| |
Collapse
|
4
|
Liu J, Gupta S, Chen A, Wang CK, Mishra P, Dai HJ, Wong ZSY, Jonnagaddala J. OpenDeID Pipeline for Unstructured Electronic Health Record Text Notes Based on Rules and Transformers: Deidentification Algorithm Development and Validation Study. J Med Internet Res 2023; 25:e48145. [PMID: 38055317 PMCID: PMC10733816 DOI: 10.2196/48145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 07/26/2023] [Accepted: 11/22/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Electronic health records (EHRs) in unstructured formats are valuable sources of information for research in both the clinical and biomedical domains. However, before such records can be used for research purposes, sensitive health information (SHI) must be removed in several cases to protect patient privacy. Rule-based and machine learning-based methods have been shown to be effective in deidentification. However, very few studies investigated the combination of transformer-based language models and rules. OBJECTIVE The objective of this study is to develop a hybrid deidentification pipeline for Australian EHR text notes using rules and transformers. The study also aims to investigate the impact of pretrained word embedding and transformer-based language models. METHODS In this study, we present a hybrid deidentification pipeline called OpenDeID, which is developed using an Australian multicenter EHR-based corpus called OpenDeID Corpus. The OpenDeID corpus consists of 2100 pathology reports with 38,414 SHI entities from 1833 patients. The OpenDeID pipeline incorporates a hybrid approach of associative rules, supervised deep learning, and pretrained language models. RESULTS The OpenDeID achieved a best F1-score of 0.9659 by fine-tuning the Discharge Summary BioBERT model and incorporating various preprocessing and postprocessing rules. The OpenDeID pipeline has been deployed at a large tertiary teaching hospital and has processed over 8000 unstructured EHR text notes in real time. CONCLUSIONS The OpenDeID pipeline is a hybrid deidentification pipeline to deidentify SHI entities in unstructured EHR text notes. The pipeline has been evaluated on a large multicenter corpus. External validation will be undertaken as part of our future work to evaluate the effectiveness of the OpenDeID pipeline.
Collapse
Affiliation(s)
- Jiaxing Liu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China
| | | | - Aipeng Chen
- School of Computer Science and Engineering, UNSW, Sydney, Australia
| | - Chen-Kai Wang
- Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | | | - Hong-Jie Dai
- School of Post-Baccalaureate Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Zoie Shui-Yee Wong
- Graduate School of Public Health, St. Luke's International University, Tokyo, Japan
- The Kirby Institute, University of New South Wales, Sydney, Australia
| | - Jitendra Jonnagaddala
- School of Population Health, UNSW Sydney, Kensington, Australia
- NMC Royal Hospital, Khalifa City, Abu Dhabi, United Arab Emirates
| |
Collapse
|
5
|
Patel R, Provenzano D, Loew M. Anonymization and validation of three-dimensional volumetric renderings of computed tomography data using commercially available T1-weighted magnetic resonance imaging-based algorithms. J Med Imaging (Bellingham) 2023; 10:066501. [PMID: 38074629 PMCID: PMC10704182 DOI: 10.1117/1.jmi.10.6.066501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 02/12/2024] Open
Abstract
Purpose Previous studies have demonstrated that three-dimensional (3D) volumetric renderings of magnetic resonance imaging (MRI) brain data can be used to identify patients using facial recognition. We have shown that facial features can be identified on simulation-computed tomography (CT) images for radiation oncology and mapped to face images from a database. We aim to determine whether CT images can be anonymized using anonymization software that was designed for T1-weighted MRI data. Approach Our study examines (1) the ability of off-the-shelf anonymization algorithms to anonymize CT data and (2) the ability of facial recognition algorithms to identify whether faces could be detected from a database of facial images. Our study generated 3D renderings from 57 head CT scans from The Cancer Imaging Archive database. Data were anonymized using AFNI (deface, reface, and 3Dskullstrip) and FSL's BET. Anonymized data were compared to the original renderings and passed through facial recognition algorithms (VGG-Face, FaceNet, DLib, and SFace) using a facial database (labeled faces in the wild) to determine what matches could be found. Results Our study found that all modules were able to process CT data and that AFNI's 3Dskullstrip and FSL's BET data consistently showed lower reidentification rates compared to the original. Conclusions The results from this study highlight the potential usage of anonymization algorithms as a clinical standard for deidentifying brain CT data. Our study demonstrates the importance of continued vigilance for patient privacy in publicly shared datasets and the importance of continued evaluation of anonymization methods for CT data.
Collapse
Affiliation(s)
- Rahil Patel
- George Washington University School of Engineering and Applied Science, Department of Biomedical Engineering, Washington, District of Columbia, United States
| | - Destie Provenzano
- George Washington University School of Engineering and Applied Science, Department of Biomedical Engineering, Washington, District of Columbia, United States
| | - Murray Loew
- George Washington University School of Engineering and Applied Science, Department of Biomedical Engineering, Washington, District of Columbia, United States
| |
Collapse
|
6
|
Vranopoulos G, Clarke N, Atkinson S. Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based System. Big Data 2023. [PMID: 37906117 DOI: 10.1089/big.2022.0201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Organizations have been investing in analytics relying on internal and external data to gain a competitive advantage. However, the legal and regulatory acts imposed nationally and internationally have become a challenge, especially for highly regulated sectors such as health or finance/banking. Data handlers such as Facebook and Amazon have already sustained considerable fines or are under investigation due to violations of data governance. The era of big data has further intensified the challenges of minimizing the risk of data loss by introducing the dimensions of Volume, Velocity, and Variety into confidentiality. Although Volume and Velocity have been extensively researched, Variety, "the ugly duckling" of big data, is often neglected and difficult to solve, thus increasing the risk of data exposure and data loss. In mitigating the risk of data exposure and data loss in this article, a framework is proposed to utilize algorithmic classification and workflow capabilities to provide a consistent approach toward data evaluations across the organizations. A rule-based system, implementing the corporate data classification policy, will minimize the risk of exposure by facilitating users to identify the approved guidelines and enforce them quickly. The framework includes an exception handling process with appropriate approval for extenuating circumstances. The system was implemented in a proof of concept working prototype to showcase the capabilities and provide a hands-on experience. The information system was evaluated and accredited by a diverse audience of academics and senior business executives in the fields of security and data management. The audience had an average experience of ∼25 years and amasses a total experience of almost three centuries (294 years). The results confirmed that the 3Vs are of concern and that Variety, with a majority of 90% of the commentators, is the most troubling. In addition to that, with an approximate average of 60%, it was confirmed that appropriate policies, procedure, and prerequisites for classification are in place while implementation tools are lagging.
Collapse
Affiliation(s)
- Georgios Vranopoulos
- School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, United Kingdom
| | - Nathan Clarke
- School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, United Kingdom
| | - Shirley Atkinson
- School of Engineering, Computing and Mathematics, University of Plymouth, Plymouth, United Kingdom
| |
Collapse
|
7
|
Pilgram L, Schäffner E, Eckardt KU, Prasser F. Utility-Preserving Anonymization in a Real-World Scenario: Evidence from the German Chronic Kidney Disease (GCKD) Study. Stud Health Technol Inform 2023; 302:28-32. [PMID: 37203603 DOI: 10.3233/shti230058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Data sharing provides benefits in terms of transparency and innovation. Privacy concerns in this context can be addressed by anonymization techniques. In our study, we evaluated anonymization approaches which transform structured data in a real-world scenario of a chronic kidney disease cohort study and checked for replicability of research results via 95% CI overlap in two differently anonymized datasets with different protection degrees. Calculated 95% CI overlapped in both applied anonymization approaches and visual comparison presented similar results. Thus, in our use case scenario, research results were not relevantly impacted by anonymization, which adds to the growing evidence of utility-preserving anonymization techniques.
Collapse
Affiliation(s)
- Lisa Pilgram
- Department of Nephrology and Medical Intensive Care, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Junior Digital Clinician Scientist Program, Berlin, Germany
| | - Elke Schäffner
- Institute of Public Health, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kai-Uwe Eckardt
- Department of Nephrology and Medical Intensive Care, Charité - Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Hypertension, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, BIH Biomedical Innovation Academy, BIH Charité Junior Digital Clinician Scientist Program, Berlin, Germany
| |
Collapse
|
8
|
Andrew J, Eunice RJ, Karthikeyan J. An anonymization-based privacy-preserving data collection protocol for digital health data. Front Public Health 2023; 11:1125011. [PMID: 36935661 PMCID: PMC10020182 DOI: 10.3389/fpubh.2023.1125011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/06/2023] [Indexed: 03/06/2023] Open
Abstract
Digital health data collection is vital for healthcare and medical research. But it contains sensitive information about patients, which makes it challenging. To collect health data without privacy breaches, it must be secured between the data owner and the collector. Existing data collection research studies have too stringent assumptions such as using a third-party anonymizer or a private channel amid the data owner and the collector. These studies are more susceptible to privacy attacks due to third-party involvement, which makes them less applicable for privacy-preserving healthcare data collection. This article proposes a novel privacy-preserving data collection protocol that anonymizes healthcare data without using a third-party anonymizer or a private channel for data transmission. A clustering-based k-anonymity model was adopted to efficiently prevent identity disclosure attacks, and the communication between the data owner and the collector is restricted to some elected representatives of each equivalent group of data owners. We also identified a privacy attack, known as "leader collusion", in which the elected representatives may collaborate to violate an individual's privacy. We propose solutions for such collisions and sensitive attribute protection. A greedy heuristic method is devised to efficiently handle the data owners who join or depart the anonymization process dynamically. Furthermore, we present the potential privacy attacks on the proposed protocol and theoretical analysis. Extensive experiments are conducted in real-world datasets, and the results suggest that our solution outperforms the state-of-the-art techniques in terms of privacy protection and computational complexity.
Collapse
Affiliation(s)
- J. Andrew
- Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India
- *Correspondence: J. Andrew
| | - R. Jennifer Eunice
- Electronics and Communication Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India
| | - J. Karthikeyan
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
- J. Karthikeyan
| |
Collapse
|
9
|
Tichopád A, Augustynek M, Beneš J, Dlouhý M, Doležal T, Horáková D, Kršek M, Lhotska L, Panzner P, Penhaker M, Petr M, Piťha J, Popesko B, Rožánek M, Táborský M, Vrablík M. The way to data: opinions and recommendations for the provision of health data for secondary use. Cas Lek Cesk 2023; 162:61-66. [PMID: 37474288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 07/22/2023]
Abstract
Healthcare data held by state-run organisations is a valuable intangible asset for society. Its use should be a priority for its administrators and the state. A completely paternalistic approach by administrators and the state is undesirable, however much it aims to protect the privacy rights of persons registered in databases. In line with European policies and the global trend, these measures should not outweigh the social benefit that arises from the analysis of these data if the technical possibilities exist to sufficiently protect the privacy rights of individuals. Czech society is having an intense discussion on the topic, but according to the authors, it is insufficiently based on facts and lacks clearly articulated opinions of the expert public. The aim of this article is to fill these gaps. Data anonymization techniques provide a solution to protect individuals' privacy rights while preserving the scientific value of the data. The risk of identifying individuals in anonymised data sets is scalable and can be minimised depending on the type and content of the data and its use by the specific applicant. Finding the optimal form and scope of deidentified data requires competence and knowledge on the part of both the applicant and the administrator. It is in the interest of the applicant, the administrator, as well as the protected persons in the databases that both parties show willingness and have the ability and expertise to communicate during the application and its processing.
Collapse
|
10
|
Sahlsten J, Wahid KA, Glerean E, Jaskari J, Naser MA, He R, Kann BH, Mäkitie A, Fuller CD, Kaski K. Segmentation stability of human head and neck cancer medical images for radiotherapy applications under de-identification conditions: Benchmarking data sharing and artificial intelligence use-cases. Front Oncol 2023; 13:1120392. [PMID: 36925936 PMCID: PMC10011442 DOI: 10.3389/fonc.2023.1120392] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/13/2023] [Indexed: 03/08/2023] Open
Abstract
Background Demand for head and neck cancer (HNC) radiotherapy data in algorithmic development has prompted increased image dataset sharing. Medical images must comply with data protection requirements so that re-use is enabled without disclosing patient identifiers. Defacing, i.e., the removal of facial features from images, is often considered a reasonable compromise between data protection and re-usability for neuroimaging data. While defacing tools have been developed by the neuroimaging community, their acceptability for radiotherapy applications have not been explored. Therefore, this study systematically investigated the impact of available defacing algorithms on HNC organs at risk (OARs). Methods A publicly available dataset of magnetic resonance imaging scans for 55 HNC patients with eight segmented OARs (bilateral submandibular glands, parotid glands, level II neck lymph nodes, level III neck lymph nodes) was utilized. Eight publicly available defacing algorithms were investigated: afni_refacer, DeepDefacer, defacer, fsl_deface, mask_face, mri_deface, pydeface, and quickshear. Using a subset of scans where defacing succeeded (N=29), a 5-fold cross-validation 3D U-net based OAR auto-segmentation model was utilized to perform two main experiments: 1.) comparing original and defaced data for training when evaluated on original data; 2.) using original data for training and comparing the model evaluation on original and defaced data. Models were primarily assessed using the Dice similarity coefficient (DSC). Results Most defacing methods were unable to produce any usable images for evaluation, while mask_face, fsl_deface, and pydeface were unable to remove the face for 29%, 18%, and 24% of subjects, respectively. When using the original data for evaluation, the composite OAR DSC was statistically higher (p ≤ 0.05) for the model trained with the original data with a DSC of 0.760 compared to the mask_face, fsl_deface, and pydeface models with DSCs of 0.742, 0.736, and 0.449, respectively. Moreover, the model trained with original data had decreased performance (p ≤ 0.05) when evaluated on the defaced data with DSCs of 0.673, 0.693, and 0.406 for mask_face, fsl_deface, and pydeface, respectively. Conclusion Defacing algorithms may have a significant impact on HNC OAR auto-segmentation model training and testing. This work highlights the need for further development of HNC-specific image anonymization methods.
Collapse
Affiliation(s)
- Jaakko Sahlsten
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Kareem A. Wahid
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Joel Jaskari
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
| | - Mohamed A. Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Renjie He
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Benjamin H. Kann
- Artificial Intelligence in Medicine Program, Brigham and Women’s Hospital, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, United States
| | - Antti Mäkitie
- Department of Otorhinolaryngology, Head and Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
| | - Clifton D. Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
- *Correspondence: Clifton D. Fuller, ; Kimmo Kaski,
| | - Kimmo Kaski
- Department of Computer Science, Aalto University School of Science, Espoo, Finland
- *Correspondence: Clifton D. Fuller, ; Kimmo Kaski,
| |
Collapse
|
11
|
Kasprzak J, Frey S, Oetlinger H, Benedikt Westphalen C, Erickson N, Heinemann V, Nasseh D. Swapping data: A pragmatic approach for enabling academic-industrial partnerships. Digit Health 2023; 9:20552076231172120. [PMID: 37188076 PMCID: PMC10176540 DOI: 10.1177/20552076231172120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 04/10/2023] [Indexed: 05/17/2023] Open
Abstract
Objectives Academic institutions have access to comprehensive sets of real-world data. However, their potential for secondary use-for example, in medical outcomes research or health care quality management-is often limited due to data privacy concerns. External partners could help achieve this potential, yet documented frameworks for such cooperation are lacking. Therefore, this work presents a pragmatic approach for enabling academic-industrial data partnerships in a health care environment. Methods We employ a value-swapping strategy to facilitate data sharing. Using tumor documentation and molecular pathology data, we define a data-altering process as well as rules for an organizational pipeline that includes the technical anonymization process. Results The resulting dataset was fully anonymized while still retaining the critical properties of the original data to allow for external development and the training of analytical algorithms. Conclusion Value swapping is a pragmatic, yet powerful method to balance data privacy and requirements for algorithm development; therefore, it is well suited to enable academic-industrial data partnerships.
Collapse
Affiliation(s)
- Julia Kasprzak
- Comprehensive Cancer Center Munich, University Hospital, LMU Munich, Munich, Germany
| | - Simon Frey
- Roche Pharma AG, Grenzach-Wyhlen, Germany
| | | | | | - Nicole Erickson
- Comprehensive Cancer Center Munich, University Hospital, LMU Munich, Munich, Germany
| | - Volker Heinemann
- Comprehensive Cancer Center Munich, University Hospital, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK, partner
site Munich), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Daniel Nasseh
- Comprehensive Cancer Center Munich, University Hospital, LMU Munich, Munich, Germany
| |
Collapse
|
12
|
Sepas A, Bangash AH, Alraoui O, El Emam K, El-Hussuna A. Algorithms to anonymize structured medical and healthcare data: A systematic review. Front Bioinform 2022; 2:984807. [PMID: 36619476 PMCID: PMC9815524 DOI: 10.3389/fbinf.2022.984807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 11/28/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction: With many anonymization algorithms developed for structured medical health data (SMHD) in the last decade, our systematic review provides a comprehensive bird's eye view of algorithms for SMHD anonymization. Methods: This systematic review was conducted according to the recommendations in the Cochrane Handbook for Reviews of Interventions and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Eligible articles from the PubMed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, ProQuest Dissertation, and Theses Global databases were identified through systematic searches. The following parameters were extracted from the eligible studies: author, year of publication, sample size, and relevant algorithms and/or software applied to anonymize SMHD, along with the summary of outcomes. Results: Among 1,804 initial hits, the present study considered 63 records including research articles, reviews, and books. Seventy five evaluated the anonymization of demographic data, 18 assessed diagnosis codes, and 3 assessed genomic data. One of the most common approaches was k-anonymity, which was utilized mainly for demographic data, often in combination with another algorithm; e.g., l-diversity. No approaches have yet been developed for protection against membership disclosure attacks on diagnosis codes. Conclusion: This study reviewed and categorized different anonymization approaches for MHD according to the anonymized data types (demographics, diagnosis codes, and genomic data). Further research is needed to develop more efficient algorithms for the anonymization of diagnosis codes and genomic data. The risk of reidentification can be minimized with adequate application of the addressed anonymization approaches. Systematic Review Registration: [http://www.crd.york.ac.uk/prospero], identifier [CRD42021228200].
Collapse
Affiliation(s)
- Ali Sepas
- Open Source Research Collaboration, Aalborg, Denmark
- Department of Materials and Production, Aalborg University, Aalborg, Denmark
| | - Ali Haider Bangash
- Open Source Research Collaboration, Aalborg, Denmark
- STMU Shifa College of Medicine, Islamabad, Pakistan
| | - Omar Alraoui
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Khaled El Emam
- Canada Research Chair in Medical AI, University of Ottawa, Ottawa, ON, Canada
| | | |
Collapse
|
13
|
Kroes SKS, van Leeuwen M, Groenwold RHH, Janssen MP. Generating synthetic mixed discrete-continuous health records with mixed sum-product networks. J Am Med Inform Assoc 2022; 30:16-25. [PMID: 36228120 PMCID: PMC9748584 DOI: 10.1093/jamia/ocac184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 09/09/2022] [Accepted: 10/01/2022] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE Privacy is a concern whenever individual patient health data is exchanged for scientific research. We propose using mixed sum-product networks (MSPNs) as private representations of data and take samples from the network to generate synthetic data that can be shared for subsequent statistical analysis. This anonymization method was evaluated with respect to privacy and information loss. MATERIALS AND METHODS Using a simulation study, information loss was quantified by assessing whether synthetic data could reproduce regression parameters obtained from the original data. Predictors variable types were varied between continuous, count, categorical, and mixed discrete-continuous. Additionally, we measured whether the MSPN approach successfully anonymizes the data by removing associations between background and sensitive information for these datasets. RESULTS The synthetic data generated with MSPNs yielded regression results highly similar to those generated with original data, differing less than 5% in most simulation scenarios. Standard errors increased compared to the original data. Particularly for smaller datasets (1000 records), this resulted in a discrepancy between the estimated and empirical standard errors. Sensitive values could no longer be inferred from background information for at least 99% of tested individuals. DISCUSSION The proposed anonymization approach yields very promising results. Further research is required to evaluate its performance with other types of data and analyses, and to predict how user parameter choices affect a bias-privacy trade-off. CONCLUSION Generating synthetic data from MSPNs is a promising, easy-to-use approach for anonymization of sensitive individual health data that yields informative and private data.
Collapse
Affiliation(s)
- Shannon K S Kroes
- Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research, Amsterdam, The Netherlands
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Matthijs van Leeuwen
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
| | - Rolf H H Groenwold
- Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Mart P Janssen
- Transfusion Technology Assessment Group, Donor Medicine Research Department, Sanquin Research, Amsterdam, The Netherlands
- Leiden Institute of Advanced Computer Science, Computer Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
14
|
Sun S, Ma S, Song JH, Yue WH, Lin XL, Ma T. Experiments and Analyses of Anonymization Mechanisms for Trajectory Data Publishing. J Comput Sci Technol 2022; 37:1026-1048. [PMID: 36281257 PMCID: PMC9581755 DOI: 10.1007/s11390-022-2409-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
UNLABELLED With the advancing of location-detection technologies and the increasing popularity of mobile phones and other location-aware devices, trajectory data is continuously growing. While large-scale trajectories provide opportunities for various applications, the locations in trajectories pose a threat to individual privacy. Recently, there has been an interesting debate on the reidentifiability of individuals in the Science magazine. The main finding of Sánchez et al. is exactly opposite to that of De Montjoye et al., which raises the first question: "what is the true situation of the privacy preservation for trajectories in terms of reidentification?" Furthermore, it is known that anonymization typically causes a decline of data utility, and anonymization mechanisms need to consider the trade-off between privacy and utility. This raises the second question: "what is the true situation of the utility of anonymized trajectories?" To answer these two questions, we conduct a systematic experimental study, using three real-life trajectory datasets, five existing anonymization mechanisms (i.e., identifier anonymization, grid-based anonymization, dummy trajectories, k-anonymity and ε-differential privacy), and two practical applications (i.e., travel time estimation and window range queries). Our findings reveal the true situation of the privacy preservation for trajectories in terms of reidentification and the true situation of the utility of anonymized trajectories, and essentially close the debate between De Montjoye et al. and Sánchez et al. To the best of our knowledge, this study is among the first systematic evaluation and analysis of anonymized trajectories on the individual privacy in terms of unicity and on the utility in terms of practical applications. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11390-022-2409-x.
Collapse
Affiliation(s)
- She Sun
- State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, 100191 China
| | - Shuai Ma
- State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, 100191 China
| | - Jing-He Song
- State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, 100191 China
| | - Wen-Hai Yue
- State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, 100191 China
| | - Xue-Lian Lin
- State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing, 100191 China
| | - Tiejun Ma
- Department of Decision Analytics and Risk, Southampton Business School, University of Southampton, Southampton, SO17 1BJ UK
| |
Collapse
|
15
|
Abstract
Mining health data can lead to faster medical decisions, improvement in the quality of treatment, disease prevention, and reduced cost, and it drives innovative solutions within the healthcare sector. However, health data are highly sensitive and subject to regulations such as the General Data Protection Regulation, which aims to ensure patient's privacy. Anonymization or removal of patient identifiable information, although the most conventional way, is the first important step to adhere to the regulations and incorporate privacy concerns. In this article, we review the existing anonymization techniques and their applicability to various types (relational and graph based) of health data. Besides, we provide an overview of possible attacks on anonymized data. We illustrate via a reconstruction attack that anonymization, although necessary, is not sufficient to address patient privacy and discuss methods for protecting against such attacks. Finally, we discuss tools that can be used to achieve anonymization.
Collapse
Affiliation(s)
| | - Jens Rauch
- Health Informatics Research Group, University of Applied Sciences, Osnabrück, Germany
| | | | - Megha Khosla
- L3S Research Center, Leibniz University, Hannover, Germany
| |
Collapse
|
16
|
Kennedy MR, Huxtable R, Birchley G, Ives J, Craddock I. "A Question of Trust" and "a Leap of Faith"-Study Participants' Perspectives on Consent, Privacy, and Trust in Smart Home Research: Qualitative Study. JMIR Mhealth Uhealth 2021; 9:e25227. [PMID: 34842551 PMCID: PMC8665399 DOI: 10.2196/25227] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 01/15/2021] [Accepted: 08/01/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Ubiquitous, smart technology has the potential to assist humans in numerous ways, including with health and social care. COVID-19 has notably hastened the move to remotely delivering many health services. A variety of stakeholders are involved in the process of developing technology. Where stakeholders are research participants, this poses practical and ethical challenges, particularly if the research is conducted in people's homes. Researchers must observe prima facie ethical obligations linked to participants' interests in having their autonomy and privacy respected. OBJECTIVE This study aims to explore the ethical considerations around consent, privacy, anonymization, and data sharing with participants involved in SPHERE (Sensor Platform for Healthcare in a Residential Environment), a project for developing smart technology for monitoring health behaviors at home. Participants' unique insights from being part of this unusual experiment offer valuable perspectives on how to properly approach informed consent for similar smart home research in the future. METHODS Semistructured qualitative interviews were conducted with 7 households (16 individual participants) recruited from SPHERE. Purposive sampling was used to invite participants from a range of household types and ages. Interviews were conducted in participants' homes or on-site at the University of Bristol. Interviews were digitally recorded, transcribed verbatim, and analyzed using an inductive thematic approach. RESULTS Four themes were identified-motivation for participating; transparency, understanding, and consent; privacy, anonymity, and data use; and trust in research. Motivations to participate in SPHERE stemmed from an altruistic desire to support research directed toward the public good. Participants were satisfied with the consent process despite reporting some difficulties-recalling and understanding the information received, the timing and amount of information provision, and sometimes finding the information to be abstract. Participants were satisfied that privacy was assured and judged that the goals of the research compensated for threats to privacy. Participants trusted SPHERE. The factors that were relevant to developing and maintaining this trust were the trustworthiness of the research team, the provision of necessary information, participants' control over their participation, and positive prior experiences of research involvement. CONCLUSIONS This study offers valuable insights into the perspectives of participants in smart home research on important ethical considerations around consent and privacy. The findings may have practical implications for future research regarding the types of information researchers should convey, the extent to which anonymity can be assured, and the long-term duty of care owed to the participants who place trust in researchers not only on the basis of this information but also because of their institutional affiliation. This study highlights important ethical implications. Although autonomy matters, trust appears to matter the most. Therefore, researchers should be alert to the need to foster and maintain trust, particularly as failing to do so might have deleterious effects on future research.
Collapse
Affiliation(s)
- Mari-Rose Kennedy
- Centre for Ethics in Medicine, University of Bristol, Bristol, United Kingdom
| | - Richard Huxtable
- Centre for Ethics in Medicine, University of Bristol, Bristol, United Kingdom
| | - Giles Birchley
- Centre for Ethics in Medicine, University of Bristol, Bristol, United Kingdom
| | - Jonathan Ives
- Centre for Ethics in Medicine, University of Bristol, Bristol, United Kingdom
| | - Ian Craddock
- Department of Electrical & Electronic Engineering, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
17
|
Zuo Z, Watson M, Budgen D, Hall R, Kennelly C, Al Moubayed N. Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study. JMIR Med Inform 2021; 9:e29871. [PMID: 34652278 PMCID: PMC8556642 DOI: 10.2196/29871] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/21/2021] [Accepted: 08/02/2021] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union's General Data Protection Regulation (2016) and the United Kingdom's Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. OBJECTIVE In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. METHODS We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization. RESULTS We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning-based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization. CONCLUSIONS This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications.
Collapse
Affiliation(s)
- Zheming Zuo
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Matthew Watson
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - David Budgen
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Robert Hall
- Cievert Ltd, Newcastle upon Tyne, United Kingdom
| | | | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| |
Collapse
|
18
|
Meurers T, Bild R, Do KM, Prasser F. A scalable software solution for anonymizing high-dimensional biomedical data. Gigascience 2021; 10:giab068. [PMID: 34605868 PMCID: PMC8489190 DOI: 10.1093/gigascience/giab068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 07/19/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Data anonymization is an important building block for ensuring privacy and fosters the reuse of data. However, transforming the data in a way that preserves the privacy of subjects while maintaining a high degree of data quality is challenging and particularly difficult when processing complex datasets that contain a high number of attributes. In this article we present how we extended the open source software ARX to improve its support for high-dimensional, biomedical datasets. FINDINGS For improving ARX's capability to find optimal transformations when processing high-dimensional data, we implement 2 novel search algorithms. The first is a greedy top-down approach and is oriented on a formally implemented bottom-up search. The second is based on a genetic algorithm. We evaluated the algorithms with different datasets, transformation methods, and privacy models. The novel algorithms mostly outperformed the previously implemented bottom-up search. In addition, we extended the GUI to provide a high degree of usability and performance when working with high-dimensional datasets. CONCLUSION With our additions we have significantly enhanced ARX's ability to handle high-dimensional data in terms of processing performance as well as usability and thus can further facilitate data sharing.
Collapse
Affiliation(s)
- Thierry Meurers
- Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| | - Raffael Bild
- School of Medicine, Technical University of Munich, Ismaninger Str. 22, 81675 Munich, Germany
| | - Kieu-Mi Do
- Faculty of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748 Garching, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité–Universitätsmedizin Berlin, Medical Informatics, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
19
|
Ingvar M, Blom MC, Winsnes C, Robinson G, Vanfleteren L, Huff S. On the Annotation of Health Care Pathways to Allow the Application of Care-Plans That Generate Data for Multiple Purposes. Front Digit Health 2021; 3:688218. [PMID: 34713160 PMCID: PMC8521921 DOI: 10.3389/fdgth.2021.688218] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 07/28/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives: Procedural interoperability in health care requires information support and monitoring of a common work practice. Our aim was to devise an information model for a complete annotation of actions in clinical pathways that allow use of multiple plans concomitantly as several partial processes underlie any composite clinical process. Materials and Methods: The development of the information model was based on the integration of a defined protocol for clinical interoperability in the care of patients with chronic obstructive pulmonary disease and an observational study protocol for cohort characterization at the group level. In the clinical process patient reported outcome measures were included. Results: The clinical protocol and the observation study protocol were developed on the clinical level and a single plan definition was developed by merging of the protocols. The information model and a common data model that had been developed for care pathways was successfully implemented and data for the medical records and the observational study could be extracted independently. The interprofessional process support improved the communication between the stakeholders (health care professionals, clinical scientists and providers). Discussion: We successfully merged the processes and had a functionally successful pilot demonstrating a seamless appearance for the health care professionals, while at the same time it was possible to generate data that could serve quality registries and clinical research. The adopted data model was initially tested and hereby published to the public domain. Conclusion: The use of a patient centered information model and data annotation focused on the care pathway simplifies the annotation of data for different purposes and supports sharing of knowledge along the patient care path.
Collapse
Affiliation(s)
- Martin Ingvar
- Department of Clinical Neuroscience, Karolinska Institutet, Solna, Sweden
- Department of Clinical Neuroradiology, Karolinska University Hospital, Stockholm, Sweden
| | | | | | - Greg Robinson
- International Consortium for Health Outcomes Measurement, Boston, MA, United States
| | - Lowie Vanfleteren
- University of Gothenburg and Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Stan Huff
- Department of Biomedical Informatics, Intermountain Health Care, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
20
|
Abstract
Although data protection is compulsory when personal data is shared, there is no systematic method available to evaluate to what extent each individual is at risk of a privacy breach. We use a collection of measures that quantify how much information is needed to uncover sensitive information. Combined with visualization techniques, our approach can be used to perform a detailed privacy analysis of medical data. Because privacy is evaluated per variable, these adjustments can be made while incorporating how likely it is that these variables will be exploited to uncover sensitive information in practice, as is mandatory in the European Union. Additionally, the analysis of privacy can be used to evaluate to what extent knowledge on specific variables in the data can contribute to privacy breaches, which can subsequently guide the use of anonymization techniques, such as generalization.
Collapse
Affiliation(s)
- Shannon Ks Kroes
- Sanquin Research, the Netherlands.,Leiden University, the Netherlands.,Leiden University Medical Center, the Netherlands
| | | | | | | |
Collapse
|
21
|
Abstract
AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, which provide a model-free and easy-to-communicate empirical metric for the representativeness of a synthetic dataset. Privacy risk is assessed by calculating the individual-level distances to closest record with respect to the training data. By showing that the synthetic samples are just as close to the training as to the holdout data, we yield strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records. We empirically demonstrate the presented framework for seven distinct synthetic data solutions across four mixed-type datasets and compare these then to traditional data perturbation techniques. Both a Python-based implementation of the proposed metrics and the demonstration study setup is made available open-source. The results highlight the need to systematically assess the fidelity just as well as the privacy of these emerging class of synthetic data generators.
Collapse
Affiliation(s)
| | - Thomas Reutterer
- Department of Marketing, WU Vienna University of Economics and Business, Vienna, Austria
| |
Collapse
|
22
|
Murugadoss K, Rajasekharan A, Malin B, Agarwal V, Bade S, Anderson JR, Ross JL, Faubion WA, Halamka JD, Soundararajan V, Ardhanari S. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns (N Y) 2021; 2:100255. [PMID: 34179842 PMCID: PMC8212138 DOI: 10.1016/j.patter.2021.100255] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 02/24/2021] [Accepted: 04/07/2021] [Indexed: 10/29/2022]
Abstract
The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data. Detected identifiers are then transformed into plausible, though fictional, surrogates to further obfuscate any leaked identifier. Our approach outperforms existing tools, with a recall of 0.992 and precision of 0.979 on the i2b2 2014 dataset and a recall of 0.994 and precision of 0.967 on a dataset of 10,000 notes from the Mayo Clinic. The de-identification system presented here enables the generation of de-identified patient data at the scale required for modern machine-learning applications to help accelerate medical discoveries.
Collapse
Affiliation(s)
| | | | - Bradley Malin
- Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | | | - Jeff R. Anderson
- Mayo Clinic, Rochester, MN 55905, USA
- Mayo Clinic Platform, Rochester, MN 55905, USA
| | | | | | - John D. Halamka
- Mayo Clinic, Rochester, MN 55905, USA
- Mayo Clinic Platform, Rochester, MN 55905, USA
| | | | | |
Collapse
|
23
|
Ullah I, Shah MA, Khan A, Maple C, Waheed A. Virtual Pseudonym-Changing and Dynamic Grouping Policy for Privacy Preservation in VANETs. Sensors (Basel) 2021; 21:s21093077. [PMID: 33925131 PMCID: PMC8124586 DOI: 10.3390/s21093077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 11/29/2022]
Abstract
Location privacy is a critical problem in the vehicular communication networks. Vehicles broadcast their road status information to other entities in the network through beacon messages. The beacon message content consists of the vehicle ID, speed, direction, position, and other information. An adversary could use vehicle identity and positioning information to determine vehicle driver behavior and identity at different visited location spots. A pseudonym can be used instead of the vehicle ID to help in the vehicle location privacy. These pseudonyms should be changed in appropriate way to produce uncertainty for any adversary attempting to identify a vehicle at different locations. In the existing research literature, pseudonyms are changed during silent mode between neighbors. However, the use of a short silent period and the visibility of pseudonyms of direct neighbors provides a mechanism for an adversary to determine the identity of a target vehicle at specific locations. Moreover, privacy is provided to the driver, only within the RSU range; outside it, there is no privacy protection. In this research, we address the problem of location privacy in a highway scenario, where vehicles are traveling at high speeds with diverse traffic density. We propose a Dynamic Grouping and Virtual Pseudonym-Changing (DGVP) scheme for vehicle location privacy. Dynamic groups are formed based on similar status vehicles and cooperatively change pseudonyms. In the case of low traffic density, we use a virtual pseudonym update process. We formally present the model and specify the scheme through High-Level Petri Nets (HLPN). The simulation results indicate that the proposed method improves the anonymity set size and entropy, provides lower traceability, reduces impact on vehicular network applications, and has lower computation cost compared to existing research work.
Collapse
Affiliation(s)
- Ikram Ullah
- Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan; (M.A.S.); (A.W.)
- Correspondence:
| | - Munam Ali Shah
- Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan; (M.A.S.); (A.W.)
| | - Abid Khan
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK;
| | - Carsten Maple
- Secure Cyber Systems Research Group, WMG, University of Warwick, Coventry CV4 7AL, UK;
| | - Abdul Waheed
- Department of Computer Science, COMSATS University Islamabad, Islamabad 45550, Pakistan; (M.A.S.); (A.W.)
| |
Collapse
|
24
|
Scheibner J, Raisaro JL, Troncoso-Pastoriza JR, Ienca M, Fellay J, Vayena E, Hubaux JP. Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis. J Med Internet Res 2021; 23:e25120. [PMID: 33629963 PMCID: PMC7952236 DOI: 10.2196/25120] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/06/2021] [Accepted: 01/16/2021] [Indexed: 12/03/2022] Open
Abstract
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administration induced by these contracts increases the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between 2 novel advanced privacy-enhancing technologies-homomorphic encryption and secure multiparty computation (defined together as multiparty homomorphic encryption). These privacy-enhancing technologies provide a mathematical guarantee of privacy, with multiparty homomorphic encryption providing a performance advantage over separately using homomorphic encryption or secure multiparty computation. We argue multiparty homomorphic encryption fulfills legal requirements for medical data sharing under the European Union's General Data Protection Regulation which has set a global benchmark for data protection. Specifically, the data processed and shared using multiparty homomorphic encryption can be considered anonymized data. We explain how multiparty homomorphic encryption can reduce the reliance upon customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research while offering additional incentives for health care and research institutes to employ common data interoperability standards.
Collapse
Affiliation(s)
- James Scheibner
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
- College of Business, Government and Law, Flinders University, Adelaide, Australia
| | - Jean Louis Raisaro
- Precision Medicine Unit, Lausanne University Hospital, Lausanne, Switzerland
- Data Science Group, Lausanne University Hospital, Lausanne, Switzerland
| | - Juan Ramón Troncoso-Pastoriza
- Laboratory for Data Security, School of Computer and Communication Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Marcello Ienca
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Jacques Fellay
- Precision Medicine Unit, Lausanne University Hospital, Lausanne, Switzerland
- School of Life Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Host-Pathogen Genomics Laboratory, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Effy Vayena
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Jean-Pierre Hubaux
- Laboratory for Data Security, School of Computer and Communication Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
25
|
Jeon S, Seo J, Kim S, Lee J, Kim JH, Sohn JW, Moon J, Joo HJ. Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models. J Med Internet Res 2020; 22:e19597. [PMID: 33177037 PMCID: PMC7728527 DOI: 10.2196/19597] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/29/2020] [Accepted: 11/11/2020] [Indexed: 02/01/2023] Open
Abstract
Background De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.
Collapse
Affiliation(s)
- Seungho Jeon
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Jeongeun Seo
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Sukyoung Kim
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Jeongmoon Lee
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Republic of Korea
| | - Jong-Ho Kim
- Department of Cardiology, Cardiovascular Center, Korea University, Seoul, Republic of Korea
| | - Jang Wook Sohn
- Division of Infectious Diseases, Department of Internal Medicine, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Jongsub Moon
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Hyung Joon Joo
- Department of Internal Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| |
Collapse
|
26
|
Abstract
Making data Findable, Accessible, Interoperable and Reusable (FAIR) is a good approach when data needs to be shared. However, security and privacy are still critical aspects. In the FAIRification process, there is a need both for de-identification of data and for license attribution. The paper analyses some of the issues related to this process when the objective is sharing genomic information. The main results are the identification of the already existing standards that could be used for this purpose and how to combine them. Nevertheless, the area is quickly evolving and more specific standards could be specified.
Collapse
Affiliation(s)
- Jaime Delgado
- Information Modeling and Processing (IMP) group - DMAG, Computer Architecture Dept. (DAC), Universitat Politècnica de Catalunya (UPC BarcelonaTECH)
| | - Silvia Llorente
- Information Modeling and Processing (IMP) group - DMAG, Computer Architecture Dept. (DAC), Universitat Politècnica de Catalunya (UPC BarcelonaTECH)
| |
Collapse
|
27
|
Parker W, Jaremko JL, Cicero M, Azar M, El-Emam K, Gray BG, Hurrell C, Lavoie-Cardinal F, Desjardins B, Lum A, Sheremeta L, Lee E, Reinhold C, Tang A, Bromwich R. Canadian Association of Radiologists White Paper on De-Identification of Medical Imaging: Part 1, General Principles. Can Assoc Radiol J 2020; 72:13-24. [PMID: 33138621 DOI: 10.1177/0846537120967349] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 1 of this article will inform CAR members on principles of de-identification, pseudonymization, encryption, direct and indirect identifiers, k-anonymization, risks of reidentification, implementations, data set release models, and validation of AI algorithms, with a view to developing appropriate standards to safeguard patient information effectively.
Collapse
Affiliation(s)
- William Parker
- Department of Radiology, 8166University of British Columbia, Vancouver, British Columbia, Canada.,SapienML Corp, Vancouver, British Columbia, Canada
| | - Jacob L Jaremko
- Department of Radiology & Diagnostic Imaging, 12357University of Alberta, Edmonton, Canada
| | - Mark Cicero
- 16 Bit Inc, Toronto, Ontario, Canada.,True North Imaging, Thornhill, Ontario, Canada
| | - Marleine Azar
- Department of Medicine, 5622Université de Montréal, Montréal, Quebec, Canada
| | - Khaled El-Emam
- School of Epidemiology and Public Health, University of Ottawa, Ontario, Canada
| | - Bruce G Gray
- Department of Medical Imaging, University of Toronto, Toronto, Canada
| | - Casey Hurrell
- 525917Canadian Association of Radiologists, Ottawa, Canada
| | | | | | - Andrea Lum
- Department of Medical Imaging, 6221Western University, London, Ontario, Canada
| | - Lori Sheremeta
- 41464Northern Alberta Institute of Technology, Alberta, Canada
| | - Emil Lee
- 27355Fraser Health Authority, Vancouver, British Columbia, Canada
| | - Caroline Reinhold
- 54473McGill University Health Center, McGill University, Montreal, Canada.,Augmented Intelligence & Precision Health Laboratory of the Research Institute, McGill University Health Center, McGill University, Montreal, Canada
| | - An Tang
- Department of Radiology, Radio-oncology, and Nuclear Medicine, 5622Universite de Montreal, Montreal, Quebec, Canada
| | - Rebecca Bromwich
- Department of Law and Legal Studies, 6339Carleton University, Ottawa, Canada
| |
Collapse
|
28
|
Parker W, Jaremko JL, Cicero M, Azar M, El-Emam K, Gray BG, Hurrell C, Lavoie-Cardinal F, Desjardins B, Lum A, Sheremeta L, Lee E, Reinhold C, Tang A, Bromwich R. Canadian Association of Radiologists White Paper on De-identification of Medical Imaging: Part 2, Practical Considerations. Can Assoc Radiol J 2020; 72:25-34. [PMID: 33140663 DOI: 10.1177/0846537120967345] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 2 of this article will inform CAR members on the practical aspects of medical imaging de-identification, strengths and limitations of de-identification approaches, list of de-identification software and tools available, and perspectives on future directions.
Collapse
Affiliation(s)
- William Parker
- Department of Radiology, 8166University of British Columbia, Vancouver, British Columbia, Canada.,SapienML Corp, Vancouver, British Columbia, Canada
| | - Jacob L Jaremko
- Department of Radiology & Diagnostic Imaging, 3158University of Alberta, Edmonton, Canada
| | - Mark Cicero
- 16 Bit Inc, Toronto, Ontario, Canada.,True North Imaging, Thornhill, Ontario, Canada
| | - Marleine Azar
- Department of Medicine, 5622Université de Montréal, Montréal, Quebec, Canada
| | - Khaled El-Emam
- School of Epidemiology and Public Health, University of Ottawa, Ontario, Canada
| | - Bruce G Gray
- Department of Medical Imaging, University of Toronto, Toronto, Canada
| | - Casey Hurrell
- 103977Canadian Association of Radiologists, Ottawa, Canada
| | | | | | - Andrea Lum
- Department of Medical Imaging, 70384Western University, London, Ontario, Canada
| | - Lori Sheremeta
- 41464Northern Alberta Institute of Technology, Edmonton, Alberta, Canada
| | - Emil Lee
- 27355Fraser Health Authority, Vancouver, British Columbia, Canada
| | - Caroline Reinhold
- 54473McGill University Health Center, McGill University, Montréal, Canada.,Augmented Intelligence & Precision Health Laboratory of the Research Institute of 54473McGill University Health Centre, Montréal, Quebec, Canada
| | - An Tang
- Department of Radiology, Radio-oncology, and Nuclear Medicine, 12368Universite de Montreal, Montréal, Quebec, Canada
| | - Rebecca Bromwich
- Department of Law and Legal Studies, 6339Carleton University, Ottawa, Canada
| |
Collapse
|
29
|
Bild R, Kuhn KA, Prasser F. Better Safe than Sorry - Implementing Reliable Health Data Anonymization. Stud Health Technol Inform 2020; 270:68-72. [PMID: 32570348 DOI: 10.3233/shti200124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modern biomedical research is increasingly data-driven. To create the required big datasets, health data needs to be shared or reused, which often leads to privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and reliable tools. In this work, we tackle the problem of achieving reliability. Privacy models often involve mathematical definitions using real numbers which are typically approximated using floating-point numbers when implemented as software. We study the effect on the privacy guarantees provided and present a reliable computing framework based on fractional and interval arithmetic for improving the reliability of implementations. Extensive evaluations demonstrate that reliable data anonymization is practical and that it can be achieved with minor impacts on executions times and data utility.
Collapse
Affiliation(s)
- Raffael Bild
- University hospital rechts der Isar, Technical University of Munich, Germany
| | - Klaus A Kuhn
- University hospital rechts der Isar, Technical University of Munich, Germany
| | - Fabian Prasser
- Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| |
Collapse
|
30
|
Mattern H, Knoll M, Lüsebrink F, Speck O. Chemical shift-based prospective k-space anonymization. Magn Reson Med 2020; 85:962-969. [PMID: 32761655 DOI: 10.1002/mrm.28460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/16/2020] [Accepted: 07/10/2020] [Indexed: 01/07/2023]
Abstract
PURPOSE Publicly available data provision is an essential part of open science. However, open data can conflict with data privacy and data protection regulations. Head scans are particularly vulnerable because the subject's face can be reconstructed from the acquired images. Although defacing can impede subject identification in reconstructed images, this approach is not applicable to k-space raw data. To address this challenge and allow defacing of raw data for publication, we present chemical shift-based prospective k-space anonymization (CHARISMA). METHODS In spin-warp imaging, fat shift occurs along the frequency-encoding direction. By placing an oil-filled mask onto the subject's face, the shifted fat signal can overlap with the face to deface k-space during the acquisition. The CHARISMA approach was tested for gradient-echo sequences in a single subject wearing the oil-filled mask at 7 T. Different fat shifts were compared by varying the readout bandwidth. Furthermore, intensity-based segmentation was used to test whether the images could be unmasked retrospectively. RESULTS To impede subject identification after retrospective unmasking, the signal of face and shifted oil should overlap. In this single-subject study, a shift of 3.3 mm to 4.9 mm resulted in the most efficient masking. Independent of CHARISMA, long TEs induce signal decay and dephasing, which impeded unmasking. CONCLUSION To our best knowledge, CHARISMA is the first prospective k-space defacing approach. With proper fat-shift direction and amplitude, this easy-to-build, low-cost solution impaired subject identification in gradient-echo data considerably. Further sequences will be tested with CHARISMA in the future.
Collapse
Affiliation(s)
- Hendrik Mattern
- Biomedical Magnetic Resonance, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Martin Knoll
- Biomedical Magnetic Resonance, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany
| | - Falk Lüsebrink
- Biomedical Magnetic Resonance, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,Medicine and Digitalization, Department of Neurology, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Oliver Speck
- Biomedical Magnetic Resonance, Otto-von-Guericke-University Magdeburg, Magdeburg, Germany.,German Center for Neurodegenerative Disease, Magdeburg, Germany.,Center for Behavioral Brain Sciences, Magdeburg, Germany.,Leibniz Institute for Neurobiology, Magdeburg, Germany
| |
Collapse
|
31
|
Nasseh D. The Mishandling of Anonymity in Terms of Medical Research Within the General Data Protection Regulation. Stud Health Technol Inform 2020; 272:43-46. [PMID: 32604596 DOI: 10.3233/shti200489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
One of the major regulatory factors for health informatics is data privacy protection. In the European Union, a shared set of laws has been implemented - the General Data Protection Regulation. While this set of rules aims at harmonizing the European data privacy protection standards, it fails in properly detailing the handling of anonymized data. This is a problem, as, for example many current research initiatives aim at reusing patient data collected within primary care, but lack a patient consent, hence, might rely on anonymized data as being the only alternative. Within this work, we detail different aspects why the concept of anonymity is wrongly handled within the GDPR and give suggestions how the laws could be adapted.
Collapse
|
32
|
Mezinska S, Buka A, Bankava A, Barzdins J. Legal and Ethical Issues in Secondary Use of Administrative Health Data: The Case of Latvian Healthcare Monitoring Datalink. Stud Health Technol Inform 2020; 270:1138-1142. [PMID: 32570559 DOI: 10.3233/shti200340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The paper presents analysis of the legal and ethical issues surrounding establishment of the Latvian Healthcare Monitoring Datalink. The paper covers three interconnected issues in the context of the use of administrative health data for research purposes - anonymization of data, concept of 'public interest' and involvement of research ethics committees. The analysis has been put into broader context of interaction between General Data Protection Regulation (GDPR), national legislative measures and practical needs of researchers. Neither GDPR, nor Latvian legal framework regulate the particularities on the use of potentially identifiable health data in research. Also, the practical use of 'public interest' as a basis for lawful processing of personal data concerning health for research purposes is not clear. More extended involvement of research ethics committees might serve as useful tool for determination the 'public interest' and for the evaluation of proportionality when balancing the aims of the research and the personal data protection.
Collapse
|
33
|
Bild R, Eicher J, Prasser F. Efficient Protection of Health Data from Sensitive Attribute Disclosure. Stud Health Technol Inform 2020; 270:193-197. [PMID: 32570373 DOI: 10.3233/shti200149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Biomedical research has become data-driven. To create the required big datasets, health data needs to be shared or reused out of the context of its initial purpose. This leads to significant privacy challenges. Data anonymization is an important protection method where data is transformed such that privacy guarantees can be provided according to formal models. For applications in practice, anonymization methods need to be integrated into scalable and robust tools. In this work, we focus on the problem of scalability. Protecting biomedical data from inference attacks is challenging, in particular for numeric data. An important privacy model in this context is t-closeness, which has also been defined for attribute values which are totally ordered. However, directly implementing a scalable algorithmic representation of the mathematical definition of the model proves difficult. In this paper we therefore present a series of optimizations that can be used to achieve efficiency in production use. An experimental evaluation shows that our approach reduces execution times of anonymization processes involving t-closeness by up to a factor of two.
Collapse
Affiliation(s)
- Raffael Bild
- University hospital rechts der Isar, Technical University of Munich, Germany
| | - Johanna Eicher
- University hospital rechts der Isar, Technical University of Munich, Germany
| | - Fabian Prasser
- Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany
| |
Collapse
|
34
|
Kofler F, Berger C, Waldmannstetter D, Lipkova J, Ezhov I, Tetteh G, Kirschke J, Zimmer C, Wiestler B, Menze BH. BraTS Toolkit: Translating BraTS Brain Tumor Segmentation Algorithms Into Clinical and Scientific Practice. Front Neurosci 2020; 14:125. [PMID: 32410929 PMCID: PMC7201293 DOI: 10.3389/fnins.2020.00125] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 01/31/2020] [Indexed: 01/01/2023] Open
Abstract
Despite great advances in brain tumor segmentation and clear clinical need, translation of state-of-the-art computational methods into clinical routine and scientific practice remains a major challenge. Several factors impede successful implementations, including data standardization and preprocessing. However, these steps are pivotal for the deployment of state-of-the-art image segmentation algorithms. To overcome these issues, we present BraTS Toolkit. BraTS Toolkit is a holistic approach to brain tumor segmentation and consists of three components: First, the BraTS Preprocessor facilitates data standardization and preprocessing for researchers and clinicians alike. It covers the entire image analysis workflow prior to tumor segmentation, from image conversion and registration to brain extraction. Second, BraTS Segmentor enables orchestration of BraTS brain tumor segmentation algorithms for generation of fully-automated segmentations. Finally, Brats Fusionator can combine the resulting candidate segmentations into consensus segmentations using fusion methods such as majority voting and iterative SIMPLE fusion. The capabilities of our tools are illustrated with a practical example to enable easy translation to clinical and scientific practice.
Collapse
Affiliation(s)
- Florian Kofler
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany.,Department of Neuroradiology, Klinikum rechts der Isar, Munich, Germany
| | - Christoph Berger
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Diana Waldmannstetter
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Jana Lipkova
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Ivan Ezhov
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Giles Tetteh
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| | - Jan Kirschke
- Department of Neuroradiology, Klinikum rechts der Isar, Munich, Germany
| | - Claus Zimmer
- Department of Neuroradiology, Klinikum rechts der Isar, Munich, Germany
| | - Benedikt Wiestler
- Department of Neuroradiology, Klinikum rechts der Isar, Munich, Germany
| | - Bjoern H Menze
- Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, Munich, Germany
| |
Collapse
|
35
|
Iaconisi J, Hasselblatt F, Mayer B, Schoen M, Böckers TM, Böckers A. Effects of an Educational Film About Body Donors on Students' Empathy and Anxiety Levels in Gross Anatomy. Anat Sci Educ 2019; 12:386-398. [PMID: 30925012 DOI: 10.1002/ase.1880] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Revised: 03/10/2019] [Accepted: 03/11/2019] [Indexed: 06/09/2023]
Abstract
While most German anatomy institutes provide only limited information about body donors and their lives, students have expressed a desire to learn more about these individuals, especially about their motivations to donate their bodies for the sake of medical education. In order to gratify this wish, as well as to further humanize body donors, an educational film was compiled, and a study designed to capture the film's effects on medical students. This is the first study using standardized, validated psychological tools to evaluate the impact of an educational film about body donors on students' empathy and psychological stress levels. The study followed a longitudinal, controlled, and cluster randomized design, including 77 (48 females/29 males) participants who watched the video either before, midway, or after the dissection course. Questionnaires were completed at four points in time applying the Jefferson Scale for Empathy (JSPE-S) and the Interpersonal Reactivity Index (IRI) to measure empathy. Psychological stress levels were recorded by the Brief Symptom Inventory (BSI). Overall, students recommended the film to be shown to all students (median 6.0; maximum on the six-point Likert scale). Viewing the film revealed no significant changes between study groups or over time in JSPE-S sum scores. All groups demonstrated a significant reduction of BSI values before the dissection course actually started and increased values during the course, but both developments appeared not to be associated with the intervention. Overall, the educational film did not correlate with any negative effects on students' empathy and psychological stress levels, and it was strongly approved of by students, as it provided more humanizing personal information about body donors without violating their anonymity.
Collapse
Affiliation(s)
- Julia Iaconisi
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - Friederike Hasselblatt
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - Benjamin Mayer
- Faculty of Medicine, Institute of Epidemiology and Medical Biometry, Ulm University, Ulm, Germany
| | - Michael Schoen
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - Tobias Maria Böckers
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - Anja Böckers
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| |
Collapse
|
36
|
Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019; 21:e13484. [PMID: 31152528 PMCID: PMC6658290 DOI: 10.2196/13484] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/26/2019] [Indexed: 01/19/2023] Open
Abstract
Background The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other. Objective This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. Methods Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. Results After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. Conclusions Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.
Collapse
Affiliation(s)
- Raphaël Chevrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Arnaud Robert
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
37
|
Caetano SJ, Dawe D, Ellis P, Earle CC, Pond GR. Methods to improve the estimation of time-to-event outcomes when data is de-identified. Stat Med 2019; 38:625-635. [PMID: 30311241 DOI: 10.1002/sim.7990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/30/2018] [Accepted: 09/06/2018] [Indexed: 11/07/2022]
Abstract
Technological advancements in recent years have sparked the use of large databases for research. The availability of these large databases has administered a need for anonymization and de-identification techniques, prior to publishing the data. This de-identification alters the data, which in turn can impact the results derived post de-identification and potentially lead to false conclusions. The objective of this study is to investigate if alterations to a de-identified time-to-event data set may improve the accuracy of the estimates. In this data set, a missing time bias was present among censored patients as a means to preserve patient confidentiality. This study investigates five methods intended to reduce the bias of time-to-event estimates. A simulation study was conducted to evaluate the effectiveness of each method in reducing bias. In situations where there was a large number of censored patients, the results of the simulation showed that Method 4 yielded the most accurate estimates. This method adjusted the survival times of censored patients by adding a random uniform component such that the modified survival time would occur within the final year of the study. Alternatively, when there was only a small number of censored patients, the method that did not alter the de-identified data set (Method 1) provided the most accurate estimates.
Collapse
Affiliation(s)
- Samantha-Jo Caetano
- Department of Mathematics and Statistics, McMaster University, Hamilton, Canada
| | - David Dawe
- Department of Internal Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, Canada
- Department of Hematology and Medical Oncology, Cancer Care Manitoba, Winnipeg, Canada
| | - Peter Ellis
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Craig C Earle
- Cancer Care Ontario, Toronto, Canada
- Ontario Institute for Cancer Research, Toronto, Canada
- Institute for Clinical Evaluative Sciences, Toronto, Canada
| | - Gregory R Pond
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| |
Collapse
|
38
|
Hasselblatt F, Messerer DAC, Keis O, Böckers TM, Böckers A. Anonymous body or first patient? A status report and needs assessment regarding the personalization of donors in dissection courses in German, Austrian, and Swiss Medical Schools. Anat Sci Educ 2018; 11:282-293. [PMID: 29742328 DOI: 10.1002/ase.1744] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Revised: 08/19/2017] [Accepted: 09/17/2017] [Indexed: 06/08/2023]
Abstract
Many Anglo-American universities have undertaken a paradigm shift in how the dissection of human material is approached, such that students are encouraged to learn about the lives of body donors, and to respectfully "personalize" them as human beings, rather than treating the specimens as anonymous cadavers. For the purposes of this study, this provision of limited personal information regarding the life of a body donor will be referred to as "personalization" of body donors. At this time, it is unknown whether this paradigm shift in the personalization of body donors can be translated into the German-speaking world. A shift from donor anonymity to donor personalization could strengthen students' perception of the donor as a "first patient," and thereby reinforce their ability to empathize with their future patients. Therefore, this study aimed to collect data about the current status of donation practices at German-speaking anatomy departments (n = 44) and to describe the opinions of anatomy departments, students (n = 366), and donors (n = 227) about possible donor personalization in medical education. Anatomy departments in Germany, Austria, and Switzerland were invited to participate in an online questionnaire. One-tenth of registered donors at Ulm University were randomly selected and received a questionnaire (20 items, yes-no questions) by mail. Students at the University of Ulm were also surveyed at the end of the dissection course (31 items, six-point Likert-scale). The majority of students were interested in receiving additional information about their donors (78.1%). A majority of donors also supported the anonymous disclosure of information about their medical history (92.5%). However, this information is only available in about 28% of the departments surveyed and is communicated to the students only irregularly. Overall, 78% of anatomy departments were not in favor of undertaking donor personalization. The results appear to reflect traditional attitudes among anatomy departments. However, since students clearly preferred receiving additional donor information, and most donors expressed a willingness to provide this information, one could argue that a change in attitudes is necessary. To do so, official recommendations for a limited, anonymous personalization of donated cadaveric specimens might be necessary. Anat Sci Educ 11: 282-293. © 2017 American Association of Anatomists.
Collapse
Affiliation(s)
- Friederike Hasselblatt
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - David A C Messerer
- Department of Evaluation and Quality Management, Ulm University, Faculty of Medicine, Ulm, Germany
| | - Oliver Keis
- Department of Evaluation and Quality Management, Ulm University, Faculty of Medicine, Ulm, Germany
| | - Tobias M Böckers
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| | - Anja Böckers
- Faculty of Medicine, Institute of Anatomy and Cell Biology, Ulm University, Ulm, Germany
| |
Collapse
|
39
|
Richter-Pechanski P, Riezler S, Dieterich C. De-Identification of German Medical Admission Notes. Stud Health Technol Inform 2018; 253:165-169. [PMID: 30147065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named entity recognition using a three-step approach. First, we used well known rule-based models based on regular expressions and gazetteers, second we used a spelling variant detector based on Levenshtein distance, exploiting the fact that the medical texts contain semi-structured headers including sensible personal data, and third we trained a named entity recognition model on out of domain data to add statistical capabilities to our prototype. Using a baseline based on regular expressions and gazetteers we could improve F2-score from 78% to 85% for de-identification. Our prototype is a first step for further research on German medical text de-identification and could show that using spelling variant detection and out of domain trained statistical models can improve de-identification performance significantly.
Collapse
Affiliation(s)
| | - Stefan Riezler
- Department of Computational Linguistics, University of Heidelberg, Heidelberg, Germany
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology and Department of Internal Medicine III, University Hospital Heidelberg, German Center for Cardiovascular Research (DZHK) - Partner site Heidelberg/Mannheim
| |
Collapse
|
40
|
Abstract
Health information technology has increased accessibility of health and medical data and benefited medical research and healthcare management. However, there are rising concerns about patient privacy in sharing medical and healthcare data. A large amount of these data are in free text form. Existing techniques for privacy-preserving data sharing deal largely with structured data. Current privacy approaches for medical text data focus on detection and removal of patient identifiers from the data, which may be inadequate for protecting privacy or preserving data quality. We propose a new systematic approach to extract, cluster, and anonymize medical text records. Our approach integrates methods developed in both data privacy and health informatics fields. The key novel elements of our approach include a recursive partitioning method to cluster medical text records based on the similarity of the health and medical information and a value-enumeration method to anonymize potentially identifying information in the text data. An experimental study is conducted using real-world medical documents. The results of the experiments demonstrate the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Xiao-Bai Li
- Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854
| | - Jialun Qin
- Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, Massachusetts 01854
| |
Collapse
|
41
|
Foufi V, Gaudet-Blavignac C, Chevrier R, Lovis C. De-Identification of Medical Narrative Data. Stud Health Technol Inform 2017; 244:23-27. [PMID: 29039370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Maintaining data security and privacy in an era of cybersecurity is a challenge. The enormous and rapidly growing amount of health-related data available today raises numerous questions about data collection, storage, analysis, comparability and interoperability but also about data protection. The US Health Portability and Accountability Act (HIPAA) of 1996 provides a legal framework and a guidance for using and disclosing health data. Practically, the approach proposed by HIPAA is the de-identification of medical documents by removing certain Protected Health Information (PHI). In this work, a rule-based method for the de-identification of French free-text medical data using Natural Language Processing (NLP) tools will be presented.
Collapse
Affiliation(s)
- Vasiliki Foufi
- Division of Medical Information Sciences, Geneva University Hospitals and University of Geneva
| | | | - Raphaël Chevrier
- Division of Medical Information Sciences, Geneva University Hospitals and University of Geneva
| | - Christian Lovis
- Division of Medical Information Sciences, Geneva University Hospitals and University of Geneva
| |
Collapse
|
42
|
Sariyar M, Schlünder I. Reconsidering Anonymization-Related Concepts and the Term "Identification" Against the Backdrop of the European Legal Framework. Biopreserv Biobank 2016; 14:367-374. [PMID: 27104620 PMCID: PMC5073223 DOI: 10.1089/bio.2015.0100] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Sharing data in biomedical contexts has become increasingly relevant, but privacy concerns set constraints for free sharing of individual-level data. Data protection law protects only data relating to an identifiable individual, whereas “anonymous” data are free to be used by everybody. Usage of many terms related to anonymization is often not consistent among different domains such as statistics and law. The crucial term “identification” seems especially hard to define, since its definition presupposes the existence of identifying characteristics, leading to some circularity. In this article, we present a discussion of important terms based on a legal perspective that it is outlined before we present issues related to the usage of terms such as unique “identifiers,” “quasi-identifiers,” and “sensitive attributes.” Based on these terms, we have tried to circumvent a circular definition for the term “identification” by making two decisions: first, deciding which (natural) identifier should stand for the individual; second, deciding how to recognize the individual. In addition, we provide an overview of anonymization techniques/methods for preventing re-identification. The discussion of basic notions related to anonymization shows that there is some work to be done in order to achieve a mutual understanding between legal and technical experts concerning some of these notions. Using a dialectical definition process in order to merge technical and legal perspectives on terms seems important for enhancing mutual understanding.
Collapse
Affiliation(s)
- Murat Sariyar
- 1 Institute of Pathology, Charité-University Medicine Berlin , Berlin, Germany .,2 TMF (Technologie- und Methodenplattform e.V.) , Berlin, Germany
| | - Irene Schlünder
- 2 TMF (Technologie- und Methodenplattform e.V.) , Berlin, Germany
| |
Collapse
|
43
|
Heatherly R, Rasmussen LV, Peissig PL, Pacheco JA, Harris P, Denny JC, Malin BA. A multi-institution evaluation of clinical profile anonymization. J Am Med Inform Assoc 2016; 23:e131-7. [PMID: 26567325 PMCID: PMC4954623 DOI: 10.1093/jamia/ocv154] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 08/17/2015] [Accepted: 09/09/2015] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND AND OBJECTIVE There is an increasing desire to share de-identified electronic health records (EHRs) for secondary uses, but there are concerns that clinical terms can be exploited to compromise patient identities. Anonymization algorithms mitigate such threats while enabling novel discoveries, but their evaluation has been limited to single institutions. Here, we study how an existing clinical profile anonymization fares at multiple medical centers. METHODS We apply a state-of-the-artk-anonymization algorithm, withkset to the standard value 5, to the International Classification of Disease, ninth edition codes for patients in a hypothyroidism association study at three medical centers: Marshfield Clinic, Northwestern University, and Vanderbilt University. We assess utility when anonymizing at three population levels: all patients in 1) the EHR system; 2) the biorepository; and 3) a hypothyroidism study. We evaluate utility using 1) changes to the number included in the dataset, 2) number of codes included, and 3) regions generalization and suppression were required. RESULTS Our findings yield several notable results. First, we show that anonymizing in the context of the entire EHR yields a significantly greater quantity of data by reducing the amount of generalized regions from ∼15% to ∼0.5%. Second, ∼70% of codes that needed generalization only generalized two or three codes in the largest anonymization. CONCLUSIONS Sharing large volumes of clinical data in support of phenome-wide association studies is possible while safeguarding privacy to the underlying individuals.
Collapse
Affiliation(s)
- Raymond Heatherly
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | | | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Biomedical Engineering, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
44
|
Clunie DA, Gebow D. Block selective redaction for minimizing loss during de-identification of burned in text in irreversibly compressed JPEG medical images. J Med Imaging (Bellingham) 2015; 2:016501. [PMID: 26158090 DOI: 10.1117/1.jmi.2.1.016501] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/03/2015] [Indexed: 11/14/2022] Open
Abstract
Deidentification of medical images requires attention to both header information as well as the pixel data itself, in which burned-in text may be present. If the pixel data to be deidentified is stored in a compressed form, traditionally it is decompressed, identifying text is redacted, and if necessary, pixel data are recompressed. Decompression without recompression may result in images of excessive or intractable size. Recompression with an irreversible scheme is undesirable because it may cause additional loss in the diagnostically relevant regions of the images. The irreversible (lossy) JPEG compression scheme works on small blocks of the image independently, hence, redaction can selectively be confined only to those blocks containing identifying text, leaving all other blocks unchanged. An open source implementation of selective redaction and a demonstration of its applicability to multiframe color ultrasound images is described. The process can be applied either to standalone JPEG images or JPEG bit streams encapsulated in other formats, which in the case of medical images, is usually DICOM.
Collapse
Affiliation(s)
- David A Clunie
- PixelMed , 943 Heiden Road, Bangor, Pennsylvania 18013, United States
| | - Dan Gebow
- MDDX Research and Informatics , 580 California Street, Fl 16, San Francisco, California 94104, United States
| |
Collapse
|
45
|
Haselgrove C, Poline JB, Kennedy DN. Comment on "A simple tool for neuroimaging data sharing". Front Neuroinform 2014; 8:82. [PMID: 25400576 PMCID: PMC4214193 DOI: 10.3389/fninf.2014.00082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2014] [Accepted: 09/24/2014] [Indexed: 11/24/2022] Open
Affiliation(s)
- Christian Haselgrove
- Department of Psychiatry, University of Massachusetts Medical School Worcester, MA, USA
| | - Jean-Baptiste Poline
- Henry H. Wheeler, Jr. Brain Imaging Center, University of California at Berkeley Berkeley, CA, USA
| | - David N Kennedy
- Department of Psychiatry, University of Massachusetts Medical School Worcester, MA, USA
| |
Collapse
|
46
|
Hochfellner D, Müller D, Schmucker A. Privacy in confidential administrative micro data: implementing statistical disclosure control in a secure computing environment. J Empir Res Hum Res Ethics 2014; 9:8-15. [PMID: 25747686 DOI: 10.1177/1556264614552799] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The demand for comprehensive and innovative data is constantly growing in social science. In particular, micro data from various social security agencies become more and more attractive. In contrast to survey data, administrative data offer a census with highly reliable information but are restricted in their usage. To make them accessible for researchers, data or research output either have to be anonymized or released after disclosure review procedures have been used. This article discusses the trade-off between maintaining a high capability of research potential while protecting private information, by exploiting the data disclosure portfolio and the adopted disclosure strategies of the Research Data Center of the German Federal Employment Agency.
Collapse
Affiliation(s)
| | - Dana Müller
- Institute for Employment Research, Nürnberg, Germany
| | | |
Collapse
|
47
|
Abstract
According to many scientists and clinicians, genomics is taking on a key role in the field of medicine. Impressive advances in genome sequencing have opened the way to a variety of revolutionary applications in modern healthcare. In particular, the increasing understanding of the human genome, and of its relation to diseases and response to treatments brings promise of improvements in better preventive and personalized medicine. However, this progress raises important privacy and ethical concerns that need to be addressed. Indeed, each genome is the ultimate identifier of its owner and, due to its nature, it contains highly personal and privacy-sensitive data. In this article, after summarizing recent advances in genomics, we discuss some important privacy issues associated with human genomic information and methods put in place to address them.
Collapse
Affiliation(s)
- Jean Louis Raisaro
- School of Computer and Communication Sciences, Laboratory for Communications and Applications (LCA1), EPFL Lausanne
| | - Erman Ayday
- School of Computer and Communication Sciences, Laboratory for Communications and Applications (LCA1), EPFL Lausanne
| | - Jean-Pierre Hubaux
- School of Computer and Communication Sciences, Laboratory for Communications and Applications (LCA1), EPFL Lausanne
| |
Collapse
|
48
|
Abstract
Regression techniques can be used not only for legitimate data analysis, but also to infer private information about individuals. In this paper, we demonstrate that regression trees, a popular data-analysis and data-mining technique, can be used to effectively reveal individuals' sensitive data. This problem, which we call a "regression attack," has not been addressed in the data privacy literature, and existing privacy-preserving techniques are not appropriate in coping with this problem. We propose a new approach to counter regression attacks. To protect against privacy disclosure, our approach introduces a novel measure, called digression, which assesses the sensitive value disclosure risk in the process of building a regression tree model. Specifically, we develop an algorithm that uses the measure for pruning the tree to limit disclosure of sensitive data. We also propose a dynamic value-concatenation method for anonymizing data, which better preserves data utility than a user-defined generalization scheme commonly used in existing approaches. Our approach can be used for anonymizing both numeric and categorical data. An experimental study is conducted using real-world financial, economic and healthcare data. The results of the experiments demonstrate that the proposed approach is very effective in protecting data privacy while preserving data quality for research and analysis.
Collapse
Affiliation(s)
- Xiao-Bai Li
- Department of Operations and Information Systems, Manning School of Business, University of Massachusetts Lowell, Lowell, MA 01854 U.S.A. { }
| | - Sumit Sarkar
- Naveen Jindal School of Management, University of Texas at Dallas, Richardson, TX 75080 U.S.A. { }
| |
Collapse
|
49
|
Abstract
Transaction data record various information about individuals, including their purchases and diagnoses, and are increasingly published to support large-scale and low-cost studies in domains such as marketing and medicine. However, the dissemination of transaction data may lead to privacy breaches, as it allows an attacker to link an individual's record to their identity. Approaches that anonymize data by eliminating certain values in an individual's record or by replacing them with more general values have been proposed recently, but they often produce data of limited usefulness. This is because these approaches adopt value transformation strategies that do not guarantee data utility in intended applications and objective measures that may lead to excessive data distortion. In this paper, we propose a novel approach for anonymizing data in a way that satisfies data publishers' utility requirements and incurs low information loss. To achieve this, we introduce an accurate information loss measure and an effective anonymization algorithm that explores a large part of the problem space. An extensive experimental study, using click-stream and medical data, demonstrates that our approach permits many times more accurate query answering than the state-of-the-art methods, while it is comparable to them in terms of efficiency.
Collapse
|
50
|
El Emam K, Hu J, Mercer J, Peyton L, Kantarcioglu M, Malin B, Buckeridge D, Samet S, Earle C. A secure protocol for protecting the identity of providers when disclosing data for disease surveillance. J Am Med Inform Assoc 2011; 18:212-7. [PMID: 21486880 PMCID: PMC3078664 DOI: 10.1136/amiajnl-2011-000100] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2011] [Accepted: 02/03/2011] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Providers have been reluctant to disclose patient data for public-health purposes. Even if patient privacy is ensured, the desire to protect provider confidentiality has been an important driver of this reluctance. METHODS Six requirements for a surveillance protocol were defined that satisfy the confidentiality needs of providers and ensure utility to public health. The authors developed a secure multi-party computation protocol using the Paillier cryptosystem to allow the disclosure of stratified case counts and denominators to meet these requirements. The authors evaluated the protocol in a simulated environment on its computation performance and ability to detect disease outbreak clusters. RESULTS Theoretical and empirical assessments demonstrate that all requirements are met by the protocol. A system implementing the protocol scales linearly in terms of computation time as the number of providers is increased. The absolute time to perform the computations was 12.5 s for data from 3000 practices. This is acceptable performance, given that the reporting would normally be done at 24 h intervals. The accuracy of detection disease outbreak cluster was unchanged compared with a non-secure distributed surveillance protocol, with an F-score higher than 0.92 for outbreaks involving 500 or more cases. CONCLUSION The protocol and associated software provide a practical method for providers to disclose patient data for sentinel, syndromic or other indicator-based surveillance while protecting patient privacy and the identity of individual providers.
Collapse
Affiliation(s)
- Khaled El Emam
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada.
| | | | | | | | | | | | | | | | | |
Collapse
|