1
|
Wirth FN, Kussel T, Müller A, Hamacher K, Prasser F. EasySMPC: a simple but powerful no-code tool for practical secure multiparty computation. BMC Bioinformatics 2022; 23:531. [PMID: 36494612 PMCID: PMC9733077 DOI: 10.1186/s12859-022-05044-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Modern biomedical research is data-driven and relies heavily on the re-use and sharing of data. Biomedical data, however, is subject to strict data protection requirements. Due to the complexity of the data required and the scale of data use, obtaining informed consent is often infeasible. Other methods, such as anonymization or federation, in turn have their own limitations. Secure multi-party computation (SMPC) is a cryptographic technology for distributed calculations, which brings formally provable security and privacy guarantees and can be used to implement a wide-range of analytical approaches. As a relatively new technology, SMPC is still rarely used in real-world biomedical data sharing activities due to several barriers, including its technical complexity and lack of usability. RESULTS To overcome these barriers, we have developed the tool EasySMPC, which is implemented in Java as a cross-platform, stand-alone desktop application provided as open-source software. The tool makes use of the SMPC method Arithmetic Secret Sharing, which allows to securely sum up pre-defined sets of variables among different parties in two rounds of communication (input sharing and output reconstruction) and integrates this method into a graphical user interface. No additional software services need to be set up or configured, as EasySMPC uses the most widespread digital communication channel available: e-mails. No cryptographic keys need to be exchanged between the parties and e-mails are exchanged automatically by the software. To demonstrate the practicability of our solution, we evaluated its performance in a wide range of data sharing scenarios. The results of our evaluation show that our approach is scalable (summing up 10,000 variables between 20 parties takes less than 300 s) and that the number of participants is the essential factor. CONCLUSIONS We have developed an easy-to-use "no-code solution" for performing secure joint calculations on biomedical data using SMPC protocols, which is suitable for use by scientists without IT expertise and which has no special infrastructure requirements. We believe that innovative approaches to data sharing with SMPC are needed to foster the translation of complex protocols into practice.
Collapse
Affiliation(s)
- Felix Nikolaus Wirth
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Tobias Kussel
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Armin Müller
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Kay Hamacher
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Fabian Prasser
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
2
|
Richard L, Hwang SW, Forchuk C, Nisenbaum R, Clemens K, Wiens K, Booth R, Azimaee M, Shariff SZ. Validation study of health administrative data algorithms to identify individuals experiencing homelessness and estimate population prevalence of homelessness in Ontario, Canada. BMJ Open 2019; 9:e030221. [PMID: 31594882 PMCID: PMC6797366 DOI: 10.1136/bmjopen-2019-030221] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
OBJECTIVES To validate case ascertainment algorithms for identifying individuals experiencing homelessness in health administrative databases between 2007 and 2014; and to estimate homelessness prevalence trends in Ontario, Canada, between 2007 and 2016. DESIGN A population-based retrospective validation study. SETTING Ontario, Canada, from 2007 to 2014 (validation) and 2007 to 2016 (estimation). PARTICIPANTS Our reference standard was the known housing status of a longitudinal cohort of housed (n=137 200) and homeless or vulnerably housed (n=686) individuals. Two reference standard definitions of homelessness were adopted: the housing episode and the annual housing experience (any homelessness within a calendar year). MAIN OUTCOME MEASURES Sensitivity, specificity, positive and negative predictive values and positive likelihood ratios of 30 case ascertainment algorithms for detecting homelessness using up to eight health service databases. RESULTS Sensitivity estimates ranged from 10.8% to 28.9% (housing episode definition) and 18.5% to 35.6% (annual housing experience definition). Specificities exceeded 99% and positive likelihood ratios were high using both definitions. The most optimal algorithm estimates that 59 974 (95% CI 55 231 to 65 208) Ontarians (0.53% of the adult population) experienced homelessness in 2016, a 67.3% increase from 2007. CONCLUSIONS In Ontario, case ascertainment algorithms for identifying homelessness had low sensitivity but very high specificity and positive likelihood ratio. The use of health administrative databases may offer opportunities to track individuals experiencing homelessness over time and inform efforts to improve housing and health status in this vulnerable population.
Collapse
Affiliation(s)
| | | | - Cheryl Forchuk
- Arthur Labatt Family School of Nursing, Western University, London, Ontario, Canada
| | - Rosane Nisenbaum
- St Michael's Hospital, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Kristin Clemens
- Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Kathryn Wiens
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Richard Booth
- Arthur Labatt Family School of Nursing, Western University, London, Ontario, Canada
| | | | | |
Collapse
|
3
|
Lefaivre S, Behan B, Vaccarino A, Evans K, Dharsee M, Gee T, Dafnas C, Mikkelsen T, Theriault E. Big Data Needs Big Governance: Best Practices From Brain-CODE, the Ontario-Brain Institute's Neuroinformatics Platform. Front Genet 2019; 10:191. [PMID: 30984233 PMCID: PMC6450217 DOI: 10.3389/fgene.2019.00191] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 02/22/2019] [Indexed: 11/13/2022] Open
Abstract
The Ontario Brain Institute (OBI) has begun to catalyze scientific discovery in the field of neuroscience through its large-scale informatics platform, known as Brain-CODE. The platform supports the capture, storage, federation, sharing, and analysis of different data types across several brain disorders. Underlying the platform is a robust and scalable data governance structure which allows for the flexibility to advance scientific understanding, while protecting the privacy of research participants. Recognizing the value of an open science approach to enabling discovery, the governance structure was designed not only to support collaborative research programs, but also to support open science by making all data open and accessible in the future. OBI’s rigorous approach to data sharing maintains the accessibility of research data for big discoveries without compromising privacy and security. Taking a Privacy by Design approach to both data sharing and development of the platform has allowed OBI to establish some best practices related to large-scale data sharing within Canada. The aim of this report is to highlight these best practices and develop a key open resource which may be referenced during the development of similar open science initiatives.
Collapse
Affiliation(s)
| | | | - Anthony Vaccarino
- Ontario Brain Institute, Toronto, ON, Canada.,Indoc Research, Toronto, ON, Canada
| | | | | | - Tom Gee
- Indoc Research, Toronto, ON, Canada
| | - Costa Dafnas
- Centre for Advanced Computing, Kingston, ON, Canada
| | | | | |
Collapse
|
4
|
Dankar FK, Gergely M, Dankar SK. Informed Consent in Biomedical Research. Comput Struct Biotechnol J 2019; 17:463-474. [PMID: 31007872 PMCID: PMC6458444 DOI: 10.1016/j.csbj.2019.03.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 03/19/2019] [Accepted: 03/21/2019] [Indexed: 12/27/2022] Open
Abstract
Informed consent is the result of tumultuous events in both the clinical and research arenas over the last 100 years. Throughout this time, the notion of informed consent has shifted tremendously, both due to advances in medicine, as well as the type of data being gathered. As such, informed consent has misaligned with the goals of medical research. It is becoming more and more vital to address this chasm, and begin building new frameworks to link this disconnect. Thus, we address three goals in this paper. First, we discuss the history of informed consent and unify the varying definitions of the term. Second, we evaluate the current research on the topic, classify them into themes, and attend to the problems therein. Lastly, we employ these themes of informed consent research mentioned previously to provide guidance and insight for future research in the arena.
Collapse
Affiliation(s)
- Fida K. Dankar
- College of IT, UAEU, Al Ain, P.O.Box 15551, United Arab Emirates
| | - Marton Gergely
- College of IT, UAEU, Al Ain, P.O.Box 15551, United Arab Emirates
| | | |
Collapse
|
5
|
Perez S, Zimet GD, Tatar O, Stupiansky NW, Fisher WA, Rosberger Z. Human Papillomavirus Vaccines: Successes and Future Challenges. Drugs 2018; 78:1385-1396. [DOI: 10.1007/s40265-018-0975-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
6
|
Shapiro GK, Guichon J, Kelaher M. Canadian school-based HPV vaccine programs and policy considerations. Vaccine 2017; 35:5700-5707. [DOI: 10.1016/j.vaccine.2017.07.079] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 07/20/2017] [Accepted: 07/24/2017] [Indexed: 12/28/2022]
|
7
|
Jacquez GM, Essex A, Curtis A, Kohler B, Sherman R, Emam KE, Shi C, Kaufmann A, Beale L, Cusick T, Goldberg D, Goovaerts P. Geospatial cryptography: enabling researchers to access private, spatially referenced, human subjects data for cancer control and prevention. JOURNAL OF GEOGRAPHICAL SYSTEMS 2017; 19:197-220. [PMID: 29085255 PMCID: PMC5659297 DOI: 10.1007/s10109-017-0252-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Accepted: 04/28/2017] [Indexed: 06/07/2023]
Abstract
As the volume, accuracy and precision of digital geographic information have increased, concerns regarding individual privacy and confidentiality have come to the forefront. Not only do these challenge a basic tenet underlying the advancement of science by posing substantial obstacles to the sharing of data to validate research results, but they are obstacles to conducting certain research projects in the first place. Geospatial cryptography involves the specification, design, implementation and application of cryptographic techniques to address privacy, confidentiality and security concerns for geographically referenced data. This article defines geospatial cryptography and demonstrates its application in cancer control and surveillance. Four use cases are considered: (1) national-level de-duplication among state or province-based cancer registries; (2) sharing of confidential data across cancer registries to support case aggregation across administrative geographies; (3) secure data linkage; and (4) cancer cluster investigation and surveillance. A secure multi-party system for geospatial cryptography is developed. Solutions under geospatial cryptography are presented and computation time is calculated. As services provided by cancer registries to the research community, de-duplication, case aggregation across administrative geographies and secure data linkage are often time-consuming and in some instances precluded by confidentiality and security concerns. Geospatial cryptography provides secure solutions that hold significant promise for addressing these concerns and for accelerating the pace of research with human subjects data residing in our nation's cancer registries. Pursuit of the research directions posed herein conceivably would lead to a geospatially encrypted geographic information system (GEGIS) designed specifically to promote the sharing and spatial analysis of confidential data. Geospatial cryptography holds substantial promise for accelerating the pace of research with spatially referenced human subjects data.
Collapse
Affiliation(s)
- Geoffrey M Jacquez
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA
- BioMedware, Ann Arbor, MI, USA
| | - Aleksander Essex
- Department of Electrical and Computer Engineering, Western University, London, ON, Canada
| | - Andrew Curtis
- Department of Geography, Kent State University, Kent, OH, USA
| | - Betsy Kohler
- North American Association of Central Cancer Registries, Springfield, IL, USA
| | - Recinda Sherman
- North American Association of Central Cancer Registries, Springfield, IL, USA
| | - Khaled El Emam
- Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Chen Shi
- Department of Geography, State University of New York at Buffalo, Buffalo, NY, USA
| | | | | | - Thomas Cusick
- Department of Mathematics, University at Buffalo, Buffalo, NY, USA
| | - Daniel Goldberg
- Department of Geography, Texas A&M University, College Station, TX, USA
- Department of Computer Science & Engineering, Texas A&M University, College Station, TX, USA
| | | |
Collapse
|
8
|
Arbuckle L, Moher E, Bartlett SJ, Ahmed S, El Emam K. Montreal Accord on Patient-Reported Outcomes (PROs) use series - Paper 9: anonymization and ethics considerations for capturing and sharing patient reported outcomes. J Clin Epidemiol 2017; 89:168-172. [PMID: 28433677 DOI: 10.1016/j.jclinepi.2017.04.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 03/28/2017] [Accepted: 04/09/2017] [Indexed: 11/16/2022]
Abstract
BACKGROUND Patient-reported outcomes (PROs) are collected with consent for care; however, using the data for any other purpose requires consent for that additional purpose, or the anonymization of the data. Collecting explicit consent to use this data for secondary purposes, before the patient completes a PRO, can also bias the responses. OBJECTIVE We consider the ethical and security issues related to the collection of data at the point of care or in the population and the aggregation and integration of PRO data with administrative databases to facilitate decision making and comparative effectiveness research. DISCUSSION In this article, we describe risk-based anonymization, taking the context of the data release into account, so that we may consider the degree by which the release is considered anonymized. We also consider the ethical use of anonymized data, the anonymization of free-form text, and the secure linking data sets without sharing any personal information. Many good standards and best practices exist for the sharing of health data and could be used as a baseline in the development of a national PRO initiative.
Collapse
Affiliation(s)
- Luk Arbuckle
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario, Canada K1H 8L1
| | - Ester Moher
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario, Canada K1H 8L1
| | - Susan J Bartlett
- Department of Medicine, McGill University/McGill University Health Center, 687 Pine Ave W R4.29, Montreal, Quebec, Canada H3A 1A1; Division of Rheumatology, Johns Hopkins School of Medicine, 5200 Eastern Avenue #4100, Baltimore, MD 21224, USA
| | - Sara Ahmed
- School of Physical and Occupational Therapy, Faculty of Medicine, McGill University, 3654 Prom Sir-William-Osler, Montreal, Quebec, Canada H3G 1Y5; Department of Pediatrics, Faculty of Medicine, University of Ottawa, 401 Smyth Road, Ottawa, Canada K1H 8L1
| | - Khaled El Emam
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, Ontario, Canada K1H 8L1; Department of Pediatrics, Faculty of Medicine, University of Ottawa, 401 Smyth Road, Ottawa, Canada K1H 8L1.
| |
Collapse
|
9
|
Yigzaw KY, Michalas A, Bellika JG. Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation. BMC Med Inform Decis Mak 2017; 17:1. [PMID: 28049465 PMCID: PMC5209873 DOI: 10.1186/s12911-016-0389-x] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 11/10/2016] [Indexed: 11/17/2022] Open
Abstract
Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians. Electronic supplementary material The online version of this article (doi:10.1186/s12911-016-0389-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kassaye Yitbarek Yigzaw
- Department of Computer Science, UiT The Arctic University of Norway, 9037, Tromsø, Norway. .,Norwegian Centre for E-health Research, University Hospital of North Norway, 9019, Tromsø, Norway.
| | - Antonis Michalas
- Department of Computer Science, University of Westminster, 115 New Cavendish Street, London, W1W 6UW, UK
| | - Johan Gustav Bellika
- Norwegian Centre for E-health Research, University Hospital of North Norway, 9019, Tromsø, Norway.,Department of Clinical Medicine, UiT The Arctic University of Norway, 9037, Tromsø, Norway
| |
Collapse
|
10
|
Chida K, Morohashi G, Fuji H, Magata F, Fujimura A, Hamada K, Ikarashi D, Yamamoto R. Implementation and evaluation of an efficient secure computation system using 'R' for healthcare statistics. J Am Med Inform Assoc 2014; 21:e326-31. [PMID: 24763677 DOI: 10.1136/amiajnl-2014-002631] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND AND OBJECTIVE While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures. MATERIALS AND METHODS Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software 'R' by effectively combining secret-sharing-based secure computation with original computation. RESULTS Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50,000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes. DISCUSSION If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using 'R' that works interactively while secure computation protocols generally require a significant amount of processing time. CONCLUSIONS We propose a secure statistical analysis system using 'R' for medical data that effectively integrates secret-sharing-based secure computation and original computation.
Collapse
Affiliation(s)
- Koji Chida
- Secure Platform Laboratories, NTT Corporation, Tokyo, Japan
| | | | - Hitoshi Fuji
- Secure Platform Laboratories, NTT Corporation, Tokyo, Japan
| | | | - Akiko Fujimura
- Secure Platform Laboratories, NTT Corporation, Tokyo, Japan
| | - Koki Hamada
- Secure Platform Laboratories, NTT Corporation, Tokyo, Japan
| | - Dai Ikarashi
- Secure Platform Laboratories, NTT Corporation, Tokyo, Japan
| | - Ryuichi Yamamoto
- Department of Health Management and Policy, Graduate School of Medicine, University of Tokyo, Tokyo, Japan
| |
Collapse
|
11
|
Doiron D, Raina P, Fortier I. Linking Canadian population health data: maximizing the potential of cohort and administrative data. Canadian Journal of Public Health 2013; 104:e258-61. [PMID: 23823892 DOI: 10.17269/cjph.104.3775] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2012] [Revised: 03/07/2013] [Accepted: 02/15/2013] [Indexed: 11/17/2022]
Abstract
Linkage of data collected by large Canadian cohort studies with provincially managed administrative health databases can offer very interesting avenues for multidisciplinary and cost-effective health research in Canada. Successfully co-analyzing cohort data and administrative health data (AHD) can lead to research results capable of improving the health and well-being of Canadians and enhancing the delivery of health care services. However, such an endeavour will require strong coordination and long-term commitment between all stakeholders involved. The challenges and opportunities of a pan-Canadian cohort-to-AHD data linkage program have been considered by cohort study investigators and data custodians from each Canadian province. Stakeholders acknowledge the important public health benefits of establishing such a program and have established an action plan to move forward.
Collapse
Affiliation(s)
- Dany Doiron
- Public Population Project in Genomics and Society, Montreal, QC, Canada.
| | | | | | | |
Collapse
|