1
|
Kragh Jørgensen RR, Jensen JF, El-Galaly T, Bøgsted M, Brøndum RF, Simonsen MR, Jakobsen LH. Development of time to event prediction models using federated learning. BMC Med Res Methodol 2025; 25:143. [PMID: 40419965 PMCID: PMC12105200 DOI: 10.1186/s12874-025-02598-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2025] [Accepted: 05/15/2025] [Indexed: 05/28/2025] Open
Abstract
BACKGROUND In a wide range of diseases, it is necessary to utilize multiple data sources to obtain enough data for model training. However, performing centralized pooling of multiple data sources, while protecting each patients' sensitive data, can require a cumbersome process involving many institutional bodies. Alternatively, federated learning (FL) can be utilized to train models based on data located at multiple sites. METHOD We propose two methods for training time-to-event prediction models based on distributed data, relying on FL algorithms, for time-to-event prediction models. Both approach incorporates steps to allow prediction of individual-level survival curves, without exposing individual-level event times. For Cox proportional hazards models, the latter is accomplished by using a kernel smoother for the baseline hazard function. The other proposed methodology is based on general parametric likelihood theory for right-censored data. We compared these two methods in four simulation and with one real-world dataset predicting the survival probability in patients with Hodgkin lymphoma (HL). RESULTS The simulations demonstrated that the FL models performed similarly to the non-distributed case in all four experiments, with only slight deviations in predicted survival probabilities compared to the true model. Our findings were similar in the real-world advanced-stage HL example where the FL models were compared to their non-distributed versions, revealing only small deviations in performance. CONCLUSION The proposed procedures enable training of time-to-event models using data distributed across sites, without direct sharing of individual-level data and event times, while retaining a predictive performance on par with undistributed approaches.
Collapse
Affiliation(s)
- Rasmus Rask Kragh Jørgensen
- Department of Hematology, Clinical Cancer Research Center, Aalborg University Hospital, Aalborg, Denmark.
- Center for Clinical Data Science, Aalborg University and Aalborg University Hospital, Aalborg, Denmark.
- Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.
| | - Jonas Faartoft Jensen
- Department of Hematology, Clinical Cancer Research Center, Aalborg University Hospital, Aalborg, Denmark
| | - Tarec El-Galaly
- Department of Hematology, Clinical Cancer Research Center, Aalborg University Hospital, Aalborg, Denmark
- Department of Clinical Medicine, Aalborg University, Aalborg, Denmark
- Department of Medicine Solna, Clinical Epidemiology Division, Karolinska Institutet, Stockholm, Sweden
- Department of Hematology, Odense University Hospital, Odense, Denmark
| | - Martin Bøgsted
- Center for Clinical Data Science, Aalborg University and Aalborg University Hospital, Aalborg, Denmark
- Clinical Cancer Research Centre, Aalborg University Hospital, Aalborg, Denmark
| | - Rasmus Froberg Brøndum
- Center for Clinical Data Science, Aalborg University and Aalborg University Hospital, Aalborg, Denmark
- Clinical Cancer Research Centre, Aalborg University Hospital, Aalborg, Denmark
| | - Mikkel Runason Simonsen
- Department of Hematology, Clinical Cancer Research Center, Aalborg University Hospital, Aalborg, Denmark
- Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
| | - Lasse Hjort Jakobsen
- Department of Hematology, Clinical Cancer Research Center, Aalborg University Hospital, Aalborg, Denmark
| |
Collapse
|
2
|
Forero DA, Curioso WH, Wang W. Ten simple rules for successfully carrying out funded research projects. PLoS Comput Biol 2024; 20:e1012431. [PMID: 39298382 PMCID: PMC11412653 DOI: 10.1371/journal.pcbi.1012431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2024] Open
Affiliation(s)
- Diego A. Forero
- School of Heath and Sport Sciences, Fundación Universitaria del Área Andina, Bogotá, Colombia
| | - Walter H. Curioso
- Vicerrectorado de Investigación, Universidad Continental, Lima, Peru
| | - Wei Wang
- Clinical Research Centre, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
- School of Public Health, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong, China
- Beijing Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing, China
- Centre for Precision Health, Edith Cowan University, Perth, Australia
| |
Collapse
|
3
|
Hertzog L, Charlson F, Tschakert P, Morgan GG, Norman R, Pereira G, Hanigan IC. Suicide deaths associated with climate change-induced heat anomalies in Australia: a time series regression analysis. BMJ MENTAL HEALTH 2024; 27:1-8. [PMID: 39122479 PMCID: PMC11409306 DOI: 10.1136/bmjment-2024-301131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 06/13/2024] [Indexed: 08/12/2024]
Abstract
BACKGROUND Although environmental determinants play an important role in suicide mortality, the quantitative influence of climate change-induced heat anomalies on suicide deaths remains relatively underexamined. OBJECTIVE The objective is to quantify the impact of climate change-induced heat anomalies on suicide deaths in Australia from 2000 to 2019. METHODS A time series regression analysis using a generalised additive model was employed to explore the potentially non-linear relationship between temperature anomalies and suicide, incorporating structural variables such as sex, age, season and geographic region. Suicide deaths data were obtained from the Australian National Mortality Database, and gridded climate data of gridded surface temperatures were sourced from the Australian Gridded Climate Dataset. FINDINGS Heat anomalies in the study period were between 0.02°C and 2.2°C hotter than the historical period due to climate change. Our analysis revealed that approximately 0.5% (264 suicides, 95% CI 257 to 271) of the total 50 733 suicides within the study period were attributable to climate change-induced heat anomalies. Death counts associated with heat anomalies were statistically significant (p value 0.03) among men aged 55+ years old. Seasonality was a significant factor, with increased deaths during spring and summer. The relationship between high heat anomalies and suicide deaths varied across different demographic segments. CONCLUSIONS AND IMPLICATIONS This study highlights the measurable impact of climate change-induced heat anomalies on suicide deaths in Australia, emphasising the need for increased climate change mitigation and adaptation strategies in public health planning and suicide prevention efforts focusing on older adult men. The findings underscore the importance of considering environmental factors in addition to individual-level factors in understanding and reducing suicide mortality.
Collapse
Affiliation(s)
- Lucas Hertzog
- School of Population Health, Curtin University, Perth, Western Australia, Australia
- WHO Collaborating Centre for Climate Change and Health Impact Assessment, Perth, Western Australia, Australia
- Healthy Environments and Lives (HEAL) National Research Network, Perth, Western Australia, Australia
| | - Fiona Charlson
- Queensland Centre of Mental Health Research and School of Public Health, The University of Queensland, Brisbane, Queensland, Australia
| | - Petra Tschakert
- School of Media, Creative Arts and Social Inquiry, Curtin University, Perth, Western Australia, Australia
| | - Geoffrey G Morgan
- Healthy Environments and Lives (HEAL) National Research Network, Perth, Western Australia, Australia
- School of Public Health, Faculty of Medicine and Health, University of Sydney, Camperdown, New South Wales, Australia
- University Centre for Rural Health, Faculty of Medicine and Health, University of Sydney, Lismore, New South Wales, Australia
- Centre for Safe Air, NHMRC CRE, Sydney, New South Wales, Australia
| | - Richard Norman
- Healthy Environments and Lives (HEAL) National Research Network, Perth, Western Australia, Australia
- School of Population Health, Faculty of Health Sciences, Curtin University, Perth, Western Australia, Australia
| | - Gavin Pereira
- School of Population Health, Curtin University, Perth, Western Australia, Australia
- WHO Collaborating Centre for Climate Change and Health Impact Assessment, Perth, Western Australia, Australia
- enAble Institute, Faculty of Health Sciences, Curtin University, Perth, Western Australia, Australia
| | - Ivan C Hanigan
- School of Population Health, Curtin University, Perth, Western Australia, Australia
- WHO Collaborating Centre for Climate Change and Health Impact Assessment, Perth, Western Australia, Australia
- Healthy Environments and Lives (HEAL) National Research Network, Perth, Western Australia, Australia
- Centre for Safe Air, NHMRC CRE, Sydney, New South Wales, Australia
| |
Collapse
|
4
|
Juwara L, Yang YA, Velly AM, Saha-Chaudhuri P. Privacy-preserving analysis of time-to-event data under nested case-control sampling. Stat Methods Med Res 2024; 33:96-111. [PMID: 38093410 PMCID: PMC10863373 DOI: 10.1177/09622802231215804] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Analyses of distributed data networks of rare diseases are constrained by legitimate privacy and ethical concerns. Analytical centers (e.g. research institutions) are thus confronted with the challenging task of obtaining data from recruiting sites that are often unable or unwilling to share personal records of participants. For time-to-event data, recently popularized disclosure techniques with privacy guarantees (e.g., Differentially Private Generative Adversarial Networks) are generally computationally expensive or inaccessible to applied researchers. To perform the widely used Cox proportional hazards regression, we propose an easy-to-implement privacy-preserving data analysis technique by pooling (i.e. aggregating) individual records of covariates at recruiting sites under the nested case-control sampling framework before sharing the pooled nested case-control subcohort. We show that the pooled hazard ratio estimators, under the pooled nested case-control subsamples from the contributing sites, are maximum likelihood estimators and provide consistent estimates of the individual level full cohort HRs. Furthermore, a sampling technique for generating pseudo-event times for individual subjects that constitute the pooled nested case-control subsamples is proposed. Our method is demonstrated using extensive simulations and analysis of the National Lung Screening Trial data. The utility of our proposed approach is compared to the gold standard (full cohort) and synthetic data generated using classification and regression trees. The proposed pooling technique performs to near-optimal levels comparable to full cohort analysis or synthetic data; the efficiency improves in rare event settings when more controls are matched on during nested case-control subcohort sampling.
Collapse
Affiliation(s)
- Lamin Juwara
- Quantitative Life Sciences, McGill University, Montreal, Canada
- Lady Davis Institute for Medical Research, Montreal, Quebec, Canada
| | - Yi Archer Yang
- Quantitative Life Sciences, McGill University, Montreal, Canada
- Department of Mathematics and Statistis, McGill University, Montreal, Quebec, Canada
| | - Ana M Velly
- Lady Davis Institute for Medical Research, Montreal, Quebec, Canada
- Department of Dentistry, McGill University, Montreal, Quebec, Canada
| | | |
Collapse
|
5
|
Thompson KA, White JP, Bardone-Cone AM. Associations between pressure to breastfeed and depressive, anxiety, obsessive-compulsive, and eating disorder symptoms among postpartum women. Psychiatry Res 2023; 328:115432. [PMID: 37669578 DOI: 10.1016/j.psychres.2023.115432] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 08/04/2023] [Accepted: 08/24/2023] [Indexed: 09/07/2023]
Abstract
BACKGROUND Data from qualitative interviews indicate postpartum individuals feel pressure from their healthcare providers, the media, and their partners to breastfeed their infant(s). However, the link between pressure to breastfeed and maternal mental health symptoms has not been evaluated quantitatively. The goal of the current study was to evaluate the associations between perceived pressure to breastfeed from various sources and depressive, anxiety, obsessive-compulsive, and eating disorder symptoms among postpartum individuals. METHODS Participants were 306 women, ages 18-39, who gave birth in the past 12 months in the United States (primarily in North Carolina). They completed an online survey about their health history (including mental health symptoms) and breastfeeding experiences. RESULTS Results found postpartum women perceived more pressure to breastfeed from healthcare providers and from the media compared to pressure to breastfeed from their partners. Pressure from healthcare providers was associated with depressive, obsessive-compulsive, and eating disorder symptoms, but not with anxiety symptoms. Pressure from the media was associated with only depressive and eating disorder symptoms. Pressure from partners was not significantly associated with mental health symptoms. Above and beyond the other sources of pressure, pressure from healthcare providers explained a unique proportion of variance of obsessive-compulsive and eating disorder symptoms. LIMITATIONS Limitations include the cross-sectional design (which limits causal interpretations), and the homogenous sample (87% identified as White). CONCLUSIONS Messaging and information about breastfeeding (particularly from healthcare providers) should be reviewed to determine if there is language which could be perceived as "pressure." It is important to screen for a variety of mental health symptoms, including eating disorders, in perinatal populations when discussing breastfeeding.
Collapse
Affiliation(s)
- Katherine A Thompson
- Military Cardiovascular Outcomes Research (MiCOR) Program, Department of Medicine, Uniformed Services University of the Health Sciences, Bethesda, MD, United States; Department of Psychology and Neurosciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.
| | - Jennifer P White
- Department of Psychiatry, UNC School of Medicine, Chapel Hill, NC, United States
| | - Anna M Bardone-Cone
- Department of Psychology and Neurosciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| |
Collapse
|
6
|
Kapumba BM, Nyirenda D, Desmond N, Seeley J. 'Guidance should have been there 15 years ago' research stakeholders' perspectives on ancillary care in the global south: a case study of Malawi. BMC Med Ethics 2023; 24:8. [PMID: 36765406 PMCID: PMC9912595 DOI: 10.1186/s12910-023-00889-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 01/31/2023] [Indexed: 02/12/2023] Open
Abstract
BACKGROUND Medical researchers in resource-constrained settings must make difficult moral decisions about the provision of ancillary care to participants where additional healthcare needs fall outside the scope of the research and are not provided for by the local healthcare system. We examined research stakeholder perceptions and experiences of ancillary care in biomedical research projects in Malawi. METHODS We conducted 45 qualitative in-depth interviews with key research stakeholders: researchers, health officials, research ethics committee members, research participants and grants officers from international research funding organisations. Thematic analysis was used to analyse and interpret the findings. FINDINGS All stakeholders perceived the provision of ancillary care to have potential health benefits to study participants in biomedical research. However, they also had concerns, particularly related to the absence of guidance to support it. Some suggested that consideration for ancillary care provision could be possible on a case-by-case basis but that most of the support from research projects should be directed towards strengthening the public health system, emphasising public good above individual or personal benefits. Some researchers and ethics committee members raised concerns about potential tensions in terms of funding, for example balancing study demands with addressing participants' additional health needs. CONCLUSION Our findings highlight the complexities and gaps in the guidance around the provision of ancillary care in Malawi and other resource-constrained settings more generally. To promote the provision of ancillary care, we recommend that national and international guidelines for research ethics include specific recommendations for resource-constrained settings and specific types of research.
Collapse
Affiliation(s)
- Blessings M Kapumba
- London School of Hygiene and Tropical Medicine, London, UK.
- Malawi-Liverpool Wellcome Trust Clinical Research Programme, P.O. Box 30096, Chichiri, Blantyre, Malawi.
| | - Deborah Nyirenda
- Malawi-Liverpool Wellcome Trust Clinical Research Programme, P.O. Box 30096, Chichiri, Blantyre, Malawi
| | | | - Janet Seeley
- London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
7
|
Tak YW, You SC, Han JH, Kim SS, Kim GT, Lee Y. Perceived Risk of Re-Identification in OMOP-CDM Database: A Cross-Sectional Survey. J Korean Med Sci 2022; 37:e205. [PMID: 35790207 PMCID: PMC9259248 DOI: 10.3346/jkms.2022.37.e205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 05/30/2022] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND The advancement of information technology has immensely increased the quality and volume of health data. This has led to an increase in observational study, as well as to the threat of privacy invasion. Recently, a distributed research network based on the common data model (CDM) has emerged, enabling collaborative international medical research without sharing patient-level data. Although the CDM database for each institution is built inside a firewall, the risk of re-identification requires management. Hence, this study aims to elucidate the perceptions CDM users have towards CDM and risk management for re-identification. METHODS The survey, targeted to answer specific in-depth questions on CDM, was conducted from October to November 2020. We targeted well-experienced researchers who actively use CDM. Basic statistics (total number and percent) were computed for all covariates. RESULTS There were 33 valid respondents. Of these, 43.8% suggested additional anonymization was unnecessary beyond, "minimum cell count" policy, which obscures a cell with a value lower than certain number (usually 5) in shared results to minimize the liability of re-identification due to rare conditions. During extract-transform-load processes, 81.8% of respondents assumed structured data is under control from the risk of re-identification. However, respondents noted that date of birth and death were highly re-identifiable information. The majority of respondents (n = 22, 66.7%) conceded the possibility of identifier-contained unstructured data in the NOTE table. CONCLUSION Overall, CDM users generally attributed high reliability for privacy protection to the intrinsic nature of CDM. There was little demand for additional de-identification methods. However, unstructured data in the CDM were suspected to have risks. The necessity for a coordinating consortium to define and manage the re-identification risk of CDM was urged.
Collapse
Affiliation(s)
- Yae Won Tak
- Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Seng Chan You
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
| | - Jeong Hyun Han
- Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Soon-Seok Kim
- Department of Big Data Science, Halla University, Wonju, Korea
| | | | - Yura Lee
- Department of Information Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
| |
Collapse
|
8
|
Kim YG, Kang G. Secure Collaborative Platform for Healthcare Research in an Open Environment: A Perspective on Accountability in Access Control (Preprint). J Med Internet Res 2022; 24:e37978. [DOI: 10.2196/37978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 08/02/2022] [Accepted: 08/30/2022] [Indexed: 11/13/2022] Open
|
9
|
Yan MK, Adler NR, Heriot N, Shang C, Zalcberg JR, Evans S, Wolfe R, Mar VJ. Opportunities and barriers for the use of Australian cancer registries as platforms for randomized clinical trials. Asia Pac J Clin Oncol 2021; 18:344-352. [PMID: 34811922 DOI: 10.1111/ajco.13670] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 08/18/2021] [Indexed: 11/30/2022]
Abstract
It is well recognized that randomized controlled trials (RCTs) are a powerful tool to investigate causal relationships, and are considered the gold standard level of research evidence. However, RCTs can be expensive and time-consuming, and when they employ strict eligibility criteria, it results in an unrepresentative population and limited external validity. Recently, the registry-based randomized clinical trial (RRCT) has emerged as an alternative trial design. Utilizing registries to underpin such studies, RRCTs can have advantages including rapid recruitment, and enhanced generalizability. In Australia, legislated mandatory reporting of cancer diagnoses means that jurisdictional cancer registries are a rich source of systematically collected patient details, representing sound platforms for comprehensive data capture that can serve as a key tool for further research. We review the roles of cancer registries in Australia, discuss important considerations relevant to the design of RRCTs, and outline the opportunities provided by cancer registries to strengthen cancer research.
Collapse
Affiliation(s)
- Mabel K Yan
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia.,Victorian Melanoma Service, The Alfred Hospital, Melbourne, Victoria, Australia
| | - Nikki R Adler
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Natalie Heriot
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Catherine Shang
- Victorian Cancer Registry, The Cancer Council Victoria, Melbourne, Victoria, Australia
| | - John R Zalcberg
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Sue Evans
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia.,Victorian Cancer Registry, The Cancer Council Victoria, Melbourne, Victoria, Australia
| | - Rory Wolfe
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia
| | - Victoria J Mar
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia.,Victorian Melanoma Service, The Alfred Hospital, Melbourne, Victoria, Australia
| |
Collapse
|
10
|
Ficek J, Wang W, Chen H, Dagne G, Daley E. Differential privacy in health research: A scoping review. J Am Med Inform Assoc 2021; 28:2269-2276. [PMID: 34333623 PMCID: PMC8449619 DOI: 10.1093/jamia/ocab135] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 06/11/2021] [Accepted: 06/16/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Differential privacy is a relatively new method for data privacy that has seen growing use due its strong protections that rely on added noise. This study assesses the extent of its awareness, development, and usage in health research. MATERIALS AND METHODS A scoping review was conducted by searching for ["differential privacy" AND "health"] in major health science databases, with additional articles obtained via expert consultation. Relevant articles were classified according to subject area and focus. RESULTS A total of 54 articles met the inclusion criteria. Nine articles provided descriptive overviews, 31 focused on algorithm development, 9 presented novel data sharing systems, and 8 discussed appraisals of the privacy-utility tradeoff. The most common areas of health research where differential privacy has been discussed are genomics, neuroimaging studies, and health surveillance with personal devices. Algorithms were most commonly developed for the purposes of data release and predictive modeling. Studies on privacy-utility appraisals have considered economic cost-benefit analysis, low-utility situations, personal attitudes toward sharing health data, and mathematical interpretations of privacy risk. DISCUSSION Differential privacy remains at an early stage of development for applications in health research, and accounts of real-world implementations are scant. There are few algorithms for explanatory modeling and statistical inference, particularly with correlated data. Furthermore, diminished accuracy in small datasets is problematic. Some encouraging work has been done on decision making with regard to epsilon. The dissemination of future case studies can inform successful appraisals of privacy and utility. CONCLUSIONS More development, case studies, and evaluations are needed before differential privacy can see widespread use in health research.
Collapse
Affiliation(s)
- Joseph Ficek
- College of Public Health, University of South Florida, Tampa, Florida, USA
| | - Wei Wang
- Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Henian Chen
- College of Public Health, University of South Florida, Tampa, Florida, USA
| | - Getachew Dagne
- College of Public Health, University of South Florida, Tampa, Florida, USA
| | - Ellen Daley
- College of Public Health, University of South Florida, Tampa, Florida, USA
| |
Collapse
|
11
|
Ficek J, Wang W, Chen H, Dagne G, Daley E. A Survey of Differentially Private Regression for Clinical and Epidemiological Research. Int Stat Rev 2020. [DOI: 10.1111/insr.12391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Joseph Ficek
- College of Public Health University of South Florida (USF) Tampa 33612 FL USA
| | - Wei Wang
- Centre for Addiction and Mental Health (CAMH) Toronto M6J IH4 ON Canada
| | - Henian Chen
- College of Public Health University of South Florida (USF) Tampa 33612 FL USA
| | - Getachew Dagne
- College of Public Health University of South Florida (USF) Tampa 33612 FL USA
| | - Ellen Daley
- College of Public Health University of South Florida (USF) Tampa 33612 FL USA
| |
Collapse
|
12
|
Thissen MR, Mason KM. Planning security architecture for health survey data storage and access. Health Syst (Basingstoke) 2019; 9:57-63. [PMID: 32284851 PMCID: PMC7144259 DOI: 10.1080/20476965.2019.1599702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 03/08/2019] [Indexed: 10/27/2022] Open
Abstract
Sensitive data from health research surveys need to be protected from loss, damage or unwanted release, especially when data include personally identifying information, protected health information or other private material. Researchers and practitioners must ensure privacy and confidentiality in the architecture of data systems and in access to the data. Internal and external risks may be deliberate or accidental, involving unintended loss, modification or exposure. To prevent risk while allowing access requires balancing concerns against providing an environment that does not impede work. The authors' purpose in this paper is to draw attention to basic data security needs for health survey data from the perspective of both the health researcher/practitioner and infrastructure/programming staff to ensure that data are securely and adequately protected. We describe risk classifications and how they affect system architecture, drawing on recent experience with systems for storage of and access to electronic health survey data.
Collapse
Affiliation(s)
- M. Rita Thissen
- Research Computing Division, RTI International, Research Triangle Park, NC, USA
| | - Katherine M. Mason
- Research Computing Division, RTI International, Research Triangle Park, NC, USA
| |
Collapse
|
13
|
Nurmi SM, Kangasniemi M, Halkoaho A, Pietilä AM. Privacy of Clinical Research Subjects: An Integrative Literature Review. J Empir Res Hum Res Ethics 2018; 14:33-48. [PMID: 30353779 DOI: 10.1177/1556264618805643] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
With changes in clinical research practice, the importance of a study-subject's privacy and the confidentiality of their personal data is growing. However, the body of research is fragmented, and a synthesis of work in this area is lacking. Accordingly, an integrative review was performed, guided by Whittemore and Knafl's work. Data from PubMed, Scopus, and CINAHL searches from January 2012 to February 2017 were analyzed via the constant comparison method. From 16 empirical and theoretical studies, six topical aspects were identified: the evolving nature of health data in clinical research, sharing of health data, the challenges of anonymizing data, collaboration among stakeholders, the complexity of regulation, and ethics-related tension between social benefits and privacy. Study subjects' privacy is an increasingly important ethics principle for clinical research, and privacy protection is rendered even more challenging by changing research practice.
Collapse
Affiliation(s)
| | | | - Arja Halkoaho
- 2 Kuopio University Hospital, Finland.,3 Tampere University of Applied Sciences, Finland
| | - Anna-Maija Pietilä
- 1 University of Eastern Finland, Kuopio, Finland.,4 Social and Health Care Services, Kuopio, Finland
| |
Collapse
|
14
|
Arellano AM, Dai W, Wang S, Jiang X, Ohno-Machado L. Privacy Policy and Technology in Biomedical Data Science. Annu Rev Biomed Data Sci 2018; 1:115-129. [PMID: 31058261 PMCID: PMC6497413 DOI: 10.1146/annurev-biodatasci-080917-013416] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Privacyis an important consideration when sharing clinical data, which often contain sensitive information. Adequate protection to safeguard patient privacy and to increase public trust in biomedical research is paramount. This review covers topics in policy and technology in the context of clinical data sharing. We review policy articles related to (a) the Common Rule, HIPAA privacy and security rules, and governance; (b) patients' viewpoints and consent practices; and (c) research ethics. We identify key features of the revised Common Rule and the most notable changes since its previous version. We address data governance for research in addition to the increasing emphasis on ethical and social implications. Research ethics topics include data sharing best practices, use of data from populations of low socioeconomic status (SES), recent updates to institutional review board (IRB) processes to protect human subjects' data, and important concerns about the limitations of current policies to address data deidentification. In terms of technology, we focus on articles that have applicability in real world health care applications: deidentification methods that comply with HIPAA, data anonymization approaches to satisfy well-acknowledged issues in deidentified data, encryption methods to safeguard data analyses, and privacy-preserving predictive modeling. The first two technology topics are mostly relevant to methodologies that attempt to sanitize structured or unstructured data. The third topic includes analysis on encrypted data. The last topic includes various mechanisms to build statistical models without sharing raw data.
Collapse
Affiliation(s)
- April Moreno Arellano
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Wenrui Dai
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Shuang Wang
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| |
Collapse
|
15
|
Taylor L, Zhou XH, Rise P. A tutorial in assessing disclosure risk in microdata. Stat Med 2018; 37:3693-3706. [DOI: 10.1002/sim.7667] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Revised: 02/07/2018] [Accepted: 02/13/2018] [Indexed: 11/08/2022]
Affiliation(s)
- Leslie Taylor
- Health Services Research & Development; VA Puget Sound Health Care System; Seattle WA 98108 USA
| | - Xiao-Hua Zhou
- Health Services Research & Development; VA Puget Sound Health Care System; Seattle WA 98108 USA
- International Center for Mathematical Research; Peking University; Beijing 100871 China
- Department of Biostatistics; University of Washington; Seattle WA 98195 USA
| | - Peter Rise
- Health Services Research & Development; VA Puget Sound Health Care System; Seattle WA 98108 USA
| |
Collapse
|
16
|
O'Keefe CM, Ickowicz A, Churches T, Westcott M, O'Sullivan M, Khan A. Assessing privacy risks in population health publications using a checklist-based approach. J Am Med Inform Assoc 2018; 25:315-320. [PMID: 29136182 DOI: 10.1093/jamia/ocx129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Revised: 09/29/2017] [Accepted: 10/17/2017] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Recent growth in the number of population health researchers accessing detailed datasets, either on their own computers or through virtual data centers, has the potential to increase privacy risks. In response, a checklist for identifying and reducing privacy risks in population health analysis outputs has been proposed for use by researchers themselves. In this study we explore the usability and reliability of such an approach by investigating whether different users identify the same privacy risks on applying the checklist to a sample of publications. METHODS The checklist was applied to a sample of 100 academic population health publications distributed among 5 readers. Cohen's κ was used to measure interrater agreement. RESULTS Of the 566 instances of statistical output types found in the 100 publications, the most frequently occurring were counts, summary statistics, plots, and model outputs. Application of the checklist identified 128 outputs (22.6%) with potential privacy concerns. Most of these were associated with the reporting of small counts. Among these identified outputs, the readers found no substantial actual privacy concerns when context was taken into account. Interrater agreement for identifying potential privacy concerns was generally good. CONCLUSION This study has demonstrated that a checklist can be a reliable tool to assist researchers with anonymizing analysis outputs in population health research. This further suggests that such an approach may have the potential to be developed into a broadly applicable standard providing consistent confidentiality protection across multiple analyses of the same data.
Collapse
|
17
|
Wang M, Ji Z, Kim HE, Wang S, Xiong L, Jiang X. Selecting Optimal Subset to release under Differentially Private M-estimators from Hybrid Datasets. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2018; 30:573-584. [PMID: 30034201 PMCID: PMC6051552 DOI: 10.1109/tkde.2017.2773545] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Privacy concern in data sharing especially for health data gains particularly increasing attention nowadays. Now some patients agree to open their information for research use, which gives rise to a new question of how to effectively use the public information to better understand the private dataset without breaching privacy. In this paper, we specialize this question as selecting an optimal subset of the public dataset for M-estimators in the framework of differential privacy (DP) in [1]. From a perspective of non-interactive learning, we first construct the weighted private density estimation from the hybrid datasets under DP. Along the same line as [2], we analyze the accuracy of the DP M-estimators based on the hybrid datasets. Our main contributions are (i) we find that the bias-variance tradeoff in the performance of our M-estimators can be characterized in the sample size of the released dataset; (2) based on this finding, we develop an algorithm to select the optimal subset of the public dataset to release under DP. Our simulation studies and application to the real datasets confirm our findings and set a guideline in the real application.
Collapse
Affiliation(s)
- Meng Wang
- Department of Biomedical Informatics, University of California at San Diego, CA, 92093 U.S., and now is with the Department of Genetics, Stanford University, CA, 94305, U.S
| | - Zhanglong Ji
- Department of Biomedical Informatics, University of California at San Diego, CA, 92093 U.S
| | - Hyeon-Eui Kim
- Department of Biomedical Informatics, University of California at San Diego, CA, 92093 U.S
| | - Shuang Wang
- Department of Biomedical Informatics, University of California at San Diego, CA, 92093 U.S
| | - Li Xiong
- Department of Computer Science, Emory University, GA, 30322 U.S
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California at San Diego, CA, 92093 U.S
| |
Collapse
|
18
|
McGrail KM, Jones K, Akbari A, Bennett TD, Boyd A, Carinci F, Cui X, Denaxas S, Dougall N, Ford D, Kirby R, Kum HC, Moorin R, Moran R, O’Keefe CM, Preen D, Quan H, Sanmartin C, Schull M, Smith M, Williams C, Williamson T, Wyper GMA, Kotelchuck M. A Position Statement on Population Data Science: The Science of Data about People. Int J Popul Data Sci 2018; 3:415. [PMID: 34095517 PMCID: PMC8142960 DOI: 10.23889/ijpds.v3i1.415] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Information is increasingly digital, creating opportunities to respond to pressing issues about human populations using linked datasets that are large, complex, and diverse. The potential social and individual benefits that can come from data-intensive science are large, but raise challenges of balancing individual privacy and the public good, building appropriate socio-technical systems to support data-intensive science, and determining whether defining a new field of inquiry might help move those collective interests and activities forward. A combination of expert engagement, literature review, and iterative conversations led to our conclusion that defining the field of Population Data Science (challenge 3) will help address the other two challenges as well. We define Population Data Science succinctly as the science of data about people and note that it is related to but distinct from the fields of data science and informatics. A broader definition names four characteristics of: data use for positive impact on citizens and society; bringing together and analyzing data from multiple sources; finding population-level insights; and developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few people possess all of the requisite knowledge and skills of Population Data Science, so this is by nature a multi-disciplinary field. Other implications include the need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, can catalyze significant advances in our understanding of trends in society, health, and human behavior.
Collapse
Affiliation(s)
- Kimberlyn M McGrail
- The University of British Columbia, School of Population and Public Health, 2206 East Mall, Vancouver, BC Canada V6T 1Z3
| | - Kerina Jones
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - Ashley Akbari
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - Tellen D Bennett
- University of Colorado School of Medicine, 13001 E 17th Pl, Aurora, CO 80045, USA
| | - Andy Boyd
- Bristol Medical School: Population Health Sciences, Office OF3 Oakfield House, Oakfield Grove, Clifton BS8 2BN
| | - Fabrizio Carinci
- Department of Statistical Sciences "Paolo Fortunati", University of Bologna, Via Belle Arti 41, Bologna, Italy
| | - Xinjie Cui
- PolicyWise for Children & Families, 9925 109 St NW, Edmonton, AB T5K 2J8, Canada
| | | | - Nadine Dougall
- School of Health & Social Care, Edinburgh Napier University, Sighthill Campus Sighthill Court Edinburgh EH11 4BN
| | - David Ford
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - Russell Kirby
- Dept of Pediatrics, College of Medicine Obstetrics & Gynecology, University of South Florida,, 13201 Bruce B Downs Blvd, MDC56 Tampa FL 33612
| | - Hye-Chung Kum
- Texas A&M School of Public Health 212 Adriance Lab Road College Station, TX
| | | | | | - Christine M O’Keefe
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), GPO Box 1700 Canberra ACT 2601 Australia
| | - David Preen
- University of Western Australia, School of Population and Global Health, 35 Stirling Highway, Perth WA 6009 Australia
| | - Hude Quan
- Department of Community Health Sciences, Faculty of Medicine, University of Calgary, TRW Building, 3rd Floor, 3280 Hospital Drive NW, Calgary, Alberta CANADA T2N 4Z6
| | - Claudia Sanmartin
- Statistics Canada 150 Tunney's Pasture Driveway Ottawa, Ontario K1A 0T6
| | - Michael Schull
- ICES Central, G1 06, 2075 Bayview Avenue Toronto, ON M4N 3M5 Canada
| | - Mark Smith
- University of Manitoba, Manitoba Centre for Health Policy
| | - Christine Williams
- Australian Bureau of Statistics, ABS House 45 Benjamin Way, Belconnen ACT 2617. Australia
| | - Tyler Williamson
- Department of Community Health Sciences, Faculty of Medicine, University of Calgary, TRW Building, 3rd Floor, 3280 Hospital Drive NW, Calgary, Alberta CANADA T2N 4Z6
| | - Grant MA Wyper
- Public Health and Intelligence, NHS National Services Scotland
| | | |
Collapse
|
19
|
O'Keefe CM, Westcott M, O'Sullivan M, Ickowicz A, Churches T. Anonymization for outputs of population health and health services research conducted via an online data center. J Am Med Inform Assoc 2017; 24:544-549. [PMID: 28011594 DOI: 10.1093/jamia/ocw152] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 09/27/2016] [Indexed: 11/13/2022] Open
Abstract
Objective Online data centers (ODCs) are becoming increasingly popular for making health-related data available for research. Such centers provide good privacy protection during analysis by trusted researchers, but privacy concerns may still remain if the system outputs are not sufficiently anonymized. In this article, we propose a method for anonymizing analysis outputs from ODCs for publication in academic literature. Methods We use as a model system the Secure Unified Research Environment, an online computing system that allows researchers to access and analyze linked health-related data for approved studies in Australia. This model system suggests realistic assumptions for an ODC that, together with literature and practice reviews, inform our solution design. Results We propose a two-step approach to anonymizing analysis outputs from an ODC. A data preparation stage requires data custodians to apply some basic treatments to the dataset before making it available. A subsequent output anonymization stage requires researchers to use a checklist at the point of downloading analysis output. The checklist assists researchers with highlighting potential privacy concerns, then applying appropriate anonymization treatments. Conclusion The checklist can be used more broadly in health care research, not just in ODCs. Ease of online publication as well as encouragement from journals to submit supplementary material are likely to increase both the volume and detail of analysis results publicly available, which in turn will increase the need for approaches such as the one suggested in this paper.
Collapse
|
20
|
|
21
|
A Review of Statistical Disclosure Control Techniques Employed by Web-Based Data Query Systems. JOURNAL OF PUBLIC HEALTH MANAGEMENT AND PRACTICE 2016; 23:e1-e4. [PMID: 27798533 DOI: 10.1097/phh.0000000000000473] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We systematically reviewed the statistical disclosure control techniques employed for releasing aggregate data in Web-based data query systems listed in the National Association for Public Health Statistics and Information Systems (NAPHSIS). Each Web-based data query system was examined to see whether (1) it employed any type of cell suppression, (2) it used secondary cell suppression, and (3) suppressed cell counts could be calculated. No more than 30 minutes was spent on each system. Of the 35 systems reviewed, no suppression was observed in more than half (n = 18); observed counts below the threshold were observed in 2 sites; and suppressed values were recoverable in 9 sites. Six sites effectively suppressed small counts. This inquiry has revealed substantial weaknesses in the protective measures used in data query systems containing sensitive public health data. Many systems utilized no disclosure control whatsoever, and the vast majority of those that did deployed it inconsistently or inadequately.
Collapse
|
22
|
Privacy protection and aggregate health data: a review of tabular cell suppression methods (not) employed in public health data systems. HEALTH SERVICES AND OUTCOMES RESEARCH METHODOLOGY 2016. [DOI: 10.1007/s10742-016-0162-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
23
|
Lea NC, Nicholls J, Dobbs C, Sethi N, Cunningham J, Ainsworth J, Heaven M, Peacock T, Peacock A, Jones K, Laurie G, Kalra D. Data Safe Havens and Trust: Toward a Common Understanding of Trusted Research Platforms for Governing Secure and Ethical Health Research. JMIR Med Inform 2016; 4:e22. [PMID: 27329087 PMCID: PMC4933798 DOI: 10.2196/medinform.5571] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 05/19/2016] [Accepted: 06/04/2016] [Indexed: 01/23/2023] Open
Abstract
In parallel with the advances in big data-driven clinical research, the data safe haven concept has evolved over the last decade. It has led to the development of a framework to support the secure handling of health care information used for clinical research that balances compliance with legal and regulatory controls and ethical requirements while engaging with the public as a partner in its governance. We describe the evolution of 4 separately developed clinical research platforms into services throughout the United Kingdom-wide Farr Institute and their common deployment features in practice. The Farr Institute is a case study from which we propose a common definition of data safe havens as trusted platforms for clinical academic research. We use this common definition to discuss the challenges and dilemmas faced by the clinical academic research community, to help promote a consistent understanding of them and how they might best be handled in practice. We conclude by questioning whether the common definition represents a safe and trustworthy model for conducting clinical research that can stand the test of time and ongoing technical advances while paying heed to evolving public and professional concerns.
Collapse
|
24
|
Matthews GJ, Harel O. Examining statistical disclosure issues involving digital images of ROC curves. Stat (Int Stat Inst) 2015. [DOI: 10.1002/sta4.93] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Gregory J. Matthews
- Department of Mathematics and Statistics; Loyola University Chicago; 1032 W. Sheridan Road Chicago IL 60660 USA
| | - Ofer Harel
- Department of Statistics; University of Connecticut; Room 323, Philip E. Austin Building, 215 Glenbrook Rd. U-4120 Storrs CT 06269-4120 USA
| |
Collapse
|