1
|
Dervishi L, Li W, Halimi A, Jiang X, Vaidya J, Ayday E. Privacy preserving identification of population stratification for collaborative genomic research. Bioinformatics 2023; 39:i168-i176. [PMID: 37387172 PMCID: PMC10311306 DOI: 10.1093/bioinformatics/btad274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.
Collapse
Affiliation(s)
- Leonard Dervishi
- Computer and Data Sciences, Case Western Reserve University, OH 44106, United States
| | - Wenbiao Li
- Computer and Data Sciences, Case Western Reserve University, OH 44106, United States
| | | | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, TX 77030, United States
| | - Jaideep Vaidya
- Management Science and Information Systems Department, Rutgers University, NJ 07102, USA
| | - Erman Ayday
- Computer and Data Sciences, Case Western Reserve University, OH 44106, United States
| |
Collapse
|
2
|
Yamamoto A, Shibuya T. Privacy-Preserving Statistical Analysis of Genomic Data Using Compressive Mechanism with Haar Wavelet Transform. J Comput Biol 2023; 30:176-188. [PMID: 36374238 DOI: 10.1089/cmb.2022.0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
To promote the use of personal genome information in medicine, it is important to analyze the relationship between diseases and the human genomes. Therefore, statistical analysis using genomic data is often conducted, but there is a privacy concern with respect to releasing the statistics as they are. Existing methods to address this problem using the concept of differential privacy cannot provide accurate outputs under strong privacy guarantees, making them less practical. In this study, for the first time, we investigate the application of a compressive mechanism to genomic statistical data and propose two approaches. The first is to apply the normal compressive mechanism to the statistics vector along with an algorithm to determine the number of nonzero entries in a sparse representation. The second is to alter the mechanism based on the data, aiming to release significant single nucleotide polymorphisms with a high probability. In this algorithm, we apply the compressive mechanism with the input as a sparse vector for significant data and the Laplace mechanism for nonsignificant data. By using the Haar wavelet transform for the compressive mechanism, we can determine the number of nonzero elements and the amount of noise. In addition, we give theoretical guarantees that our proposed methods achieve ϵ-differential privacy. We evaluated our methods in terms of accuracy and rank error compared with the Laplace and exponential mechanisms. The results show that our second method in particular can guarantee high privacy assurance as well as utility.
Collapse
Affiliation(s)
- Akito Yamamoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Tetsuo Shibuya
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Samlali K, Thornbury M, Venter A. Community-led risk analysis of direct-to-consumer whole-genome sequencing. Biochem Cell Biol 2022; 100:499-509. [PMID: 35939839 DOI: 10.1139/bcb-2021-0506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Direct-to-consumer (DTC) genetic testing is cheaper and more accessible than ever before; however, the intention to combine, reuse, and resell this genetic information as powerful data sets is generally hidden from the consumer. This financial gain is creating a competitive DTC market, reducing the price of whole-genome sequencing (WGS) to under 300 USD. Entering this transition from single-nucleotide polymorphism-based DTC testing to WGS DTC testing, individuals looking for access to their whole-genomic information face new privacy and security risks. Differences between WGS and other methods of consumer genetic tests are left unexplored by regulation, leading to the application of legal data anonymization methods on whole-genome data, and questionable consent methods. Large representative genomic data sets are important for research and improve the standard of medicine and personalized care. However, these data can also be used by market players, law enforcement, and governments for surveillance, population analyses, marketing purposes, and discrimination. Here, we present a summary of the state of WGS DTC genetic testing and its current regulation, through a community-based lens to expose dual-use risks in consumer-facing biotechnologies.
Collapse
Affiliation(s)
- Kenza Samlali
- BricoBio Community Biology Lab, Montréal, QC, Canada.,Centre for Applied Synthetic Biology, Concordia University, Montréal, QC, Canada.,Department of Electrical and Computer Engineering, Concordia University, Montréal, QC, Canada
| | - Mackenzie Thornbury
- BricoBio Community Biology Lab, Montréal, QC, Canada.,Centre for Applied Synthetic Biology, Concordia University, Montréal, QC, Canada.,Department of Biology, Concordia University, Montréal, QC, Canada
| | - Andrei Venter
- BricoBio Community Biology Lab, Montréal, QC, Canada
| |
Collapse
|
4
|
Wan Z, Hazel JW, Clayton EW, Vorobeychik Y, Kantarcioglu M, Malin BA. Sociotechnical safeguards for genomic data privacy. Nat Rev Genet 2022; 23:429-445. [PMID: 35246669 PMCID: PMC8896074 DOI: 10.1038/s41576-022-00455-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/24/2022] [Indexed: 12/21/2022]
Abstract
Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information. In this Review, the authors describe technical and legal protection mechanisms for mitigating vulnerabilities in genomic data privacy. They also discuss how these protections are dependent on the context of data use such as in research, health care, direct-to-consumer testing or forensic investigations.
Collapse
Affiliation(s)
- Zhiyu Wan
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA.,Department of Computer Science, Vanderbilt University, Nashville, TN, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - James W Hazel
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA.,Center for Biomedical Ethics and Society, Vanderbilt University, Nashville, TN, USA
| | - Ellen Wright Clayton
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA.,Center for Biomedical Ethics and Society, Vanderbilt University, Nashville, TN, USA.,Vanderbilt University Law School, Nashville, TN, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA
| | - Bradley A Malin
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA. .,Department of Computer Science, Vanderbilt University, Nashville, TN, USA. .,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. .,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
5
|
Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022; 61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background
Artificial intelligence (AI) has been successfully applied in numerous scientific domains. In biomedicine, AI has already shown tremendous potential, e.g., in the interpretation of next-generation sequencing data and in the design of clinical decision support systems.
Objectives
However, training an AI model on sensitive data raises concerns about the privacy of individual participants. For example, summary statistics of a genome-wide association study can be used to determine the presence or absence of an individual in a given dataset. This considerable privacy risk has led to restrictions in accessing genomic and other biomedical data, which is detrimental for collaborative research and impedes scientific progress. Hence, there has been a substantial effort to develop AI methods that can learn from sensitive data while protecting individuals' privacy.
Method
This paper provides a structured overview of recent advances in privacy-preserving AI techniques in biomedicine. It places the most important state-of-the-art approaches within a unified taxonomy and discusses their strengths, limitations, and open problems.
Conclusion
As the most promising direction, we suggest combining federated machine learning as a more scalable approach with other additional privacy-preserving techniques. This would allow to merge the advantages to provide privacy guarantees in a distributed way for biomedical applications. Nonetheless, more research is necessary as hybrid approaches pose new challenges such as additional network or computation overhead.
Collapse
Affiliation(s)
- Reihaneh Torkzadehmahani
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany
| | - Reza Nasirigerdeh
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany.,Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - David B Blumenthal
- Department of Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Tim Kacprowski
- Division of Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Medical School Hannover, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Julian Spaeth
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Nina Kerstin Wenke
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
6
|
Venkatesaramani R, Malin BA, Vorobeychik Y. Re-identification of individuals in genomic datasets using public face images. SCIENCE ADVANCES 2021; 7:eabg3296. [PMID: 34788101 PMCID: PMC8597988 DOI: 10.1126/sciadv.abg3296] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Recent studies suggest that genomic data can be matched to images of human faces, raising the concern that genomic data can be re-identified with relative ease. However, such investigations assume access to well-curated images, which are rarely available in practice and challenging to derive from photos not generated in a controlled laboratory setting. In this study, we reconsider re-identification risk and find that, for most individuals, the actual risk posed by linkage attacks to typical face images is substantially smaller than claimed in prior investigations. Moreover, we show that only a small amount of well-calibrated noise, imperceptible to humans, can be added to images to markedly reduce such risk. The results of this investigation create an opportunity to create image filters that enable individuals to have better control over re-identification risk based on linkage.
Collapse
Affiliation(s)
- Rajagopal Venkatesaramani
- Department of Computer Science and Engineering, Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63108, USA
- Corresponding author.
| | - Bradley A. Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Suite 1475, 2525 West End Avenue, Nashville, TN 37203, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Suite 1475, 2525 West End Avenue, Nashville, TN 37203, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, 2201 West End Ave, Nashville, TN 37235, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63108, USA
| |
Collapse
|
7
|
Yamamoto A, Shibuya T. More practical differentially private publication of key statistics in GWAS. BIOINFORMATICS ADVANCES 2021; 1:vbab004. [PMID: 36700105 PMCID: PMC9710635 DOI: 10.1093/bioadv/vbab004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 05/07/2021] [Accepted: 05/15/2021] [Indexed: 01/28/2023]
Abstract
Motivation: Analyses of datasets that contain personal genomic information are very important for revealing associations between diseases and genomes. Genome-wide association studies, which are large-scale genetic statistical analyses, often involve tests with contingency tables. However, if the statistics obtained by these tests are made public as they are, sensitive information of individuals could be leaked. Existing studies have proposed privacy-preserving methods for statistics in the χ2 test with a 3 × 2 contingency table, but they do not cover all the tests used in association studies. In addition, existing methods for releasing differentially private P-values are not practical. Results: In this work, we propose methods for releasing statistics in the χ2 test, the Fisher's exact test and the Cochran-Armitage's trend test while preserving both personal privacy and utility. Our methods for releasing P-values are the first to achieve practicality under the concept of differential privacy by considering their base 10 logarithms. We make theoretical guarantees by showing the sensitivity of the above statistics. From our experimental results, we evaluate the utility of the proposed methods and show appropriate thresholds with high accuracy for using the private statistics in actual tests. Availability and implementation A python implementation of our experiments is available at https://github.com/ay0408/DP-statistics-GWAS. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Akito Yamamoto
- Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan,To whom correspondence should be addressed.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| |
Collapse
|
8
|
Abstract
Abstract
Genome-Wide Association Studies (GWAS) identify the genomic variations that are statistically associated with a particular phenotype (e.g., a disease). The confidence in GWAS results increases with the number of genomes analyzed, which encourages federated computations where biocenters would periodically share the genomes they have sequenced. However, for economical and legal reasons, this collaboration will only happen if biocenters cannot learn each others’ data. In addition, GWAS releases should not jeopardize the privacy of the individuals whose genomes are used. We introduce DyPS, a novel framework to conduct dynamic privacy-preserving federated GWAS. DyPS leverages a Trusted Execution Environment to secure dynamic GWAS computations. Moreover, DyPS uses a scaling mechanism to speed up the releases of GWAS results according to the evolving number of genomes used in the study, even if individuals retract their participation consent. Lastly, DyPS also tolerates up to all-but-one colluding biocenters without privacy leaks. We implemented and extensively evaluated DyPS through several scenarios involving more than 6 million simulated genomes and up to 35,000 real genomes. Our evaluation shows that DyPS updates test statistics with a reasonable additional request processing delay (11% longer) compared to an approach that would update them with minimal delay but would lead to 8% of the genomes not being protected. In addition, DyPS can result in the same amount of aggregate statistics as a static release (i.e., at the end of the study), but can produce up to 2.6 times more statistics information during earlier dynamic releases. Besides, we show that DyPS can support a larger number of genomes and SNP positions without any significant performance penalty.
Collapse
|
9
|
Katsanis SH. Pedigrees and Perpetrators: Uses of DNA and Genealogy in Forensic Investigations. Annu Rev Genomics Hum Genet 2020; 21:535-564. [DOI: 10.1146/annurev-genom-111819-084213] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the past few years, cases with DNA evidence that could not be solved with direct matches in DNA databases have benefited from comparing single-nucleotide polymorphism data with private and public genomic databases. Using a combination of genome comparisons and traditional genealogical research, investigators can triangulate distant relatives to the contributor of DNA data from a crime scene, ultimately identifying perpetrators of violent crimes. This approach has also been successful in identifying unknown deceased persons and perpetrators of lesser crimes. Such advances are bringing into focus ethical questions on how much access to DNA databases should be granted to law enforcement and how best to empower public genome contributors with control over their data. The necessary policies will take time to develop but can be informed by reflection on the familial searching policies developed for searches of the federal DNA database and considerations of the anonymity and privacy interests of civilians.
Collapse
Affiliation(s)
- Sara H. Katsanis
- Mary Ann & J. Milburn Smith Child Health Research, Outreach, and Advocacy Center, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois 60611, USA
- Department of Pediatrics, Northwestern University, Chicago, Illinois 60611, USA
| |
Collapse
|
10
|
Sariyar M, Schlünder I. Challenges and Legal Gaps of Genetic Profiling in the Era of Big Data. Front Big Data 2019; 2:40. [PMID: 33693363 PMCID: PMC7931923 DOI: 10.3389/fdata.2019.00040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 10/28/2019] [Indexed: 11/30/2022] Open
Abstract
Profiling of individuals based on inborn, acquired, and assigned characteristics is central for decision making in health care. In the era of omics and big smart data, it becomes urgent to differentiate between different data governance affordances for different profiling activities. Typically, diagnostic profiling is in the focus of researchers and physicians, and other types are regarded as undesired side-effects; for example, in the connection of health care insurance risk calculations. Profiling in a legal sense is addressed, for example, by the EU data protection law. It is defined in the General Data Protection Regulation as automated decision making. This term does not correspond fully with profiling in biomedical research and healthcare, and the impact on privacy has hardly ever been examined. But profiling is also an issue concerning the fundamental right of non-discrimination, whenever profiles are used in a way that has a discriminatory effect on individuals. Here, we will focus on genetic profiling, define related notions as legal and subject-matter definitions frequently differ, and discuss the ethical and legal challenges.
Collapse
Affiliation(s)
- Murat Sariyar
- Institute of Medical Informatics, Bern University of Applied Sciences, Bienne, Switzerland
| | - Irene Schlünder
- TMF - Technologie- und Methodenplattform e.V., Berlin, Germany.,BBMRI-ERIC, Graz, Austria
| |
Collapse
|
11
|
Mohammed Yakubu A, Chen YPP. Ensuring privacy and security of genomic data and functionalities. Brief Bioinform 2019; 21:511-526. [DOI: 10.1093/bib/bbz013] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 01/01/2019] [Accepted: 01/14/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
In recent times, the reduced cost of DNA sequencing has resulted in a plethora of genomic data that is being used to advance biomedical research and improve clinical procedures and healthcare delivery. These advances are revolutionizing areas in genome-wide association studies (GWASs), diagnostic testing, personalized medicine and drug discovery. This, however, comes with security and privacy challenges as the human genome is sensitive in nature and uniquely identifies an individual. In this article, we discuss the genome privacy problem and review relevant privacy attacks, classified into identity tracing, attribute disclosure and completion attacks, which have been used to breach the privacy of an individual. We then classify state-of-the-art genomic privacy-preserving solutions based on their application and computational domains (genomic aggregation, GWASs and statistical analysis, sequence comparison and genetic testing) that have been proposed to mitigate these attacks and compare them in terms of their underlining cryptographic primitives, security goals and complexities—computation and transmission overheads. Finally, we identify and discuss the open issues, research challenges and future directions in the field of genomic privacy. We believe this article will provide researchers with the current trends and insights on the importance and challenges of privacy and security issues in the area of genomics.
Collapse
Affiliation(s)
- Abukari Mohammed Yakubu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|
12
|
Abstract
The rising demand to use genetic data for research goes hand in hand with an increased awareness of privacy issues related to its use. Using human genetic data in a legally compliant way requires an examination of the legal basis as well as an assessment of potential disclosure risks. Focusing on the relevant legal framework in the European Union, we discuss open questions and uncertainties around the handling of genetic data in research, which can result in the introduction of unnecessary hurdles for data sharing. First, we discuss defining features and relative disclosure risks of some DNA-related biomarkers, distinguishing between the risk for disclosure of (1) the identity of an individual, (2) information about an individual's health and behavior, including previously unknown phenotypes, and (3) information about an individual's blood relatives. Second, we discuss the European legal framework applicable to the use of DNA-related biomarkers in research, the implications of including both inherited and acquired traits in the legal definition, as well as the issue of “genetic exceptionalism”—the notion that genetic information has inherent characteristics that require different considerations than other health and medical information. Finally, by mapping the legal to specific technical definitions, we draw some initial conclusions concerning how sensitive different types of “genetic data” may actually be. We argue that whole genome sequences may justifiably be considered “exceptional” and require special protection, whereas other genetic data that do not fulfill the same criteria should be treated in a similar manner to other clinical data. This kind of differentiation should be reflected by the law and/or other governance frameworks as well as agreed Codes of Conduct when using the term “genetic data.”
Collapse
Affiliation(s)
- Murat Sariyar
- 1 Institute of Medical Informatics, Bern University of Applied Sciences , Bienne, Switzerland
| | | | - Irene Schlünder
- 3 TMF-Technologie- und Methodenplattform e.V. , Berlin, Germany .,4 BBMRI-ERIC , Graz, Austria
| |
Collapse
|
13
|
Wan Z, Vorobeychik Y, Kantarcioglu M, Malin B. Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services. BMC Med Genomics 2017; 10:39. [PMID: 28786360 PMCID: PMC5547445 DOI: 10.1186/s12920-017-0282-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual’s DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. Methods This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. Results Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly. Conclusions This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.
Collapse
Affiliation(s)
- Zhiyu Wan
- Department of Electrical Engineering and Computer Science, Vanderbilt University, 2525 West End Avenue, Suite 800, 37203, Nashville, TN, USA.
| | - Yevgeniy Vorobeychik
- Department of Electrical Engineering and Computer Science, Vanderbilt University, 2525 West End Avenue, Suite 800, 37203, Nashville, TN, USA.,Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA
| | - Bradley Malin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, 2525 West End Avenue, Suite 800, 37203, Nashville, TN, USA.,Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.,Department of Biostatistics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
14
|
|
15
|
Huang Z, Lin H, Fellay J, Kutalik Z, Hubaux JP. SQC: secure quality control for meta-analysis of genome-wide association studies. Bioinformatics 2017; 33:2273-2280. [DOI: 10.1093/bioinformatics/btx193] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 03/31/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
- Zhicong Huang
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Huang Lin
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Zoltán Kutalik
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Social and Preventive Medicine, University Hospital Lausanne (CHUV), Lausanne, Switzerland
| | - Jean-Pierre Hubaux
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
16
|
Abstract
The past 15 years has seen considerable changes in the research environment. These changes include the development of new sophisticated genetic and genomic technologies, a proliferation of databases containing large amount of genotypic and phenotypic data, and wide-spread data sharing among many institutions, nationally and internationally. These changes have raised new questions regarding how best to protect the participants of biobanking research. In response to these questions, best practices for addressing the legal, ethical, and social issues of biobanking have been developed. In addition, new ethical guidelines related to biobanking have been established, as well as new regulations regarding privacy and human subject protections. Finally, changes in the science and the research environment have raised complex ethical issues related to biobanking, such as questions about the most appropriate consent models to use for biobanking research, commercial use and ownership issues, and whether and how to return individual research results to biobank participants. This article reviews some of the developments over the past 15 years related to the ELSI of biobanking with a look toward the future.
Collapse
|
17
|
Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach. Am J Hum Genet 2017; 100:316-322. [PMID: 28065469 DOI: 10.1016/j.ajhg.2016.12.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 12/07/2016] [Indexed: 12/11/2022] Open
Abstract
Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations.
Collapse
|
18
|
Wang S, Jiang X, Singh S, Marmor R, Bonomi L, Fox D, Dow M, Ohno-Machado L. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann N Y Acad Sci 2017; 1387:73-83. [PMID: 27681358 PMCID: PMC5266631 DOI: 10.1111/nyas.13259] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/18/2016] [Accepted: 08/22/2016] [Indexed: 12/28/2022]
Abstract
Accessing and integrating human genomic data with phenotypes are important for biomedical research. Making genomic data accessible for research purposes, however, must be handled carefully to avoid leakage of sensitive individual information to unauthorized parties and improper use of data. In this article, we focus on data sharing within the scope of data accessibility for research. Current common practices to gain biomedical data access are strictly rule based, without a clear and quantitative measurement of the risk of privacy breaches. In addition, several types of studies require privacy-preserving linkage of genotype and phenotype information across different locations (e.g., genotypes stored in a sequencing facility and phenotypes stored in an electronic health record) to accelerate discoveries. The computer science community has developed a spectrum of techniques for data privacy and confidentiality protection, many of which have yet to be tested on real-world problems. In this article, we discuss clinical, technical, and ethical aspects of genome data privacy and confidentiality in the United States, as well as potential solutions for privacy-preserving genotype-phenotype linkage in biomedical research.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Siddharth Singh
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Rebecca Marmor
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Luca Bonomi
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Dov Fox
- School of Law, University of San Diego, San Diego, California
| | - Michelle Dow
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| |
Collapse
|
19
|
Shringarpure SS, Bustamante CD. Privacy Risks from Genomic Data-Sharing Beacons. Am J Hum Genet 2015; 97:631-46. [PMID: 26522470 PMCID: PMC4667107 DOI: 10.1016/j.ajhg.2015.09.010] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 09/23/2015] [Indexed: 12/21/2022] Open
Abstract
The human genetics community needs robust protocols that enable secure sharing of genomic data from participants in genetic research. Beacons are web servers that answer allele-presence queries—such as “Do you have a genome that has a specific nucleotide (e.g., A) at a specific genomic position (e.g., position 11,272 on chromosome 1)?”—with either “yes” or “no.” Here, we show that individuals in a beacon are susceptible to re-identification even if the only data shared include presence or absence information about alleles in a beacon. Specifically, we propose a likelihood-ratio test of whether a given individual is present in a given genetic beacon. Our test is not dependent on allele frequencies and is the most powerful test for a specified false-positive rate. Through simulations, we showed that in a beacon with 1,000 individuals, re-identification is possible with just 5,000 queries. Relatives can also be identified in the beacon. Re-identification is possible even in the presence of sequencing errors and variant-calling differences. In a beacon constructed with 65 European individuals from the 1000 Genomes Project, we demonstrated that it is possible to detect membership in the beacon with just 250 SNPs. With just 1,000 SNP queries, we were able to detect the presence of an individual genome from the Personal Genome Project in an existing beacon. Our results show that beacons can disclose membership and implied phenotypic information about participants and do not protect privacy a priori. We discuss risk mitigation through policies and standards such as not allowing anonymous pings of genetic beacons and requiring minimum beacon sizes.
Collapse
|
20
|
Simmons S, Berger B. One Size Doesn't Fit All: Measuring Individual Privacy in Aggregate Genomic Data. PROCEEDINGS. IEEE SYMPOSIUM ON SECURITY AND PRIVACY. WORKSHOPS 2015; 2015:41-49. [PMID: 29202050 DOI: 10.1109/spw.2015.25] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Even in the aggregate, genomic data can reveal sensitive information about individuals. We present a new model-based measure, PrivMAF, that provides provable privacy guarantees for aggregate data (namely minor allele frequencies) obtained from genomic studies. Unlike many previous measures that have been designed to measure the total privacy lost by all participants in a study, PrivMAF gives an individual privacy measure for each participant in the study, not just an average measure. These individual measures can then be combined to measure the worst case privacy loss in the study. Our measure also allows us to quantify the privacy gains achieved by perturbing the data, either by adding noise or binning. Our findings demonstrate that both perturbation approaches offer significant privacy gains. Moreover, we see that these privacy gains can be achieved while minimizing perturbation (and thus maximizing the utility) relative to stricter notions of privacy, such as differential privacy. We test PrivMAF using genotype data from the Wellcome Trust Case Control Consortium, providing a more nuanced understanding of the privacy risks involved in an actual genome-wide association studies. Interestingly, our analysis demonstrates that the privacy implications of releasing MAFs from a study can differ greatly from individual to individual. An implementation of our method is available at http://privmaf.csail.mit.edu.
Collapse
Affiliation(s)
- Sean Simmons
- Department of Mathematics and CSAIL, Massachusetts Institute of Technology
| | - Bonnie Berger
- Department of Mathematics and CSAIL, Massachusetts Institute of Technology
| |
Collapse
|
21
|
Xie W, Kantarcioglu M, Bush WS, Crawford D, Denny JC, Heatherly R, Malin BA. SecureMA: protecting participant privacy in genetic association meta-analysis. ACTA ACUST UNITED AC 2014; 30:3334-41. [PMID: 25147357 DOI: 10.1093/bioinformatics/btu561] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Sharing genomic data is crucial to support scientific investigation such as genome-wide association studies. However, recent investigations suggest the privacy of the individual participants in these studies can be compromised, leading to serious concerns and consequences, such as overly restricted access to data. RESULTS We introduce a novel cryptographic strategy to securely perform meta-analysis for genetic association studies in large consortia. Our methodology is useful for supporting joint studies among disparate data sites, where privacy or confidentiality is of concern. We validate our method using three multisite association studies. Our research shows that genetic associations can be analyzed efficiently and accurately across substudy sites, without leaking information on individual participants and site-level association summaries. AVAILABILITY AND IMPLEMENTATION Our software for secure meta-analysis of genetic association studies, SecureMA, is publicly available at http://github.com/XieConnect/SecureMA. Our customized secure computation framework is also publicly available at http://github.com/XieConnect/CircuitService.
Collapse
Affiliation(s)
- Wei Xie
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Murat Kantarcioglu
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - William S Bush
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Dana Crawford
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Joshua C Denny
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Raymond Heatherly
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Bradley A Malin
- Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, USA, Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA, Department of Biomedical Informatics, Center for Human Genetics Research, Department of Molecular Physiology and Biophysics and Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| |
Collapse
|
22
|
Abstract
We are entering an era of ubiquitous genetic information for research, clinical care and personal curiosity. Sharing these data sets is vital for progress in biomedical research. However, a growing concern is the ability to protect the genetic privacy of the data originators. Here, we present an overview of genetic privacy breaching strategies. We outline the principles of each technique, indicate the underlying assumptions, and assess their technological complexity and maturation. We then review potential mitigation methods for privacy-preserving dissemination of sensitive data and highlight different cases that are relevant to genetic applications.
Collapse
Affiliation(s)
- Yaniv Erlich
- Whitehead Institute for Biomedical Research, Nine Cambridge Center,
Cambridge, MA USA 02142
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, 35 Olden Street,
Princeton, NJ USA 08540
| |
Collapse
|
23
|
Abstract
Next-generation sequencing (NGS) has enabled whole-exome and whole-genome sequencing of tumors for causative mutations, allowing for more accurate targeting of therapies. In the process of sequencing the tumor, comparisons to the germline genome may identify variants associated with susceptibility to cancer as well as other hereditary diseases. Already, the combination of massively parallel sequencing and selective capture approaches has facilitated efficient simultaneous genetic analysis (multiplex testing) of large numbers of candidate genes. As the field of oncology incorporates NGS approaches into tumor and germline analyses, it has become clear that the ability to achieve high-throughput genotyping surpasses our current ability to interpret and appropriately apply the vast amounts of data generated from such technologies. A review of the current state of knowledge of rare and common genetic variants associated with cancer risk or treatment outcome reveals significant progress, as well as a number of challenges associated with the clinical translation of these discoveries. The combined efforts of oncologists, genetic counselors, and cancer geneticists will be required to drive the paradigm shift toward personalized or precision medicine and to ensure the incorporation of NGS technologies into the practice of preventive oncology.
Collapse
Affiliation(s)
- Zsofia K. Stadler
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Kasmintan A. Schrader
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Joseph Vijai
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Mark E. Robson
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| | - Kenneth Offit
- All authors: Memorial Sloan-Kettering Cancer Center; Zsofia K. Stadler, Mark E. Robson, and Kenneth Offit, Weill Cornell Medical College, New York, NY
| |
Collapse
|
24
|
Yu F, Fienberg SE, Slavković AB, Uhler C. Scalable privacy-preserving data sharing methodology for genome-wide association studies. J Biomed Inform 2014; 50:133-41. [PMID: 24509073 DOI: 10.1016/j.jbi.2014.01.008] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Revised: 01/21/2014] [Accepted: 01/23/2014] [Indexed: 11/28/2022]
Abstract
The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).
Collapse
Affiliation(s)
- Fei Yu
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA.
| | - Stephen E Fienberg
- Department of Statistics, Heinz College, Machine Learning Department, and Cylab, Carnegie Mellon University, Pittsburgh, PA 15213-3890, USA.
| | - Aleksandra B Slavković
- Department of Statistics, Department of Public Health Sciences, Penn State University, University Park, PA 16802, USA.
| | - Caroline Uhler
- Institute of Science and Technology Austria, Am Campus 1, 3400 Klosterneuburg, Austria.
| |
Collapse
|
25
|
Differentially-Private Logistic Regression for Detecting Multiple-SNP Association in GWAS Databases. PRIVACY IN STATISTICAL DATABASES 2014. [DOI: 10.1007/978-3-319-11257-2_14] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
26
|
Bledsoe MJ, Grizzle WE. Use of human specimens in research: the evolving United States regulatory, policy, and scientific landscape. DIAGNOSTIC HISTOPATHOLOGY (OXFORD, ENGLAND) 2013; 19:322-330. [PMID: 24639889 PMCID: PMC3954467 DOI: 10.1016/j.mpdhp.2013.06.015] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The use of human specimens in research has contributed to significant scientific and medical advancements. However, the development of sophisticated whole genome and informatics technologies and the increase in specimen and data sharing have raised new questions about the identifiability of specimens and the protection of participants in human specimen research. In the US, new regulations and policies are being considered to address these changes. This review discusses the current and proposed regulations as they apply to specimen research, as well as relevant policy discussions. It summarizes the ways that researchers and other stakeholders can provide their input to these discussions and policy development efforts. Input from all the stakeholders in specimen research will be essential for the development of policies that facilitate such research while at the same time protecting the rights and welfare of research participants.
Collapse
Affiliation(s)
| | - William E Grizzle
- Division of Anatomic Pathology, Department of Pathology, The University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
27
|
Affiliation(s)
- Jeantine E Lunshof
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02215, USA
- Section Molecular Cell Physiology, VU University Amsterdam, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | - Madeleine P Ball
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02215, USA
| |
Collapse
|
28
|
Walker L, Starks H, West KM, Fullerton SM. dbGaP data access requests: a call for greater transparency. Sci Transl Med 2012; 3:113cm34. [PMID: 22174311 DOI: 10.1126/scitranslmed.3002788] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The scientific and public health benefits of mandatory data-sharing mechanisms must be actively demonstrated. To this end, we manually reviewed 2724 data access requests approved between June 2007 and August 2010 through the U.S. National Center for Biotechnology Information database of genotypes and phenotypes (dbGaP). Our analysis demonstrates that dbGaP enables a wide range of secondary research by investigators from academic, governmental, and nonprofit and for-profit institutions in the United States and abroad. However, limitations in public reporting preclude the tracing of outcomes from secondary research to longer-term translational benefit.
Collapse
Affiliation(s)
- Lorelei Walker
- Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
| | | | | | | |
Collapse
|
29
|
Craig DW, Goor RM, Wang Z, Paschall J, Ostell J, Feolo M, Sherry ST, Manolio TA. Assessing and managing risk when sharing aggregate genetic variant data. Nat Rev Genet 2011; 12:730-6. [PMID: 21921928 DOI: 10.1038/nrg3067] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Access to genetic data across studies is an important aspect of identifying new genetic associations through genome-wide association studies (GWASs). Meta-analysis across multiple GWASs with combined cohort sizes of tens of thousands of individuals often uncovers many more genome-wide associated loci than the original individual studies; this emphasizes the importance of tools and mechanisms for data sharing. However, even sharing summary-level data, such as allele frequencies, inherently carries some degree of privacy risk to study participants. Here we discuss mechanisms and resources for sharing data from GWASs, particularly focusing on approaches for assessing and quantifying the privacy risks to participants that result from the sharing of summary-level data.
Collapse
Affiliation(s)
- David W Craig
- Translational Genomics Research Institute (TGen), Phoenix, Arizona 85004, USA.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Malin B, Loukides G, Benitez K, Clayton EW. Identifiability in biobanks: models, measures, and mitigation strategies. Hum Genet 2011; 130:383-92. [PMID: 21739176 PMCID: PMC3621020 DOI: 10.1007/s00439-011-1042-5] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 06/12/2011] [Indexed: 12/29/2022]
Abstract
The collection and sharing of person-specific biospecimens has raised significant questions regarding privacy. In particular, the question of identifiability, or the degree to which materials stored in biobanks can be linked to the name of the individuals from which they were derived, is under scrutiny. The goal of this paper is to review the extent to which biospecimens and affiliated data can be designated as identifiable. To achieve this goal, we summarize recent research in identifiability assessment for DNA sequence data, as well as associated demographic and clinical data, shared via biobanks. We demonstrate the variability of the degree of risk, the factors that contribute to this variation, and potential ways to mitigate and manage such risk. Finally, we discuss the policy implications of these findings, particularly as they pertain to biobank security and access policies. We situate our review in the context of real data sharing scenarios and biorepositories.
Collapse
Affiliation(s)
- Bradley Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA. Department of Electrical Engineering and Computer Science, School of Engineering, Vanderbilt University, Nashville, USA
| | - Grigorios Loukides
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA
| | - Kathleen Benitez
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 600, Nashville, TN 37203, USA
| | - Ellen Wright Clayton
- Department of Pediatrics, School of Medicine, Vanderbilt, USA. Center for Biomedical Ethics and Society, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 400, Nashville, TN 37203, USA. School of Law, Vanderbilt University, Nashville, USA
| |
Collapse
|
31
|
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. STrengthening the REporting of Genetic Association Studies (STREGA)--an extension of the STROBE statement. Genet Epidemiol 2010; 33:581-98. [PMID: 19278015 DOI: 10.1002/gepi.20410] [Citation(s) in RCA: 177] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the STrengthening the Reporting of OBservational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modelling haplotype variation, Hardy-Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.
Collapse
Affiliation(s)
- Julian Little
- Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
The advent of both population genomic studies and direct-to-consumer personal genetic testing raises ethical challenges for researchers and physicians alike. Quality and solidarity can now be added to traditional ethical principles, such as autonomy and privacy. There is no doubt that genetic information is going 'public'. Informatic technologies allow for greater accessibility and integration, but can researchers and physicians handle the challenges? Are ethics committees equipped to handle this shift towards greater openness and towards a conflation of research and traditional medical ethics?
Collapse
Affiliation(s)
- Bartha Maria Knoppers
- Centre of Genomics and Policy, McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Ave, Room 5210, Montreal H3A 1A4, Quebec, Canada.
| | - Denise Avard
- Centre of Genomics and Policy, McGill University and Genome Quebec Innovation Centre, 740 Dr. Penfield Ave, Room 5210, Montreal H3A 1A4, Quebec, Canada.
| |
Collapse
|
33
|
Angrist M. Eyes wide open: the personal genome project, citizen science and veracity in informed consent. Per Med 2009; 6:691-699. [PMID: 22328898 DOI: 10.2217/pme.09.48] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
I am a close observer of the Personal Genome Project (PGP) and one of the original ten participants. The PGP was originally conceived as a way to test novel DNA sequencing technologies on human samples and to begin to build a database of human genomes and traits. However, its founder, Harvard geneticist George Church, was concerned about the fact that DNA is the ultimate digital identifier - individuals and many of their traits can be identified. Therefore, he believed that promising participants privacy and confidentiality would be impractical and disingenuous. Moreover, deidentification of samples would impoverish both genotypic and phenotypic data. As a result, the PGP has arguably become best known for its unprecedented approach to informed consent. All participants must pass an exam testing their knowledge of genomic science and privacy issues and agree to forgo the privacy and confidentiality of their genomic data and personal health records. Church aims to scale up to 100,000 participants. This special report discusses the impetus for the project, its early history and its potential to have a lasting impact on the treatment of human subjects in biomedical research.
Collapse
Affiliation(s)
- Misha Angrist
- Duke University Institute for Genome Sciences & Policy, 450 Research Drive, Room B321A, Durham, NC 27708-21009, USA, Tel.: +1 919 684 2872
| |
Collapse
|
34
|
|
35
|
Little J, Higgins JP, Ioannidis JP, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart AF, Birkett N. Strengthening the reporting of genetic association studies (STREGA)—an extension of the strengthening the reporting of observational studies in epidemiology (STROBE) statement. J Clin Epidemiol 2009; 62:597-608.e4. [DOI: 10.1016/j.jclinepi.2008.12.004] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2008] [Revised: 12/29/2008] [Accepted: 12/29/2008] [Indexed: 01/15/2023]
|
36
|
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. STrengthening the REporting of Genetic Association studies (STREGA)--an extension of the STROBE statement. Eur J Clin Invest 2009; 39:247-66. [PMID: 19297801 PMCID: PMC2730482 DOI: 10.1111/j.1365-2362.2009.02125.x] [Citation(s) in RCA: 181] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the STrengthening the Reporting of OBservational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modelling haplotype variation, Hardy-Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed, but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct or analysis.
Collapse
Affiliation(s)
- Julian Little
- Canada Research Chair in Human Genome Epidemiology, Ottawa, ON, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. STrengthening the REporting of Genetic Association Studies (STREGA): an extension of the STROBE statement. PLoS Med 2009; 6:e22. [PMID: 19192942 PMCID: PMC2634792 DOI: 10.1371/journal.pmed.1000022] [Citation(s) in RCA: 310] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modelling haplotype variation, Hardy-Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.
Collapse
Affiliation(s)
- Julian Little
- Canada Research Chair in Human Genome Epidemiology, University of Ottawa, Ottawa, Ontario, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement. Eur J Epidemiol 2009; 24:37-55. [PMID: 19189221 PMCID: PMC2764094 DOI: 10.1007/s10654-008-9302-y] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2008] [Accepted: 11/04/2008] [Indexed: 02/02/2023]
Abstract
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplotype variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.
Collapse
Affiliation(s)
- Julian Little
- Canada Research Chair in Human Genome Epidemiology, Toronto, ON, Canada.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE Statement. Hum Genet 2009; 125:131-51. [DOI: 10.1007/s00439-008-0592-7] [Citation(s) in RCA: 129] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2008] [Accepted: 11/09/2008] [Indexed: 12/21/2022]
|
40
|
Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet 2009; 10:9-18. [PMID: 19065136 DOI: 10.1038/nrg2483] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The flow of research data concerning the genetic basis of health and disease is rapidly increasing in speed and complexity. In response, many projects are seeking to ensure that there are appropriate informatics tools, systems and databases available to manage and exploit this flood of information. Previous solutions, such as central databases, journal-based publication and manually intensive data curation, are now being enhanced with new systems for federated databases, database publication, and more automated management of data flows and quality control. Along with emerging technologies that enhance connectivity and data retrieval, these advances should help to create a powerful knowledge environment for genotype-phenotype information.
Collapse
|
41
|
Resnik DB. Direct-to-consumer genomics, social networking, and confidentiality. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2009; 9:45-6. [PMID: 19998113 PMCID: PMC2792564 DOI: 10.1080/15265160902893924] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Affiliation(s)
- David B Resnik
- National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA.
| |
Collapse
|
42
|
Wolf LE, Catania JA, Dolcini MM, Pollack LM, Lo B. IRB Chairs' Perspectives on Genomics Research Involving Stored Biological Materials: Ethical Concerns and Proposed Solutions. J Empir Res Hum Res Ethics 2008; 3:99-111. [DOI: 10.1525/jer.2008.3.4.99] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
We evaluated 55 irb chairs' perspectives on ethical issues in a hypothetical study involving mental health–related genomics research using stored specimens to identify potential barriers and solutions to such research. Most chairs identified the ethical issues of consent and confidentially as important. The majority of Chairs expressed concern about using materials in new research, especially concerning a mental health condition, that was not discussed in the original consent. Few Chairs considered permissible strategies, such as de-identification and waiver of consent, which could allow the proposed research to go forward without consent. Chairs who reviewed more protocols and had attended conferences on human subjects protection identified more of the salient ethical issues in the scenario. Our study could not determine whether Chairs were not familiar with the strategies of de-identification and waiver of consent, or believed that they did not adequately protect participants who had provided specimens for research. Thus, our findings suggest that investigators and IRBs should consider future use of specimens and obtain appropriate consent before collection of specimens. Furthermore, our findings suggest that IRBs can improve review of genomics research involving stored specimens by redesigning forms to prompt IRB members to consider some strategies, such as de-identification and Certificates of Confidentiality, that are recommended for this type of research and by sending members to conferences on human subjects protections and research ethics.
Collapse
|