1
|
Abstract
Abstract
Genome-Wide Association Studies (GWAS) identify the genomic variations that are statistically associated with a particular phenotype (e.g., a disease). The confidence in GWAS results increases with the number of genomes analyzed, which encourages federated computations where biocenters would periodically share the genomes they have sequenced. However, for economical and legal reasons, this collaboration will only happen if biocenters cannot learn each others’ data. In addition, GWAS releases should not jeopardize the privacy of the individuals whose genomes are used. We introduce DyPS, a novel framework to conduct dynamic privacy-preserving federated GWAS. DyPS leverages a Trusted Execution Environment to secure dynamic GWAS computations. Moreover, DyPS uses a scaling mechanism to speed up the releases of GWAS results according to the evolving number of genomes used in the study, even if individuals retract their participation consent. Lastly, DyPS also tolerates up to all-but-one colluding biocenters without privacy leaks. We implemented and extensively evaluated DyPS through several scenarios involving more than 6 million simulated genomes and up to 35,000 real genomes. Our evaluation shows that DyPS updates test statistics with a reasonable additional request processing delay (11% longer) compared to an approach that would update them with minimal delay but would lead to 8% of the genomes not being protected. In addition, DyPS can result in the same amount of aggregate statistics as a static release (i.e., at the end of the study), but can produce up to 2.6 times more statistics information during earlier dynamic releases. Besides, we show that DyPS can support a larger number of genomes and SNP positions without any significant performance penalty.
Collapse
|
2
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2019; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
3
|
Coady SA, Mensah GA, Wagner EL, Goldfarb ME, Hitchcock DM, Giffen CA. Use of the National Heart, Lung, and Blood Institute Data Repository. N Engl J Med 2017; 376:1849-1858. [PMID: 28402243 PMCID: PMC5665376 DOI: 10.1056/nejmsa1603542] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Research on data sharing from clinical trials has focused on elucidating perceptions, barriers, and attitudes among trialists and study participants with respect to sharing data. However, little information exists regarding utilization or associated publication of articles once clinical trial data have been widely shared. METHODS We analyzed administrative records of investigator requests for data access, linked publications, and bibliometrics to describe the use of the National Heart, Lung, and Blood Institute data repository. RESULTS From January 2000 through May 2016, a total of 370 investigators requested data from 1 or more clinical trials. Requests for trial data have been increasing, with 195 investigators (53%) initiating requests during the last 4.4 years of the study period. The predominant reason for requesting data was post hoc secondary analysis of new questions (72%), followed by analytic or statistical approaches to clinical trials (9%) and meta-analyses or pooled study research (7%). Of 172 requests with online project descriptions, only 2 requests were initiated for reanalysis of primary-outcome findings. Data from 88 of 100 available clinical trials were requested at least once, and the median time from repository availability to first request was 235 days. A total of 277 articles were published on the basis of data from 47 trials. Citation metrics from 224 articles indicated that half of the publications have cumulative citations that rank in the top 34% normalized for subject category and year of publication. CONCLUSIONS Demand for trial data for secondary analysis has been increasing. Requesting data for the a priori purpose of reanalysis or verification of original findings was rare.
Collapse
Affiliation(s)
- Sean A Coady
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| | - George A Mensah
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| | - Elizabeth L Wagner
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| | - Miriam E Goldfarb
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| | - Denise M Hitchcock
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| | - Carol A Giffen
- From the Division of Cardiovascular Sciences (S.A.C.), the Center for Translation Research and Implementation Science (G.A.M.), and the Division of Blood Diseases and Resources (E.L.W.), National Heart, Lung, and Blood Institute, Bethesda, and Information Management Services, Calverton (M.E.G., D.M.H., C.A.G.) - both in Maryland
| |
Collapse
|
4
|
dbAARD & AGP: A computational pipeline for the prediction of genes associated with age related disorders. J Biomed Inform 2016; 60:153-61. [PMID: 26836976 DOI: 10.1016/j.jbi.2016.01.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 11/11/2015] [Accepted: 01/12/2016] [Indexed: 01/01/2023]
Abstract
The atrocious behavioral and physiological shift with aging accelerate occurrence of deleterious disorders. Contemporary research is focused at uncovering the role of genetic associations in age-related disorders (ARDs). While the completion of the Human Genome Project and the HapMap project has generated huge amount of data on genetic variations; Genome-Wide Association Studies (GWAS) have identified genetic variations, essentially SNPs associated with several disorders including ARDs. However, a repository that houses all such ARD associations is lacking. The present work is aimed at filling this void. A database, dbAARD (database of Aging and Age Related Disorders) has been developed which hosts information on more than 3000 genetic variations significantly (p-value <0.05) associated with 51 ARDs. Furthermore, a machine learning based gene prediction tool AGP (Age Related Disorders Gene Prediction) has been constructed by employing rotation forest algorithm, to prioritize genes associated with ARDs. The tool achieved an overall accuracy in terms of precision 75%, recall 76%, F-measure 76% and AUC 0.85. Both the web resources have been made available online at http://genomeinformatics.dce.edu/dbAARD/ and http://genomeinformatics.dce.edu/AGP/ respectively for easy retrieval and usage by the scientific community. We believe that this work may facilitate the analysis of plethora of variants associated with ARDs and provide cues for deciphering the biology of aging.
Collapse
|
5
|
Shabani M, Dyke SOM, Joly Y, Borry P. Controlled Access under Review: Improving the Governance of Genomic Data Access. PLoS Biol 2015; 13:e1002339. [PMID: 26720729 PMCID: PMC4697814 DOI: 10.1371/journal.pbio.1002339] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
In parallel with massive genomic data production, data sharing practices have rapidly expanded over the last decade. To ensure authorized access to data, access review by data access committees (DACs) has been utilized as one potential solution. Here we discuss core elements to be integrated into the fabric of access review by both established and emerging DACs in order to foster fair, efficient, and responsible access to datasets. We particularly highlight the fact that the access review process could be adversely influenced by the potential conflicts of interest of data producers, particularly when they are directly involved in DACs management. Therefore, in structuring DACs and access procedures, possible data withholding by data producers should receive thorough attention.
Collapse
Affiliation(s)
- Mahsa Shabani
- Centre for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium
- * E-mail:
| | - Stephanie O. M. Dyke
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Yann Joly
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Pascal Borry
- Centre for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium
| |
Collapse
|
6
|
Simmons S, Berger B. One Size Doesn't Fit All: Measuring Individual Privacy in Aggregate Genomic Data. PROCEEDINGS. IEEE SYMPOSIUM ON SECURITY AND PRIVACY. WORKSHOPS 2015; 2015:41-49. [PMID: 29202050 DOI: 10.1109/spw.2015.25] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Even in the aggregate, genomic data can reveal sensitive information about individuals. We present a new model-based measure, PrivMAF, that provides provable privacy guarantees for aggregate data (namely minor allele frequencies) obtained from genomic studies. Unlike many previous measures that have been designed to measure the total privacy lost by all participants in a study, PrivMAF gives an individual privacy measure for each participant in the study, not just an average measure. These individual measures can then be combined to measure the worst case privacy loss in the study. Our measure also allows us to quantify the privacy gains achieved by perturbing the data, either by adding noise or binning. Our findings demonstrate that both perturbation approaches offer significant privacy gains. Moreover, we see that these privacy gains can be achieved while minimizing perturbation (and thus maximizing the utility) relative to stricter notions of privacy, such as differential privacy. We test PrivMAF using genotype data from the Wellcome Trust Case Control Consortium, providing a more nuanced understanding of the privacy risks involved in an actual genome-wide association studies. Interestingly, our analysis demonstrates that the privacy implications of releasing MAFs from a study can differ greatly from individual to individual. An implementation of our method is available at http://privmaf.csail.mit.edu.
Collapse
Affiliation(s)
- Sean Simmons
- Department of Mathematics and CSAIL, Massachusetts Institute of Technology
| | - Bonnie Berger
- Department of Mathematics and CSAIL, Massachusetts Institute of Technology
| |
Collapse
|
7
|
Satterthwaite TD, Connolly JJ, Ruparel K, Calkins ME, Jackson C, Elliott MA, Roalf DR, Hopson R, Prabhakaran K, Behr M, Qiu H, Mentch FD, Chiavacci R, Sleiman PMA, Gur RC, Hakonarson H, Gur RE. The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth. Neuroimage 2015; 124:1115-1119. [PMID: 25840117 DOI: 10.1016/j.neuroimage.2015.03.056] [Citation(s) in RCA: 223] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 03/16/2015] [Accepted: 03/16/2015] [Indexed: 01/31/2023] Open
Abstract
The Philadelphia Neurodevelopmental Cohort (PNC) is a large-scale study of child development that combines neuroimaging, diverse clinical and cognitive phenotypes, and genomics. Data from this rich resource is now publicly available through the Database of Genotypes and Phenotypes (dbGaP). Here we focus on the data from the PNC that is available through dbGaP and describe how users can access this data, which is evolving to be a significant resource for the broader neuroscience community for studies of normal and abnormal neurodevelopment.
Collapse
Affiliation(s)
- Theodore D Satterthwaite
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - John J Connolly
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kosha Ruparel
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Monica E Calkins
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chad Jackson
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Mark A Elliott
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David R Roalf
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ryan Hopson
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Karthik Prabhakaran
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Meckenzie Behr
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Haijun Qiu
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Frank D Mentch
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Rosetta Chiavacci
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Patrick M A Sleiman
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ruben C Gur
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Philadelphia Veterans Administration Medical Center, Philadelphia, PA 19104, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Raquel E Gur
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
James R, Tsosie R, Sahota P, Parker M, Dillard D, Sylvester I, Lewis J, Klejka J, Muzquiz L, Olsen P, Whitener R, Burke W. Exploring pathways to trust: a tribal perspective on data sharing. Genet Med 2014; 16:820-6. [PMID: 24830328 DOI: 10.1038/gim.2014.47] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 04/07/2014] [Indexed: 11/09/2022] Open
Abstract
The data-sharing policies of the National Institutes of Health aim to maximize public benefit derived from genetic studies by increasing research efficiency and use of a pooled data resource for future studies. Although broad access to data may lead to benefits for populations underrepresented in genetic studies, such as indigenous groups, tribes have ownership interest in their data. The Northwest-Alaska Pharmacogenetic Research Network, a partnership involving tribal organizations and universities conducting basic and translational pharmacogenetic research, convened a meeting to discuss the collection, management, and secondary use of research data, and of the processes surrounding access to data stored in federal repositories. This article reports the tribal perspectives that emerged from the dialogue and discusses the implications of tribal government sovereign status on research agreements and data-sharing negotiations. There is strong tribal support for efficient research processes that expedite the benefits from collaborative research, but there is also a need for data-sharing procedures that take into account tribal sovereignty and appropriate oversight of research--such as tribally based research review processes and review of draft manuscripts. We also note specific ways in which accountability could be encouraged by the National Institutes of Health as part of the research process.
Collapse
Affiliation(s)
- Rosalina James
- Department of Bioethics and Humanities, University of Washington, Seattle, Washington, USA
| | - Rebecca Tsosie
- Indian Legal Program, Sandra Day O'Conner College of Law, Arizona State University, Tempe, Arizona, USA
| | - Puneet Sahota
- Department of Psychiatry, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Myra Parker
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington, USA
| | | | | | - John Lewis
- Inter Tribal Council of Arizona, Phoenix, Arizona, USA
| | - Joseph Klejka
- Yukon-Kuskokwim Health Corporation, Bethel, Alaska, USA
| | - LeeAnna Muzquiz
- Confederated Salish and Kootenai Tribes, Pablo, Montana, USA
| | - Polly Olsen
- Indigenous Wellness Research Institute, School of Social Work, University of Washington, Seattle, Washington, USA
| | - Ron Whitener
- Native American Law Center, University of Washington, Seattle, Washington, USA
| | - Wylie Burke
- Department of Bioethics and Humanities, University of Washington, Seattle, Washington, USA
| | | |
Collapse
|
9
|
Cimino JJ, Ayres EJ, Remennik L, Rath S, Freedman R, Beri A, Chen Y, Huser V. The National Institutes of Health's Biomedical Translational Research Information System (BTRIS): design, contents, functionality and experience to date. J Biomed Inform 2013; 52:11-27. [PMID: 24262893 DOI: 10.1016/j.jbi.2013.11.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 09/29/2013] [Accepted: 11/03/2013] [Indexed: 11/24/2022]
Abstract
The US National Institutes of Health (NIH) has developed the Biomedical Translational Research Information System (BTRIS) to support researchers' access to translational and clinical data. BTRIS includes a data repository, a set of programs for loading data from NIH electronic health records and research data management systems, an ontology for coding the disparate data with a single terminology, and a set of user interface tools that provide access to identified data from individual research studies and data across all studies from which individually identifiable data have been removed. This paper reports on unique design elements of the system, progress to date and user experience after five years of development and operation.
Collapse
Affiliation(s)
- James J Cimino
- Laboratory for Informatics Development, NIH Clinical Center, Bethesda, MD, United States.
| | - Elaine J Ayres
- Laboratory for Informatics Development, NIH Clinical Center, Bethesda, MD, United States
| | - Lyubov Remennik
- Laboratory for Informatics Development, NIH Clinical Center, Bethesda, MD, United States
| | - Sachi Rath
- Computer Sciences Corporation, Falls Church, VA, United States
| | - Robert Freedman
- Computer Sciences Corporation, Falls Church, VA, United States
| | - Andrea Beri
- Computer Sciences Corporation, Falls Church, VA, United States
| | - Yang Chen
- Computer Sciences Corporation, Falls Church, VA, United States
| | - Vojtech Huser
- Laboratory for Informatics Development, NIH Clinical Center, Bethesda, MD, United States
| |
Collapse
|
10
|
Mennes M, Biswal B, Castellanos FX, Milham MP. Making data sharing work: the FCP/INDI experience. Neuroimage 2013; 82:683-91. [PMID: 23123682 PMCID: PMC3959872 DOI: 10.1016/j.neuroimage.2012.10.064] [Citation(s) in RCA: 172] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 10/22/2012] [Indexed: 11/26/2022] Open
Abstract
Over a decade ago, the fMRI Data Center (fMRIDC) pioneered open-access data sharing in the task-based functional neuroimaging community. Well ahead of its time, the fMRIDC effort encountered logistical, sociocultural and funding barriers that impeded the field-wise instantiation of open-access data sharing. In 2009, ambitions for open-access data sharing were revived in the resting state functional MRI community in the form of two grassroots initiatives: the 1000 Functional Connectomes Project (FCP) and its successor, the International Neuroimaging Datasharing Initiative (INDI). Beyond providing open access to thousands of clinical and non-clinical imaging datasets, the FCP and INDI have demonstrated the feasibility of large-scale data aggregation for hypothesis generation and testing. Yet, the success of the FCP and INDI should not be confused with widespread embracement of open-access data sharing. Reminiscent of the challenges faced by fMRIDC, key controversies persist and include participant privacy, the role of informatics, and the logistical and cultural challenges of establishing an open science ethos. We discuss the FCP and INDI in the context of these challenges, highlighting the promise of current initiatives and suggesting solutions for possible pitfalls.
Collapse
Affiliation(s)
- Maarten Mennes
- Donders Institute for Brain, Cognition and Behaviour, Dept. of Cognitive Neuroscience, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands
- Phyllis Green and Randolph Cowen Institute for Pediatric Neuroscience, NYU Child Study Center, New York, NY, USA
| | - Bharat Biswal
- Department of Radiology, University of Medicine & Dentistry in New Jersey, Newark, NJ, USA
- Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
| | - F. Xavier Castellanos
- Phyllis Green and Randolph Cowen Institute for Pediatric Neuroscience, NYU Child Study Center, New York, NY, USA
- Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
| | - Michael P. Milham
- Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
- Center for the Developing Brain, Child Mind Institute, New York, NY, USA
| |
Collapse
|
11
|
Doan S, Lin KW, Conway M, Ohno-Machado L, Hsieh A, Feupe SF, Garland A, Ross MK, Jiang X, Farzaneh S, Walker R, Alipanah N, Zhang J, Xu H, Kim HE. PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. J Am Med Inform Assoc 2013; 21:31-6. [PMID: 23989082 PMCID: PMC3912702 DOI: 10.1136/amiajnl-2013-001882] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net.
Collapse
Affiliation(s)
- Son Doan
- Division of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
McEwen JE, Boyer JT, Sun KY. Evolving approaches to the ethical management of genomic data. Trends Genet 2013; 29:375-82. [PMID: 23453621 PMCID: PMC3665610 DOI: 10.1016/j.tig.2013.02.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 01/22/2013] [Accepted: 02/05/2013] [Indexed: 10/27/2022]
Abstract
The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science.
Collapse
Affiliation(s)
- Jean E McEwen
- Ethical, Legal, and Social Implications Program, Division of Genomics and Society, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-9305, USA.
| | | | | |
Collapse
|
13
|
Ramos EM, Din-Lovinescu C, Bookman EB, McNeil LJ, Baker CC, Godynskiy G, Harris EL, Lehner T, McKeon C, Moss J, Starks VL, Sherry ST, Manolio TA, Rodriguez LL. A mechanism for controlled access to GWAS data: experience of the GAIN Data Access Committee. Am J Hum Genet 2013; 92:479-88. [PMID: 23561843 PMCID: PMC3617375 DOI: 10.1016/j.ajhg.2012.08.034] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2012] [Revised: 08/06/2012] [Accepted: 08/06/2012] [Indexed: 11/29/2022] Open
Abstract
The Genetic Association Information Network (GAIN) Data Access Committee was established in June 2007 to provide prompt and fair access to data from six genome-wide association studies through the database of Genotypes and Phenotypes (dbGaP). Of 945 project requests received through 2011, 749 (79%) have been approved; median receipt-to-approval time decreased from 14 days in 2007 to 8 days in 2011. Over half (54%) of the proposed research uses were for GAIN-specific phenotypes; other uses were for method development (26%) and adding controls to other studies (17%). Eight data-management incidents, defined as compromises of any of the data-use conditions, occurred among nine approved users; most were procedural violations, and none violated participant confidentiality. Over 5 years of experience with GAIN data access has demonstrated substantial use of GAIN data by investigators from academic, nonprofit, and for-profit institutions with relatively few and contained policy violations. The availability of GAIN data has allowed for advances in both the understanding of the genetic underpinnings of mental-health disorders, diabetes, and psoriasis and the development and refinement of statistical methods for identifying genetic and environmental factors related to complex common diseases.
Collapse
Affiliation(s)
- Erin M Ramos
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Enabling genomic-phenomic association discovery without sacrificing anonymity. PLoS One 2013; 8:e53875. [PMID: 23405076 PMCID: PMC3566194 DOI: 10.1371/journal.pone.0053875] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 12/03/2012] [Indexed: 01/08/2023] Open
Abstract
Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data.
Collapse
|
15
|
Contreras JL, Floratos A, Holden AL. The International Serious Adverse Events Consortium's data sharing model. Nat Biotechnol 2013; 31:17-9. [DOI: 10.1038/nbt.2470] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|