701
|
Wang Y, Chen R, Ghosh J, Denny JC, Kho A, Chen Y, Malin BA, Sun J. Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2015; 2015:1265-1274. [PMID: 31452969 DOI: 10.1145/2783258.2783395] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Computational phenotyping is the process of converting heterogeneous electronic health records (EHRs) into meaningful clinical concepts. Unsupervised phenotyping methods have the potential to leverage a vast amount of labeled EHR data for phenotype discovery. However, existing unsupervised phenotyping methods do not incorporate current medical knowledge and cannot directly handle missing, or noisy data. We propose Rubik, a constrained non-negative tensor factorization and completion method for phenotyping. Rubik incorporates 1) guidance constraints to align with existing medical knowledge, and 2) pairwise constraints for obtaining distinct, non-overlapping phenotypes. Rubik also has built-in tensor completion that can significantly alleviate the impact of noisy and missing data. We utilize the Alternating Direction Method of Multipliers (ADMM) framework to tensor factorization and completion, which can be easily scaled through parallel computing. We evaluate Rubik on two EHR datasets, one of which contains 647,118 records for 7,744 patients from an outpatient clinic, the other of which is a public dataset containing 1,018,614 CMS claims records for 472,645 patients. Our results show that Rubik can discover more meaningful and distinct phenotypes than the baselines. In particular, by using knowledge guidance constraints, Rubik can also discover sub-phenotypes for several major diseases. Rubik also runs around seven times faster than current state-of-the-art tensor methods. Finally, Rubik is scalable to large datasets containing millions of EHR records.
Collapse
|
702
|
Pendergrass SA, Verma A, Okula A, Hall MA, Crawford DC, Ritchie MD. Phenome-Wide Association Studies: Embracing Complexity for Discovery. Hum Hered 2015. [PMID: 26201697 DOI: 10.1159/000381851] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The inherent complexity of biological systems can be leveraged for a greater understanding of the impact of genetic architecture on outcomes, traits, and pharmacological response. The genome-wide association study (GWAS) approach has well-developed methods and relatively straight-forward methodologies; however, the bigger picture of the impact of genetic architecture on phenotypic outcome still remains to be elucidated even with an ever-growing number of GWAS performed. Greater consideration of the complexity of biological processes, using more data from the phenome, exposome, and diverse -omic resources, including considering the interplay of pleiotropy and genetic interactions, may provide additional leverage for making the most of the incredible wealth of information available for study. Here, we describe how incorporating greater complexity into analyses through the use of additional phenotypic data and widespread deployment of phenome-wide association studies may provide new insights into genetic factors influencing diseases, traits, and pharmacological response.
Collapse
Affiliation(s)
- Sarah A Pendergrass
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, Pa., USA
| | | | | | | | | | | |
Collapse
|
703
|
Griswold AJ, Dueker ND, Van Booven D, Rantus JA, Jaworski JM, Slifer SH, Schmidt MA, Hulme W, Konidari I, Whitehead PL, Cuccaro ML, Martin ER, Haines JL, Gilbert JR, Hussman JP, Pericak-Vance MA. Targeted massively parallel sequencing of autism spectrum disorder-associated genes in a case control cohort reveals rare loss-of-function risk variants. Mol Autism 2015; 6:43. [PMID: 26185613 PMCID: PMC4504419 DOI: 10.1186/s13229-015-0034-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/16/2015] [Indexed: 12/31/2022] Open
Abstract
Background Autism spectrum disorder (ASD) is highly heritable, yet genome-wide association studies (GWAS), copy number variation screens, and candidate gene association studies have found no single factor accounting for a large percentage of genetic risk. ASD trio exome sequencing studies have revealed genes with recurrent de novo loss-of-function variants as strong risk factors, but there are relatively few recurrently affected genes while as many as 1000 genes are predicted to play a role. As such, it is critical to identify the remaining rare and low-frequency variants contributing to ASD. Methods We have utilized an approach of prioritization of genes by GWAS and follow-up with massively parallel sequencing in a case-control cohort. Using a previously reported ASD noise reduction GWAS analyses, we prioritized 837 RefSeq genes for custom targeting and sequencing. We sequenced the coding regions of those genes in 2071 ASD cases and 904 controls of European white ancestry. We applied comprehensive annotation to identify single variants which could confer ASD risk and also gene-based association analysis to identify sets of rare variants associated with ASD. Results We identified a significant over-representation of rare loss-of-function variants in genes previously associated with ASD, including a de novo premature stop variant in the well-established ASD candidate gene RBFOX1. Furthermore, ASD cases were more likely to have two damaging missense variants in candidate genes than controls. Finally, gene-based rare variant association implicates genes functioning in excitatory neurotransmission and neurite outgrowth and guidance pathways including CACNAD2, KCNH7, and NRXN1. Conclusions We find suggestive evidence that rare variants in synaptic genes are associated with ASD and that loss-of-function mutations in ASD candidate genes are a major risk factor, and we implicate damaging mutations in glutamate signaling receptors and neuronal adhesion and guidance molecules. Furthermore, the role of de novo mutations in ASD remains to be fully investigated as we identified the first reported protein-truncating variant in RBFOX1 in ASD. Overall, this work, combined with others in the field, suggests a convergence of genes and molecular pathways underlying ASD etiology. Electronic supplementary material The online version of this article (doi:10.1186/s13229-015-0034-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Anthony J Griswold
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Nicole D Dueker
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Derek Van Booven
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Joseph A Rantus
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - James M Jaworski
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Susan H Slifer
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Michael A Schmidt
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - William Hulme
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Ioanna Konidari
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Patrice L Whitehead
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Michael L Cuccaro
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ; Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ; Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106 USA
| | - John R Gilbert
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ; Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | | | - Margaret A Pericak-Vance
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA ; Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| |
Collapse
|
704
|
Groza T, Köhler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, Schriml LM, Kibbe WA, Schofield PN, Beck T, Vasant D, Brookes AJ, Zankl A, Washington NL, Mungall CJ, Lewis SE, Haendel MA, Parkinson H, Robinson PN. The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Am J Hum Genet 2015; 97:111-24. [PMID: 26119816 PMCID: PMC4572507 DOI: 10.1016/j.ajhg.2015.05.020] [Citation(s) in RCA: 157] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 05/22/2015] [Indexed: 12/24/2022] Open
Abstract
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Collapse
Affiliation(s)
- Tudor Groza
- School of Information Technology and Electrical Engineering, University of Queensland, St. Lucia, QLD 4072, Australia; Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Dawid Moldenhauer
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; University of Applied Sciences, Wiesenstrasse 14, 35390 Giessen, Germany
| | - Nicole Vasilevsky
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Gareth Baynam
- School of Paediatrics and Child Health, University of Western Australia, Perth, WA 6840, Australia; Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA 6150, Australia; Office of Population Health Genomics, Public Health and Clinical Services Division, Department of Health, Perth, WA 6004, Australia; Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, WA 6008, Australia; Telethon Kids Institute, Perth, WA 6008, Australia
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznań, Poland
| | - Lynn Marie Schriml
- Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, MD 21201, USA; Institute for Genome Sciences, School of Medicine, University of Maryland, Baltimore, MD 21201, USA
| | - Warren Alden Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK; The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Tim Beck
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Drashtti Vasant
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Anthony J Brookes
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Andreas Zankl
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; Academic Department of Medical Genetics, The Children's Hospital at Westmead, Sydney, NSW 2145, Australia; Discipline of Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, NSW 2145, Australia
| | - Nicole L Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany.
| |
Collapse
|
705
|
Syed-Abdul S, Moldovan M, Nguyen PA, Enikeev R, Jian WS, Iqbal U, Hsu MH, Li YC. Profiling phenome-wide associations: a population-based observational study. J Am Med Inform Assoc 2015; 22:896-9. [PMID: 25656518 PMCID: PMC11737641 DOI: 10.1093/jamia/ocu019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 10/23/2014] [Accepted: 11/02/2014] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVES To objectively characterize phenome-wide associations observed in the entire Taiwanese population and represent them in a meaningful, interpretable way. STUDY DESIGN In this population-based observational study, we analyzed 782 million outpatient visits and 15 394 unique phenotypes that were observed in the entire Taiwanese population of over 22 million individuals. Our data was obtained from Taiwan's National Health Insurance Research Database.Results We stratified the population into 20 gender-age groups and generated 28.8 million and 31.8 million pairwise odds ratios from male and female subpopulations, respectively. These associations can be accessed online at http://associations.phr.tmu.edu.tw. To demonstrate the database and validate the association estimates obtained, we used correlation analysis to analyze 100 phenotypes that were observed to have the strongest positive association estimates with respect to essential hypertension. The results indicated that association patterns tended to have a strong positive correlation between adjacent age groups, while correlation estimates tended to decline as groups became more distant in age, and they diverged when assessed across gender groups. CONCLUSIONS The correlation analysis of pairwise disease association patterns across different age and gender groups led to outcomes that were broadly predicted before the analysis, thus confirming the validity of the information contained in the presented database. More diverse individual disease-specific analyses would lead to a better understanding of phenome-wide associations and empower physicians to provide personalized care in terms of predicting, preventing, or initiating an early management of concomitant diseases.
Collapse
Affiliation(s)
- Shabbir Syed-Abdul
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Max Moldovan
- Centre for Clinical Governance Research, Australian Institute of Health Innovation, Faculty of Medicine, University of New South Wales, Sydney, Australia School of Population Health, Sansom Institute for Health Research, University of South Australia, South Australian Health & Medical Research Institute (SAHMRI)
| | - Phung-Anh Nguyen
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | | | - Wen-Shan Jian
- School of Health Care Administration, Taipei Medical University, Taipei, Taiwan
| | - Usman Iqbal
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Min-Huei Hsu
- Bureau of International Cooperation, Department of Health, Taipei, Taiwan
| | - Yu-Chuan Li
- Graduate Institute of Biomedical Informatics, College of Medicine Science and Technology; Department of Dermatology, Wan Fang Hospital, Taiwan. Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
706
|
Alnazzawi N, Thompson P, Batista-Navarro R, Ananiadou S. Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Med Inform Decis Mak 2015; 15 Suppl 2:S3. [PMID: 26099853 PMCID: PMC4474585 DOI: 10.1186/1472-6947-15-s2-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text. Methods To stimulate the development of TM systems that are able to extract phenotypic information from text, we have created a new corpus (PhenoCHF) that is annotated by domain experts with several types of phenotypic information relating to congestive heart failure. To ensure that systems developed using the corpus are robust to multiple text types, it integrates text from heterogeneous sources, i.e., electronic health records (EHRs) and scientific articles from the literature. We have developed several different phenotype extraction methods to demonstrate the utility of the corpus, and tested these methods on a further corpus, i.e., ShARe/CLEF 2013. Results Evaluation of our automated methods showed that PhenoCHF can facilitate the training of reliable phenotype extraction systems, which are robust to variations in text type. These results have been reinforced by evaluating our trained systems on the ShARe/CLEF corpus, which contains clinical records of various types. Like other studies within the biomedical domain, we found that solutions based on conditional random fields produced the best results, when coupled with a rich feature set. Conclusions PhenoCHF is the first annotated corpus aimed at encoding detailed phenotypic information. The unique heterogeneous composition of the corpus has been shown to be advantageous in the training of systems that can accurately extract phenotypic information from a range of different text types. Although the scope of our annotation is currently limited to a single disease, the promising results achieved can stimulate further work into the extraction of phenotypic information for other diseases. The PhenoCHF annotation guidelines and annotations are publicly available at https://code.google.com/p/phenochf-corpus.
Collapse
|
707
|
Aggarwal S, Gheware A, Agrawal A, Ghosh S, Prasher B, Mukerji M. Combined genetic effects of EGLN1 and VWF modulate thrombotic outcome in hypoxia revealed by Ayurgenomics approach. J Transl Med 2015; 13:184. [PMID: 26047609 PMCID: PMC4457985 DOI: 10.1186/s12967-015-0542-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 05/18/2015] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Extreme constitution "Prakriti" types of Ayurveda exhibit systemic physiological attributes. Our earlier genetic study has revealed differences in EGLN1, key modulator of hypoxia axis between Prakriti types. This was associated with differences in high altitude adaptation and susceptibility to high altitude pulmonary edema (HAPE). In this study we investigate other molecular differences that contribute to systemic attributes of Prakriti that would be relevant in predictive marker discovery. METHODS Genotyping of 96 individuals of the earlier cohort was carried out in a panel of 2,800 common genic SNPs represented in Indian Genomic Variation Consortium (IGVC) panel from 24 diverse populations. Frequency distribution patterns of Prakriti differentiating variations (FDR correction P < 0.05) was studied in IGVC and 55 global populations (HGDP-CEPH) panels. Genotypic interactions between VWF, identified from the present analysis, and EGLN1 was analyzed using multinomial logistic regression in Prakriti and Indian populations from contrasting altitudes. Spearman's Rank correlation was used to study this genotypic interaction with respect to altitude in HGDP-CEPH panel. Validation of functional link between EGLN1 and VWF was carried out in a mouse model using chemical inhibition and siRNA studies. RESULT Significant differences in allele frequencies were observed in seven genes (SPTA1, VWF, OLR1, UCP2, OR6K3, LEPR, and OR10Z1) after FDR correction (P < 0.05). A non synonymous variation (C/T, rs1063856) associated with thrombosis/bleeding susceptibility respectively, differed significantly between Kapha (C-allele) and Pitta (T-allele) constitution types. A combination of derived EGLN1 allele (HAPE associated) and ancestral VWF allele (thrombosis associated) was significantly high in Kapha group compared to Pitta (p < 10(-5)). The combination of risk-associated Kapha alleles was nearly absent in natives of high altitude. Inhibition of EGLN1 using (DHB) and an EGLN1 specific siRNA in a mouse model lead to a marked increase in vWF levels as well as pro-thrombotic phenotype viz. reduced bleeding time and enhanced platelet count and activation. CONCLUSION We demonstrate for the first time a genetic link between EGLN1 and VWF in a constitution specific manner which could modulate thrombosis/bleeding susceptibility and outcomes of hypoxia. Integration of Prakriti in population stratification may help assemble common variations in key physiological axes that confers differences in disease occurrence and patho-phenotypic outcomes.
Collapse
Affiliation(s)
- Shilpi Aggarwal
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, India.
| | - Atish Gheware
- CSIR's Ayurgenomics Unit-TRISUTRA (Translational Research and Innovative Science ThRough Ayurgenomics), CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, 110 020, India. .,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India.
| | - Anurag Agrawal
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, India. .,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India.
| | | | - Bhavana Prasher
- CSIR's Ayurgenomics Unit-TRISUTRA (Translational Research and Innovative Science ThRough Ayurgenomics), CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, 110 020, India. .,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India.
| | - Mitali Mukerji
- Genomics and Molecular Medicine, CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, India. .,CSIR's Ayurgenomics Unit-TRISUTRA (Translational Research and Innovative Science ThRough Ayurgenomics), CSIR-Institute of Genomics and Integrative Biology, Sukhdev Vihar, Mathura Road, New Delhi, 110 020, India. .,Academy of Scientific and Innovative Research (AcSIR), New Delhi, India.
| | | |
Collapse
|
708
|
Seok J, Seon Kang Y. Mutual Information between Discrete Variables with Many Categories using Recursive Adaptive Partitioning. Sci Rep 2015; 5:10981. [PMID: 26046461 PMCID: PMC4456943 DOI: 10.1038/srep10981] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 05/11/2015] [Indexed: 11/09/2022] Open
Abstract
Mutual information, a general measure of the relatedness between two random variables, has been actively used in the analysis of biomedical data. The mutual information between two discrete variables is conventionally calculated by their joint probabilities estimated from the frequency of observed samples in each combination of variable categories. However, this conventional approach is no longer efficient for discrete variables with many categories, which can be easily found in large-scale biomedical data such as diagnosis codes, drug compounds, and genotypes. Here, we propose a method to provide stable estimations for the mutual information between discrete variables with many categories. Simulation studies showed that the proposed method reduced the estimation errors by 45 folds and improved the correlation coefficients with true values by 99 folds, compared with the conventional calculation of mutual information. The proposed method was also demonstrated through a case study for diagnostic data in electronic health records. This method is expected to be useful in the analysis of various biomedical data with discrete variables.
Collapse
Affiliation(s)
- Junhee Seok
- School of Electrical Engineering, Korea University, Seoul, South Korea
| | - Yeong Seon Kang
- Department of Business Administration, University of Seoul, Seoul, South Korea
| |
Collapse
|
709
|
Boland MR, Shahn Z, Madigan D, Hripcsak G, Tatonetti NP. Birth month affects lifetime disease risk: a phenome-wide method. J Am Med Inform Assoc 2015; 22:1042-53. [PMID: 26041386 PMCID: PMC4986668 DOI: 10.1093/jamia/ocv046] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 04/18/2015] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE An individual's birth month has a significant impact on the diseases they develop during their lifetime. Previous studies reveal relationships between birth month and several diseases including atherothrombosis, asthma, attention deficit hyperactivity disorder, and myopia, leaving most diseases completely unexplored. This retrospective population study systematically explores the relationship between seasonal affects at birth and lifetime disease risk for 1688 conditions. METHODS We developed a hypothesis-free method that minimizes publication and disease selection biases by systematically investigating disease-birth month patterns across all conditions. Our dataset includes 1 749 400 individuals with records at New York-Presbyterian/Columbia University Medical Center born between 1900 and 2000 inclusive. We modeled associations between birth month and 1688 diseases using logistic regression. Significance was tested using a chi-squared test with multiplicity correction. RESULTS We found 55 diseases that were significantly dependent on birth month. Of these 19 were previously reported in the literature (P < .001), 20 were for conditions with close relationships to those reported, and 16 were previously unreported. We found distinct incidence patterns across disease categories. CONCLUSIONS Lifetime disease risk is affected by birth month. Seasonally dependent early developmental mechanisms may play a role in increasing lifetime risk of disease.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI)
| | | | - David Madigan
- Observational Health Data Sciences and Informatics (OHDSI) Department of Statistics
| | - George Hripcsak
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI)
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics Observational Health Data Sciences and Informatics (OHDSI) Department of Systems Biology Department of Medicine, Columbia University, New York, NY, USA
| |
Collapse
|
710
|
Pendergrass SA, Ritchie MD. Phenome-Wide Association Studies: Leveraging Comprehensive Phenotypic and Genotypic Data for Discovery. CURRENT GENETIC MEDICINE REPORTS 2015; 3:92-100. [PMID: 26146598 PMCID: PMC4489156 DOI: 10.1007/s40142-015-0067-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
With the large volume of clinical and epidemiological data being collected, increasingly linked to extensive genotypic data, coupled with expanding high-performance computational resources, there are considerable opportunities for comprehensively exploring the networks of connections that exist between the phenome and the genome. These networks can be identified through Phenome-Wide Association Studies (PheWAS) where the association between a collection of genetic variants, or in some cases a particular clinical lab variable, and a wide and diverse range of phenotypes, diagnoses, traits, and/or outcomes are evaluated. This is a departure from the more familiar genome-wide association study (GWAS) approach, which has been used to identify single nucleotide polymorphisms (SNPs) associated with one outcome or a very limited phenotypic domain. In addition to highlighting novel connections between multiple phenotypes and elucidating more of the phenotype-genotype landscape, PheWAS can generate new hypotheses for further exploration, and can also be used to narrow the search space for research using comprehensive data collections. The complex results of PheWAS also have the potential for uncovering new mechanistic insights. We review here how the PheWAS approach has been used with data from epidemiological studies, clinical trials, and de-identified electronic health record data. We also review methodologies for the analyses underlying PheWAS, and emerging methods developed for evaluating the comprehensive results of PheWAS including genotype-phenotype networks. This review also highlights PheWAS as an important tool for identifying new biomarkers, elucidating the genetic architecture of complex traits, and uncovering pleiotropy. There are many directions and new methodologies for the future of PheWAS analyses, from the phenotypic data to the genetic data, and herein we also discuss some of these important future PheWAS developments.
Collapse
|
711
|
Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med 2015; 7:41. [PMID: 25937834 PMCID: PMC4416392 DOI: 10.1186/s13073-015-0166-y] [Citation(s) in RCA: 158] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The convergence of two rapidly developing technologies - high-throughput genotyping and electronic health records (EHRs) - gives scientists an unprecedented opportunity to utilize routine healthcare data to accelerate genomic discovery. Institutions and healthcare systems have been building EHR-linked DNA biobanks to enable such a vision. However, the precise extraction of detailed disease and drug-response phenotype information hidden in EHRs is not an easy task. EHR-based studies have successfully replicated known associations, made new discoveries for diseases and drug response traits, rapidly contributed cases and controls to large meta-analyses, and demonstrated the potential of EHRs for broad-based phenome-wide association studies. In this review, we summarize the advantages and challenges of repurposing EHR data for genetic research. We also highlight recent notable studies and novel approaches to provide an overview of advanced EHR-based phenotyping.
Collapse
|
712
|
Yu S, Liao KP, Shaw SY, Gainer VS, Churchill SE, Szolovits P, Murphy SN, Kohane IS, Cai T. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources. J Am Med Inform Assoc 2015; 22:993-1000. [PMID: 25929596 DOI: 10.1093/jamia/ocv034] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 03/24/2015] [Indexed: 01/09/2023] Open
Abstract
OBJECTIVE Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy. MATERIALS AND METHODS Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype. RESULTS The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. DISCUSSION Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable. CONCLUSION The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.
Collapse
Affiliation(s)
- Sheng Yu
- Partners HealthCare Personalized Medicine, Boston, MA, USA Brigham and Women's Hospital, Boston, MA, USA Harvard Medical School, Boston, MA, USA
| | - Katherine P Liao
- Brigham and Women's Hospital, Boston, MA, USA Harvard Medical School, Boston, MA, USA
| | | | - Vivian S Gainer
- Research Computing, Partners HealthCare, Charlestown, MA, USA
| | | | | | - Shawn N Murphy
- Massachusetts General Hospital, Boston, MA Research Computing, Partners HealthCare, Charlestown, MA, USA
| | - Isaac S Kohane
- Harvard Medical School, Boston, MA, USA Boston Children's Hospital, Boston, MA, USA
| | - Tianxi Cai
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
713
|
Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 2015; 47:569-76. [PMID: 25915600 PMCID: PMC4828725 DOI: 10.1038/ng.3259] [Citation(s) in RCA: 594] [Impact Index Per Article: 59.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/06/2015] [Indexed: 12/17/2022]
Abstract
Tissue and cell-type identity lie at the core of human physiology and disease. Understanding the genetic underpinnings of complex tissues and individual cell lineages is crucial for developing improved diagnostics and therapeutics. We present genome-wide functional interaction networks for 144 human tissues and cell types developed using a data-driven Bayesian methodology that integrates thousands of diverse experiments spanning tissue and disease states. Tissue-specific networks predict lineage-specific responses to perturbation, reveal genes’ changing functional roles across tissues, and illuminate disease-disease relationships. We introduce NetWAS, which combines genes with nominally significant GWAS p-values and tissue-specific networks to identify disease-gene associations more accurately than GWAS alone. Our webserver, GIANT, provides an interface to human tissue networks through multi-gene queries, network visualization, analysis tools including NetWAS, and downloadable networks. GIANT enables systematic exploration of the landscape of interacting genes that shape specialized cellular functions across more than one hundred human tissues and cell types.
Collapse
|
714
|
Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, Gainer VS, Shaw SY, Xia Z, Szolovits P, Churchill S, Kohane I. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015; 350:h1885. [PMID: 25911572 PMCID: PMC4707569 DOI: 10.1136/bmj.h1885] [Citation(s) in RCA: 195] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Electronic medical records are emerging as a major source of data for clinical and translational research studies, although phenotypes of interest need to be accurately defined first. This article provides an overview of how to develop a phenotype algorithm from electronic medical records, incorporating modern informatics and biostatistics methods.
Collapse
Affiliation(s)
- Katherine P Liao
- Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston
| | | | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston
| | - Elizabeth W Karlson
- Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston
| | - Ashwin N Ananthakrishnan
- Department of Gastroenterology, Massachusetts General Hospital, MGH Crohn's and Colitis Center, Boston
| | - Vivian S Gainer
- Partners Research Computing, Partners HealthCare System, Boston
| | - Stanley Y Shaw
- Harvard Medical School, Boston Center for Systems Biology, Massachusetts General Hospital, Boston
| | - Zongqi Xia
- Harvard Medical School, Boston Department of Neurology, Harvard Medical School, Boston
| | - Peter Szolovits
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
| | | | - Isaac Kohane
- Harvard Medical School, Boston Department of Neurology, Massachusetts General Hospital, Boston
| |
Collapse
|
715
|
Croen LA, Zerbo O, Qian Y, Massolo ML, Rich S, Sidney S, Kripke C. The health status of adults on the autism spectrum. AUTISM : THE INTERNATIONAL JOURNAL OF RESEARCH AND PRACTICE 2015; 19:814-23. [DOI: 10.1177/1362361315577517] [Citation(s) in RCA: 534] [Impact Index Per Article: 53.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Compared to the general pediatric population, children with autism have higher rates of co-occurring medical and psychiatric illnesses, yet very little is known about the general health status of adults with autism. The objective of this study was to describe the frequency of psychiatric and medical conditions among a large, diverse, insured population of adults with autism in the United States. Participants were adult members of Kaiser Permanente Northern California enrolled from 2008 to 2012. Autism spectrum disorder cases ( N = 1507) were adults with autism spectrum disorder diagnoses (International Classification of Diseases-9-Clinical Modification codes 299.0, 299.8, 299.9) recorded in medical records on at least two separate occasions. Controls ( N = 15,070) were adults without any autism spectrum disorder diagnoses sampled at a 10:1 ratio and frequency matched to cases on sex and age. Adults with autism had significantly increased rates of all major psychiatric disorders including depression, anxiety, bipolar disorder, obsessive–compulsive disorder, schizophrenia, and suicide attempts. Nearly all medical conditions were significantly more common in adults with autism, including immune conditions, gastrointestinal and sleep disorders, seizure, obesity, dyslipidemia, hypertension, and diabetes. Rarer conditions, such as stroke and Parkinson’s disease, were also significantly more common among adults with autism. Future research is needed to understand the social, healthcare access, and biological factors underlying these observations.
Collapse
Affiliation(s)
- Lisa A Croen
- Kaiser Permanente Northern California—Oakland, USA
| | | | - Yinge Qian
- Kaiser Permanente Northern California—Oakland, USA
| | | | - Steve Rich
- Kaiser Permanente Northern California—Santa Rosa, USA
| | | | | |
Collapse
|
716
|
Fox CS, Hall JL, Arnett DK, Ashley EA, Delles C, Engler MB, Freeman MW, Johnson JA, Lanfear DE, Liggett SB, Lusis AJ, Loscalzo J, MacRae CA, Musunuru K, Newby LK, O'Donnell CJ, Rich SS, Terzic A. Future translational applications from the contemporary genomics era: a scientific statement from the American Heart Association. Circulation 2015; 131:1715-36. [PMID: 25882488 DOI: 10.1161/cir.0000000000000211] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The field of genetics and genomics has advanced considerably with the achievement of recent milestones encompassing the identification of many loci for cardiovascular disease and variable drug responses. Despite this achievement, a gap exists in the understanding and advancement to meaningful translation that directly affects disease prevention and clinical care. The purpose of this scientific statement is to address the gap between genetic discoveries and their practical application to cardiovascular clinical care. In brief, this scientific statement assesses the current timeline for effective translation of basic discoveries to clinical advances, highlighting past successes. Current discoveries in the area of genetics and genomics are covered next, followed by future expectations, tools, and competencies for achieving the goal of improving clinical care.
Collapse
|
717
|
Diogo D, Bastarache L, Liao KP, Graham RR, Fulton RS, Greenberg JD, Eyre S, Bowes J, Cui J, Lee A, Pappas DA, Kremer JM, Barton A, Coenen MJH, Franke B, Kiemeney LA, Mariette X, Richard-Miceli C, Canhão H, Fonseca JE, de Vries N, Tak PP, Crusius JBA, Nurmohamed MT, Kurreeman F, Mikuls TR, Okada Y, Stahl EA, Larson DE, Deluca TL, O'Laughlin M, Fronick CC, Fulton LL, Kosoy R, Ransom M, Bhangale TR, Ortmann W, Cagan A, Gainer V, Karlson EW, Kohane I, Murphy SN, Martin J, Zhernakova A, Klareskog L, Padyukov L, Worthington J, Mardis ER, Seldin MF, Gregersen PK, Behrens T, Raychaudhuri S, Denny JC, Plenge RM. TYK2 protein-coding variants protect against rheumatoid arthritis and autoimmunity, with no evidence of major pleiotropic effects on non-autoimmune complex traits. PLoS One 2015; 10:e0122271. [PMID: 25849893 PMCID: PMC4388675 DOI: 10.1371/journal.pone.0122271] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 02/17/2015] [Indexed: 02/06/2023] Open
Abstract
Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3x10-21), A928V (rs35018800, OR = 0.53, P = 1.2x10-9), and I684S (rs12720356, OR = 0.86, P = 4.6x10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6x10-18), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
Collapse
Affiliation(s)
- Dorothée Diogo
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
- Partners HealthCare Center for Personalized Genetic Medicine, Boston, Massachusetts, United States of America
- * E-mail:
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Katherine P. Liao
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Robert R. Graham
- ITGR Human Genetics Group, Genentech Inc, San Francisco, California, United States of America
| | - Robert S. Fulton
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jeffrey D. Greenberg
- New York University Hospital for Joint Diseases, New York, New York, United States of America
| | - Steve Eyre
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester Academic Health Sciences Centre, Manchester, United Kingdom
| | - John Bowes
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester Academic Health Sciences Centre, Manchester, United Kingdom
| | - Jing Cui
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Annette Lee
- The Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, United States of America
| | - Dimitrios A. Pappas
- Columbia University, College of Physicians and Surgeons, New York, New York, United States of America
| | - Joel M. Kremer
- The Albany Medical College and The Center for Rheumatology, Albany, New York, United States of America
| | - Anne Barton
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester Academic Health Sciences Centre, Manchester, United Kingdom
| | - Marieke J. H. Coenen
- Radboud university medical center, Radboud Institute for Health Sciences, Department of Human Genetics, Nijmegen, The Netherlands
| | - Barbara Franke
- Radboud University Medical Center, Donders Centre for Neurosciences, Department of Psychiatry and Human Genetics, Nijmegen, The Netherlands
| | - Lambertus A. Kiemeney
- Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
| | - Xavier Mariette
- Université Paris-Sud, Orsay, France
- APHP–Hôpital Bicêtre, INSERM U1012, Le Kremlin Bicêtre, Paris, France
| | - Corrine Richard-Miceli
- Université Paris-Sud, Orsay, France
- APHP–Hôpital Bicêtre, INSERM U1012, Le Kremlin Bicêtre, Paris, France
| | - Helena Canhão
- Rheumatology Research Unit, Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisbon, Portugal
- Rheumatology Department, Santa Maria Hospital–CHLN, Lisbon, Portugal
| | - João E. Fonseca
- Rheumatology Research Unit, Instituto de Medicina Molecular, Faculdade de Medicina da Universidade de Lisboa, Lisbon, Portugal
- Rheumatology Department, Santa Maria Hospital–CHLN, Lisbon, Portugal
| | - Niek de Vries
- Amsterdam Rheumatology and Immunology Center, Department of Clinical Immunology & Rheumatology, Academic Medical Center /University of Amsterdam, Amsterdam, The Netherlands
| | - Paul P. Tak
- Amsterdam Rheumatology and Immunology Center, Department of Clinical Immunology & Rheumatology, Academic Medical Center /University of Amsterdam, Amsterdam, The Netherlands
| | - J. Bart A. Crusius
- Laboratory of Immunogenetics, Department of Medical Microbiology and Infection Control, VU University Medical Center, Amsterdam, The Netherlands
| | - Michael T. Nurmohamed
- Amsterdam Rheumatology and Immunology Center, Department of Rheumatology, Reade, Amsterdam, The Netherlands
| | - Fina Kurreeman
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Ted R. Mikuls
- Division of Rheumatology and Immunology, Omaha VA and University of Nebraska Medical Center, Omaha, Nebraska, United States of America
| | - Yukinori Okada
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
| | - Eli A. Stahl
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
| | - David E. Larson
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Tracie L. Deluca
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Michelle O'Laughlin
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Catrina C. Fronick
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Lucinda L. Fulton
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Roman Kosoy
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| | - Michael Ransom
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| | - Tushar R. Bhangale
- ITGR Human Genetics Group, Genentech Inc, San Francisco, California, United States of America
| | - Ward Ortmann
- ITGR Human Genetics Group, Genentech Inc, San Francisco, California, United States of America
| | - Andrew Cagan
- Information Systems, Partners Healthcare, Charlestown, Massachusetts, United States of America
| | - Vivian Gainer
- Information Systems, Partners Healthcare, Charlestown, Massachusetts, United States of America
| | - Elizabeth W. Karlson
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Isaac Kohane
- Information Systems, Partners Healthcare, Charlestown, Massachusetts, United States of America
| | - Shawn N. Murphy
- Information Systems, Partners Healthcare, Charlestown, Massachusetts, United States of America
| | - Javier Martin
- Instituto de Parasitologia y Biomedicina Lopez-Neyra, CSIC, Granada, 18100, Spain
| | - Alexandra Zhernakova
- Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
- Genetics Department, University Medical Center and Groningen University, Groningen, The Netherlands
| | - Lars Klareskog
- Rheumatology Unit, Department of Medicine, Karolinska Institutet and Karolinska University Hospital Solna, Stockholm, Sweden
| | - Leonid Padyukov
- Rheumatology Unit, Department of Medicine, Karolinska Institutet and Karolinska University Hospital Solna, Stockholm, Sweden
| | - Jane Worthington
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester Academic Health Sciences Centre, Manchester, United Kingdom
| | - Elaine R. Mardis
- The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Michael F. Seldin
- Division of Rheumatology and Immunology, Omaha VA and University of Nebraska Medical Center, Omaha, Nebraska, United States of America
| | - Peter K. Gregersen
- The Feinstein Institute for Medical Research, North Shore-Long Island Jewish Health System, Manhasset, New York, United States of America
| | - Timothy Behrens
- ITGR Human Genetics Group, Genentech Inc, San Francisco, California, United States of America
| | - Soumya Raychaudhuri
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
- Partners HealthCare Center for Personalized Genetic Medicine, Boston, Massachusetts, United States of America
- Arthritis Research UK Epidemiology Unit, University of Manchester, Manchester Academic Health Sciences Centre, Manchester, United Kingdom
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert M. Plenge
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
| |
Collapse
|
718
|
Boland MR, Tatonetti NP, Hripcsak G. Development and validation of a classification approach for extracting severity automatically from electronic health records. J Biomed Semantics 2015; 6:14. [PMID: 25848530 PMCID: PMC4386082 DOI: 10.1186/s13326-015-0010-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 03/03/2015] [Indexed: 12/29/2022] Open
Abstract
Background Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level. Methods We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes. Results Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716). Conclusions CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA ; Department of Systems Biology, Columbia University, New York, NY USA ; Department of Medicine, Columbia University, New York, NY USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY USA ; Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY USA
| |
Collapse
|
719
|
Chen Y, Ghosh J, Bejan CA, Gunter CA, Gupta S, Kho A, Liebovitz D, Sun J, Denny J, Malin B. Building bridges across electronic health record systems through inferred phenotypic topics. J Biomed Inform 2015; 55:82-93. [PMID: 25841328 DOI: 10.1016/j.jbi.2015.03.011] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Revised: 03/24/2015] [Accepted: 03/25/2015] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Data in electronic health records (EHRs) is being increasingly leveraged for secondary uses, ranging from biomedical association studies to comparative effectiveness. To perform studies at scale and transfer knowledge from one institution to another in a meaningful way, we need to harmonize the phenotypes in such systems. Traditionally, this has been accomplished through expert specification of phenotypes via standardized terminologies, such as billing codes. However, this approach may be biased by the experience and expectations of the experts, as well as the vocabulary used to describe such patients. The goal of this work is to develop a data-driven strategy to (1) infer phenotypic topics within patient populations and (2) assess the degree to which such topics facilitate a mapping across populations in disparate healthcare systems. METHODS We adapt a generative topic modeling strategy, based on latent Dirichlet allocation, to infer phenotypic topics. We utilize a variance analysis to assess the projection of a patient population from one healthcare system onto the topics learned from another system. The consistency of learned phenotypic topics was evaluated using (1) the similarity of topics, (2) the stability of a patient population across topics, and (3) the transferability of a topic across sites. We evaluated our approaches using four months of inpatient data from two geographically distinct healthcare systems: (1) Northwestern Memorial Hospital (NMH) and (2) Vanderbilt University Medical Center (VUMC). RESULTS The method learned 25 phenotypic topics from each healthcare system. The average cosine similarity between matched topics across the two sites was 0.39, a remarkably high value given the very high dimensionality of the feature space. The average stability of VUMC and NMH patients across the topics of two sites was 0.988 and 0.812, respectively, as measured by the Pearson correlation coefficient. Also the VUMC and NMH topics have smaller variance of characterizing patient population of two sites than standard clinical terminologies (e.g., ICD9), suggesting they may be more reliably transferred across hospital systems. CONCLUSIONS Phenotypic topics learned from EHR data can be more stable and transferable than billing codes for characterizing the general status of a patient population. This suggests that EHR-based research may be able to leverage such phenotypic topics as variables when pooling patient populations in predictive models.
Collapse
Affiliation(s)
- You Chen
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.
| | - Joydeep Ghosh
- Dept. of Electrical & Computer Engineering, University of Texas, Austin, TX, USA
| | - Cosmin Adrian Bejan
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Carl A Gunter
- Dept. of Computer Science, University of Illinois at Urbana-Champagne, Champaign, IL, USA
| | - Siddharth Gupta
- Dept. of Computer Science, University of Illinois at Urbana-Champagne, Champaign, IL, USA
| | - Abel Kho
- School of Medicine, Northwestern University, Chicago, IL, USA
| | - David Liebovitz
- School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jimeng Sun
- School of Computational Science & Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Joshua Denny
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Bradley Malin
- Dept. of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA; Dept. of Electrical Engineering & Computer Science, School of Engineering, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
720
|
Boland MR, Tatonetti NP. Are All Vaccines Created Equal? Using Electronic Health Records to Discover Vaccines Associated With Clinician-Coded Adverse Events. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:196-200. [PMID: 26306268 PMCID: PMC4525221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Adverse drug events (ADEs) are responsible for unnecessary patient deaths making them a major public health issue. Literature estimates 1% of ADEs recorded in Electronic Health Records (EHRs) are reported to federal databases making EHRs a vital source of ADE-related information. Using Columbia University Medical Center (CUMC)'s EHRs, we developed an algorithm to mine for vaccine-related ADEs occurring within 3 months of vaccination. In phase one, we measured the association between vaccinated patients with an ADE (cases) against those vaccinated without an ADE. To adjust for healthcare-process effects, phase two compared cases against those who returned to CUMC within 3 months without an ADE. We report 7 results passing multiplicity correction after demographic confounder adjustment. We observed an association, having some literature support, between swine flu vaccination and ADEs (H1N1v-like, OR=9.469, p<0.001; H1N1/H3N2, OR=3.207, p<0.001). Our algorithm could inform clinicians of the risks/benefits of vaccinations towards improving clinical care.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University ; Observational Health Data Sciences and Informatics, Columbia University
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University ; Department of Medicine, Columbia University ; Department of Systems Biology, Columbia University ; Observational Health Data Sciences and Informatics, Columbia University
| |
Collapse
|
721
|
Rubbo B, Fitzpatrick NK, Denaxas S, Daskalopoulou M, Yu N, Patel RS, Hemingway H. Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: A systematic review and recommendations. Int J Cardiol 2015; 187:705-11. [PMID: 25966015 DOI: 10.1016/j.ijcard.2015.03.075] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 02/16/2015] [Accepted: 03/03/2015] [Indexed: 01/13/2023]
Abstract
Electronic health records (EHRs) offer the opportunity to ascertain clinical outcomes at large scale and low cost, thus facilitating cohort studies, quality of care research and clinical trials. For acute myocardial infarction (AMI) the extent to which different EHR sources are accessible and accurate remains uncertain. Using MEDLINE and EMBASE we identified thirty three studies, reporting a total of 128658 patients, published between January 2000 and July 2014 that permitted assessment of the validity of AMI diagnosis drawn from EHR sources against a reference such as manual chart review. In contrast to clinical practice, only one study used EHR-derived markers of myocardial necrosis to identify possible AMI cases, none used electrocardiogram findings and one used symptoms in the form of free text combined with coded diagnosis. The remaining studies relied mostly on coded diagnosis. Thirty one studies reported positive predictive value (PPV)≥ 70% between AMI diagnosis from both secondary care and primary care EHRs and the reference. Among fifteen studies reporting EHR-derived AMI phenotypes, three cross-referenced ST-segment elevation AMI diagnosis (PPV range 71-100%), two non-ST-segment elevation AMI (PPV 91.0, 92.1%), three non-fatal AMI (PPV range 82-92.2%) and six fatal AMI (PPV range 64-91.7%). Clinical coding of EHR-derived AMI diagnosis in primary care and secondary care was found to be accurate in different clinical settings and for different phenotypes. However, markers of myocardial necrosis, ECG and symptoms, the cornerstones of a clinical diagnosis, are underutilised and remain a challenge to retrieve from EHRs.
Collapse
Affiliation(s)
- Bruna Rubbo
- Farr Institute of Health Informatics Research, University College London, UK.
| | | | - Spiros Denaxas
- Farr Institute of Health Informatics Research, University College London, UK
| | - Marina Daskalopoulou
- Department of Infection & Population Health, The Royal Free Hospital NHS Trust, London, UK
| | - Ning Yu
- Farr Institute of Health Informatics Research, University College London, UK
| | - Riyaz S Patel
- Farr Institute of Health Informatics Research, University College London, UK; The Heart Hospital, University College London NHS Trust, London, UK
| | | | - Harry Hemingway
- Farr Institute of Health Informatics Research, University College London, UK
| |
Collapse
|
722
|
Carroll RJ, Eyler AE, Denny JC. Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis. Expert Rev Clin Immunol 2015; 11:329-37. [PMID: 25660652 DOI: 10.1586/1744666x.2015.1009895] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In the past 10 years, electronic health records (EHRs) have had growing impact in clinical care. EHRs efficiently capture and reuse clinical information, which can directly benefit patient care by guiding treatments and providing effective reminders for best practices. The increased adoption has also lead to more complex implementations, including robust, disease-specific tools, such as for rheumatoid arthritis (RA). In addition, the data collected through normal clinical care is also used in secondary research, helping to refine patient treatment for the future. Although few studies have directly demonstrated benefits for direct clinical care of RA, the opposite is true for EHR-based research - RA has been a particularly fertile ground for clinical and genomic research that have leveraged typically advanced informatics methods to accurately define RA populations. We discuss the clinical impact of EHRs in RA treatment and their impact on secondary research, and provide recommendations for improved utility in future EHR installations.
Collapse
Affiliation(s)
- Robert J Carroll
- Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | |
Collapse
|
723
|
Cabrera CP, Ng FL, Warren HR, Barnes MR, Munroe PB, Caulfield MJ. Exploring hypertension genome-wide association studies findings and impact on pathophysiology, pathways, and pharmacogenetics. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 7:73-90. [DOI: 10.1002/wsbm.1290] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 11/25/2014] [Accepted: 01/05/2015] [Indexed: 01/11/2023]
Affiliation(s)
- Claudia P Cabrera
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
- NIHR Barts Cardiovascular Biomedical Research Unit; Queen Mary University of London; London UK
| | - Fu Liang Ng
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
| | - Helen R Warren
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
- NIHR Barts Cardiovascular Biomedical Research Unit; Queen Mary University of London; London UK
| | - Michael R Barnes
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
- NIHR Barts Cardiovascular Biomedical Research Unit; Queen Mary University of London; London UK
| | - Patricia B Munroe
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
- NIHR Barts Cardiovascular Biomedical Research Unit; Queen Mary University of London; London UK
| | - Mark J Caulfield
- Department of Clinical Pharmacology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry; Queen Mary University of London; London UK
- NIHR Barts Cardiovascular Biomedical Research Unit; Queen Mary University of London; London UK
| |
Collapse
|
724
|
Hebbring SJ, Rastegar-Mojarad M, Ye Z, Mayer J, Jacobson C, Lin S. Application of clinical text data for phenome-wide association studies (PheWASs). Bioinformatics 2015; 31:1981-7. [PMID: 25657332 DOI: 10.1093/bioinformatics/btv076] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Accepted: 02/02/2015] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWASs) are effective for describing genetic complexities of common diseases. Phenome-wide association studies (PheWASs) offer an alternative and complementary approach to GWAS using data embedded in the electronic health record (EHR) to define the phenome. International Classification of Disease version 9 (ICD9) codes are used frequently to define the phenome, but using ICD9 codes alone misses other clinically relevant information from the EHR that can be used for PheWAS analyses and discovery. RESULTS As an alternative to ICD9 coding, a text-based phenome was defined by 23 384 clinically relevant terms extracted from Marshfield Clinic's EHR. Five single nucleotide polymorphisms (SNPs) with known phenotypic associations were genotyped in 4235 individuals and associated across the text-based phenome. All five SNPs genotyped were associated with expected terms (P<0.02), most at or near the top of their respective PheWAS ranking. Raw association results indicate that text data performed equivalently to ICD9 coding and demonstrate the utility of information beyond ICD9 coding for application in PheWAS.
Collapse
Affiliation(s)
- Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - Majid Rastegar-Mojarad
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - Zhan Ye
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - John Mayer
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - Crystal Jacobson
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| | - Simon Lin
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA and Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
| |
Collapse
|
725
|
An Active Learning Framework for Efficient Condition Severity Classification. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
726
|
Moore CB, Verma A, Pendergrass S, Verma SS, Johnson DH, Daar ES, Gulick RM, Haubrich R, Robbins GK, Ritchie MD, Haas DW. Phenome-wide Association Study Relating Pretreatment Laboratory Parameters With Human Genetic Variants in AIDS Clinical Trials Group Protocols. Open Forum Infect Dis 2015; 2:ofu113. [PMID: 25884002 PMCID: PMC4396430 DOI: 10.1093/ofid/ofu113] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 12/02/2014] [Indexed: 01/11/2023] Open
Abstract
Background. Phenome-Wide Association Studies (PheWAS) identify genetic associations across multiple phenotypes. Clinical trials offer opportunities for PheWAS to identify pharmacogenomic associations. We describe the first PheWAS to use genome-wide genotypic data and to utilize human immunodeficiency virus (HIV) clinical trials data. As proof-of-concept, we focused on baseline laboratory phenotypes from antiretroviral therapy-naive individuals. Methods. Data from 4 AIDS Clinical Trials Group (ACTG) studies were split into 2 datasets: Dataset I (1181 individuals from protocol A5202) and Dataset II (1366 from protocols A5095, ACTG 384, and A5142). Final analyses involved 2547 individuals and 5 954 294 imputed polymorphisms. We calculated comprehensive associations between these polymorphisms and 27 baseline laboratory phenotypes. Results. A total of 10 584 (0.17%) polymorphisms had associations with P < .01 in both datasets and with the same direction of association. Twenty polymorphisms replicated associations with identical or related phenotypes reported in the Catalog of Published Genome-Wide Association Studies, including several not previously reported in HIV-positive cohorts. We also identified several possibly novel associations. Conclusions. These analyses define PheWAS properties and principles with baseline laboratory data from HIV clinical trials. This approach may be useful for evaluating on-treatment HIV clinical trials data for associations with various clinical phenotypes.
Collapse
Affiliation(s)
- Carrie B. Moore
- Vanderbilt University School of Medicine, Nashville, Tennessee
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Anurag Verma
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Sarah Pendergrass
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Shefali S. Verma
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | | | - Eric S. Daar
- Los Angeles Biomed Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | | | | | | | - Marylyn D. Ritchie
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - David W. Haas
- Vanderbilt University School of Medicine, Nashville, Tennessee
| |
Collapse
|
727
|
Barrie ES, Weinshenker D, Verma A, Pendergrass SA, Lange LA, Ritchie MD, Wilson JG, Kuivaniemi H, Tromp G, Carey DJ, Gerhard GS, Brilliant MH, Hebbring SJ, Cubells JF, Pinsonneault JK, Norman GJ, Sadee W. Regulatory polymorphisms in human DBH affect peripheral gene expression and sympathetic activity. Circ Res 2014; 115:1017-1025. [PMID: 25326128 PMCID: PMC4258174 DOI: 10.1161/circresaha.116.304398] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 10/16/2014] [Indexed: 01/20/2023]
Abstract
RATIONALE Dopamine β-hydroxylase (DBH) catalyzes the conversion of dopamine to norepinephrine in the central nervous system and peripherally. DBH variants are associated with large changes in circulating DBH and implicated in multiple disorders; yet causal relationships and tissue-specific effects remain unresolved. OBJECTIVE To characterize regulatory variants in DBH, effect on mRNA expression, and role in modulating sympathetic tone and disease risk. METHODS AND RESULTS Analysis of DBH mRNA in human tissues confirmed high expression in the locus coeruleus and adrenal gland, but also in sympathetically innervated organs (liver>lung>heart). Allele-specific mRNA assays revealed pronounced allelic expression differences in the liver (2- to 11-fold) attributable to promoter rs1611115 and exon 2 rs1108580, but only small differences in locus coeruleus and adrenals. These alleles were also associated with significantly reduced mRNA expression in liver and lung. Although DBH protein is expressed in other sympathetically innervated organs, mRNA levels were too low for analysis. In mice, hepatic Dbh mRNA levels correlated with cardiovascular risk phenotypes. The minor alleles of rs1611115 and rs1108580 were associated with sympathetic phenotypes, including angina pectoris. Testing combined effects of these variants suggested protection against myocardial infarction in 3 separate clinical cohorts. CONCLUSIONS We demonstrate profound effects of DBH variants on expression in 2 sympathetically innervated organs, liver and lung, but not in adrenals and brain. Preliminary results demonstrate an association of these variants with clinical phenotypes responsive to peripheral sympathetic tone. We hypothesize that in addition to endocrine effects via circulating DBH and norepinephrine, the variants act in sympathetically innervated target organs.
Collapse
Affiliation(s)
- Elizabeth S Barrie
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - David Weinshenker
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Anurag Verma
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Sarah A Pendergrass
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Leslie A Lange
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Marylyn D Ritchie
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - James G Wilson
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Helena Kuivaniemi
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Gerard Tromp
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - David J Carey
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Glenn S Gerhard
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Murray H Brilliant
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Scott J Hebbring
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Joseph F Cubells
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Julia K Pinsonneault
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Greg J Norman
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.)
| | - Wolfgang Sadee
- From the Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus (E.S.B., J.K.P., W.S.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (D.W., J.F.C.); Center for Systems Genomics, Pennsylvania State University, University Park (A.V., S.A.P., M.D.R.); Department of Genetics, University of North Carolina School of Medicine, Chapel Hill (L.A.L.); Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson (J.G.W.); The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA (H.K., G.T., D.J.C.); Institute for Personalized Medicine, The Pennsylvania State University College of Medicine, Hershey (G.S.G.); Center for Human Genetics, Marshfield Clinic Research Foundation, WI (M.H.B., S.J.H.); and Department of Psychology, The University of Chicago, IL (G.J.N.).
| |
Collapse
|
728
|
Heatherly R, Denny JC, Haines JL, Roden DM, Malin BA. Size matters: how population size influences genotype-phenotype association studies in anonymized data. J Biomed Inform 2014; 52:243-50. [PMID: 25038554 PMCID: PMC4260994 DOI: 10.1016/j.jbi.2014.07.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Revised: 05/21/2014] [Accepted: 07/07/2014] [Indexed: 12/29/2022]
Abstract
OBJECTIVE Electronic medical records (EMRs) data is increasingly incorporated into genome-phenome association studies. Investigators hope to share data, but there are concerns it may be "re-identified" through the exploitation of various features, such as combinations of standardized clinical codes. Formal anonymization algorithms (e.g., k-anonymization) can prevent such violations, but prior studies suggest that the size of the population available for anonymization may influence the utility of the resulting data. We systematically investigate this issue using a large-scale biorepository and EMR system through which we evaluate the ability of researchers to learn from anonymized data for genome-phenome association studies under various conditions. METHODS We use a k-anonymization strategy to simulate a data protection process (on data sets containing clinical codes) for resources of similar size to those found at nine academic medical institutions within the United States. Following the protection process, we replicate an existing genome-phenome association study and compare the discoveries using the protected data and the original data through the correlation (r(2)) of the p-values of association significance. RESULTS Our investigation shows that anonymizing an entire dataset with respect to the population from which it is derived yields significantly more utility than small study-specific datasets anonymized unto themselves. When evaluated using the correlation of genome-phenome association strengths on anonymized data versus original data, all nine simulated sites, results from largest-scale anonymizations (population ∼100,000) retained better utility to those on smaller sizes (population ∼6000-75,000). We observed a general trend of increasing r(2) for larger data set sizes: r(2)=0.9481 for small-sized datasets, r(2)=0.9493 for moderately-sized datasets, r(2)=0.9934 for large-sized datasets. CONCLUSIONS This research implies that regardless of the overall size of an institution's data, there may be significant benefits to anonymization of the entire EMR, even if the institution is planning on releasing only data about a specific cohort of patients.
Collapse
Affiliation(s)
- Raymond Heatherly
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA.
| | - Joshua C Denny
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Medicine, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, University School of Medicine, Case Western Reserve University, USA
| | - Dan M Roden
- Department of Medicine, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Pharmacology, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA; Department of Electrical Engineering and Computer Science, School of Engineering, Vanderbilt University, 2525 West End Avenue, Suite 1030, Nashville, TN 37203, USA
| |
Collapse
|
729
|
Hall MA, Verma A, Brown-Gentry KD, Goodloe R, Boston J, Wilson S, McClellan B, Sutcliffe C, Dilks HH, Gillani NB, Jin H, Mayo P, Allen M, Schnetz-Boutaud N, Crawford DC, Ritchie MD, Pendergrass SA. Detection of pleiotropy through a Phenome-wide association study (PheWAS) of epidemiologic data as part of the Environmental Architecture for Genes Linked to Environment (EAGLE) study. PLoS Genet 2014; 10:e1004678. [PMID: 25474351 PMCID: PMC4256091 DOI: 10.1371/journal.pgen.1004678] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 08/16/2014] [Indexed: 12/19/2022] Open
Abstract
We performed a Phenome-wide association study (PheWAS) utilizing diverse genotypic and phenotypic data existing across multiple populations in the National Health and Nutrition Examination Surveys (NHANES), conducted by the Centers for Disease Control and Prevention (CDC), and accessed by the Epidemiological Architecture for Genes Linked to Environment (EAGLE) study. We calculated comprehensive tests of association in Genetic NHANES using 80 SNPs and 1,008 phenotypes (grouped into 184 phenotype classes), stratified by race-ethnicity. Genetic NHANES includes three surveys (NHANES III, 1999-2000, and 2001-2002) and three race-ethnicities: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We identified 69 PheWAS associations replicating across surveys for the same SNP, phenotype-class, direction of effect, and race-ethnicity at p<0.01, allele frequency >0.01, and sample size >200. Of these 69 PheWAS associations, 39 replicated previously reported SNP-phenotype associations, 9 were related to previously reported associations, and 21 were novel associations. Fourteen results had the same direction of effect across more than one race-ethnicity: one result was novel, 11 replicated previously reported associations, and two were related to previously reported results. Thirteen SNPs showed evidence of pleiotropy. We further explored results with gene-based biological networks, contrasting the direction of effect for pleiotropic associations across phenotypes. One PheWAS result was ABCG2 missense SNP rs2231142, associated with uric acid levels in both non-Hispanic whites and Mexican Americans, protoporphyrin levels in non-Hispanic whites and Mexican Americans, and blood pressure levels in Mexican Americans. Another example was SNP rs1800588 near LIPC, significantly associated with the novel phenotypes of folate levels (Mexican Americans), vitamin E levels (non-Hispanic whites) and triglyceride levels (non-Hispanic whites), and replication for cholesterol levels. The results of this PheWAS show the utility of this approach for exposing more of the complex genetic architecture underlying multiple traits, through generating novel hypotheses for future research.
Collapse
Affiliation(s)
- Molly A. Hall
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Kristin D. Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Sarah Wilson
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Bob McClellan
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Cara Sutcliffe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Holly H. Dilks
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nila B. Gillani
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Hailing Jin
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Ping Mayo
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Melissa Allen
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nathalie Schnetz-Boutaud
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Marylyn D. Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Sarah A. Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
730
|
Namjou B, Marsolo K, Caroll RJ, Denny JC, Ritchie MD, Verma SS, Lingren T, Porollo A, Cobb BL, Perry C, Kottyan LC, Rothenberg ME, Thompson SD, Holm IA, Kohane IS, Harley JB. Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to Eosinophilic Esophagitis. Front Genet 2014; 5:401. [PMID: 25477900 PMCID: PMC4235428 DOI: 10.3389/fgene.2014.00401] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 10/31/2014] [Indexed: 02/06/2023] Open
Abstract
Objective: We report the first pediatric specific Phenome-Wide Association Study (PheWAS) using electronic medical records (EMRs). Given the early success of PheWAS in adult populations, we investigated the feasibility of this approach in pediatric cohorts in which associations between a previously known genetic variant and a wide range of clinical or physiological traits were evaluated. Although computationally intensive, this approach has potential to reveal disease mechanistic relationships between a variant and a network of phenotypes. Method: Data on 5049 samples of European ancestry were obtained from the EMRs of two large academic centers in five different genotyped cohorts. Recently, these samples have undergone whole genome imputation. After standard quality controls, removing missing data and outliers based on principal components analyses (PCA), 4268 samples were used for the PheWAS study. We scanned for associations between 2476 single-nucleotide polymorphisms (SNP) with available genotyping data from previously published GWAS studies and 539 EMR-derived phenotypes. The false discovery rate was calculated and, for any new PheWAS findings, a permutation approach (with up to 1,000,000 trials) was implemented. Results: This PheWAS found a variety of common variants (MAF > 10%) with prior GWAS associations in our pediatric cohorts including Juvenile Rheumatoid Arthritis (JRA), Asthma, Autism and Pervasive Developmental Disorder (PDD) and Type 1 Diabetes with a false discovery rate < 0.05 and power of study above 80%. In addition, several new PheWAS findings were identified including a cluster of association near the NDFIP1 gene for mental retardation (best SNP rs10057309, p = 4.33 × 10−7, OR = 1.70, 95%CI = 1.38 − 2.09); association near PLCL1 gene for developmental delays and speech disorder [best SNP rs1595825, p = 1.13 × 10−8, OR = 0.65(0.57 − 0.76)]; a cluster of associations in the IL5-IL13 region with Eosinophilic Esophagitis (EoE) [best at rs12653750, p = 3.03 × 10−9, OR = 1.73 95%CI = (1.44 − 2.07)], previously implicated in asthma, allergy, and eosinophilia; and association of variants in GCKR and JAZF1 with allergic rhinitis in our pediatric cohorts [best SNP rs780093, p = 2.18 × 10−5, OR = 1.39, 95%CI = (1.19 − 1.61)], previously demonstrated in metabolic disease and diabetes in adults. Conclusion: The PheWAS approach with re-mapping ICD-9 structured codes for our European-origin pediatric cohorts, as with the previous adult studies, finds many previously reported associations as well as presents the discovery of associations with potentially important clinical implications.
Collapse
Affiliation(s)
- Bahram Namjou
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA
| | - Keith Marsolo
- College of Medicine, University of Cincinnati Cincinnati, OH, USA ; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Robert J Caroll
- Department of Biomedical Informatics, Vanderbilt University School of Medicine Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine Nashville, TN, USA ; Department of Medicine, Vanderbilt University School of Medicine Nashville, TN, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, The Pennsylvania State University Philadelphia, PA, USA
| | - Shefali S Verma
- Center for Systems Genomics, The Pennsylvania State University Philadelphia, PA, USA
| | - Todd Lingren
- College of Medicine, University of Cincinnati Cincinnati, OH, USA ; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Aleksey Porollo
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA ; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Beth L Cobb
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Cassandra Perry
- Division of Genetics and Genomics, Boston Children's Hospital Boston, MA, USA
| | - Leah C Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA ; Division of Allergy and Immunology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Marc E Rothenberg
- Division of Allergy and Immunology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA
| | - Susan D Thompson
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA
| | - Ingrid A Holm
- Division of Genetics and Genomics, Department of Pediatrics, The Manton Center for Orphan Disease Research, Harvard Medical School, Boston Children's Hospital Boston, MA, USA
| | - Isaac S Kohane
- Children's Hospital Informatics Program, Center for Biomedical Informatics, Harvard Medical School Boston, MA, USA
| | - John B Harley
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center Cincinnati, OH, USA ; College of Medicine, University of Cincinnati Cincinnati, OH, USA ; U.S. Department of Veterans Affairs Medical Center Cincinnati, OH, USA
| |
Collapse
|
731
|
Nadkarni GN, Gottesman O, Linneman JG, Chase H, Berg RL, Farouk S, Nadukuru R, Lotay V, Ellis S, Hripcsak G, Peissig P, Weng C, Bottinger EP. Development and validation of an electronic phenotyping algorithm for chronic kidney disease. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2014; 2014:907-916. [PMID: 25954398 PMCID: PMC4419875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Twenty-six million Americans are estimated to have chronic kidney disease (CKD) with increased risk for cardiovascular disease and end stage renal disease. CKD is frequently undiagnosed and patients are unaware, hampering intervention. A tool for accurate and timely identification of CKD from electronic medical records (EMR) could improve healthcare quality and identify patients for research. As members of eMERGE (electronic medical records and genomics) Network, we developed an automated phenotyping algorithm that can be deployed to identify rapidly diabetic and/or hypertensive CKD cases and controls in health systems with EMRs It uses diagnostic codes, laboratory results, medication and blood pressure records, and textual information culled from notes. Validation statistics demonstrated positive predictive values of 96% and negative predictive values of 93.3. Similar results were obtained on implementation by two independent eMERGE member institutions. The algorithm dramatically outperformed identification by ICD-9-CM codes with 63% positive and 54% negative predictive values, respectively.
Collapse
Affiliation(s)
| | | | | | - Herbert Chase
- Marshfield Clinic Research Foundation, Marshfield, WI
| | | | - Samira Farouk
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | | | - Vaneet Lotay
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | - Steve Ellis
- Icahn School Of Medicine at Mount Sinai, New York, NY
| | | | - Peggy Peissig
- Marshfield Clinic Research Foundation, Marshfield, WI
| | - Chunhua Weng
- Columbia University Medical Center, New York, NY
| | | |
Collapse
|
732
|
Liao SG, Lin Y, Kang DD, Chandra D, Bon J, Kaminski N, Sciurba FC, Tseng GC. Missing value imputation in high-dimensional phenomic data: imputable or not, and how? BMC Bioinformatics 2014; 15:346. [PMID: 25371041 PMCID: PMC4228077 DOI: 10.1186/s12859-014-0346-6] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 10/06/2014] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND In modern biomedical research of complex diseases, a large number of demographic and clinical variables, herein called phenomic data, are often collected and missing values (MVs) are inevitable in the data collection process. Since many downstream statistical and bioinformatics methods require complete data matrix, imputation is a common and practical solution. In high-throughput experiments such as microarray experiments, continuous intensities are measured and many mature missing value imputation methods have been developed and widely applied. Numerous methods for missing data imputation of microarray data have been developed. Large phenomic data, however, contain continuous, nominal, binary and ordinal data types, which void application of most methods. Though several methods have been developed in the past few years, not a single complete guideline is proposed with respect to phenomic missing data imputation. RESULTS In this paper, we investigated existing imputation methods for phenomic data, proposed a self-training selection (STS) scheme to select the best imputation method and provide a practical guideline for general applications. We introduced a novel concept of "imputability measure" (IM) to identify missing values that are fundamentally inadequate to impute. In addition, we also developed four variations of K-nearest-neighbor (KNN) methods and compared with two existing methods, multivariate imputation by chained equations (MICE) and missForest. The four variations are imputation by variables (KNN-V), by subjects (KNN-S), their weighted hybrid (KNN-H) and an adaptively weighted hybrid (KNN-A). We performed simulations and applied different imputation methods and the STS scheme to three lung disease phenomic datasets to evaluate the methods. An R package "phenomeImpute" is made publicly available. CONCLUSIONS Simulations and applications to real datasets showed that MICE often did not perform well; KNN-A, KNN-H and random forest were among the top performers although no method universally performed the best. Imputation of missing values with low imputability measures increased imputation errors greatly and could potentially deteriorate downstream analyses. The STS scheme was accurate in selecting the optimal method by evaluating methods in a second layer of missingness simulation. All source files for the simulation and the real data analyses are available on the author's publication website.
Collapse
|
733
|
Sinnott JA, Dai W, Liao KP, Shaw SY, Ananthakrishnan AN, Gainer VS, Karlson EW, Churchill S, Szolovits P, Murphy S, Kohane I, Plenge R, Cai T. Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records. Hum Genet 2014; 133:1369-82. [PMID: 25062868 PMCID: PMC4185241 DOI: 10.1007/s00439-014-1466-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 06/29/2014] [Indexed: 01/04/2023]
Abstract
To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case-control status, and this estimated case-control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies.
Collapse
Affiliation(s)
- Jennifer A Sinnott
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02115, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
734
|
Hong S, Huang Y, Cao Y, Chen X, Han JDJ. Approaches to uncovering cancer diagnostic and prognostic molecular signatures. Mol Cell Oncol 2014; 1:e957981. [PMID: 27308330 PMCID: PMC4905187 DOI: 10.4161/23723548.2014.957981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 07/21/2014] [Accepted: 07/22/2014] [Indexed: 12/14/2022]
Abstract
The recent rapid development of high-throughput technology enables the study of molecular signatures for cancer diagnosis and prognosis at multiple levels, from genomic and epigenomic to transcriptomic. These unbiased large-scale scans provide important insights into the detection of cancer-related signatures. In addition to single-layer signatures, such as gene expression and somatic mutations, integrating data from multiple heterogeneous platforms using a systematic approach has been proven to be particularly effective for the identification of classification markers. This approach not only helps to uncover essential driver genes and pathways in the cancer network that are responsible for the mechanisms of cancer development, but will also lead us closer to the ultimate goal of personalized cancer therapy.
Collapse
Affiliation(s)
- Shengjun Hong
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Yi Huang
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Yaqiang Cao
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Xingwei Chen
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Jing-Dong J Han
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| |
Collapse
|
735
|
Warner JL, Denny JC, Kreda DA, Alterovitz G. Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization. J Am Med Inform Assoc 2014; 22:324-9. [PMID: 25336590 DOI: 10.1136/amiajnl-2014-002965] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Our aim was to uncover unrecognized phenomic relationships using force-based network visualization methods, based on observed electronic medical record data. A primary phenotype was defined from actual patient profiles in the Multiparameter Intelligent Monitoring in Intensive Care II database. Network visualizations depicting primary relationships were compared to those incorporating secondary adjacencies. Interactivity was enabled through a phenotype visualization software concept: the Phenomics Advisor. Subendocardial infarction with cardiac arrest was demonstrated as a sample phenotype; there were 332 primarily adjacent diagnoses, with 5423 relationships. Primary network visualization suggested a treatment-related complication phenotype and several rare diagnoses; re-clustering by secondary relationships revealed an emergent cluster of smokers with the metabolic syndrome. Network visualization reveals phenotypic patterns that may have remained occult in pairwise correlation analysis. Visualization of complex data, potentially offered as point-of-care tools on mobile devices, may allow clinicians and researchers to quickly generate hypotheses and gain deeper understanding of patient subpopulations.
Collapse
Affiliation(s)
- Jeremy L Warner
- Division of Hematology/Oncology, Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA Division of General Internal Medicine, Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | - David A Kreda
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Gil Alterovitz
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA Children's Hospital Informatics Program at Harvard-MIT Division of Health Science, Boston, Massachusetts, USA Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
736
|
Crosslin DR, Carrell DS, Burt A, Kim DS, Underwood JG, Hanna DS, Comstock BA, Baldwin E, de Andrade M, Kullo IJ, Tromp G, Kuivaniemi H, Borthwick KM, McCarty CA, Peissig PL, Doheny KF, Pugh E, Kho A, Pacheco J, Hayes MG, Ritchie MD, Verma SS, Armstrong G, Stallings S, Denny JC, Carroll RJ, Crawford DC, Crane PK, Mukherjee S, Bottinger E, Li R, Keating B, Mirel DB, Carlson CS, Harley JB, Larson EB, Jarvik GP. Genetic variation in the HLA region is associated with susceptibility to herpes zoster. Genes Immun 2014; 16:1-7. [PMID: 25297839 PMCID: PMC4308645 DOI: 10.1038/gene.2014.51] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Revised: 07/22/2014] [Accepted: 07/24/2014] [Indexed: 01/25/2023]
Abstract
Herpes zoster, commonly referred to as shingles, is caused by the varicella zoster virus (VZV). VZV initially manifests as chicken pox, most commonly in childhood, can remain asymptomatically latent in nerve tissues for many years and often re-emerges as shingles. Although reactivation may be related to immune suppression, aging and female sex, most inter-individual variability in re-emergence risk has not been explained to date. We performed a genome-wide association analyses in 22 981 participants (2280 shingles cases) from the electronic Medical Records and Genomics Network. Using Cox survival and logistic regression, we identified a genomic region in the combined and European ancestry groups that has an age of onset effect reaching genome-wide significance (P>1.0 × 10−8). This region tags the non-coding gene HCP5 (HLA Complex P5) in the major histocompatibility complex. This gene is an endogenous retrovirus and likely influences viral activity through regulatory functions. Variants in this genetic region are known to be associated with delay in development of AIDS in people infected by HIV. Our study provides further suggestion that this region may have a critical role in viral suppression and could potentially harbor a clinically actionable variant for the shingles vaccine.
Collapse
Affiliation(s)
- D R Crosslin
- 1] Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA [2] Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - D S Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - A Burt
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA
| | - D S Kim
- 1] Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA [2] Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - J G Underwood
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - D S Hanna
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - B A Comstock
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - E Baldwin
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - M de Andrade
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| | - I J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - G Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - H Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - K M Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - C A McCarty
- 1] Essentia Institute of Rural Health, Duluth, MN, USA [2] Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - P L Peissig
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - K F Doheny
- Center for Inherited Disease Research, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - E Pugh
- Center for Inherited Disease Research, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - A Kho
- Divisions of General Internal Medicine and Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - J Pacheco
- Divisions of General Internal Medicine and Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - M G Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - M D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania, PA, USA
| | - S S Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania, PA, USA
| | - G Armstrong
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania, PA, USA
| | - S Stallings
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - J C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - R J Carroll
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - D C Crawford
- 1] Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA [2] Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | - P K Crane
- Division of General Internal Medicine, University of Washington, Seattle, WA, USA
| | - S Mukherjee
- Division of General Internal Medicine, University of Washington, Seattle, WA, USA
| | - E Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine, New York, NY, USA
| | - R Li
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - B Keating
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - D B Mirel
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - C S Carlson
- Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, WA, USA
| | - J B Harley
- Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH), Boston, MA, USA
| | - E B Larson
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - G P Jarvik
- 1] Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA [2] Department of Genome Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
737
|
Mooney SD. Progress towards the integration of pharmacogenomics in practice. Hum Genet 2014; 134:459-65. [PMID: 25238897 DOI: 10.1007/s00439-014-1484-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/20/2014] [Indexed: 12/12/2022]
Abstract
Understanding the role genes and genetic variants play in clinical treatment response continues to be an active area of research with the goal of common clinical use. This goal has developed into today's industry of pharmacogenomics, where new drug-gene relationships are discovered and further characterized, published and then curated into national and international resources for use by researchers and clinicians. These efforts have given us insight into what a pharmacogenomic variant is, and how it differs from human disease variants and common polymorphisms. While publications continue to reveal pharmacogenomic relationships between genes and specific classes of drugs, many challenges remain toward the goal of widespread use clinically. First, the clinical guidelines for pharmacogenomic testing are still in their infancy. Second, sequencing technologies are changing rapidly making it somewhat unclear what genetic data will be available to the clinician at the time of care. Finally, what and when to return data to a patient is an area under constant debate. New innovations such as PheWAS approaches and whole genome sequencing studies are enabling a tsunami of new findings. In this review, pharmacogenomic variants, pharmacogenomic resources, interpretation clinical guidelines and challenges, such as WGS approaches, and the impact of pharmacogenomics on drug development and regulatory approval are reviewed.
Collapse
Affiliation(s)
- Sean D Mooney
- Buck Institute for Research on Aging, 8001 Redwood Blvd, Novato, CA, 94945, USA,
| |
Collapse
|
738
|
Radder JE, Shapiro SD, Berndt A. Personalized medicine for chronic, complex diseases: chronic obstructive pulmonary disease as an example. Per Med 2014; 11:669-679. [PMID: 29764057 DOI: 10.2217/pme.14.51] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Chronic, complex diseases represent the majority of healthcare utilization and spending in the USA today. Despite this, therapeutics that account for the heterogeneity of these diseases are lacking, begging for more personalized approaches. Improving our understanding of disease phenotypes through retrospective trials of electronic health record data will enable us to better categorize patients. Increased usage of next-generation sequencing will further our understanding of the genetic variants involved in chronic disease. Utilization of data warehousing will be necessary in order to securely handle, integrate and analyze the large sets of data produced with these methods. Finally, increased use of clinical decision support will enable the return of clinically actionable results that physicians can use to apply these personalized approaches.
Collapse
Affiliation(s)
- Josiah E Radder
- Division of Pulmonary, Allergy & Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Steven D Shapiro
- Division of Pulmonary, Allergy & Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.,University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Annerose Berndt
- Division of Pulmonary, Allergy & Critical Care Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.,University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| |
Collapse
|
739
|
Genetic Dissection of the Physiological Role of Skeletal Muscle in Metabolic Syndrome. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/635146] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The primary deficiency underlying metabolic syndrome is insulin resistance, in which insulin-responsive peripheral tissues fail to maintain glucose homeostasis. Because skeletal muscle is the major site for insulin-induced glucose uptake, impairments in skeletal muscle’s insulin responsiveness play a major role in the development of insulin resistance and type 2 diabetes. For example, skeletal muscle of type 2 diabetes patients and their offspring exhibit reduced ratios of slow oxidative muscle. These observations suggest the possibility of applying muscle remodeling to recover insulin sensitivity in metabolic syndrome. Skeletal muscle is highly adaptive to external stimulations such as exercise; however, in practice it is often not practical or possible to enforce the necessary intensity to obtain measurable benefits to the metabolic syndrome patient population. Therefore, identifying molecular targets for inducing muscle remodeling would provide new approaches to treat metabolic syndrome. In this review, the physiological properties of skeletal muscle, genetic analysis of metabolic syndrome in human populations and model organisms, and genetically engineered mouse models will be discussed in regard to the prospect of applying skeletal muscle remodeling as possible therapy for metabolic syndrome.
Collapse
|
740
|
Denny JC. Surveying Recent Themes in Translational Bioinformatics: Big Data in EHRs, Omics for Drugs, and Personal Genomics. Yearb Med Inform 2014; 9:199-205. [PMID: 25123743 PMCID: PMC4287076 DOI: 10.15265/iy-2014-0015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
OBJECTIVE To provide a survey of recent progress in the use of large-scale biologic data to impact clinical care, and the impact the reuse of electronic health record data has made in genomic discovery. METHOD Survey of key themes in translational bioinformatics, primarily from 2012 and 2013. RESULT This survey focuses on four major themes: the growing use of Electronic Health Records (EHRs) as a source for genomic discovery, adoption of genomics and pharmacogenomics in clinical practice, the possible use of genomic technologies for drug repurposing, and the use of personal genomics to guide care. CONCLUSION Reuse of abundant clinical data for research is speeding discovery, and implementation of genomic data into clinical medicine is impacting care with new classes of data rarely used previously in medicine.
Collapse
Affiliation(s)
- J C Denny
- Joshua C. Denny, MD, MS, 2525 West End Ave - Suite 672, Nashville, TN 37213, USA, E-mail:
| |
Collapse
|
741
|
Abstract
OBJECTIVES Implementation of Electronic Health Record (EHR) systems continues to expand. The massive number of patient encounters results in high amounts of stored data. Transforming clinical data into knowledge to improve patient care has been the goal of biomedical informatics professionals for many decades, and this work is now increasingly recognized outside our field. In reviewing the literature for the past three years, we focus on "big data" in the context of EHR systems and we report on some examples of how secondary use of data has been put into practice. METHODS We searched PubMed database for articles from January 1, 2011 to November 1, 2013. We initiated the search with keywords related to "big data" and EHR. We identified relevant articles and additional keywords from the retrieved articles were added. Based on the new keywords, more articles were retrieved and we manually narrowed down the set utilizing predefined inclusion and exclusion criteria. RESULTS Our final review includes articles categorized into the themes of data mining (pharmacovigilance, phenotyping, natural language processing), data application and integration (clinical decision support, personal monitoring, social media), and privacy and security. CONCLUSION The increasing adoption of EHR systems worldwide makes it possible to capture large amounts of clinical data. There is an increasing number of articles addressing the theme of "big data", and the concepts associated with these articles vary. The next step is to transform healthcare big data into actionable knowledge.
Collapse
Affiliation(s)
- M K Ross
- Lucila Ohno-Machado, Division of Biomedical Informatics, 9500 Gilman Drive, MC 0505, La Jolla, California, 92037-0505, USA, Tel: +1 858 822 4931, E-mail:
| | | | | |
Collapse
|
742
|
Seo HJ, Kim HH, Im JS, Kim JH. Standard based deposit guideline for distribution of human biological materials in cancer patients. Asian Pac J Cancer Prev 2014; 15:5545-50. [PMID: 25081662 DOI: 10.7314/apjcp.2014.15.14.5545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Human biological materials from cancer patients are linked directly with public health issues in medical science research as foundational resources so securing "human biological material" is truly important in bio-industry. However, because South Korea's national R and D project lacks a proper managing system for establishing a national standard for the outputs of certain processes, high-value added human biological material produced by the national R and D project could be lost or neglected. As a result, it is necessary to develop a managing process, which can be started by establishing operating guidelines to handle the output of human biological materials. MATERIALS AND METHODS The current law and regulations related to submitting research outcome resources was reviewed, and the process of data 'acquisition' and data 'distribution' from the point of view of big data and health 2.0 was examined in order to arrive at a method for switching paradigms to better utilize human biological materials. RESULTS For the deposit of biological research resources, the original process was modified and a standard process with relative forms was developed. With deposit forms, research information, researchers, and deposit type are submitted. The checklist's 26 items are provided for publishing. This is a checklist of items that should be addressed in deposit reports. Lastly, XML-based deposit procedure forms were designed and developed to collect data in a structured form, to help researchers distribute their data in an electronic way. CONCLUSIONS Through guidelines included with the plan for profit sharing between depositor and user it is possible to manage the material effectively and safely, so high-quality human biological material can be supplied and utilized by researchers from universities, industry and institutes. Furthermore, this will improve national competitiveness by leading to development in the national bio-science industry.
Collapse
Affiliation(s)
- Hwa Jeong Seo
- Medical Informatics and health Technology (MIT), Department of Healthcare Management, College of Social Science, Gachon University, Seongnam, Korea E-mail :
| | | | | | | |
Collapse
|
743
|
Ye Z, Mayer J, Ivacic L, Zhou Z, He M, Schrodi SJ, Page D, Brilliant MH, Hebbring SJ. Phenome-wide association studies (PheWASs) for functional variants. Eur J Hum Genet 2014; 23:523-9. [PMID: 25074467 DOI: 10.1038/ejhg.2014.123] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 05/27/2014] [Accepted: 05/30/2014] [Indexed: 01/08/2023] Open
Abstract
The genome-wide association study (GWAS) is a powerful approach for studying the genetic complexities of human disease. Unfortunately, GWASs often fail to identify clinically significant associations and describing function can be a challenge. GWAS is a phenotype-to-genotype approach. It is now possible to conduct a converse genotype-to-phenotype approach using extensive electronic medical records to define a phenome. This approach associates a single genetic variant with many phenotypes across the phenome and is called a phenome-wide association study (PheWAS). The majority of PheWASs conducted have focused on variants identified previously by GWASs. This approach has been efficient for rediscovering gene-disease associations while also identifying pleiotropic effects for some single-nucleotide polymorphisms (SNPs). However, the use of SNPs identified by GWAS in a PheWAS is limited by the inherent properties of the GWAS SNPs, including weak effect sizes and difficulty when translating discoveries to function. To address these challenges, we conducted a PheWAS on 105 presumed functional stop-gain and stop-loss variants genotyped on 4235 Marshfield Clinic patients. Associations were validated on an additional 10 640 Marshfield Clinic patients. PheWAS results indicate that a nonsense variant in ARMS2 (rs2736911) is associated with age-related macular degeneration (AMD). These results demonstrate that focusing on functional variants may be an effective approach when conducting a PheWAS.
Collapse
Affiliation(s)
- Zhan Ye
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - John Mayer
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Lynn Ivacic
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Zhiyi Zhou
- Parkland Center for Clinical Innovation, Parkland Health and Hospital System, Dallas, TX, USA
| | - Min He
- 1] Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA [2] Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Steven J Schrodi
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - David Page
- Computation and Informatics in Biology and Medicine, University of Wisconsin Madison, Madison, WI, USA
| | - Murray H Brilliant
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Scott J Hebbring
- 1] Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA [2] Computation and Informatics in Biology and Medicine, University of Wisconsin Madison, Madison, WI, USA
| |
Collapse
|
744
|
Xu H, Aldrich MC, Chen Q, Liu H, Peterson NB, Dai Q, Levy M, Shah A, Han X, Ruan X, Jiang M, Li Y, Julien JS, Warner J, Friedman C, Roden DM, Denny JC. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. J Am Med Inform Assoc 2014; 22:179-91. [PMID: 25053577 PMCID: PMC4433365 DOI: 10.1136/amiajnl-2014-002649] [Citation(s) in RCA: 148] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Objectives Drug repurposing, which finds new indications for existing drugs, has received great attention recently. The goal of our work is to assess the feasibility of using electronic health records (EHRs) and automated informatics methods to efficiently validate a recent drug repurposing association of metformin with reduced cancer mortality. Methods By linking two large EHRs from Vanderbilt University Medical Center and Mayo Clinic to their tumor registries, we constructed a cohort including 32 415 adults with a cancer diagnosis at Vanderbilt and 79 258 cancer patients at Mayo from 1995 to 2010. Using automated informatics methods, we further identified type 2 diabetes patients within the cancer cohort and determined their drug exposure information, as well as other covariates such as smoking status. We then estimated HRs for all-cause mortality and their associated 95% CIs using stratified Cox proportional hazard models. HRs were estimated according to metformin exposure, adjusted for age at diagnosis, sex, race, body mass index, tobacco use, insulin use, cancer type, and non-cancer Charlson comorbidity index. Results Among all Vanderbilt cancer patients, metformin was associated with a 22% decrease in overall mortality compared to other oral hypoglycemic medications (HR 0.78; 95% CI 0.69 to 0.88) and with a 39% decrease compared to type 2 diabetes patients on insulin only (HR 0.61; 95% CI 0.50 to 0.73). Diabetic patients on metformin also had a 23% improved survival compared with non-diabetic patients (HR 0.77; 95% CI 0.71 to 0.85). These associations were replicated using the Mayo Clinic EHR data. Many site-specific cancers including breast, colorectal, lung, and prostate demonstrated reduced mortality with metformin use in at least one EHR. Conclusions EHR data suggested that the use of metformin was associated with decreased mortality after a cancer diagnosis compared with diabetic and non-diabetic cancer patients not on metformin, indicating its potential as a chemotherapeutic regimen. This study serves as a model for robust and inexpensive validation studies for drug repurposing signals using EHR data.
Collapse
Affiliation(s)
- Hua Xu
- The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA
| | - Melinda C Aldrich
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Hongfang Liu
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Neeraja B Peterson
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Qi Dai
- Division of Epidemiology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Mia Levy
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Anushi Shah
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Xue Han
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Xiaoyang Ruan
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Min Jiang
- The University of Texas School of Biomedical Informatics at Houston, Houston, Texas, USA
| | - Ying Li
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Jamii St Julien
- Department of Thoracic Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Jeremy Warner
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Carol Friedman
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| |
Collapse
|
745
|
Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014; 52:199-211. [PMID: 25038555 DOI: 10.1016/j.jbi.2014.07.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 05/14/2014] [Accepted: 07/02/2014] [Indexed: 12/22/2022]
Abstract
The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. Furthermore, existing approaches are often disease-centric and specialized to the idiosyncrasies of the information technology and/or business practices of a single healthcare organization. In this paper, we propose Limestone, a nonnegative tensor factorization method to derive phenotype candidates with virtually no human supervision. Limestone represents the data source interactions naturally using tensors (a generalization of matrices). In particular, we investigate the interaction of diagnoses and medications among patients. The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and medications. Using the proposed method, multiple phenotypes can be identified simultaneously from data. We demonstrate the capability of Limestone on a cohort of 31,815 patient records from the Geisinger Health System. The dataset spans 7years of longitudinal patient records and was initially constructed for a heart failure onset prediction study. Our experiments demonstrate the robustness, stability, and the conciseness of Limestone-derived phenotypes. Our results show that using only 40 phenotypes, we can outperform the original 640 features (169 diagnosis categories and 471 medication types) to achieve an area under the receiver operator characteristic curve (AUC) of 0.720 (95% CI 0.715 to 0.725). Moreover, in consultation with a medical expert, we confirmed 82% of the top 50 candidates automatically extracted by Limestone are clinically meaningful.
Collapse
Affiliation(s)
- Joyce C Ho
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, United States.
| | - Joydeep Ghosh
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, United States
| | - Steve R Steinhubl
- Scripps Translational Science Institute, Scripps Health, La Jolla, CA 92037, United States
| | - Walter F Stewart
- Sutter Health Research, Development, and Dissemination Team, Sutter Health, Walnut Creek, CA 94598, United States
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, United States; Department of Medicine, Vanderbilt University, Nashville, TN 37232, United States
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, United States; Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37232, United States
| | - Jimeng Sun
- School of Computational Science and Engineering at College of Computing, Georgia Institute of Technology, Atlanta, GA 30332, United States
| |
Collapse
|
746
|
Mosley JD, Van Driest SL, Weeke PE, Delaney JT, Wells QS, Bastarache L, Roden DM, Denny JC. Integrating EMR-linked and in vivo functional genetic data to identify new genotype-phenotype associations. PLoS One 2014; 9:e100322. [PMID: 24949630 PMCID: PMC4065041 DOI: 10.1371/journal.pone.0100322] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 05/25/2014] [Indexed: 12/31/2022] Open
Abstract
The coupling of electronic medical records (EMR) with genetic data has created the potential for implementing reverse genetic approaches in humans, whereby the function of a gene is inferred from the shared pattern of morbidity among homozygotes of a genetic variant. We explored the feasibility of this approach to identify phenotypes associated with low frequency variants using Vanderbilt's EMR-based BioVU resource. We analyzed 1,658 low frequency non-synonymous SNPs (nsSNPs) with a minor allele frequency (MAF)<10% collected on 8,546 subjects. For each nsSNP, we identified diagnoses shared by at least 2 minor allele homozygotes and with an association p<0.05. The diagnoses were reviewed by a clinician to ascertain whether they may share a common mechanistic basis. While a number of biologically compelling clinical patterns of association were observed, the frequency of these associations was identical to that observed using genotype-permuted data sets, indicating that the associations were likely due to chance. To refine our analysis associations, we then restricted the analysis to 711 nsSNPs in genes with phenotypes in the On-line Mendelian Inheritance in Man (OMIM) or knock-out mouse phenotype databases. An initial comparison of the EMR diagnoses to the known in vivo functions of the gene identified 25 candidate nsSNPs, 19 of which had significant genotype-phenotype associations when tested using matched controls. Twleve of the 19 nsSNPs associations were confirmed by a detailed record review. Four of 12 nsSNP-phenotype associations were successfully replicated in an independent data set: thrombosis (F5,rs6031), seizures/convulsions (GPR98,rs13157270), macular degeneration (CNGB3,rs3735972), and GI bleeding (HGFAC,rs16844401). These analyses demonstrate the feasibility and challenges of using reverse genetics approaches to identify novel gene-phenotype associations in human subjects using low frequency variants. As increasing amounts of rare variant data are generated from modern genotyping and sequence platforms, model organism data may be an important tool to enable discovery.
Collapse
Affiliation(s)
- Jonathan D. Mosley
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Sara L. Van Driest
- Department of Pediatrics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Peter E. Weeke
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jessica T. Delaney
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Quinn S. Wells
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Lisa Bastarache
- Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dan M. Roden
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Josh C. Denny
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
- Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
747
|
Crawford DC, Crosslin DR, Tromp G, Kullo IJ, Kuivaniemi H, Hayes MG, Denny JC, Bush WS, Haines JL, Roden DM, McCarty CA, Jarvik GP, Ritchie MD. eMERGEing progress in genomics-the first seven years. Front Genet 2014; 5:184. [PMID: 24987407 PMCID: PMC4060012 DOI: 10.3389/fgene.2014.00184] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/30/2014] [Indexed: 12/15/2022] Open
Abstract
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
Collapse
Affiliation(s)
- Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Molecular Physiology and Biophysics, Vanderbilt University Nashville, TN, USA
| | - David R Crosslin
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic Rochester, MN, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University Chicago, IL, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA ; Department of Medicine, Vanderbilt University Nashville, TN, USA
| | - William S Bush
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA ; Institute for Computational Biology, Case Western Reserve University Cleveland, OH, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Nashville, TN, USA ; Department of Pharmacology, Vanderbilt University Nashville, TN, USA
| | | | - Gail P Jarvik
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University University Park, PA, USA ; Center for Systems Genomics, Pennsylvania State University University Park, PA, USA
| |
Collapse
|
748
|
Ried JS, Shin SY, Krumsiek J, Illig T, Theis FJ, Spector TD, Adamski J, Wichmann HE, Strauch K, Soranzo N, Suhre K, Gieger C. Novel genetic associations with serum level metabolites identified by phenotype set enrichment analyses. Hum Mol Genet 2014; 23:5847-57. [PMID: 24927737 DOI: 10.1093/hmg/ddu301] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Availability of standardized metabolite panels and genome-wide single-nucleotide polymorphism data endorse the comprehensive analysis of gene-metabolite association. Currently, many studies use genome-wide association analysis to investigate the genetic effects on single metabolites (mGWAS) separately. Such studies have identified several loci that are associated not only with one but with multiple metabolites, facilitated by the fact that metabolite panels often include metabolites of the same or related pathways. Strategies that analyse several phenotypes in a combined way were shown to be able to detect additional genetic loci. One of those methods is the phenotype set enrichment analysis (PSEA) that tests sets of metabolites for enrichment at genes. Here we applied PSEA on two different panels of serum metabolites together with genome-wide data. All analyses were performed as a two-step identification-validation approach, using data from the population-based KORA cohort and the TwinsUK study. In addition to confirming genes that were already known from mGWAS, we were able to identify and validate 12 new genes. Knowledge about gene function was supported by the enriched metabolite sets. For loci with unknown gene functions, the results suggest a function that is interrelated with the metabolites, and hint at the underlying pathways.
Collapse
Affiliation(s)
| | - So-Youn Shin
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1HH Hinxton, UK, MRC Integrative Epidemiology Unit, University of Bristol, BS8 2BN Bristol, UK
| | | | - Thomas Illig
- Research Unit of Molecular Epidemiology, Hannover Unified Biobank, Hannover Medical School, 30625 Hannover, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Department of Mathematics, Technische Universität München, 85748 Garching, Germany
| | - Tim D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London School of Medicine, St Thomas' Hospital, SE1 7EH London, UK
| | - Jerzy Adamski
- Institute of Experimental Genetics, Genome Analysis Center, Institute of Experimental Genetics, Life and Food Science Center Weihenstephan, Technische Universität München, 85354 Freising-Weihenstephan, Germany, German Center for Diabetes Research, 85764 Neuherberg, Germany
| | - H-Erich Wichmann
- Institute of Epidemiology I and Institute of Medical Informatics, Biometry and Epidemiology, Chair of Epidemiology and , Klinikum Grosshadern, 81377 Munich, Germany and
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig-Maximilians-Universität, 81377 Munich, Germany
| | - Nicole Soranzo
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1HH Hinxton, UK
| | - Karsten Suhre
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, 85764 Neuherberg, Germany, Department of Physiology and Biophysics, Weill Cornell Medical College, PO Box 24144 Doha, Qatar
| | | |
Collapse
|
749
|
Xing EP, Curtis RE, Schoenherr G, Lee S, Yin J, Puniyani K, Wu W, Kinnaird P. GWAS in a box: statistical and visual analytics of structured associations via GenAMap. PLoS One 2014; 9:e97524. [PMID: 24905018 PMCID: PMC4048179 DOI: 10.1371/journal.pone.0097524] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2013] [Accepted: 04/18/2014] [Indexed: 01/05/2023] Open
Abstract
With the continuous improvement in genotyping and molecular phenotyping technology and the decreasing typing cost, it is expected that in a few years, more and more clinical studies of complex diseases will recruit thousands of individuals for pan-omic genetic association analyses. Hence, there is a great need for algorithms and software tools that could scale up to the whole omic level, integrate different omic data, leverage rich structure information, and be easily accessible to non-technical users. We present GenAMap, an interactive analytics software platform that 1) automates the execution of principled machine learning methods that detect genome- and phenome-wide associations among genotypes, gene expression data, and clinical or other macroscopic traits, and 2) provides new visualization tools specifically designed to aid in the exploration of association mapping results. Algorithmically, GenAMap is based on a new paradigm for GWAS and PheWAS analysis, termed structured association mapping, which leverages various structures in the omic data. We demonstrate the function of GenAMap via a case study of the Brem and Kruglyak yeast dataset, and then apply it on a comprehensive eQTL analysis of the NIH heterogeneous stock mice dataset and report some interesting findings. GenAMap is available from http://sailing.cs.cmu.edu/genamap.
Collapse
Affiliation(s)
- Eric P. Xing
- Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Ross E. Curtis
- Joint Carnegie Mellon – University of Pittsburgh PhD Program in Computational Biology, Pittsburgh, Pennsylvania, United States of America
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Georg Schoenherr
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Seunghak Lee
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Junming Yin
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Kriti Puniyani
- Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Wei Wu
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Peter Kinnaird
- Human Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
750
|
Hall JB, Dumitrescu L, Dilks HH, Crawford DC, Bush WS. Accuracy of administratively-assigned ancestry for diverse populations in an electronic medical record-linked biobank. PLoS One 2014; 9:e99161. [PMID: 24896101 PMCID: PMC4045967 DOI: 10.1371/journal.pone.0099161] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2014] [Accepted: 05/12/2014] [Indexed: 11/19/2022] Open
Abstract
Recently, the development of biobanks linked to electronic medical records has presented new opportunities for genetic and epidemiological research. Studies based on these resources, however, present unique challenges, including the accurate assignment of individual-level population ancestry. In this work we examine the accuracy of administratively-assigned race in diverse populations by comparing assigned races to genetically-defined ancestry estimates. Using 220 ancestry informative markers, we generated principal components for patients in our dataset, which were used to cluster patients into groups based on genetic ancestry. Consistent with other studies, we find a strong overall agreement (Kappa = 0.872) between genetic ancestry and assigned race, with higher rates of agreement for African-descent and European-descent assignments, and reduced agreement for Hispanic, East Asian-descent, and South Asian-descent assignments. These results suggest caution when selecting study samples of non-African and non-European backgrounds when administratively-assigned race from biobanks is used.
Collapse
Affiliation(s)
- Jacob B. Hall
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Logan Dumitrescu
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Holli H. Dilks
- Vanderbilt Technologies for Advanced Genomics (VANTAGE), Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - William S. Bush
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|