801
|
Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 2013; 127:1377-85. [PMID: 23463857 DOI: 10.1161/circulationaha.112.000604] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias. METHODS AND RESULTS We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium QRS genome-wide association study meta-analysis. Twenty-three single-nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 single-nucleotide polymorphisms were in the chromosome 3 SCN5A and SCN10A loci, where the most significant single-nucleotide polymorphisms were rs1805126 in SCN5A with P=1.2×10(-8) (eMERGE) and P=2.5×10(-20) (CHARGE) and rs6795970 in SCN10A with P=6×10(-6) (eMERGE) and P=5×10(-27) (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies on variants in these 5 loci in 13859 European Americans to search for diagnoses associated with these markers. Phenome-wide association study identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5272 "heart-healthy" study population. CONCLUSIONS We conclude that DNA biobanks coupled to electronic medical records not only provide a platform for genome-wide association study but also may allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The phenome-wide association study approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
802
|
Hebbring SJ, Schrodi SJ, Ye Z, Zhou Z, Page D, Brilliant MH. A PheWAS approach in studying HLA-DRB1*1501. Genes Immun 2013; 14:187-91. [PMID: 23392276 DOI: 10.1038/gene.2013.2] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
HLA-DRB1 codes for a major histocompatibility complex class II cell surface receptor. Genetic variants in and around this gene have been linked to numerous autoimmune diseases. Most notably, an association between HLA-DRB1*1501 haplotype and multiple sclerosis (MS) has been defined. Utilizing electronic health records and 4235 individuals within Marshfield Clinic's Personalized Medicine Research Project, a reverse genetic screen coined phenome-wide association study (PheWAS) tested association of rs3135388 genotype (tagging HLA-DRB1*1501) with 4841 phenotypes. As expected, HLA-DRB1*1501 was associated with MS (International Classification of Disease version 9-CM (ICD9) 340, P=0.023), whereas the strongest association was with alcohol-induced cirrhosis of the liver (ICD9 571.2, P=0.00011). HLA-DRB1*1501 also demonstrated association with erythematous conditions (ICD9 695, P=0.0054) and benign neoplasms of the respiratory and intrathoracic organs (ICD9 212, P=0.042), replicating previous findings. This study not only builds on the feasibility/utility of the PheWAS approach, represents the first external validation of a PheWAS, but may also demonstrate the complex etiologies associated with the HLA-DRB1*1501 loci.
Collapse
Affiliation(s)
- S J Hebbring
- Department of Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA.
| | | | | | | | | | | |
Collapse
|
803
|
Enabling genomic-phenomic association discovery without sacrificing anonymity. PLoS One 2013; 8:e53875. [PMID: 23405076 PMCID: PMC3566194 DOI: 10.1371/journal.pone.0053875] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 12/03/2012] [Indexed: 01/08/2023] Open
Abstract
Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data.
Collapse
|
804
|
Pendergrass SA, Brown-Gentry K, Dudek S, Frase A, Torstenson ES, Goodloe R, Ambite JL, Avery CL, Buyske S, Bůžková P, Deelman E, Fesinmeyer MD, Haiman CA, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Kooperberg C, Le Marchand L, Lin Y, Matise TC, Monroe KR, Moreland L, Park SL, Reiner A, Wallace R, Wilkens LR, Crawford DC, Ritchie MD. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet 2013; 9:e1003087. [PMID: 23382687 PMCID: PMC3561060 DOI: 10.1371/journal.pgen.1003087] [Citation(s) in RCA: 132] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 09/12/2012] [Indexed: 01/13/2023] Open
Abstract
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype-phenotype associations, 26 represented phenotypes closely related to previously known genotype-phenotype associations, and 33 represented potentially novel genotype-phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Sarah A. Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Kristin Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Scott Dudek
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Alex Frase
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| | - Eric S. Torstenson
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jose Luis Ambite
- Information Sciences Institute, University of Southern California, Marina del Rey, California, United States of America
| | - Christy L. Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Steve Buyske
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Petra Bůžková
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ewa Deelman
- Information Sciences Institute, University of Southern California, Marina del Rey, California, United States of America
| | - Megan D. Fesinmeyer
- Division of Public Health, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Christopher A. Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Gerardo Heiss
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Lucia A. Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chu-Nan Hsu
- Information Sciences Institute, University of Southern California, Marina del Rey, California, United States of America
| | | | - Charles Kooperberg
- Division of Public Health, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Yi Lin
- Division of Public Health, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Tara C. Matise
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Kristine R. Monroe
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Larry Moreland
- University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Sungshim L. Park
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Alex Reiner
- Division of Public Health, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Robert Wallace
- Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Lynn R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Marylyn D. Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, Pennsylvania, United States of America
| |
Collapse
|
805
|
BUSH WILLIAMS, BOSTON JONATHAN, PENDERGRASS SARAHA, DUMITRESCU LOGAN, GOODLOE ROBERT, BROWN-GENTRY KRISTIN, WILSON SARAH, MCCLELLAN BOB, TORSTENSON ERIC, BASFORD MELISSAA, SPENCER KYLEEL, RITCHIE MARYLYND, CRAWFORD DANAC. Enabling high-throughput genotype-phenotype associations in the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project as part of the Population Architecture using Genomics and Epidemiology (PAGE) study. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:373-84. [PMID: 23424142 PMCID: PMC3579641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype-phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta-analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.
Collapse
Affiliation(s)
- WILLIAM S. BUSH
- Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland, Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - JONATHAN BOSTON
- Center for Human Genetics Research, Vanderbilt University, 1207 17 Avenue, Suite 300, Nashville, TN 37232, USA
| | - SARAH A. PENDERGRASS
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 503 Wartik Lab, University Park, PA 16802, USA
| | - LOGAN DUMITRESCU
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - ROBERT GOODLOE
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - KRISTIN BROWN-GENTRY
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - SARAH WILSON
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - BOB MCCLELLAN
- Center for Human Genetics Research, Vanderbilt University, 1207 17Avenue, Suite 300, Nashville, TN 37232, USA
| | - ERIC TORSTENSON
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| | - MELISSA A. BASFORD
- Office of Research, Office of Personalized Medicine, Vanderbilt University, 2525 West End Avenue, Nashville, TN 37203, USA
| | - KYLEE L. SPENCER
- Biology and Environmental Science, Heidelberg University, Bareis Hall 131, 310 East Market Street, Tiffin, OH 44883, USA
| | - MARYLYN D. RITCHIE
- Center for System Genomics, Department of Biochemistry and Molecular Biology,, Pennsylvania State University, 512 Wartik Lab, University Park, PA 16802, USA
| | - DANA C. CRAWFORD
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, TN 37232, USA
| |
Collapse
|
806
|
Abstract
Technology continues to lead the field of personalized medicine as the interpretation of the human genome is progressing. The cost and duration of genomic sequencing continue to decrease sharply and there is intensive research aimed at understanding how the changes that occur within the genome can alter its function and the genomic variations that constitute individual susceptibility to diseases and responses to therapy. The overlay of a personal genome with the personal medical record of patients has a potential to improve prediction and prevention and to allow a more pro-active therapeutic strategy. It is evident that pharmacogenomics and individualized drug therapy are the building blocks of personalized medicine. A growing number of drugs are now used for the treatment of cancer in subjects selected by a companion genetic test. Personalized medicine while based upon genomic knowledge of the individual requires equally essential personalised environmental information as well as the understanding of every subject's capacity for health-promoting behaviour.
Collapse
Affiliation(s)
- Johanne Tremblay
- Research Centre, Centre hospitalier de l'Université de Montréal-Technopôle Angus, 2901 Rachel Street East, Room 314, H1W 4A4, Montreal, Quebec, Canada
| | | |
Collapse
|
807
|
Abstract
Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.
Collapse
Affiliation(s)
- Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America.
| |
Collapse
|
808
|
Pathak J, Kiefer RC, Bielinski SJ, Chute CG. Applying semantic web technologies for phenome-wide scan using an electronic health record linked Biobank. J Biomed Semantics 2012; 3:10. [PMID: 23244446 PMCID: PMC3554594 DOI: 10.1186/2041-1480-3-10] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 08/22/2012] [Indexed: 01/12/2023] Open
Abstract
Background The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation. Results In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries. Conclusions This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
Collapse
Affiliation(s)
- Jyotishman Pathak
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| | | | | | | |
Collapse
|
809
|
Thompson WK, Rasmussen LV, Pacheco JA, Peissig PL, Denny JC, Kho AN, Miller A, Pathak J. An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:911-920. [PMID: 23304366 PMCID: PMC3540514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.
Collapse
|
810
|
Warner JL, Alterovitz G. Phenome based analysis as a means for discovering context dependent clinical reference ranges. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:1441-1449. [PMID: 23304424 PMCID: PMC3540498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Robust electronic medical records (EMR's) have made large-scale phenome-based analysis feasible. The context-dependent phenome of a large ICU-based EMR database (MIMIC II) was explored, as a function of a clinical feature: white blood cell count (WBC). Phenome visualization led to the discovery that peak WBC in the range 15-45 K/μl was highly associated with the diagnoses of Clostridium difficile and bacterial sepsis; thus, it is conceivable that clinicians might delay ordering targeted antimicrobials towards C. difficile for patients with peak WBC in this range. This hypothesis was confirmed, with significant delays in this group (median 135 vs. 85 hours, p = 0.002). These delays could be associated with adverse effects on patient health and high hospitalization costs (e.g. an additional $3,000,000 for the MIMIC II cohort). In conclusion, context-dependent clinical reference ranges are critical to clinical decision making; furthermore, important findings can be discovered through EMR-driven phenome association studies.
Collapse
Affiliation(s)
- Jeremy L Warner
- Division of Hematology/Oncology, Vanderbilt University, Nashville, TN, USA
| | | |
Collapse
|
811
|
Pathak J, Kiefer RC, Bielinski SJ, Chute CG. Mining the human phenome using semantic web technologies: a case study for Type 2 Diabetes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:699-708. [PMID: 23304343 PMCID: PMC3540447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form "biobanks" where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
Collapse
Affiliation(s)
- Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | | | | |
Collapse
|
812
|
Gottesman O, Drill E, Lotay V, Bottinger E, Peter I. Can genetic pleiotropy replicate common clinical constellations of cardiovascular disease and risk? PLoS One 2012; 7:e46419. [PMID: 23029515 PMCID: PMC3460880 DOI: 10.1371/journal.pone.0046419] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Accepted: 08/29/2012] [Indexed: 02/07/2023] Open
Abstract
The relationship between obesity, diabetes, hyperlipidemia, hypertension, kidney disease and cardiovascular disease (CVD) is established when looked at from a clinical, epidemiological or pathophysiological perspective. Yet, when viewed from a genetic perspective, there is comparatively little data synthesis that these conditions have an underlying relationship. We sought to investigate the overlap of genetic variants independently associated with each of these commonly co-existing conditions from the NHGRI genome-wide association study (GWAS) catalog, in an attempt to replicate the established notion of shared pathophysiology and risk. We used pathway-based analyses to detect subsets of pleiotropic genes involved in similar biological processes. We identified 107 eligible GWAS studies related to CVD and its established comorbidities and risk factors and assigned genes that correspond to the associated signals based on their position. We found 44 positional genes shared across at least two CVD-related phenotypes that independently recreated the established relationship between the six phenotypes, but only if studies representing non-European populations were included. Seven genes revealed pleiotropy across three or more phenotypes, mostly related to lipid transport and metabolism. Yet, many genes had no relationship to each other or to genes with established functional connection. Whilst we successfully reproduced established relationships between CVD risk factors using GWAS findings, interpretation of biological pathways involved in the observed pleiotropy was limited. Further studies linking genetic variation to gene expression, as well as describing novel biological pathways will be needed to take full advantage of GWAS results.
Collapse
Affiliation(s)
- Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Mount Sinai School of Medicine, New York, New York, United States of America.
| | | | | | | | | |
Collapse
|
813
|
Hanauer DA, Ramakrishnan N. Modeling temporal relationships in large scale clinical associations. J Am Med Inform Assoc 2012; 20:332-41. [PMID: 23019240 DOI: 10.1136/amiajnl-2012-001117] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
OBJECTIVE We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients. METHODS We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance. RESULTS We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect. DISCUSSION Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation. CONCLUSIONS Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109-5940, USA.
| | | |
Collapse
|
814
|
Roden DM, Xu H, Denny JC, Wilke RA. Electronic medical records as a tool in clinical pharmacology: opportunities and challenges. Clin Pharmacol Ther 2012; 91:1083-86. [PMID: 22534870 DOI: 10.1038/clpt.2012.42] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The development and increasing sophistication of electronic medical record (EMR) systems hold the promise of not only improving patient care but also providing unprecedented opportunities for discovery in the fields of basic, translational, and implementation sciences. Clinical pharmacology research in the EMR environment has only recently started to become a reality, with EMRs becoming increasingly populated, methods to mine drug response and other phenotypes becoming more sophisticated, and links being established with DNA repositories.
Collapse
Affiliation(s)
- D M Roden
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA.
| | | | | | | |
Collapse
|
815
|
Altman RB. Translational bioinformatics: linking the molecular world to the clinical world. Clin Pharmacol Ther 2012; 91:994-1000. [PMID: 22549287 DOI: 10.1038/clpt.2012.49] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care.
Collapse
Affiliation(s)
- R B Altman
- Department of Bioengineering, Stanford University, Stanford, California, USA.
| |
Collapse
|
816
|
Lonergan DF, Ehrenfeld JM. Advancement of information technology in outpatient and perioperative settings to support patient care and translational research. Pain Manag 2012; 2:445-9. [DOI: 10.2217/pmt.12.43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
SUMMARY Information systems assist in documentation and clinical decision support in settings ranging from an outpatient clinical encounter to the monitoring in an operating room. Such information, if stored and categorized well in a centralized database, offers a treasure trove of information for translational researchers. At Vanderbilt University Medical Center (TN, USA), there is an ongoing effort to advance information systems in all areas and couple this data with a robust genetic repository. It is hoped that such an effort will achieve improvements in quality of care and decreases in costs, while simultaneously providing a fertile ground for translational research.
Collapse
Affiliation(s)
- Daniel F Lonergan
- Division of Pain Medicine, Department of Anesthesiology, Vanderbilt University Medical Center, TN, USA
| | - Jesse M Ehrenfeld
- Assistant Professor of Anesthesiology & Biomedical Informatics Director, Center for Evidence-Based Anesthesia Director, Perioperative Data Systems Research Medical Director, Perioperative Quality Department of Anesthesiology, Vanderbilt University Medical Center, TN, USA
| |
Collapse
|
817
|
Abstract
The growing need for performing large-scale and low-cost biomedical studies has led organizations to promote the reuse of patient data. For instance, the National Institutes of Health in the US requires patient-specific data collected and analyzed in the context of Genome-Wide Association Studies (GWAS) to be deposited into a biorepository and broadly disseminated. While essential to comply with regulations, disseminating such data risks privacy breaches, because patients genomic sequences can be linked to their identities through diagnosis codes. This work proposes a novel approach that prevents this type of data linkage by modifying diagnosis codes to limit the probability of associating a patients identity to their genomic sequence. Our approach employs an effective algorithm that uses generalization and suppression of diagnosis codes to preserve privacy and takes into account the intended uses of the disseminated data to guarantee utility. We also present extensive experiments using several datasets derived from the Electronic Medical Record (EMR) system of the Vanderbilt University Medical Center, as well as a large-scale case-study using the EMRs of 79K patients, which are linked to DNA contained in the Vanderbilt University biobank. Our results verify that our approach generates anonymized data that permit accurate biomedical analysis in tasks including case count studies and GWAS.
Collapse
|
818
|
Pendergrass SA, Dudek SM, Crawford DC, Ritchie MD. Visually integrating and exploring high throughput Phenome-Wide Association Study (PheWAS) results using PheWAS-View. BioData Min 2012; 5:5. [PMID: 22682510 PMCID: PMC3476448 DOI: 10.1186/1756-0381-5-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 06/08/2012] [Indexed: 11/10/2022] Open
Abstract
Background Phenome-Wide Association Studies (PheWAS) can be used to investigate the association between single nucleotide polymorphisms (SNPs) and a wide spectrum of phenotypes. This is a complementary approach to Genome Wide Association studies (GWAS) that calculate the association between hundreds of thousands of SNPs and one or a limited range of phenotypes. The extensive exploration of the association between phenotypic structure and genotypic variation through PheWAS produces a set of complex and comprehensive results. Integral to fully inspecting, analysing, and interpreting PheWAS results is visualization of the data. Results We have developed the software PheWAS-View for visually integrating PheWAS results, including information about the SNPs, relevant genes, phenotypes, and the interrelationships between phenotypes, that exist in PheWAS. As a result both the fine grain detail as well as the larger trends that exist within PheWAS results can be elucidated. Conclusions PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation – and these results can be both explored and presented with PheWAS-View. PheWAS-View is freely available for non-commercial research institutions, for full details see http://ritchielab.psu.edu/ritchielab/software.
Collapse
Affiliation(s)
- Sarah A Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, University Park, PA, USA.
| | | | | | | |
Collapse
|
819
|
Ashley EA, Hershberger RE, Caleshu C, Ellinor PT, Garcia JGN, Herrington DM, Ho CY, Johnson JA, Kittner SJ, Macrae CA, Mudd-Martin G, Rader DJ, Roden DM, Scholes D, Sellke FW, Towbin JA, Van Eyk J, Worrall BB. Genetics and cardiovascular disease: a policy statement from the American Heart Association. Circulation 2012; 126:142-57. [PMID: 22645291 DOI: 10.1161/cir.0b013e31825b07f8] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [MESH Headings] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
820
|
Wason J, Dudbridge F. A general framework for two-stage analysis of genome-wide association studies and its application to case-control studies. Am J Hum Genet 2012; 90:760-73. [PMID: 22560088 DOI: 10.1016/j.ajhg.2012.03.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 02/17/2012] [Accepted: 03/09/2012] [Indexed: 02/03/2023] Open
Abstract
Two-stage analyses of genome-wide association studies have been proposed as a means to improving power for designs including family-based association and gene-environment interaction testing. In these analyses, all markers are first screened via a statistic that may not be robust to an underlying assumption, and the markers thus selected are then analyzed in a second stage with a test that is independent from the first stage and is robust to the assumption in question. We give a general formulation of two-stage designs and show how one can use this formulation both to derive existing methods and to improve upon them, opening up a range of possible further applications. We show how using simple regression models in conjunction with external data such as average trait values can improve the power of genome-wide association studies. We focus on case-control studies and show how it is possible to use allele frequencies derived from an external reference to derive a powerful two-stage analysis. An illustration involving the Wellcome Trust Case-Control Consortium data shows several genome-wide-significant associations, subsequently validated, that were not significant in the standard analysis. We give some analytic properties of the methods and discuss some underlying principles.
Collapse
|
821
|
Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13:395-405. [PMID: 22549152 DOI: 10.1038/nrg3208] [Citation(s) in RCA: 745] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Clinical data describing the phenotypes and treatment of patients represents an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for establishing new patient-stratification principles and for revealing unknown disease correlations. Integrating EHR data with genetic data will also give a finer understanding of genotype-phenotype relationships. However, a broad range of ethical, legal and technical reasons currently hinder the systematic deposition of these data in EHRs and their mining. Here, we consider the potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality.
Collapse
|
822
|
Manolio TA, Weis BK, Cowie CC, Hoover RN, Hudson K, Kramer BS, Berg C, Collins R, Ewart W, Gaziano JM, Hirschfeld S, Marcus PM, Masys D, McCarty CA, McLaughlin J, Patel AV, Peakman T, Pedersen NL, Schaefer C, Scott JA, Sprosen T, Walport M, Collins FS. New models for large prospective studies: is there a better way? Am J Epidemiol 2012; 175:859-66. [PMID: 22411865 PMCID: PMC3339313 DOI: 10.1093/aje/kwr453] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Large prospective cohort studies are critical for identifying etiologic factors for disease, but they require substantial long-term research investment. Such studies can be conducted as multisite consortia of academic medical centers, combinations of smaller ongoing studies, or a single large site such as a dominant regional health-care provider. Still another strategy relies upon centralized conduct of most or all aspects, recruiting through multiple temporary assessment centers. This is the approach used by a large-scale national resource in the United Kingdom known as the “UK Biobank,” which completed recruitment/examination of 503,000 participants between 2007 and 2010 within budget and ahead of schedule. A key lesson from UK Biobank and similar studies is that large studies are not simply small studies made large but, rather, require fundamentally different approaches in which “process” expertise is as important as scientific rigor. Embedding recruitment in a structure that facilitates outcome determination, utilizing comprehensive and flexible information technology, automating biospecimen processing, ensuring broad consent, and establishing essentially autonomous leadership with appropriate oversight are all critical to success. Whether and how these approaches may be transportable to the United States remain to be explored, but their success in studies such as UK Biobank makes a compelling case for such explorations to begin.
Collapse
Affiliation(s)
- Teri A Manolio
- Office of Population Genomics, National Human Genome Research Institute, Bethesda, Maryland, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
823
|
Canim M, Kantarcioglu M, Malin B. Secure management of biomedical data with cryptographic hardware. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2012; 16:166-75. [PMID: 22010157 PMCID: PMC4156282 DOI: 10.1109/titb.2011.2171701] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The biomedical community is increasingly migrating toward research endeavors that are dependent on large quantities of genomic and clinical data. At the same time, various regulations require that such data be shared beyond the initial collecting organization (e.g., an academic medical center). It is of critical importance to ensure that when such data are shared, as well as managed, it is done so in a manner that upholds the privacy of the corresponding individuals and the overall security of the system. In general, organizations have attempted to achieve these goals through deidentification methods that remove explicitly, and potentially, identifying features (e.g., names, dates, and geocodes). However, a growing number of studies demonstrate that deidentified data can be reidentified to named individuals using simple automated methods. As an alternative, it was shown that biomedical data could be shared, managed, and analyzed through practical cryptographic protocols without revealing the contents of any particular record. Yet, such protocols required the inclusion of multiple third parties, which may not always be feasible in the context of trust or bandwidth constraints. Thus, in this paper, we introduce a framework that removes the need for multiple third parties by collocating services to store and to process sensitive biomedical data through the integration of cryptographic hardware. Within this framework, we define a secure protocol to process genomic data and perform a series of experiments to demonstrate that such an approach can be run in an efficient manner for typical biomedical investigations.
Collapse
Affiliation(s)
- Mustafa Canim
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083 USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083 USA
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203 USA
| |
Collapse
|
824
|
Delaney JT, Ramirez AH, Bowton E, Pulley JM, Basford MA, Schildcrout JS, Shi Y, Zink R, Oetjens M, Xu H, Cleator JH, Jahangir E, Ritchie MD, Masys DR, Roden DM, Crawford DC, Denny JC. Predicting clopidogrel response using DNA samples linked to an electronic health record. Clin Pharmacol Ther 2011; 91:257-63. [PMID: 22190063 DOI: 10.1038/clpt.2011.221] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Variants in ABCB1 and CYP2C19 have been identified as predictors of cardiac events during clopidogrel therapy initiated after myocardial infarction (MI) or percutaneous coronary intervention (PCI). In addition, PON1 has recently been associated with stent thrombosis. The reported effects of these variants have not yet been replicated in a real-world setting. We used BioVU, the Vanderbilt DNA repository linked to de-identified electronic health records (EHRs), to find data on patients who were on clopidogrel treatment after an MI and/or a PCI; among these, we identified those who had experienced one or more recurrent cardiac events while on treatment (cases, n = 225) and those who had not experienced any cardiac event while on treatment (controls, n = 468). We found that CYP2C19*2 (hazard ratio (HR) 1.54, 95% confidence interval (CI) 1.16-2.06, P = 0.003) and ABCB1 (HR 1.28, 95% CI 1.04-1.57, P = 0.018), but not PON1 (HR 0.91, 95% CI 0.73-1.12, P = 0.370), were associated with recurrent events. In this population, genetic signals for clopidogrel resistance in ABCB1 and CYP2C19 were replicated, supporting the use of EHRs for pharmacogenomic studies. Our data do not show an association between PON1 and recurrent cardiovascular events.
Collapse
Affiliation(s)
- J T Delaney
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
825
|
Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, Rudan I, McKeigue P, Wilson JF, Campbell H. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet 2011; 89:607-18. [PMID: 22077970 DOI: 10.1016/j.ajhg.2011.10.004] [Citation(s) in RCA: 383] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2011] [Revised: 08/25/2011] [Accepted: 10/07/2011] [Indexed: 11/19/2022] Open
Abstract
We present a systematic review of pleiotropy among SNPs and genes reported to show genome-wide association with common complex diseases and traits. We find abundant evidence of pleiotropy; 233 (16.9%) genes and 77 (4.6%) SNPs show pleiotropic effects. SNP pleiotropic status was associated with gene location (p = 0.024; pleiotropic SNPs more often exonic [14.5% versus 4.9% for nonpleiotropic, trait-associated SNPs] and less often intergenic [15.8% versus 23.6%]), "predicted transcript consequence" (p = 0.001; pleiotropic SNPs more often predicted to be structurally deleterious [5% versus 0.4%] but not more often in regulatory sequences), and certain disease classes. We develop a method to calculate the likelihood that pleiotropic links between traits occurred more often than expected and demonstrate that this approach can identify etiological links that are already known (such as between fetal hemoglobin and malaria risk) and those that are not yet established (e.g., between plasma campesterol levels and gallstones risk; and between immunoglobulin A and juvenile idiopathic arthritis). Examples of pleiotropy will accumulate over time, but it is already clear that pleiotropy is a common property of genes and SNPs associated with disease traits, and this will have implications for identification of molecular targets for drug development, future genetic risk-profiling, and classification of diseases.
Collapse
Affiliation(s)
- Shanya Sivakumaran
- Centre for Population Health Sciences, The University of Edinburgh, Edinburgh EH8 9AG, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
826
|
Kohane IS, Churchill SE, Murphy SN. A translational engine at the national scale: informatics for integrating biology and the bedside. J Am Med Inform Assoc 2011; 19:181-5. [PMID: 22081225 DOI: 10.1136/amiajnl-2011-000492] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Informatics for integrating biology and the bedside (i2b2) seeks to provide the instrumentation for using the informational by-products of health care and the biological materials accumulated through the delivery of health care to conduct discovery research and to study the healthcare system in vivo. This complements existing efforts such as prospective cohort studies or trials outside the delivery of routine health care. i2b2 has been used to generate genome-wide studies at less than one tenth the cost and one tenth the time of conventionally performed studies as well as to identify important risk from commonly used medications. i2b2 has been adopted by over 60 academic health centers internationally.
Collapse
Affiliation(s)
- Isaac S Kohane
- Harvard Medical School Center for Biomedical Informatics, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
827
|
Abstract
A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future.
Collapse
Affiliation(s)
- Russ B Altman
- Department of Bioengineering, Stanford University School of Medicine, Stanford, California 94305-5444, USA.
| | | |
Collapse
|
828
|
Wright A, Pang J, Feblowitz JC, Maloney FL, Wilcox AR, Ramelson HZ, Schneider LI, Bates DW. A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record. J Am Med Inform Assoc 2011; 18:859-67. [PMID: 21613643 PMCID: PMC3197992 DOI: 10.1136/amiajnl-2011-000121] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2010] [Accepted: 04/25/2011] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Accurate knowledge of a patient's medical problems is critical for clinical decision making, quality measurement, research, billing and clinical decision support. Common structured sources of problem information include the patient problem list and billing data; however, these sources are often inaccurate or incomplete. OBJECTIVE To develop and validate methods of automatically inferring patient problems from clinical and billing data, and to provide a knowledge base for inferring problems. STUDY DESIGN AND METHODS We identified 17 target conditions and designed and validated a set of rules for identifying patient problems based on medications, laboratory results, billing codes, and vital signs. A panel of physicians provided input on a preliminary set of rules. Based on this input, we tested candidate rules on a sample of 100,000 patient records to assess their performance compared to gold standard manual chart review. The physician panel selected a final rule for each condition, which was validated on an independent sample of 100,000 records to assess its accuracy. RESULTS Seventeen rules were developed for inferring patient problems. Analysis using a validation set of 100,000 randomly selected patients showed high sensitivity (range: 62.8-100.0%) and positive predictive value (range: 79.8-99.6%) for most rules. Overall, the inference rules performed better than using either the problem list or billing data alone. CONCLUSION We developed and validated a set of rules for inferring patient problems. These rules have a variety of applications, including clinical decision support, care improvement, augmentation of the problem list, and identification of patients for research cohorts.
Collapse
Affiliation(s)
- Adam Wright
- Department of General Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | |
Collapse
|
829
|
Carroll RJ, Eyler AE, Denny JC. Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:189-96. [PMID: 22195070 PMCID: PMC3243261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Electronic Health Records (EHRs) provide a real-world patient cohort for clinical and genomic research. Phenotype identification using informatics algorithms has been shown to replicate known genetic associations found in clinical trials and observational cohorts. However, development of accurate phenotype identification methods can be challenging, requiring significant time and effort. We applied Support Vector Machines (SVMs) to both naïve (i.e., non-curated) and expert-defined collections of EHR features to identify Rheumatoid Arthritis cases using billing codes, medication exposures, and natural language processing-derived concepts. SVMs trained on naïve and expert-defined data outperformed an existing deterministic algorithm; the best performing naïve system had precision of 0.94 and recall of 0.87, compared to precision of 0.75 and recall of 0.51 for the deterministic algorithm. We show that with an expert defined feature set as few as 50-100 training samples are required. This study demonstrates that SVMs operating on non-curated sets of attributes can accurately identify cases from an EHR.
Collapse
Affiliation(s)
- Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | | |
Collapse
|
830
|
Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet 2011; 89:529-42. [PMID: 21981779 PMCID: PMC3188836 DOI: 10.1016/j.ajhg.2011.09.008] [Citation(s) in RCA: 201] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Revised: 09/15/2011] [Accepted: 09/15/2011] [Indexed: 12/20/2022] Open
Abstract
We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Collapse
Affiliation(s)
- Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
831
|
Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, Crawford DC, Haiman CA, Heiss G, Kooperberg C, Marchand LL, Manolio TA, North KE, Peters U, Ritchie MD, Hindorff LA, Haines JL. The Next PAGE in understanding complex traits: design for the analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study. Am J Epidemiol 2011; 174:849-59. [PMID: 21836165 PMCID: PMC3176830 DOI: 10.1093/aje/kwr160] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Genetic studies have identified thousands of variants associated with complex traits. However, most association studies are limited to populations of European descent and a single phenotype. The Population Architecture using Genomics and Epidemiology (PAGE) Study was initiated in 2008 by the National Human Genome Research Institute to investigate the epidemiologic architecture of well-replicated genetic variants associated with complex diseases in several large, ethnically diverse population-based studies. Combining DNA samples and hundreds of phenotypes from multiple cohorts, PAGE is well-suited to address generalization of associations and variability of effects in diverse populations; identify genetic and environmental modifiers; evaluate disease subtypes, intermediate phenotypes, and biomarkers; and investigate associations with novel phenotypes. PAGE investigators harmonize phenotypes across studies where possible and perform coordinated cohort-specific analyses and meta-analyses. PAGE researchers are genotyping thousands of genetic variants in up to 121,000 DNA samples from African-American, white, Hispanic/Latino, Asian/Pacific Islander, and American Indian participants. Initial analyses will focus on single nucleotide polymorphisms (SNPs) associated with obesity, lipids, cardiovascular disease, type 2 diabetes, inflammation, various cancers, and related biomarkers. PAGE SNPs are also assessed for pleiotropy using the “phenome-wide association study” approach, testing each SNP for associations with hundreds of phenotypes. PAGE data will be deposited into the National Center for Biotechnology Information's Database of Genotypes and Phenotypes and made available via a custom browser.
Collapse
Affiliation(s)
- Tara C Matise
- Department of Genetics, School of Arts and Sciences, Rutgers University, Piscataway, New Jersey, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
832
|
CYP4A11 variant is associated with high-density lipoprotein cholesterol in women. THE PHARMACOGENOMICS JOURNAL 2011; 13:44-51. [PMID: 21912424 PMCID: PMC3380161 DOI: 10.1038/tpj.2011.40] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The ω-hydroxylase CYP4A11 catalyzes the transformation of epoxyeicosatrienoic acids to omega-hydroxylated-epoxyeicosatrienoic acids, endogenous peroxisome proliferator-activated receptor α (PPARα) agonists. PPARα activation increases high-density lipoprotein-cholesterol (HDL-C). A cytosine-for-thymidine (T8590C) variant of CYP4A11 encodes for a ω-hydroxylase with reduced activity. This study examined the relationship between CYP4A11 T8590C genotype and metabolic parameters in the Framingham Offspring Study and in a clinical practice-based biobank, BioVU. In women in the Framingham Offspring Study, the CYP4A11 8590C allele was associated with reduced HDL-C concentrations (54.2±0.9 mg/dL in CYP4A11 CC or CT genotype women versus 56.7±0.5 mg/dL in TT women, p=0.02), and with an increased prevalence of low HDL-C, defined categorically as ≤50mg/dL [odds ratio 1.39 (95% CI 1.02-1.90), p=0.04]. In the BioVU cohort, the CYP4A11 8590C allele was also associated with low HDL-C in women [odds ratio 1.69 (95% CI 1.03-2.77, p=0.04)]. There was no relationship between genotype and HDL-C in men in either cohort.
Collapse
|
833
|
Zöller B, Li X, Ohlsson H, Sundquist J, Sundquist K. Venous thromboembolism does not share strong familial susceptibility with ischemic stroke: a nationwide family study in Sweden. ACTA ACUST UNITED AC 2011; 4:484-90. [PMID: 21880672 DOI: 10.1161/circgenetics.111.959882] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Coagulation allelic variants associated with venous thromboembolism (VTE) have been suggested to be involved in the pathogenesis of ischemic stroke. This nationwide study aimed at determining whether VTE shares familial susceptibility with ischemic stroke. METHOD AND RESULTS The Swedish Multigeneration Register of 0- to 75-year-old subjects was linked to the Swedish Hospital Discharge Register and the Cause of Death Register for the period 1987 to 2007. Odds ratios (ORs) for VTE and ischemic stroke were determined in 2 ways: odds of ischemic stroke in offspring whose parents had been diagnosed with VTE, and odds of VTE in offspring whose parents had been diagnosed with ischemic stroke. The analyses were repeated for siblings and spouses. Offspring of parents with VTE (n=25,929) were at increased risk for ischemic stroke (n=5595): OR, 1.10 (95% confidence interval [CI], 1.06-1.14). Siblings of probands with VTE (n=45,132) had no increased risk of ischemic stroke (n=1716): OR, 1.05 (95% CI, 1.00-1.11). Spouses of probands with VTE (n=24,106) were at increased risk for ischemic stroke (n=940): OR, 1.18 (95% CI, 1.10-1.27). The risks for VTE in relatives of probands with ischemic stroke were OR, 1.15; 95% CI, 1.10-1.21 (offspring); OR, 1.07; 95% CI, 1.02-1.12 (siblings); and OR, 1.21; 95% CI, 1.11-1.32 (spouses). CONCLUSIONS VTE does not share strong familial susceptibility with ischemic stroke in the Swedish population. Moreover, familial nongenetic factors contribute to the observed weak familial associations. The present study suggests that it is unlikely that strong shared disease-causing mutations exist to a large extent in the Swedish population.
Collapse
Affiliation(s)
- Bengt Zöller
- Center for Primary Health Care Research, Region Skåne and Lund University, Malmö, Sweden.
| | | | | | | | | |
Collapse
|
834
|
Roque FS, Jensen PB, Schmock H, Dalgaard M, Andreatta M, Hansen T, Søeby K, Bredkjær S, Juul A, Werge T, Jensen LJ, Brunak S. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput Biol 2011; 7:e1002141. [PMID: 21901084 PMCID: PMC3161904 DOI: 10.1371/journal.pcbi.1002141] [Citation(s) in RCA: 173] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 06/13/2011] [Indexed: 12/15/2022] Open
Abstract
Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks. Text mining and information extraction can be seen as the challenge of converting information hidden in text into manageable data. We have used text mining to automatically extract clinically relevant terms from 5543 psychiatric patient records and map these to disease codes in the International Classification of Disease ontology (ICD10). Mined codes were supplemented by existing coded data. For each patient we constructed a phenotypic profile of associated ICD10 codes. This allowed us to cluster patients together based on the similarity of their profiles. The result is a patient stratification based on more complete profiles than the primary diagnosis, which is typically used. Similarly we investigated comorbidities by looking for pairs of disease codes cooccuring in patients more often than expected. Our high ranking pairs were manually curated by a medical doctor who flagged 93 candidates as interesting. For a number of these we were able to find genes/proteins known to be associated with the diseases using the OMIM database. The disease-associated proteins allowed us to construct protein networks suspected to be involved in each of the phenotypes. Shared proteins between two associated diseases might provide insight to the disease comorbidity.
Collapse
Affiliation(s)
- Francisco S. Roque
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Peter B. Jensen
- NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Henriette Schmock
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
| | - Marlene Dalgaard
- Department of Growth and Reproduction GR, Rigshospitalet, Copenhagen, Denmark
| | - Massimo Andreatta
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Thomas Hansen
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
| | - Karen Søeby
- Department of Clinical Biochemistry, Hvidovre Hospital, Copenhagen University Hospital, Hvidovre, Denmark
| | - Søren Bredkjær
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
- Psychiatry Region Sealand, Ringsted, Denmark
| | - Anders Juul
- Department of Growth and Reproduction GR, Rigshospitalet, Copenhagen, Denmark
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center Sct. Hans, Copenhagen University Hospital, Roskilde, Denmark
| | - Lars J. Jensen
- NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
- NNF Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
835
|
Davis DA, Chawla NV. Exploring and exploiting disease interactions from multi-relational gene and phenotype networks. PLoS One 2011; 6:e22670. [PMID: 21829475 PMCID: PMC3146471 DOI: 10.1371/journal.pone.0022670] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Accepted: 07/01/2011] [Indexed: 11/19/2022] Open
Abstract
The availability of electronic health care records is unlocking the potential for novel studies on understanding and modeling disease co-morbidities based on both phenotypic and genetic data. Moreover, the insurgence of increasingly reliable phenotypic data can aid further studies on investigating the potential genetic links among diseases. The goal is to create a feedback loop where computational tools guide and facilitate research, leading to improved biological knowledge and clinical standards, which in turn should generate better data. We build and analyze disease interaction networks based on data collected from previous genetic association studies and patient medical histories, spanning over 12 years, acquired from a regional hospital. By exploring both individual and combined interactions among these two levels of disease data, we provide novel insight into the interplay between genetics and clinical realities. Our results show a marked difference between the well defined structure of genetic relationships and the chaotic co-morbidity network, but also highlight clear interdependencies. We demonstrate the power of these dependencies by proposing a novel multi-relational link prediction method, showing that disease co-morbidity can enhance our currently limited knowledge of genetic association. Furthermore, our methods for integrated networks of diverse data are widely applicable and can provide novel advances for many problems in systems biology and personalized medicine.
Collapse
Affiliation(s)
- Darcy A. Davis
- Interdisciplinary Center for Network Science and Applications, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Nitesh V. Chawla
- Interdisciplinary Center for Network Science and Applications, Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana, United States of America
- * E-mail:
| |
Collapse
|
836
|
Langanke M, Brothers KB, Erdmann P, Weinert J, Krafczyk-Korth J, Dörr M, Hoffmann W, Kroemer HK, Assel H. Comparing different scientific approaches to personalized medicine: research ethics and privacy protection. Per Med 2011; 8:437-444. [PMID: 21892358 DOI: 10.2217/pme.11.34] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In this article, two different scientific approaches to personalized medicine are compared. Biorepository at Vanderbilt University (BioVU) is a genomic biorepository at Vanderbilt University Medical Center in Nashville, TN, USA. Genetic biosamples are collected from leftover clinical blood samples; medical information is derived from an electronic medical records. Greifswald Approach to Individualized Medicine is a research resource at the University of Greifswald, Germany, comprised of clinical records combined with biosamples collected for research. We demonstrate that although both approaches are based on the collection of clinical data and biosamples, different legal milieus present in the USA and Germany as well as slight differences in scientific goals have led to different 'ethical designs'. While BioVU can successfully operate with an 'opt-out' mechanism, an informed consent-based 'opt-in' model is indispensable to allow GANI_MED to reach its scientific goals.
Collapse
Affiliation(s)
- Martin Langanke
- Faculty of Theology, University of Greifswald, Greifswald, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
837
|
Fernald GH, Capriotti E, Daneshjou R, Karczewski KJ, Altman RB. Bioinformatics challenges for personalized medicine. ACTA ACUST UNITED AC 2011; 27:1741-8. [PMID: 21596790 PMCID: PMC3117361 DOI: 10.1093/bioinformatics/btr295] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
MOTIVATION Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics. RESULTS This review outlines recent developments in sequencing technologies and genome analysis methods for application in personalized medicine. New methods are needed in four areas to realize the potential of personalized medicine: (i) processing large-scale robust genomic data; (ii) interpreting the functional effect and the impact of genomic variation; (iii) integrating systems data to relate complex genetic interactions with phenotypes; and (iv) translating these discoveries into medical practice. CONTACT russ.altman@stanford.edu
Collapse
Affiliation(s)
- Guy Haskin Fernald
- Biomedical Informatics Training Program, Stanford University School of Medicine, Department of Bioengineering, Stanford University, Stanford, CA, USA
| | | | | | | | | |
Collapse
|
838
|
Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, Avery CL, Buyske S, Cai C, Fesinmeyer MD, Haiman C, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Kooperberg C, Le Marchand L, Lin Y, Matise TC, Moreland L, Monroe K, Reiner AP, Wallace R, Wilkens LR, Crawford DC, Ritchie MD. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet Epidemiol 2011; 35:410-22. [PMID: 21594894 DOI: 10.1002/gepi.20589] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Revised: 04/01/2011] [Accepted: 04/03/2011] [Indexed: 01/09/2023]
Abstract
The field of phenomics has been investigating network structure among large arrays of phenotypes, and genome-wide association studies (GWAS) have been used to investigate the relationship between genetic variation and single diseases/outcomes. A novel approach has emerged combining both the exploration of phenotypic structure and genotypic variation, known as the phenome-wide association study (PheWAS). The Population Architecture using Genomics and Epidemiology (PAGE) network is a National Human Genome Research Institute (NHGRI)-supported collaboration of four groups accessing eight extensively characterized epidemiologic studies. The primary focus of PAGE is deep characterization of well-replicated GWAS variants and their relationships to various phenotypes and traits in diverse epidemiologic studies that include European Americans, African Americans, Mexican Americans/Hispanics, Asians/Pacific Islanders, and Native Americans. The rich phenotypic resources of PAGE studies provide a unique opportunity for PheWAS as each genotyped variant can be tested for an association with the wide array of phenotypic measurements available within the studies of PAGE, including prevalent and incident status for multiple common clinical conditions and risk factors, as well as clinical parameters and intermediate biomarkers. The results of PheWAS can be used to discover novel relationships between SNPs, phenotypes, and networks of interrelated phenotypes; identify pleiotropy; provide novel mechanistic insights; and foster hypothesis generation. The PAGE network has developed infrastructure to support and perform PheWAS in a high-throughput manner. As implementing the PheWAS approach has presented several challenges, the infrastructure and methodology, as well as insights gained in this project, are presented herein to benefit the larger scientific community.
Collapse
Affiliation(s)
- S A Pendergrass
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232-0700, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
839
|
Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011; 12:417-28. [PMID: 21587298 DOI: 10.1038/nrg2999] [Citation(s) in RCA: 208] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
If genomic studies are to be a clinically relevant and timely reflection of the relationship between genetics and health status--whether for common or rare variants--cost-effective ways must be found to measure both the genetic variation and the phenotypic characteristics of large populations, including the comprehensive and up-to-date record of their medical treatment. The adoption of electronic health records, used by clinicians to document clinical care, is becoming widespread and recent studies demonstrate that they can be effectively employed for genetic studies using the informational and biological 'by-products' of health-care delivery while maintaining patient privacy.
Collapse
Affiliation(s)
- Isaac S Kohane
- Harvard Medical School, 10 Shattuck Street, Boston, Massachusetts 02115, USA.
| |
Collapse
|
840
|
Turner SD, Berg RL, Linneman JG, Peissig PL, Crawford DC, Denny JC, Roden DM, McCarty CA, Ritchie MD, Wilke RA. Knowledge-driven multi-locus analysis reveals gene-gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS One 2011; 6:e19586. [PMID: 21589926 PMCID: PMC3092760 DOI: 10.1371/journal.pone.0019586] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 04/01/2011] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) are routinely being used to examine the genetic contribution to complex human traits, such as high-density lipoprotein cholesterol (HDL-C). Although HDL-C levels are highly heritable (h(2)∼0.7), the genetic determinants identified through GWAS contribute to a small fraction of the variance in this trait. Reasons for this discrepancy may include rare variants, structural variants, gene-environment (GxE) interactions, and gene-gene (GxG) interactions. Clinical practice-based biobanks now allow investigators to address these challenges by conducting GWAS in the context of comprehensive electronic medical records (EMRs). Here we apply an EMR-based phenotyping approach, within the context of routine care, to replicate several known associations between HDL-C and previously characterized genetic variants: CETP (rs3764261, p = 1.22e-25), LIPC (rs11855284, p = 3.92e-14), LPL (rs12678919, p = 1.99e-7), and the APOA1/C3/A4/A5 locus (rs964184, p = 1.06e-5), all adjusted for age, gender, body mass index (BMI), and smoking status. By using a novel approach which censors data based on relevant co-morbidities and lipid modifying medications to construct a more rigorous HDL-C phenotype, we identified an association between HDL-C and TRIB1, a gene which previously resisted identification in studies with larger sample sizes. Through the application of additional analytical strategies incorporating biological knowledge, we further identified 11 significant GxG interaction models in our discovery cohort, 8 of which show evidence of replication in a second biobank cohort. The strongest predictive model included a pairwise interaction between LPL (which modulates the incorporation of triglyceride into HDL) and ABCA1 (which modulates the incorporation of free cholesterol into HDL). These results demonstrate that gene-gene interactions modulate complex human traits, including HDL cholesterol.
Collapse
Affiliation(s)
- Stephen D. Turner
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Richard L. Berg
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America
| | - James G. Linneman
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America
| | - Peggy L. Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America
| | - Dana C. Crawford
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Dan M. Roden
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Catherine A. McCarty
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, United States of America
| | - Marylyn D. Ritchie
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Russell A. Wilke
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| |
Collapse
|
841
|
Abstract
There has been much progress in genomics in the ten years since a draft sequence of the human genome was published. Opportunities for understanding health and disease are now unprecedented, as advances in genomics are harnessed to obtain robust foundational knowledge about the structure and function of the human genome and about the genetic contributions to human health and disease. Here we articulate a 2011 vision for the future of genomics research and describe the path towards an era of genomic medicine.
Collapse
Affiliation(s)
- Eric D Green
- National Human Genome Research Institute, National Institutes of Health, 31 Center Dr., Bethesda, Maryland 20892-2152, USA.
| | | | | |
Collapse
|
842
|
The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics 2011; 4:13. [PMID: 21269473 PMCID: PMC3038887 DOI: 10.1186/1755-8794-4-13] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 01/26/2011] [Indexed: 11/23/2022] Open
Abstract
Introduction The eMERGE (electronic MEdical Records and GEnomics) Network is an NHGRI-supported consortium of five institutions to explore the utility of DNA repositories coupled to Electronic Medical Record (EMR) systems for advancing discovery in genome science. eMERGE also includes a special emphasis on the ethical, legal and social issues related to these endeavors. Organization The five sites are supported by an Administrative Coordinating Center. Setting of network goals is initiated by working groups: (1) Genomics, (2) Informatics, and (3) Consent & Community Consultation, which also includes active participation by investigators outside the eMERGE funded sites, and (4) Return of Results Oversight Committee. The Steering Committee, comprised of site PIs and representatives and NHGRI staff, meet three times per year, once per year with the External Scientific Panel. Current progress The primary site-specific phenotypes for which samples have undergone genome-wide association study (GWAS) genotyping are cataract and HDL, dementia, electrocardiographic QRS duration, peripheral arterial disease, and type 2 diabetes. A GWAS is also being undertaken for resistant hypertension in ≈2,000 additional samples identified across the network sites, to be added to data available for samples already genotyped. Funded by ARRA supplements, secondary phenotypes have been added at all sites to leverage the genotyping data, and hypothyroidism is being analyzed as a cross-network phenotype. Results are being posted in dbGaP. Other key eMERGE activities include evaluation of the issues associated with cross-site deployment of common algorithms to identify cases and controls in EMRs, data privacy of genomic and clinically-derived data, developing approaches for large-scale meta-analysis of GWAS data across five sites, and a community consultation and consent initiative at each site. Future activities Plans are underway to expand the network in diversity of populations and incorporation of GWAS findings into clinical care. Summary By combining advanced clinical informatics, genome science, and community consultation, eMERGE represents a first step in the development of data-driven approaches to incorporate genomic information into routine healthcare delivery.
Collapse
|
843
|
Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet 2011; 88:57-69. [PMID: 21211616 DOI: 10.1016/j.ajhg.2010.12.007] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2010] [Revised: 12/03/2010] [Accepted: 12/16/2010] [Indexed: 12/17/2022] Open
Abstract
Discovering and following up on genetic associations with complex phenotypes require large patient cohorts. This is particularly true for patient cohorts of diverse ancestry and clinically relevant subsets of disease. The ability to mine the electronic health records (EHRs) of patients followed as part of routine clinical care provides a potential opportunity to efficiently identify affected cases and unaffected controls for appropriate-sized genetic studies. Here, we demonstrate proof-of-concept that it is possible to use EHR data linked with biospecimens to establish a multi-ethnic case-control cohort for genetic research of a complex disease, rheumatoid arthritis (RA). In 1,515 EHR-derived RA cases and 1,480 controls matched for both genetic ancestry and disease-specific autoantibodies (anti-citrullinated protein antibodies [ACPA]), we demonstrate that the odds ratios and aggregate genetic risk score (GRS) of known RA risk alleles measured in individuals of European ancestry within our EHR cohort are nearly identical to those derived from a genome-wide association study (GWAS) of 5,539 autoantibody-positive RA cases and 20,169 controls. We extend this approach to other ethnic groups and identify a large overlap in the GRS among individuals of European, African, East Asian, and Hispanic ancestry. We also demonstrate that the distribution of a GRS based on 28 non-HLA risk alleles in ACPA+ cases partially overlaps with ACPA- subgroup of RA cases. Our study demonstrates that the genetic basis of rheumatoid arthritis risk is similar among cases of diverse ancestry divided into subsets based on ACPA status and emphasizes the utility of linking EHR clinical data with biospecimens for genetic studies.
Collapse
|
844
|
|
845
|
Congdon E, Poldrack RA, Freimer NB. Neurocognitive phenotypes and genetic dissection of disorders of brain and behavior. Neuron 2010; 68:218-30. [PMID: 20955930 DOI: 10.1016/j.neuron.2010.10.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2010] [Indexed: 01/10/2023]
Abstract
Elucidating the molecular mechanisms underlying quantitative neurocognitive phenotypes will further our understanding of the brain's structural and functional architecture and advance the diagnosis and treatment of the psychiatric disorders that these traits underlie. Although many neurocognitive traits are highly heritable, little progress has been made in identifying genetic variants unequivocally associated with these phenotypes. A major obstacle to such progress is the difficulty in identifying heritable neurocognitive measures that are precisely defined and systematically assessed and represent unambiguous mental constructs, yet are also amenable to the high-throughput phenotyping necessary to obtain adequate power for genetic association studies. In this perspective we compare the current status of genetic investigations of neurocognitive phenotypes to that of other categories of biomedically relevant traits and suggest strategies for genetically dissecting traits that may underlie disorders of brain and behavior.
Collapse
Affiliation(s)
- Eliza Congdon
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
846
|
Stearns SC, Byars SG, Govindaraju DR, Ewbank D. Measuring selection in contemporary human populations. Nat Rev Genet 2010; 11:611-22. [PMID: 20680024 DOI: 10.1038/nrg2831] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Are humans currently evolving? This question can be answered using data on lifetime reproductive success, multiple traits and genetic variation and covariation in those traits. Such data are available in existing long-term, multigeneration studies - both clinical and epidemiological - but they have not yet been widely used to address contemporary human evolution. Here we review methods to predict evolutionary change and attempts to measure selection and inheritance in humans. We also assemble examples of long-term studies in which additional measurements of evolution could be made. The evidence strongly suggests that we are evolving and that our nature is dynamic, not static.
Collapse
Affiliation(s)
- Stephen C Stearns
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06520-8102, USA.
| | | | | | | |
Collapse
|
847
|
Schildcrout JS, Basford MA, Pulley JM, Masys DR, Roden DM, Wang D, Chute CG, Kullo IJ, Carrell D, Peissig P, Kho A, Denny JC. An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records. J Biomed Inform 2010; 43:914-23. [PMID: 20688191 DOI: 10.1016/j.jbi.2010.07.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Revised: 07/19/2010] [Accepted: 07/27/2010] [Indexed: 10/19/2022]
Abstract
We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., "hypertensive disease" or "appendicitis") are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measured by electrocardiography, and senile cataracts.
Collapse
Affiliation(s)
- Jonathan S Schildcrout
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232-2156, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|