751
|
Liao KP, Diogo D, Cui J, Cai T, Okada Y, Gainer VS, Murphy SN, Gupta N, Mirel D, Ananthakrishan AN, Szolovits P, Shaw SY, Raychaudhuri S, Churchill S, Kohane I, Karlson EW, Plenge RM. Association between low density lipoprotein and rheumatoid arthritis genetic factors with low density lipoprotein levels in rheumatoid arthritis and non-rheumatoid arthritis controls. Ann Rheum Dis 2014; 73:1170-5. [PMID: 23716066 PMCID: PMC3815491 DOI: 10.1136/annrheumdis-2012-203202] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
OBJECTIVES While genetic determinants of low density lipoprotein (LDL) cholesterol levels are well characterised in the general population, they are understudied in rheumatoid arthritis (RA). Our objective was to determine the association of established LDL and RA genetic alleles with LDL levels in RA cases compared with non-RA controls. METHODS Using data from electronic medical records, we linked validated RA cases and non-RA controls to discarded blood samples. For each individual, we extracted data on: first LDL measurement, age, gender and year of LDL measurement. We genotyped subjects for 11 LDL and 44 non-HLA RA alleles, and calculated RA and LDL genetic risk scores (GRS). We tested the association between each GRS and LDL level using multivariate linear regression models adjusted for age, gender, year of LDL measurement and RA status. RESULTS Among 567 RA cases and 979 controls, 80% were female and mean age at the first LDL measurement was 55 years. RA cases had significantly lower mean LDL levels than controls (117.2 vs 125.6 mg/dl, respectively, p<0.0001). Each unit increase in LDL GRS was associated with 0.8 mg/dl higher LDL levels in both RA cases and controls (p=3.0×10(-7)). Each unit increase in RA GRS was associated with 4.3 mg/dl lower LDL levels in both groups (p=0.01). CONCLUSIONS LDL alleles were associated with higher LDL levels in RA. RA alleles were associated with lower LDL levels in both RA cases and controls. As RA cases carry more RA alleles, these findings suggest a genetic basis for epidemiological observations of lower LDL levels in RA.
Collapse
Affiliation(s)
- Katherine P. Liao
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
| | - Dorothée Diogo
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Medical and Population Genetics Program, The Broad Institute, Cambridge, MA
| | - Jing Cui
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston, MA
| | - Yukinori Okada
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Medical and Population Genetics Program, The Broad Institute, Cambridge, MA
| | | | - Shawn N. Murphy
- Research Computing, Partners Healthcare, Charlestown, MA
- Laboratory of Computer Science, Massachusetts General Hospital, Boston MA
| | - Namrata Gupta
- Medical and Population Genetics Program, The Broad Institute, Cambridge, MA
| | - Daniel Mirel
- Medical and Population Genetics Program, The Broad Institute, Cambridge, MA
| | | | - Peter Szolovits
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA
| | | | - Soumya Raychaudhuri
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Partners Center for Personalized Genetic Medicine, Boston, MA
- Faculty of Medical and Human Sciences, University of Manchester, Manchester, UK
| | | | - Isaac Kohane
- Research Computing, Partners Healthcare, Charlestown, MA
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA
| | - Elizabeth W. Karlson
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
| | - Robert M. Plenge
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women’s Hospital, Boston, MA
- Division of Genetics, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
- Medical and Population Genetics Program, The Broad Institute, Cambridge, MA
| |
Collapse
|
752
|
Abstract
PURPOSE OF REVIEW The purpose is to review the arguments for increasing use of existing data in health research. RECENT FINDINGS The reuse of data in observational, exploratory and outcome studies, as well as in confirming other findings, is being justified on epistemological grounds as the major path to new knowledge and to the generalizing of findings to 'real world' populations. It is also justified on the grounds of cost, power, and efficiency, even though data reuse comes with real informatics, scientific culture, societal, and regulatory issues. SUMMARY Data reuse is becoming more compelling. When contemplating new research for any purpose other than RCTs for efficacy, it is prudent to consider whether partnership with existing data holders should be part of the optimal research plan.
Collapse
Affiliation(s)
- Ted D. Wade
- National Jewish Health, Denver, Colorado, USA
| |
Collapse
|
753
|
Denny JC, Bastarache L, Ritchie MD, Carroll RJ, Zink R, Mosley JD, Field JR, Pulley JM, Ramirez AH, Bowton E, Basford MA, Carrell DS, Peissig PL, Kho AN, Pacheco JA, Rasmussen LV, Crosslin DR, Crane PK, Pathak J, Bielinski SJ, Pendergrass SA, Xu H, Hindorff LA, Li R, Manolio TA, Chute CG, Chisholm RL, Larson EB, Jarvik GP, Brilliant MH, McCarty CA, Kullo IJ, Haines JL, Crawford DC, Masys DR, Roden DM. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 2014; 31:1102-10. [PMID: 24270849 DOI: 10.1038/nbt.2749] [Citation(s) in RCA: 727] [Impact Index Per Article: 66.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2013] [Accepted: 10/21/2013] [Indexed: 02/06/2023]
Abstract
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10⁻⁶ (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
Collapse
|
754
|
Rosenbloom ST, Harris P, Pulley J, Basford M, Grant J, DuBuisson A, Rothman RL. The Mid-South clinical Data Research Network. J Am Med Inform Assoc 2014; 21:627-32. [PMID: 24821742 PMCID: PMC4078290 DOI: 10.1136/amiajnl-2014-002745] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The Mid-South Clinical Data Research Network (CDRN) encompasses three large health systems: (1) Vanderbilt Health System (VU) with electronic medical records for over 2 million patients, (2) the Vanderbilt Healthcare Affiliated Network (VHAN) which currently includes over 40 hospitals, hundreds of ambulatory practices, and over 3 million patients in the Mid-South, and (3) Greenway Medical Technologies, with access to 24 million patients nationally. Initial goals of the Mid-South CDRN include: (1) expansion of our VU data network to include the VHAN and Greenway systems, (2) developing data integration/interoperability across the three systems, (3) improving our current tools for extracting clinical data, (4) optimization of tools for collection of patient-reported data, and (5) expansion of clinical decision support. By 18 months, we anticipate our CDRN will robustly support projects in comparative effectiveness research, pragmatic clinical trials, and other key research areas and have the capacity to share data and health information technology tools nationally.
Collapse
Affiliation(s)
- S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Paul Harris
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jill Pulley
- Office of Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA Office of Personalized Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Melissa Basford
- Office of Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jason Grant
- Vanderbilt Health Affiliated Network, Nashville, Tennessee, USA
| | | | - Russell L Rothman
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA Center for Health Services Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
755
|
Sleiman P, Bradfield J, Mentch F, Almoguera B, Connolly J, Hakonarson H. Assessing the functional consequence of loss of function variants using electronic medical record and large-scale genomics consortium efforts. Front Genet 2014; 5:105. [PMID: 24808909 PMCID: PMC4010747 DOI: 10.3389/fgene.2014.00105] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/10/2014] [Indexed: 01/25/2023] Open
Abstract
Estimates from large scale genome sequencing studies indicate that each human carries up to 20 genetic variants that are predicted to results in loss of function (LOF) of protein-coding genes. While some are known disease-causing variants or common, tolerated, LOFs in non-essential genes, the majority remain of unknown consequence. We explore the possibility of using imputed GWAS data from large biorepositories such as the electronic medical record and genomics (eMERGE) consortium to determine the effects of rare LOFs. Here, we show that two hypocholesterolemia-associated LOF mutations in the PCSK9 gene can be accurately imputed into large-scale GWAS datasets which raises the possibility of assessing LOFs through genomics-linked medical records.
Collapse
Affiliation(s)
- Patrick Sleiman
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA ; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania Philadelphia, PA, USA
| | - Jonathan Bradfield
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Frank Mentch
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Berta Almoguera
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - John Connolly
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Abramson Research Center, The Children's Hospital of Philadelphia Philadelphia, PA, USA ; Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania Philadelphia, PA, USA
| |
Collapse
|
756
|
Mitchell SL, Hall JB, Goodloe RJ, Boston J, Farber-Eger E, Pendergrass SA, Bush WS, Crawford DC. Investigating the relationship between mitochondrial genetic variation and cardiovascular-related traits to develop a framework for mitochondrial phenome-wide association studies. BioData Min 2014; 7:6. [PMID: 24731735 PMCID: PMC4021623 DOI: 10.1186/1756-0381-7-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 04/05/2014] [Indexed: 11/12/2022] Open
Abstract
Background Mitochondria play a critical role in the cell and have DNA independent of the nuclear genome. There is much evidence that mitochondrial DNA (mtDNA) variation plays a role in human health and disease, however, this area of investigation has lagged behind research into the role of nuclear genetic variation on complex traits and phenotypic outcomes. Phenome-wide association studies (PheWAS) investigate the association between a wide range of traits and genetic variation. To date, this approach has not been used to investigate the relationship between mtDNA variants and phenotypic variation. Herein, we describe the development of a PheWAS framework for mtDNA variants (mt-PheWAS). Using the Metabochip custom genotyping array, nuclear and mitochondrial DNA variants were genotyped in 11,519 African Americans from the Vanderbilt University biorepository, BioVU. We employed both polygenic modeling and association testing with mitochondrial single nucleotide polymorphisms (mtSNPs) to explore the relationship between mtDNA variants and a group of eight cardiovascular-related traits obtained from de-identified electronic medical records within BioVU. Results Using polygenic modeling we found evidence for an effect of mtDNA variation on total cholesterol and type 2 diabetes (T2D). After performing comprehensive mitochondrial single SNP associations, we identified an increased number of single mtSNP associations with total cholesterol and T2D compared to the other phenotypes examined, which did not have more significantly associated SNPs than would be expected by chance. Among the mtSNPs significantly associated with T2D we identified variant mt16189, an association previously reported only in Asian and European-descent populations. Conclusions Our replication of previous findings and identification of novel associations from this initial study suggest that our mt-PheWAS approach is robust for investigating the relationship between mitochondrial genetic variation and a range of phenotypes, providing a framework for future mt-PheWAS.
Collapse
Affiliation(s)
- Sabrina L Mitchell
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jacob B Hall
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Robert J Goodloe
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Eric Farber-Eger
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Sarah A Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - William S Bush
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA.,Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
757
|
Carroll RJ, Bastarache L, Denny JC. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 2014; 30:2375-6. [PMID: 24733291 DOI: 10.1093/bioinformatics/btu197] [Citation(s) in RCA: 309] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Phenome-wide association studies (PheWAS) have been used to replicate known genetic associations and discover new phenotype associations for genetic variants. This PheWAS implementation allows users to translate ICD-9 codes to PheWAS case and control groups, perform analyses using these and/or other phenotypes with covariate adjustments and plot the results. We demonstrate the methods by replicating a PheWAS on rs3135388 (near HLA-DRB, associated with multiple sclerosis) and performing a novel PheWAS using an individual's maximum white blood cell count (WBC) as a continuous measure. Our results for rs3135388 replicate known associations with more significant results than the original study on the same dataset. Our PheWAS of WBC found expected results, including associations with infections, myeloproliferative diseases and associated conditions, such as anemia. These results demonstrate the performance of the improved classification scheme and the flexibility of PheWAS encapsulated in this package. AVAILABILITY AND IMPLEMENTATION This R package is freely available under the Gnu Public License (GPL-3) from http://phewascatalog.org. It is implemented in native R and is platform independent.
Collapse
Affiliation(s)
- Robert J Carroll
- Department of Biomedical Informatics and Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37212, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics and Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37212, USA
| | - Joshua C Denny
- Department of Biomedical Informatics and Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37212, USADepartment of Biomedical Informatics and Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN 37212, USA
| |
Collapse
|
758
|
Wei WQ, Feng Q, Weeke P, Bush W, Waitara MS, Iwuchukwu OF, Roden DM, Wilke RA, Stein CM, Denny JC. Creation and Validation of an EMR-based Algorithm for Identifying Major Adverse Cardiac Events while on Statins. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:112-9. [PMID: 25717410 PMCID: PMC4333709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Statin medications are often prescribed to ameliorate a patient's risk of cardiovascular events due in part to cholesterol reduction. We developed and evaluated an algorithm that can accurately identify subjects with major adverse cardiac events (MACE) while on statins using electronic medical record (EMR) data. The algorithm also identifies subjects experiencing their first MACE while on statins for primary prevention. The algorithm achieved 90% to 97% PPVs in identification of MACE cases as compared against physician review. By applying the algorithm to EMR data in BioVU, cases and controls were identified and used subsequently to replicate known associations with eight genetic variants. We replicated 6/8 previously reported genetic associations with cardiovascular diseases or lipid metabolism disorders. Our results demonstrated that the algorithm can be used to accurately identify subjects with MACE and MACE while on statins. Consequently, future e studies can be conducted to investigate and validate the relationship between statins and MACE using real-world clinical data.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Qiping Feng
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Peter Weeke
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - William Bush
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN
| | - Magarya S. Waitara
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Otito F. Iwuchukwu
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Dan M. Roden
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN,Oates Institute for Experimental Therapeutics, Vanderbilt University, Nashville, TN,Office of Personalized Medicine, Vanderbilt University, Nashville, TN
| | | | - Charles M Stein
- Division of Clinical Pharmacology, Vanderbilt University School of Medicine, Nashville, TN
| | - Joshua C. Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN
| |
Collapse
|
759
|
Adamusiak T, Shimoyama M. EHR-based phenome wide association study in pancreatic cancer. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:9-15. [PMID: 25717392 PMCID: PMC4333703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
BACKGROUND Pancreatic cancer is one of the most common causes of cancer-related deaths in the United States, it is difficult to detect early and typically has a very poor prognosis. We present a novel method of large-scale clinical hypothesis generation based on phenome wide association study performed using Electronic Health Records (EHR) in a pancreatic cancer cohort. METHODS The study population consisted of 1,154 patients diagnosed with malignant neoplasm of pancreas seen at The Froedtert & The Medical College of Wisconsin academic medical center between the years 2004 and 2013. We evaluated death of a patient as the primary clinical outcome and tested its association with the phenome, which consisted of over 2.5 million structured clinical observations extracted out of the EHR including labs, medications, phenotypes, diseases and procedures. The individual observations were encoded in the EHR using 6,617 unique ICD-9, CPT-4, LOINC, and RxNorm codes. We remapped this initial code set into UMLS concepts and then hierarchically expanded to support generalization into the final set of 10,164 clinical concepts, which formed the final phenome. We then tested all possible pairwise associations between any of the original 10,164 concepts and death as the primary outcome. RESULTS After correcting for multiple testing and folding back (generalizing) child concepts were appropriate, we found 231 concepts to be significantly associated with death in the study population. CONCLUSIONS With the abundance of structured EHR data, phenome wide association studies combined with knowledge engineering can be a viable method of rapid hypothesis generation.
Collapse
Affiliation(s)
- Tomasz Adamusiak
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI,
| | - Mary Shimoyama
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI,Department of Surgery, Medical College of Wisconsin, Milwaukee, WI
| |
Collapse
|
760
|
Masanz J, Pakhomov SV, Xu H, Wu ST, Chute CG, Liu H. Open Source Clinical NLP - More than Any Single System. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:76-82. [PMID: 25954581 PMCID: PMC4419764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The number of Natural Language Processing (NLP) tools and systems for processing clinical free-text has grown as interest and processing capability have surged. Unfortunately any two systems typically cannot simply interoperate, even when both are built upon a framework designed to facilitate the creation of pluggable components. We present two ongoing activities promoting open source clinical NLP. The Open Health Natural Language Processing (OHNLP) Consortium was originally founded to foster a collaborative community around clinical NLP, releasing UIMA-based open source software. OHNLP's mission currently includes maintaining a catalog of clinical NLP software and providing interfaces to simplify the interaction of NLP systems. Meanwhile, Apache cTAKES aims to integrate best-of-breed annotators, providing a world-class NLP system for accessing clinical information within free-text. These two activities are complementary. OHNLP promotes open source clinical NLP activities in the research community and Apache cTAKES bridges research to the health information technology (HIT) practice.
Collapse
Affiliation(s)
| | - Serguei V. Pakhomov
- College of Pharmacy and Institute for Health Informatics, University of Minnesota
| | - Hua Xu
- School of Biomedical Informatics in The University of Texas Health Science Center at Houston
| | | | | | | |
Collapse
|
761
|
Hancock-Cerutti W, Rader DJ. Opposing effects of ABCG5/8 function on myocardial infarction and gallstone disease. J Am Coll Cardiol 2014; 63:2129-2130. [PMID: 24657684 DOI: 10.1016/j.jacc.2014.02.553] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 02/12/2014] [Accepted: 02/17/2014] [Indexed: 11/26/2022]
Affiliation(s)
- William Hancock-Cerutti
- Department of Medicine and Cardiovascular Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Daniel J Rader
- Department of Medicine and Cardiovascular Institute, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania.
| |
Collapse
|
762
|
Manolio TA, Green ED. Leading the way to genomic medicine. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2014; 166C:1-7. [PMID: 24619573 DOI: 10.1002/ajmg.c.31384] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The National Human Genome Research Institute, in close collaboration with its research community, is pursuing an ambitious research agenda to facilitate and promote the implementation of genomics in clinical care. Since 2011, research programs utilizing next-generation sequencing in the management of cancer and other multigenic conditions, workup of undiagnosed conditions, and evaluation of disorders of the newborn period have been initiated, along with projects identifying clinically actionable variants and exploring the ethical and social implications of reporting these findings. Several genomic medicine symposia and other consultations have helped to shape these research initiatives and develop educational materials for physicians and others working to implement the use of genomic findings in clinical care. These efforts provide a valuable complement to the highly successful basic genomics research enterprise that has at last enabled the transition of genomics from the bench to the bedside.
Collapse
|
763
|
Hebbring SJ. The challenges, advantages and future of phenome-wide association studies. Immunology 2014; 141:157-65. [PMID: 24147732 PMCID: PMC3904236 DOI: 10.1111/imm.12195] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 10/16/2013] [Accepted: 10/17/2013] [Indexed: 12/11/2022] Open
Abstract
Over the last decade, significant technological breakthroughs have revolutionized human genomic research in the form of genome-wide association studies (GWASs). GWASs have identified thousands of statistically significant genetic variants associated with hundreds of human conditions including many with immunological aetiologies (e.g. multiple sclerosis, ankylosing spondylitis and rheumatoid arthritis). Unfortunately, most GWASs fail to identify clinically significant associations. Identifying biologically significant variants by GWAS also presents a challenge. The GWAS is a phenotype-to-genotype approach. As a complementary/alternative approach to the GWAS, investigators have begun to exploit extensive electronic medical record systems to conduct a genotype-to-phenotype approach when studying human disease – specifically, the phenome-wide association study (PheWAS). Although the PheWAS approach is in its infancy, this method has already demonstrated its capacity to rediscover important genetic associations related to immunological diseases/conditions. Furthermore, PheWAS has the advantage of identifying genetic variants with pleiotropic properties. This is particularly relevant for HLA variants. For example, PheWAS results have demonstrated that the HLA-DRB1 variant associated with multiple sclerosis may also be associated with erythematous conditions including rosacea. Likewise, PheWAS has demonstrated that the HLA-B genotype is not only associated with spondylopathies, uveitis, and variability in platelet count, but may also play an important role in other conditions, such as mastoiditis. This review will discuss and compare general PheWAS methodologies, describe both the challenges and advantages of the PheWAS, and provide insight into the potential directions in which PheWAS may lead.
Collapse
Affiliation(s)
- Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| |
Collapse
|
764
|
Perotte A, Pivovarov R, Natarajan K, Weiskopf N, Wood F, Elhadad N. Diagnosis code assignment: models and evaluation metrics. J Am Med Inform Assoc 2014; 21:231-7. [PMID: 24296907 PMCID: PMC3932472 DOI: 10.1136/amiajnl-2013-002159] [Citation(s) in RCA: 82] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Revised: 11/11/2013] [Accepted: 11/12/2013] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND AND OBJECTIVE The volume of healthcare data is growing rapidly with the adoption of health information technology. We focus on automated ICD9 code assignment from discharge summary content and methods for evaluating such assignments. METHODS We study ICD9 diagnosis codes and discharge summaries from the publicly available Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC II) repository. We experiment with two coding approaches: one that treats each ICD9 code independently of each other (flat classifier), and one that leverages the hierarchical nature of ICD9 codes into its modeling (hierarchy-based classifier). We propose novel evaluation metrics, which reflect the distances among gold-standard and predicted codes and their locations in the ICD9 tree. Experimental setup, code for modeling, and evaluation scripts are made available to the research community. RESULTS The hierarchy-based classifier outperforms the flat classifier with F-measures of 39.5% and 27.6%, respectively, when trained on 20,533 documents and tested on 2282 documents. While recall is improved at the expense of precision, our novel evaluation metrics show a more refined assessment: for instance, the hierarchy-based classifier identifies the correct sub-tree of gold-standard codes more often than the flat classifier. Error analysis reveals that gold-standard codes are not perfect, and as such the recall and precision are likely underestimated. CONCLUSIONS Hierarchy-based classification yields better ICD9 coding than flat classification for MIMIC patients. Automated ICD9 coding is an example of a task for which data and tools can be shared and for which the research community can work together to build on shared models and advance the state of the art.
Collapse
Affiliation(s)
- Adler Perotte
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Rimma Pivovarov
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- NewYork Presbyterian Hospital, New York, New York, USA
| | - Nicole Weiskopf
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Frank Wood
- Department of Engineering, University of Oxford, Oxford, UK
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
765
|
Abstract
Psychopathology research has focused either on the analysis of the mental state in the here and now or on the synthesis of mental status abnormalities with biological markers and outcome data. These two schools of psychopathology, the analytic and the synthetic, make contrasting assumptions, take different approaches, and pursue divergent goals. Analytic psychopathology favors the individual person and unique biography, whereas synthetic psychopathology abstracts from the single case and generalizes to the population level. The dimension of time, especially the prediction of future outcomes, is viewed differently by these two schools. Here I outline how Carpenter's proposal of strong inference and theory testing in psychopathology research can be used to test the value of analytic and synthetic psychopathology. The emerging field of personalized psychiatry can clarify the relevance of psychopathology for contemporary research in psychiatry.
Collapse
Affiliation(s)
- Stephan Heckers
- *To whom correspondence should be addressed; Department of Psychiatry, Vanderbilt Psychiatric Hospital, 1601 23rd Avenue South, Room 3060, Nashville, TN 37212, US; tel: 615-322-2665, fax: 615-343-8400, e-mail:
| |
Collapse
|
766
|
Secondary use of clinical data: the Vanderbilt approach. J Biomed Inform 2014; 52:28-35. [PMID: 24534443 DOI: 10.1016/j.jbi.2014.02.003] [Citation(s) in RCA: 196] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2013] [Revised: 12/21/2013] [Accepted: 02/04/2014] [Indexed: 01/04/2023]
Abstract
The last decade has seen an exponential growth in the quantity of clinical data collected nationwide, triggering an increase in opportunities to reuse the data for biomedical research. The Vanderbilt research data warehouse framework consists of identified and de-identified clinical data repositories, fee-for-service custom services, and tools built atop the data layer to assist researchers across the enterprise. Providing resources dedicated to research initiatives benefits not only the research community, but also clinicians, patients and institutional leadership. This work provides a summary of our approach in the secondary use of clinical data for research domain, including a description of key components and a list of lessons learned, designed to assist others assembling similar services and infrastructure.
Collapse
|
767
|
Affiliation(s)
- Brahim Aissani
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
768
|
Doshi-Velez F, Ge Y, Kohane I. Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics 2014; 133:e54-63. [PMID: 24323995 PMCID: PMC3876178 DOI: 10.1542/peds.2013-0819] [Citation(s) in RCA: 248] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE The distinct trajectories of patients with autism spectrum disorders (ASDs) have not been extensively studied, particularly regarding clinical manifestations beyond the neurobehavioral criteria from the Diagnostic and Statistical Manual of Mental Disorders. The objective of this study was to investigate the patterns of co-occurrence of medical comorbidities in ASDs. METHODS International Classification of Diseases, Ninth Revision codes from patients aged at least 15 years and a diagnosis of ASD were obtained from electronic medical records. These codes were aggregated by using phenotype-wide association studies categories and processed into 1350-dimensional vectors describing the counts of the most common categories in 6-month blocks between the ages of 0 to 15. Hierarchical clustering was used to identify subgroups with distinct courses. RESULTS Four subgroups were identified. The first was characterized by seizures (n = 120, subgroup prevalence 77.5%). The second (n = 197) was characterized by multisystem disorders including gastrointestinal disorders (prevalence 24.3%) and auditory disorders and infections (prevalence 87.8%), and the third was characterized by psychiatric disorders (n = 212, prevalence 33.0%). The last group (n = 4316) could not be further resolved. The prevalence of psychiatric disorders was uncorrelated with seizure activity (P = .17), but a significant correlation existed between gastrointestinal disorders and seizures (P < .001). The correlation results were replicated by using a second sample of 496 individuals from a different geographic region. CONCLUSIONS Three distinct patterns of medical trajectories were identified by unsupervised clustering of electronic health record diagnoses. These may point to distinct etiologies with different genetic and environmental contributions. Additional clinical and molecular characterizations will be required to further delineate these subgroups.
Collapse
Affiliation(s)
- Finale Doshi-Velez
- Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck St, Boston, MA 02115.
| | - Yaorong Ge
- Center for Biomedical Informatics, Wake Forest University, Winston-Salem, North Carolina
| | - Isaac Kohane
- Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts; and
| |
Collapse
|
769
|
Abstract
The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease-disease, disease-drug, and disease-gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework.
Collapse
Affiliation(s)
- Elizabeth S Chen
- Center for Clinical and Translational Science, University of Vermont, Burlington, VT, USA,
| | | |
Collapse
|
770
|
HALL MOLLYA, DUDEK SCOTTM, GOODLOE ROBERT, CRAWFORD DANAC, PENDERGRASS SARAHA, PEISSIG PEGGY, BRILLIANT MURRAY, MCCARTY CATHERINEA, RITCHIE MARYLYND. Environment-wide association study (EWAS) for type 2 diabetes in the Marshfield Personalized Medicine Research Project Biobank. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:200-211. [PMID: 24297547 PMCID: PMC4037237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.
Collapse
Affiliation(s)
- MOLLY A. HALL
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 512 Wartik Lab, University Park, PA 16802, USA
| | - SCOTT M. DUDEK
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 512 Wartik Lab, University Park, PA 16802, USA
| | - ROBERT GOODLOE
- Center for Human Genetics Research, Vanderbilt University, Nashville TN, 37232, USA
| | - DANA C. CRAWFORD
- Center for Human Genetics Research, Vanderbilt University, Nashville TN, 37232, USA
| | - SARAH A. PENDERGRASS
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 503 Wartik Lab, University Park, PA 16802, USA
| | | | | | | | - MARYLYN D. RITCHIE
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 512 Wartik Lab, University Park, PA 16802, USA
| |
Collapse
|
771
|
Cronin RM, Field JR, Bradford Y, Shaffer CM, Carroll RJ, Mosley JD, Bastarache L, Edwards TL, Hebbring SJ, Lin S, Hindorff LA, Crane PK, Pendergrass SA, Ritchie MD, Crawford DC, Pathak J, Bielinski SJ, Carrell DS, Crosslin DR, Ledbetter DH, Carey DJ, Tromp G, Williams MS, Larson EB, Jarvik GP, Peissig PL, Brilliant MH, McCarty CA, Chute CG, Kullo IJ, Bottinger E, Chisholm R, Smith ME, Roden DM, Denny JC. Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index. Front Genet 2014; 5:250. [PMID: 25177340 PMCID: PMC4134007 DOI: 10.3389/fgene.2014.00250] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 07/10/2014] [Indexed: 01/29/2023] Open
Abstract
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11-1.24, p = 2.10 × 10(-9)) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08-1.21, p = 2.34 × 10(-6)). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07-1.22, p = 3.33 × 10(-5)); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74-0.91, p = 5.41 × 10(-5)) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
Collapse
Affiliation(s)
- Robert M. Cronin
- Department of Medicine, Vanderbilt UniversityNashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt UniversityNashville, TN, USA
- *Correspondence: Robert M. Cronin, Department of Biomedical Informatics, Vanderbilt University Medical Center, 220 Garland 440 EBL, Nashville, TN 37232, USA e-mail:
| | - Julie R. Field
- Office of Research, Vanderbilt UniversityNashville, TN, USA
| | - Yuki Bradford
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt UniversityNashville, TN, USA
| | - Christian M. Shaffer
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt UniversityNashville, TN, USA
| | | | - Jonathan D. Mosley
- Department of Medicine, Vanderbilt UniversityNashville, TN, USA
- Department of Pharmacology, Vanderbilt UniversityNashville, TN, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt UniversityNashville, TN, USA
| | - Todd L. Edwards
- Vanderbilt Epidemiology Center, Vanderbilt UniversityNashville, TN, USA
| | - Scott J. Hebbring
- Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA
| | - Simon Lin
- Biomedical Informatics Research Center, Marshfield Clinic Research FoundationMarshfield, WI, USA
| | - Lucia A. Hindorff
- Division of Genomic Medicine, National Human Genome Research InstituteBethesda, MD, USA
| | - Paul K. Crane
- Department of Medicine, University of WashingtonSeattle, WA, USA
| | - Sarah A. Pendergrass
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State UniversityUniversity Park, PA, USA
| | - Marylyn D. Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State UniversityUniversity Park, PA, USA
| | - Dana C. Crawford
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt UniversityNashville, TN, USA
| | - Jyotishman Pathak
- Divisions of Biomedical Informatics and Statistics, Mayo ClinicRochester, MN, USA
| | | | | | - David R. Crosslin
- Department of Genome Sciences, University of WashingtonSeattle, WA, USA
| | | | - David J. Carey
- Weis Center for Research, Geisinger Health SystemDanville, PA, USA
| | - Gerard Tromp
- Weis Center for Research, Geisinger Health SystemDanville, PA, USA
| | - Marc S. Williams
- Genomic Medicine Institute, Geisinger Health SystemDanville, PA, USA
| | | | - Gail P. Jarvik
- Department of Medicine, University of WashingtonSeattle, WA, USA
- Department of Genome Sciences, University of WashingtonSeattle, WA, USA
| | - Peggy L. Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research FoundationMarshfield, WI, USA
| | - Murray H. Brilliant
- Center for Human Genetics, Marshfield Clinic Research FoundationMarshfield, WI, USA
| | | | - Christopher G. Chute
- Divisions of Biomedical Informatics and Statistics, Mayo ClinicRochester, MN, USA
| | | | - Erwin Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount SinaiNew York, NY, USA
| | - Rex Chisholm
- Department of Cell and Molecular Biology, Feinberg School of Medicine, Northwestern UniversityEvanston, IL, USA
| | - Maureen E. Smith
- Department of Cell and Molecular Biology, Feinberg School of Medicine, Northwestern UniversityEvanston, IL, USA
| | - Dan M. Roden
- Department of Medicine, Vanderbilt UniversityNashville, TN, USA
- Department of Pharmacology, Vanderbilt UniversityNashville, TN, USA
| | - Joshua C. Denny
- Department of Medicine, Vanderbilt UniversityNashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt UniversityNashville, TN, USA
- Joshua C. Denny, Department of Biomedical Informatics and Department of Medicine, Vanderbilt University Medical Center, 2525 West End Avenue, Suite 600, Nashville, TN 37203-8820, USA e-mail:
| |
Collapse
|
772
|
Tyler AL, Crawford DC, Pendergrass SA. Detecting and Characterizing Pleiotropy: New Methods for Uncovering the Connection Between the Complexity of Genomic Architecture and Multiple phenotypes. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:183-187. [PMID: 25072629 PMCID: PMC4108263 DOI: 10.1142/9789814583220_0018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | - Dana C. Crawford
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN 37240, USA
| | - Sarah A. Pendergrass
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
773
|
Shameer K, Denny JC, Ding K, Jouni H, Crosslin DR, de Andrade M, Chute CG, Peissig P, Pacheco JA, Li R, Bastarache L, Kho AN, Ritchie MD, Masys DR, Chisholm RL, Larson EB, McCarty CA, Roden DM, Jarvik GP, Kullo IJ. A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects. Hum Genet 2014; 133:95-109. [PMID: 24026423 PMCID: PMC3880605 DOI: 10.1007/s00439-013-1355-7] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 08/22/2013] [Indexed: 12/21/2022]
Abstract
Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized genetic variants associated with MPV and PLT using functional, pathway and disease enrichment analyses; we assessed pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic network had data for PLT and 6,291 participants had data for MPV. We identified five chromosomal regions associated with PLT and eight associated with MPV at genome-wide significance (P < 5E-8). In addition, we replicated 20 SNPs [out of 56 SNPs (α: 0.05/56 = 9E-4)] influencing PLT and 22 SNPs [out of 29 SNPs (α: 0.05/29 = 2E-3)] influencing MPV in a published meta-analysis of GWAS of PLT and MPV. While our GWAS did not find any new associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development, and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1,368 diagnoses (0.05/1368 = 3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune, and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
Collapse
Affiliation(s)
- Khader Shameer
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55905, USA
| | - Joshua C. Denny
- Departments of Medicine and Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA
| | - Keyue Ding
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55905, USA
| | - Hayan Jouni
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55905, USA
| | - David R. Crosslin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Mariza de Andrade
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Christopher G. Chute
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905, USA
| | - Peggy Peissig
- Biomedical Informatics Research Center, Marshfield Clinic, Marshfield, WI, 54449, USA
| | - Jennifer A. Pacheco
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Rongling Li
- Office of Population Genomics, National Human Genome Research Institute, 5635 Fishers Lane, Suite 3058, MSC 9307, Bethesda, MD, 20892, USA
| | - Lisa Bastarache
- Departments of Medicine and Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA
| | - Abel N. Kho
- Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Pennsylvania State University, Eberly College of Science, The Huck Institutes of the Life Sciences, 512 Wartik Laboratory, University Park, PA 16802 USA
| | - Daniel R. Masys
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Room 416 Eskind Medical Library, Nashville, TN, 37232, USA
| | - Rex L. Chisholm
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Eric B. Larson
- Group Health Research Institute, 1730 Minor Avenue, Suite 1600, Seattle, WA, 98101, USA
| | | | - Dan M. Roden
- Department of Pharmacology, Vanderbilt University School of Medicine, 1285 Medical Research Building IV, Nashville, TN, 37232, USA
| | - Gail P. Jarvik
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle WA 98195, USA
| | - Iftikhar J. Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN 55905, USA
| |
Collapse
|
774
|
Feng Q, Vickers KC, Anderson MP, Levin MG, Chen W, Harrison DG, Wilke RA. A common functional promoter variant links CNR1 gene expression to HDL cholesterol level. Nat Commun 2013; 4:1973. [PMID: 23748922 PMCID: PMC3873874 DOI: 10.1038/ncomms2973] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Accepted: 05/08/2013] [Indexed: 12/17/2022] Open
Abstract
CB1 receptor blockers increase HDL-C levels. Although genetic variation in the CB1 receptor – encoded by the CNR1 gene – is known to influence HDL-C level as well, human studies conducted to date have been limited to genetic markers such as haplotype tagging SNPs. Here we identify rs806371 in the CNR1 promoter as the causal variant. We resequenced the CNR1 gene and genotype all variants in a DNA biobank linked to comprehensive electronic medical records. By testing each variant for association with HDL-C level in a clinical practice-based setting, we localize a putative functional allele to a 100bp window in the 5′-flanking region. Assessment of variants in this window for functional impact on electrophoretic mobility shift assay identified rs806371 as a novel regulatory binding element. Reporter gene assays confirm that rs806371 reduces HDL-C gene expression, thereby linking CNR1 gene variation to HDL-C level in humans.
Collapse
Affiliation(s)
- Q Feng
- Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
| | | | | | | | | | | | | |
Collapse
|
775
|
Neuraz A, Chouchana L, Malamut G, Le Beller C, Roche D, Beaune P, Degoulet P, Burgun A, Loriot MA, Avillach P. Phenome-wide association studies on a quantitative trait: application to TPMT enzyme activity and thiopurine therapy in pharmacogenomics. PLoS Comput Biol 2013; 9:e1003405. [PMID: 24385893 PMCID: PMC3873228 DOI: 10.1371/journal.pcbi.1003405] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Accepted: 11/08/2013] [Indexed: 02/04/2023] Open
Abstract
Phenome-Wide Association Studies (PheWAS) investigate whether genetic polymorphisms associated with a phenotype are also associated with other diagnoses. In this study, we have developed new methods to perform a PheWAS based on ICD-10 codes and biological test results, and to use a quantitative trait as the selection criterion. We tested our approach on thiopurine S-methyltransferase (TPMT) activity in patients treated by thiopurine drugs. We developed 2 aggregation methods for the ICD-10 codes: an ICD-10 hierarchy and a mapping to existing ICD-9-CM based PheWAS codes. Eleven biological test results were also analyzed using discretization algorithms. We applied these methods in patients having a TPMT activity assessment from the clinical data warehouse of a French academic hospital between January 2000 and July 2013. Data after initiation of thiopurine treatment were analyzed and patient groups were compared according to their TPMT activity level. A total of 442 patient records were analyzed representing 10,252 ICD-10 codes and 72,711 biological test results. The results from the ICD-9-CM based PheWAS codes and ICD-10 hierarchy codes were concordant. Cross-validation with the biological test results allowed us to validate the ICD phenotypes. Iron-deficiency anemia and diabetes mellitus were associated with a very high TPMT activity (p = 0.0004 and p = 0.0015, respectively). We describe here an original method to perform PheWAS on a quantitative trait using both ICD-10 diagnosis codes and biological test results to identify associated phenotypes. In the field of pharmacogenomics, PheWAS allow for the identification of new subgroups of patients who require personalized clinical and therapeutic management. The use of underlying molecular mechanisms and other factors to describe and classify diseases is a major challenge for future treatment strategies. New methods are needed to achieve this goal. The phenome wide association study (PheWAS) methodology was initially developed to unveil unknown associations between a specific genetic status and phenotypic features (e.g. diagnoses from electronic health records). We initially propose to extend this method to assessment of the relationships between the levels of a quantitative trait and diagnosis codes. We also assess the relationships between this quantitative trait and the biological test results. We tested this method using the levels of enzymatic activity of thiopurine S-methyltransferase (TPMT) that is involved in the metabolism of thiopurine drugs used in inflammatory bowel diseases for example. We discovered an association between a very high TPMT activity and nutritional anemia and diabetes. These results could be used to describe a new subgroup of patients in order to optimize drug treatments.
Collapse
Affiliation(s)
- Antoine Neuraz
- Biomedical Informatics and Public Health Department, University Hospital HEGP, AP-HP, Paris, France
- INSERM UMR_S 872 Team 22: Information Sciences to support Personalized Medicine, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
| | - Laurent Chouchana
- INSERM UMR-S 775, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Georgia Malamut
- Gastroenterology Department, University Hospital HEGP, AP-HP, Paris, France
| | | | - Denis Roche
- Biochemistry, Pharmacogenetics and Molecular Oncology Unit, University Hospital HEGP, AP-HP, Paris, France
| | - Philippe Beaune
- INSERM UMR-S 775, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Biochemistry, Pharmacogenetics and Molecular Oncology Unit, University Hospital HEGP, AP-HP, Paris, France
| | - Patrice Degoulet
- Biomedical Informatics and Public Health Department, University Hospital HEGP, AP-HP, Paris, France
- INSERM UMR_S 872 Team 22: Information Sciences to support Personalized Medicine, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
| | - Anita Burgun
- Biomedical Informatics and Public Health Department, University Hospital HEGP, AP-HP, Paris, France
- INSERM UMR_S 872 Team 22: Information Sciences to support Personalized Medicine, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
| | - Marie-Anne Loriot
- INSERM UMR-S 775, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
- Biochemistry, Pharmacogenetics and Molecular Oncology Unit, University Hospital HEGP, AP-HP, Paris, France
| | - Paul Avillach
- Biomedical Informatics and Public Health Department, University Hospital HEGP, AP-HP, Paris, France
- INSERM UMR_S 872 Team 22: Information Sciences to support Personalized Medicine, Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
- * E-mail:
| |
Collapse
|
776
|
Mechanistic phenotypes: an aggregative phenotyping strategy to identify disease mechanisms using GWAS data. PLoS One 2013; 8:e81503. [PMID: 24349080 PMCID: PMC3861317 DOI: 10.1371/journal.pone.0081503] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 10/23/2013] [Indexed: 11/19/2022] Open
Abstract
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with “mechanistic phenotypes”, comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10−5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10−6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
Collapse
|
777
|
Abstract
Combining genotyping and the data locked in medical records yields a large number of known genotype-phenotype associations.
Collapse
Affiliation(s)
- Nigam H Shah
- Nigam H. Shah is at the Center for Biomedical Informatics Research,
Stanford, California, USA.
| |
Collapse
|
778
|
Boland MR, Hripcsak G, Shen Y, Chung WK, Weng C. Defining a comprehensive verotype using electronic health records for personalized medicine. J Am Med Inform Assoc 2013; 20:e232-8. [PMID: 24001516 PMCID: PMC3861934 DOI: 10.1136/amiajnl-2013-001932] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 08/12/2013] [Indexed: 11/04/2022] Open
Abstract
The burgeoning adoption of electronic health records (EHR) introduces a golden opportunity for studying individual manifestations of myriad diseases, which is called 'EHR phenotyping'. In this paper, we break down this concept by: relating it to phenotype definitions from Johannsen; comparing it to cohort identification and disease subtyping; introducing a new concept called 'verotype' (Latin: vere = true, actually) to represent the 'true' population of similar patients for treatment purposes through the integration of genotype, phenotype, and disease subtype (eg, specific glucose value pattern in patients with diabetes) information; analyzing the value of the 'verotype' concept for personalized medicine; and outlining the potential for using network-based approaches to reverse engineer clinical disease subtypes.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Yufeng Shen
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Department of Systems Biology, Columbia University, New York, New York, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University, New York, New York, USA
- Department of Medicine, Columbia University, New York, New York, USA
- The Irving Institute for Clinical and Translational Research, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- The Irving Institute for Clinical and Translational Research, Columbia University, New York, New York, USA
| |
Collapse
|
779
|
McPeek Hinz ER, Bastarache L, Denny JC. A natural language processing algorithm to define a venous thromboembolism phenotype. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:975-983. [PMID: 24551388 PMCID: PMC3900229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Known risk factors are attributed for only slight majority of venous thromboembolic disease (VTE) with the remainder of risk presumably related to unidentified genetic factors. We designed a general purpose Natural Language (NLP) algorithm to retrospectively capture both acute and historical cases of thromboembolic disease in a de-identified electronic health record. Applying the NLP algorithm to a separate evaluation set found a positive predictive value of 84.7% and sensitivity of 95.3% for an F-measure of 0.897, which was similar to the training set of 0.925. Use of the same algorithm on problem lists only in patients without VTE ICD-9s was found to be the best means of capturing historical cases with a PPV of 83%. NLP of VTE ICD-9 positive cases and non-ICD-9 positive problem lists provides an effective means for capture of both acute and historical cases of venous thromboembolic disease.
Collapse
Affiliation(s)
| | | | - Joshua C Denny
- Departments of Biomedical Informatics, Nashville, TN ; Medicine Vanderbilt University School of Medicine, Nashville, TN
| |
Collapse
|
780
|
Shivade C, Raghavan P, Fosler-Lussier E, Embi PJ, Elhadad N, Johnson SB, Lai AM. A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc 2013; 21:221-30. [PMID: 24201027 PMCID: PMC3932460 DOI: 10.1136/amiajnl-2013-001935] [Citation(s) in RCA: 316] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
Collapse
Affiliation(s)
- Chaitanya Shivade
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
| | | | | | | | | | | | | |
Collapse
|
781
|
Ye Z, Kalloo FS, Dalenberg AK, Kullo IJ. An electronic medical record-linked biorepository to identify novel biomarkers for atherosclerotic cardiovascular disease. Glob Cardiol Sci Pract 2013; 2013:82-90. [PMID: 24689004 PMCID: PMC3963733 DOI: 10.5339/gcsp.2013.10] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 03/06/2013] [Indexed: 12/04/2022] Open
Abstract
Background: Atherosclerotic vascular disease (AVD), a leading cause of morbidity and mortality, is increasing in prevalence in the developing world. We describe an approach to establish a biorepository linked to medical records with the eventual goal of facilitating discovery of biomarkers for AVD. Methods: The Vascular Disease Biorepository at Mayo Clinic was established to archive DNA, plasma, and serum from patients with suspected AVD. AVD phenotypes, relevant risk factors and comorbid conditions were ascertained by electronic medical record (EMR)-based electronic algorithms that included diagnosis and procedure codes, laboratory data and text searches to ascertain medication use. Results: Up to December 2012, 8800 patients referred for vascular ultrasound examination and non-invasive lower extremity arterial evaluation were approached, of whom 5268 consented. The mean age of the initial 2182 patients recruited was 70.4 ± 11.2 years, 62.6% were men and 97.6% were whites. The prevalences of AVD phenotypes were: carotid artery stenosis 48%, abdominal aortic aneurysm 21% and peripheral arterial disease 38%. Positive predictive values for electronic phenotyping algorithms were>0.90 for cases (and>0.95 for controls) for each AVD phenotype, using manual review of the EMR as the gold standard. The prevalences of risk factors and comorbidities were as follows: hypertension 78%, diabetes 29%, dyslipidemia 73%, smoking 70%, coronary heart disease 37%, heart failure 12%, cerebrovascular disease 20% and chronic kidney disease 19%. Conclusions: Our study demonstrates the feasibility of establishing a biorepository of plasma, serum and DNA, with relatively rapid annotation of clinical variables using EMR-based algorithms.
Collapse
Affiliation(s)
| | - Fara S Kalloo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | | | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
782
|
Mullane K, Winquist RJ, Williams M. Translational paradigms in pharmacology and drug discovery. Biochem Pharmacol 2013; 87:189-210. [PMID: 24184503 DOI: 10.1016/j.bcp.2013.10.019] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/16/2013] [Indexed: 12/15/2022]
Abstract
The translational sciences represent the core element in enabling and utilizing the output from the biomedical sciences and to improving drug discovery metrics by reducing the attrition rate as compounds move from preclinical research to clinical proof of concept. Key to understanding the basis of disease causality and to developing therapeutics is an ability to accurately diagnose the disease and to identify and develop safe and effective therapeutics for its treatment. The former requires validated biomarkers and the latter, qualified targets. Progress has been hampered by semantic issues, specifically those that define the end product, and by scientific issues that include data reliability, an overt reductionistic cultural focus and a lack of hierarchically integrated data gathering and systematic analysis. A necessary framework for these activities is represented by the discipline of pharmacology, efforts and training in which require recognition and revitalization.
Collapse
Affiliation(s)
- Kevin Mullane
- Profectus Pharma Consulting Inc., San Jose, CA, United States.
| | - Raymond J Winquist
- Department of Pharmacology, Vertex Pharmaceuticals Inc., Cambridge, MA, United States
| | - Michael Williams
- Department of Molecular Pharmacology and Biological Chemistry, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| |
Collapse
|
783
|
Gundlapalli AV, Redd A, Carter M, Divita G, Shen S, Palmer M, Samore MH. Validating a strategy for psychosocial phenotyping using a large corpus of clinical text. J Am Med Inform Assoc 2013; 20:e355-64. [PMID: 24169276 DOI: 10.1136/amiajnl-2013-001946] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
OBJECTIVE To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. MATERIALS AND METHODS From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. RESULTS A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). CONCLUSIONS Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.
Collapse
Affiliation(s)
- Adi V Gundlapalli
- IDEAS Center, VA Salt Lake City Health Care System, Salt Lake City, Utah, USA
| | | | | | | | | | | | | |
Collapse
|
784
|
Davis MF, Sriram S, Bush WS, Denny JC, Haines JL. Automated extraction of clinical traits of multiple sclerosis in electronic medical records. J Am Med Inform Assoc 2013; 20:e334-40. [PMID: 24148554 PMCID: PMC3861927 DOI: 10.1136/amiajnl-2013-001999] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Objectives The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course. Materials and methods We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers. Results We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%. Discussion and conclusion This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability.
Collapse
Affiliation(s)
- Mary F Davis
- Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | | | | | | |
Collapse
|
785
|
Smoller JW. Disorders and borders: psychiatric genetics and nosology. Am J Med Genet B Neuropsychiatr Genet 2013; 162B:559-78. [PMID: 24132891 DOI: 10.1002/ajmg.b.32174] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Accepted: 05/07/2013] [Indexed: 01/10/2023]
Abstract
Over the past century, the definition and classification of psychiatric disorders has evolved through a combination of historical trends, clinical observations, and empirical research. The current nosology, instantiated in the DSM-5 and ICD-10, rests on descriptive criteria agreed upon by a consensus of experts. While the development of explicit criteria has enhanced the reliability of diagnosis, the validity of the current diagnostic categories has been the subject of debate and controversy. Genetic studies have long been regarded as a key resource for validating the boundaries among diagnostic categories. Genetic epidemiologic studies have documented the familiality and heritability of clinically defined psychiatric disorders and molecular genetic studies have begun to identify specific susceptibility variants. At the same time, there is growing evidence from family, twin and genomic studies that genetic influences on psychiatric disorders transcend clinical boundaries. Here I review this evidence for cross-disorder genetic effects and discuss the implications of these findings for psychiatric nosology. Psychiatric genetic research can inform a bottom-up reappraisal of psychopathology that may help the field move beyond a purely descriptive classification and toward an etiology-based nosology.
Collapse
Affiliation(s)
- Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit and Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| |
Collapse
|
786
|
Sun YV, Davis RL. Rapid collection of biospecimens by automated identification of patients eligible for pharmacoepigenetic studies. J Pers Med 2013; 3:263-74. [PMID: 25562727 PMCID: PMC4251387 DOI: 10.3390/jpm3040263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 09/04/2013] [Accepted: 09/10/2013] [Indexed: 01/12/2023] Open
Abstract
Epigenetics plays an important role in regulating gene expression, and can be modified by environmental factors and physiological conditions. Studying epigenetics is a promising approach to potentially improving the diagnosis, prevention and treatment of human diseases, and to providing personalized medical care. However, the role of epigenetics in the development of diseases is not clear because epigenetic markers may be both mediators and outcomes of human diseases. It is particularly complicated to study pharmacoepigenetics, as medication use may modify the epigenetic profile. To address the challenges facing pharmacoepigenetic research of human diseases, we developed a novel design to rapidly identify, contact, and recruit participants and collect specimens for longitudinal studies of pharmacoepigenetics. Using data in real-time from electronic medical record systems, we can identify patients recently start on new medications and who also have a blood test. Prior to disposal of the leftover blood by the clinical laboratory, we are able to contact and recruit these patients, enabling us to use both their leftover baseline blood sample as well as leftover specimens at future tests. With treatment-naïve and follow-up specimens, this system is able to study both epigenetic markers associated with disease without treatment effect as well as treatment-related epigenetic changes.
Collapse
Affiliation(s)
- Yan V Sun
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA.
| | - Robert L Davis
- Center for Health Research, Kaiser Permanente Georgia, Atlanta, GA 30305, USA.
| |
Collapse
|
787
|
Medication-wide association studies. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2013; 2:e76. [PMID: 24448022 PMCID: PMC4026636 DOI: 10.1038/psp.2013.52] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 08/09/2013] [Indexed: 11/23/2022]
Abstract
Undiscovered side effects of drugs can have a profound effect on the health of the nation, and electronic health-care databases offer opportunities to speed up the discovery of these side effects. We applied a “medication-wide association study” approach that combined multivariate analysis with exploratory visualization to study four health outcomes of interest in an administrative claims database of 46 million patients and a clinical database of 11 million patients. The technique had good predictive value, but there was no threshold high enough to eliminate false-positive findings. The visualization not only highlighted the class effects that strengthened the review of specific products but also underscored the challenges in confounding. These findings suggest that observational databases are useful for identifying potential associations that warrant further consideration but are unlikely to provide definitive evidence of causal effects.
Collapse
|
788
|
Wei WQ, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 2013; 20:954-61. [PMID: 23576672 PMCID: PMC3756263 DOI: 10.1136/amiajnl-2012-001431] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Revised: 02/25/2013] [Accepted: 03/18/2013] [Indexed: 11/09/2022] Open
Abstract
OBJECTIVE To create a computable MEDication Indication resource (MEDI) to support primary and secondary use of electronic medical records (EMRs). MATERIALS AND METHODS We processed four public medication resources, RxNorm, Side Effect Resource (SIDER) 2, MedlinePlus, and Wikipedia, to create MEDI. We applied natural language processing and ontology relationships to extract indications for prescribable, single-ingredient medication concepts and all ingredient concepts as defined by RxNorm. Indications were coded as Unified Medical Language System (UMLS) concepts and International Classification of Diseases, 9th edition (ICD9) codes. A total of 689 extracted indications were randomly selected for manual review for accuracy using dual-physician review. We identified a subset of medication-indication pairs that optimizes recall while maintaining high precision. RESULTS MEDI contains 3112 medications and 63 343 medication-indication pairs. Wikipedia was the largest resource, with 2608 medications and 34 911 pairs. For each resource, estimated precision and recall, respectively, were 94% and 20% for RxNorm, 75% and 33% for MedlinePlus, 67% and 31% for SIDER 2, and 56% and 51% for Wikipedia. The MEDI high-precision subset (MEDI-HPS) includes indications found within either RxNorm or at least two of the three other resources. MEDI-HPS contains 13 304 unique indication pairs regarding 2136 medications. The mean±SD number of indications for each medication in MEDI-HPS is 6.22 ± 6.09. The estimated precision of MEDI-HPS is 92%. CONCLUSIONS MEDI is a publicly available, computable resource that links medications with their indications as represented by concepts and billing codes. MEDI may benefit clinical EMR applications and reuse of EMR data for research.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | | | | | | | | | | |
Collapse
|
789
|
Anand V, Rosenman MB, Downs SM. Translating genome wide association study results to associations among common diseases: In silico study with an electronic medical record. Int J Med Inform 2013; 82:864-74. [DOI: 10.1016/j.ijmedinf.2013.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Revised: 05/03/2013] [Accepted: 05/06/2013] [Indexed: 10/26/2022]
|
790
|
Lyalina S, Percha B, LePendu P, Iyer SV, Altman RB, Shah NH. Identifying phenotypic signatures of neuropsychiatric disorders from electronic medical records. J Am Med Inform Assoc 2013; 20:e297-305. [PMID: 23956017 DOI: 10.1136/amiajnl-2013-001933] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
OBJECTIVE Mental illness is the leading cause of disability in the USA, but boundaries between different mental illnesses are notoriously difficult to define. Electronic medical records (EMRs) have recently emerged as a powerful new source of information for defining the phenotypic signatures of specific diseases. We investigated how EMR-based text mining and statistical analysis could elucidate the phenotypic boundaries of three important neuropsychiatric illnesses-autism, bipolar disorder, and schizophrenia. METHODS We analyzed the medical records of over 7000 patients at two facilities using an automated text-processing pipeline to annotate the clinical notes with Unified Medical Language System codes and then searching for enriched codes, and associations among codes, that were representative of the three disorders. We used dimensionality-reduction techniques on individual patient records to understand individual-level phenotypic variation within each disorder, as well as the degree of overlap among disorders. RESULTS We demonstrate that automated EMR mining can be used to extract relevant drugs and phenotypes associated with neuropsychiatric disorders and characteristic patterns of associations among them. Patient-level analyses suggest a clear separation between autism and the other disorders, while revealing significant overlap between schizophrenia and bipolar disorder. They also enable localization of individual patients within the phenotypic 'landscape' of each disorder. CONCLUSIONS Because EMRs reflect the realities of patient care rather than idealized conceptualizations of disease states, we argue that automated EMR mining can help define the boundaries between different mental illnesses, facilitate cohort building for clinical and genomic studies, and reveal how clear expert-defined disease boundaries are in practice.
Collapse
Affiliation(s)
- Svetlana Lyalina
- Department of Bioengineering, Stanford University, Stanford, California, USA
| | | | | | | | | | | |
Collapse
|
791
|
Warner JL, Zollanvari A, Ding Q, Zhang P, Snyder GM, Alterovitz G. Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications. J Am Med Inform Assoc 2013; 20:e281-7. [PMID: 23907284 DOI: 10.1136/amiajnl-2013-001861] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE To develop methods for visual analysis of temporal phenotype data available through electronic health records (EHR). MATERIALS AND METHODS 24 580 adults from the multiparameter intelligent monitoring in intensive care V.6 (MIMIC II) EHR database of critically ill patients were analyzed, with significant temporal associations visualized as a map of associations between hospital length of stay (LOS) and ICD-9-CM codes. An expanded phenotype, using ICD-9-CM, microbiology, and computerized physician order entry data, was defined for hospital-acquired Clostridium difficile (HA-CDI). LOS, estimated costs, 30-day post-discharge mortality, and antecedent medication provider order entry were evaluated for HA-CDI cases compared to randomly selected controls. RESULTS Temporal phenome analysis revealed 191 significant codes (p value, adjusted for false discovery rate, ≤0.05). HA-CDI was identified in 414 cases, and was associated with longer median LOS, 20 versus 9 days, and adjusted HR 0.33 (95% CI 0.28 to 0.39). This prolongation carries an estimated annual incremental cost increase of US$1.2-2.0 billion in the USA alone. DISCUSSION Comprehensive EHR data have made large-scale phenome-based analysis feasible. Time-dependent pathological disease states have dynamic phenomic evolution, which may be captured through visual analytical approaches. Although MIMIC II is a single institutional retrospective database, our approach should be portable to other EHR data sources, including prospective 'learning healthcare systems'. For example, interventions to prevent HA-CDI could be dynamically evaluated using the same techniques. CONCLUSIONS The new visual analytical method described in this paper led directly to the identification of numerous hospital-acquired conditions, which could be further explored through an expanded phenotype definition.
Collapse
Affiliation(s)
- Jeremy L Warner
- Department of Medicine, Division of Hematology and Oncology, Vanderbilt University, Nashville, Tennessee, USA
| | | | | | | | | | | |
Collapse
|
792
|
Hanauer DA, Ramakrishnan N, Seyfried LS. Describing the relationship between cat bites and human depression using data from an electronic health record. PLoS One 2013; 8:e70585. [PMID: 23936453 PMCID: PMC3731284 DOI: 10.1371/journal.pone.0070585] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 06/20/2013] [Indexed: 01/09/2023] Open
Abstract
Data mining approaches have been increasingly applied to the electronic health record and have led to the discovery of numerous clinical associations. Recent data mining studies have suggested a potential association between cat bites and human depression. To explore this possible association in more detail we first used administrative diagnosis codes to identify patients with either depression or bites, drawn from a population of 1.3 million patients. We then conducted a manual chart review in the electronic health record of all patients with a code for a bite to accurately determine which were from cats or dogs. Overall there were 750 patients with cat bites, 1,108 with dog bites, and approximately 117,000 patients with depression. Depression was found in 41.3% of patients with cat bites and 28.7% of those with dog bites. Furthermore, 85.5% of those with both cat bites and depression were women, compared to 64.5% of those with dog bites and depression. The probability of a woman being diagnosed with depression at some point in her life if she presented to our health system with a cat bite was 47.0%, compared to 24.2% of men presenting with a similar bite. The high proportion of depression in patients who had cat bites, especially among women, suggests that screening for depression could be appropriate in patients who present to a clinical provider with a cat bite. Additionally, while no causative link is known to explain this association, there is growing evidence to suggest that the relationship between cats and human mental illness, such as depression, warrants further investigation.
Collapse
Affiliation(s)
- David A Hanauer
- Department of Pediatrics, University of Michigan, Ann Arbor, Michigan, USA.
| | | | | |
Collapse
|
793
|
Rosenbloom ST, Madison JL, Brothers KB, Bowton EA, Clayton EW, Malin BA, Roden DM, Pulley J. Ethical and practical challenges to studying patients who opt out of large-scale biorepository research. J Am Med Inform Assoc 2013; 20:e221-5. [PMID: 23886923 DOI: 10.1136/amiajnl-2013-001937] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Large-scale biorepositories that couple biologic specimens with electronic health records containing documentation of phenotypic expression can accelerate scientific research and discovery. However, differences between those subjects who participate in biorepository-based research and the population from which they are drawn may influence research validity. While an opt-out approach to biorepository-based research enhances inclusiveness, empirical research evaluating voluntariness, risk, and the feasibility of an opt-out approach is sparse, and factors influencing patients' decisions to opt out are understudied. Determining why patients choose to opt out may help to improve voluntariness, however there may be ethical and logistical challenges to studying those who opt out. In this perspective paper, the authors explore what is known about research based on the opt-out model, describe a large-scale biorepository that leverages the opt-out model, and review specific ethical and logistical challenges to bridging the research gaps that remain.
Collapse
Affiliation(s)
- S Trent Rosenbloom
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | | | | | | | | | | | | |
Collapse
|
794
|
Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet 2013; 14:483-95. [PMID: 23752797 DOI: 10.1038/nrg3461] [Citation(s) in RCA: 748] [Impact Index Per Article: 62.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide association studies have identified many variants that each affects multiple traits, particularly across autoimmune diseases, cancers and neuropsychiatric disorders, suggesting that pleiotropic effects on human complex traits may be widespread. However, systematic detection of such effects is challenging and requires new methodologies and frameworks for interpreting cross-phenotype results. In this Review, we discuss the evidence for pleiotropy in contemporary genetic mapping studies, new and established analytical approaches to identifying pleiotropic effects, sources of spurious cross-phenotype effects and study design considerations. We also outline the molecular and clinical implications of such findings and discuss future directions of research.
Collapse
Affiliation(s)
- Nadia Solovieff
- Center for Human Genetics Research, Massachusetts General Hospital, 185 Cambridge Street, Boston, Massachusetts 02114, USA
| | | | | | | | | |
Collapse
|
795
|
Gottesman O, Kuivaniemi H, Tromp G, Faucett WA, Li R, Manolio TA, Sanderson SC, Kannry J, Zinberg R, Basford MA, Brilliant M, Carey DJ, Chisholm RL, Chute CG, Connolly JJ, Crosslin D, Denny JC, Gallego CJ, Haines JL, Hakonarson H, Harley J, Jarvik GP, Kohane I, Kullo IJ, Larson EB, McCarty C, Ritchie MD, Roden DM, Smith ME, Böttinger EP, Williams MS. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future. Genet Med 2013; 15:761-71. [PMID: 23743551 PMCID: PMC3795928 DOI: 10.1038/gim.2013.72] [Citation(s) in RCA: 541] [Impact Index Per Article: 45.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 04/18/2013] [Indexed: 12/13/2022] Open
Abstract
The Electronic Medical Records and Genomics Network is a National Human Genome Research Institute–funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here, we describe the evolution, accomplishments, opportunities, and challenges of the network from its inception as a five-group consortium focused on genotype–phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting toward the implementation of genomic medicine. Genet Med15 10, 761–771.
Collapse
|
796
|
Liao KP, Kurreeman F, Li G, Duclos G, Murphy S, Guzman R, Cai T, Gupta N, Gainer V, Schur P, Cui J, Denny JC, Szolovits P, Churchill S, Kohane I, Karlson EW, Plenge RM. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. ACTA ACUST UNITED AC 2013; 65:571-81. [PMID: 23233247 DOI: 10.1002/art.37801] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 11/15/2012] [Indexed: 12/29/2022]
Abstract
OBJECTIVE The significance of non-rheumatoid arthritis (RA) autoantibodies in patients with RA is unclear. The aim of this study was to assess associations of autoantibodies with autoimmune risk alleles and with clinical diagnoses from the electronic medical records (EMRs) among RA cases and non-RA controls. METHODS Data on 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry were obtained from the EMRs of 2 large academic centers. The levels of anti-citrullinated protein antibodies (ACPAs), antinuclear antibodies (ANAs), anti-tissue transglutaminase antibodies (AGTAs), and anti-thyroid peroxidase (anti-TPO) antibodies were measured. All subjects were genotyped for autoimmune risk alleles, and the association between number of autoimmune risk alleles present and number of types of autoantibodies present was studied. A phenome-wide association study (PheWAS) was conducted to study potential associations between autoantibodies and clinical diagnoses among RA cases and non-RA controls. RESULTS The mean ages were 60.7 years in RA cases and 64.6 years in non-RA controls. The proportion of female subjects was 79% in each group. The prevalence of ACPAs and ANAs was higher in RA cases compared to controls (each P < 0.0001); there were no differences in the prevalence of anti-TPO antibodies and AGTAs. Carriage of higher numbers of autoimmune risk alleles was associated with increasing numbers of autoantibody types in RA cases (P = 2.1 × 10(-5)) and non-RA controls (P = 5.0 × 10(-3)). From the PheWAS, the presence of ANAs was significantly associated with a diagnosis of Sjögren's/sicca syndrome in RA cases. CONCLUSION The increased frequency of autoantibodies in RA cases and non-RA controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS of EMR data, with linkage to laboratory data obtained from blood samples, provide a novel method to test for the clinical significance of biomarkers in disease.
Collapse
Affiliation(s)
- Katherine P Liao
- Brigham and Women's Hospital, 75 Francis Street, PBB-B3, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
797
|
Warner JL, Alterovitz G, Bodio K, Joyce RM. External phenome analysis enables a rational federated query strategy to detect changing rates of treatment-related complications associated with multiple myeloma. J Am Med Inform Assoc 2013; 20:696-9. [PMID: 23515788 DOI: 10.1136/amiajnl-2012-001355] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Electronic health records (EHRs) are increasingly useful for health services research. For relatively uncommon conditions, such as multiple myeloma (MM) and its treatment-related complications, a combination of multiple EHR sources is essential for such research. The Shared Health Research Information Network (SHRINE) enables queries for aggregate results across participating institutions. Development of a rational search strategy in SHRINE may be augmented through analysis of pre-existing databases. We developed a SHRINE query for likely non-infectious treatment-related complications of MM, based upon an analysis of the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database. Using this query strategy, we found that the rate of likely treatment-related complications significantly increased from 2001 to 2007, by an average of 6% a year (p=0.01), across the participating SHRINE institutions. This finding is in keeping with increasingly aggressive strategies in the treatment of MM. This proof of concept demonstrates that a staged approach to federated queries, using external EHR data, can yield potentially clinically meaningful results.
Collapse
Affiliation(s)
- Jeremy L Warner
- Department of Medicine, Division of Hematology and Oncology, Vanderbilt University, Nashville, Tennessee 37232, USA.
| | | | | | | |
Collapse
|
798
|
Boland MR, Hripcsak G, Albers DJ, Wei Y, Wilcox AB, Wei J, Li J, Lin S, Breene M, Myers R, Zimmerman J, Papapanou PN, Weng C. Discovering medical conditions associated with periodontitis using linked electronic health records. J Clin Periodontol 2013; 40:474-82. [PMID: 23495669 DOI: 10.1111/jcpe.12086] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/20/2013] [Indexed: 12/12/2022]
Abstract
AIM To use linked electronic medical and dental records to discover associations between periodontitis and medical conditions independent of a priori hypotheses. MATERIALS AND METHODS This case-control study included 2475 patients who underwent dental treatment at the College of Dental Medicine at Columbia University and medical treatment at NewYork-Presbyterian Hospital. Our cases are patients who received periodontal treatment and our controls are patients who received dental maintenance but no periodontal treatment. Chi-square analysis was performed for medical treatment codes and logistic regression was used to adjust for confounders. RESULTS Our method replicated several important periodontitis associations in a largely Hispanic population, including diabetes mellitus type I (OR = 1.6, 95% CI 1.30-1.99, p < 0.001) and type II (OR = 1.4, 95% CI 1.22-1.67, p < 0.001), hypertension (OR = 1.2, 95% CI 1.10-1.37, p < 0.001), hypercholesterolaemia (OR = 1.2, 95% CI 1.07-1.38, p = 0.004), hyperlipidaemia (OR = 1.2, 95% CI 1.06-1.43, p = 0.008) and conditions pertaining to pregnancy and childbirth (OR = 2.9, 95% CI: 1.32-7.21, p = 0.014). We also found a previously unreported association with benign prostatic hyperplasia (OR = 1.5, 95% CI 1.05-2.10, p = 0.026) after adjusting for age, gender, ethnicity, hypertension, diabetes, obesity, lipid and circulatory system conditions, alcohol and tobacco abuse. CONCLUSIONS This study contributes a high-throughput method for associating periodontitis with systemic diseases using linked electronic records.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
799
|
|
800
|
Ritchie MD, Denny JC, Zuvich RL, Crawford DC, Schildcrout JS, Bastarache L, Ramirez AH, Mosley JD, Pulley JM, Basford MA, Bradford Y, Rasmussen LV, Pathak J, Chute CG, Kullo IJ, McCarty CA, Chisholm RL, Kho AN, Carlson CS, Larson EB, Jarvik GP, Sotoodehnia N, Manolio TA, Li R, Masys DR, Haines JL, Roden DM. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 2013; 127:1377-85. [PMID: 23463857 DOI: 10.1161/circulationaha.112.000604] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND ECG QRS duration, a measure of cardiac intraventricular conduction, varies ≈2-fold in individuals without cardiac disease. Slow conduction may promote re-entrant arrhythmias. METHODS AND RESULTS We performed a genome-wide association study to identify genomic markers of QRS duration in 5272 individuals without cardiac disease selected from electronic medical record algorithms at 5 sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium QRS genome-wide association study meta-analysis. Twenty-three single-nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 single-nucleotide polymorphisms were in the chromosome 3 SCN5A and SCN10A loci, where the most significant single-nucleotide polymorphisms were rs1805126 in SCN5A with P=1.2×10(-8) (eMERGE) and P=2.5×10(-20) (CHARGE) and rs6795970 in SCN10A with P=6×10(-6) (eMERGE) and P=5×10(-27) (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies on variants in these 5 loci in 13859 European Americans to search for diagnoses associated with these markers. Phenome-wide association study identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5272 "heart-healthy" study population. CONCLUSIONS We conclude that DNA biobanks coupled to electronic medical records not only provide a platform for genome-wide association study but also may allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The phenome-wide association study approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|