1
|
Danis D, Bamshad MJ, Bridges Y, Cacheiro P, Carmody LC, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24308104. [PMID: 38854034 PMCID: PMC11160806 DOI: 10.1101/2024.05.29.24308104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
| | - Ben Coleman
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore
- Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- University College London Institute of Child Health, London, United Kingdom
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Anthony J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy
- Morgagni foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Ccampus
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
- High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Utrecht University, Utrecht, the Netherlands
| | | | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- ELLIS-European Laboratory for Learning and Intelligent Systems
| |
Collapse
|
2
|
Chishtie J, Sapiro N, Wiebe N, Rabatach L, Lorenzetti D, Leung AA, Rabi D, Quan H, Eastwood CA. Use of Epic Electronic Health Record System for Health Care Research: Scoping Review. J Med Internet Res 2023; 25:e51003. [PMID: 38100185 PMCID: PMC10757236 DOI: 10.2196/51003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/29/2023] [Accepted: 11/05/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Electronic health records (EHRs) enable health data exchange across interconnected systems from varied settings. Epic is among the 5 leading EHR providers and is the most adopted EHR system across the globe. Despite its global reach, there is a gap in the literature detailing how EHR systems such as Epic have been used for health care research. OBJECTIVE The objective of this scoping review is to synthesize the available literature on use cases of the Epic EHR for research in various areas of clinical and health sciences. METHODS We used established scoping review methods and searched 9 major information repositories, including databases and gray literature sources. To categorize the research data, we developed detailed criteria for 5 major research domains to present the results. RESULTS We present a comprehensive picture of the method types in 5 research domains. A total of 4669 articles were screened by 2 independent reviewers at each stage, while 206 articles were abstracted. Most studies were from the United States, with a sharp increase in volume from the year 2015 onwards. Most articles focused on clinical care, health services research and clinical decision support. Among research designs, most studies used longitudinal designs, followed by interventional studies implemented at single sites in adult populations. Important facilitators and barriers to the use of Epic and EHRs in general were identified. Important lessons to the use of Epic and other EHRs for research purposes were also synthesized. CONCLUSIONS The Epic EHR provides a wide variety of functions that are helpful toward research in several domains, including clinical and population health, quality improvement, and the development of clinical decision support tools. As Epic is reported to be the most globally adopted EHR, researchers can take advantage of its various system features, including pooled data, integration of modules and developing decision support tools. Such research opportunities afforded by the system can contribute to improving quality of care, building health system efficiencies, and conducting population-level studies. Although this review is limited to the Epic EHR system, the larger lessons are generalizable to other EHRs.
Collapse
Affiliation(s)
- Jawad Chishtie
- Center for Health Informatics, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Calgary, AB, Canada
| | - Natalie Sapiro
- Center for Health Informatics, University of Calgary, Calgary, AB, Canada
| | - Natalie Wiebe
- Center for Health Informatics, University of Calgary, Calgary, AB, Canada
- Alberta Health Services, Calgary, AB, Canada
| | | | - Diane Lorenzetti
- Community Health Sciences, University of Calgary, Calgary, AB, Canada
- Health Sciences Library, University of Calgary, Calgary, AB, Canada
| | - Alexander A Leung
- Community Health Sciences, University of Calgary, Calgary, AB, Canada
- Department of Medicine, University of Calgary, Calgary, AB, Canada
| | - Doreen Rabi
- Community Health Sciences, University of Calgary, Calgary, AB, Canada
- Department of Medicine, University of Calgary, Calgary, AB, Canada
| | - Hude Quan
- Center for Health Informatics, University of Calgary, Calgary, AB, Canada
- Community Health Sciences, University of Calgary, Calgary, AB, Canada
| | - Cathy A Eastwood
- Center for Health Informatics, University of Calgary, Calgary, AB, Canada
- Community Health Sciences, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
3
|
Boßelmann CM, Hedrich UBS, Lerche H, Pfeifer N. Predicting functional effects of ion channel variants using new phenotypic machine learning methods. PLoS Comput Biol 2023; 19:e1010959. [PMID: 36877742 PMCID: PMC10019634 DOI: 10.1371/journal.pcbi.1010959] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/16/2023] [Accepted: 02/19/2023] [Indexed: 03/07/2023] Open
Abstract
Missense variants in genes encoding ion channels are associated with a spectrum of severe diseases. Variant effects on biophysical function correlate with clinical features and can be categorized as gain- or loss-of-function. This information enables a timely diagnosis, facilitates precision therapy, and guides prognosis. Functional characterization presents a bottleneck in translational medicine. Machine learning models may be able to rapidly generate supporting evidence by predicting variant functional effects. Here, we describe a multi-task multi-kernel learning framework capable of harmonizing functional results and structural information with clinical phenotypes. This novel approach extends the human phenotype ontology towards kernel-based supervised machine learning. Our gain- or loss-of-function classifier achieves high performance (mean accuracy 0.853 SD 0.016, mean AU-ROC 0.912 SD 0.025), outperforming both conventional baseline and state-of-the-art methods. Performance is robust across different phenotypic similarity measures and largely insensitive to phenotypic noise or sparsity. Localized multi-kernel learning offered biological insight and interpretability by highlighting channels with implicit genotype-phenotype correlations or latent task similarity for downstream analysis.
Collapse
Affiliation(s)
- Christian Malte Boßelmann
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Ulrike B. S. Hedrich
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
| | - Holger Lerche
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Tuebingen, Germany
- * E-mail: (HL); (NP)
| | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tuebingen, Tuebingen, Germany
- * E-mail: (HL); (NP)
| |
Collapse
|
4
|
Nixon A, Fang L, Havrilla JM, Wang K. Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation. Chem Biodivers 2022; 19:e202200805. [PMID: 36328766 DOI: 10.1002/cbdv.202200805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022]
Abstract
Clinical notes from electronic health records (EHRs) contain a large amount of clinical phenotype data on patients that can provide insights into the phenotypic presentation of various diseases. A number of Natural Language Processing (NLP) algorithms have been utilized in the past few years to annotate medical concepts, such as Human Phenotype Ontology (HPO) terms, from clinical notes. However, efficient use of NLP algorithms requires the use of high-quality clinical notes with phenotype descriptions, and erroneous annotations often exist in results from these NLP algorithms. Manual review by human experts is often needed to compile the correct phenotype information on individual patients. Here we develop TermViewer, a web application that allows multi-party collaborative annotation and quality assessment of clinical notes that have already been processed and tagged by NLP algorithms. TermViewer allows users to view clinical notes with HPO terms highlighted, and to easily classify high-quality notes and revise incorrect tagging of HPO terms. Currently, TermViewer combines MetaMap and cTAKES, two of the most widely used NLP tools for tagging medical terms, and identifies where these two tools agree and disagree, allowing users to perform collaborative manual reviews of computationally generated HPO annotations. TermViewer can be a stand-alone tool for analyzing notes or become part of a machine-learning pipeline where tagged HPO terms can be used as additional input data. TermViewer is available at https://github.com/WGLab/TermViewer.
Collapse
Affiliation(s)
- Anna Nixon
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|