1
|
Chen JS, Copado IA, Vallejos C, Kalaw FGP, Soe P, Cai CX, Toy BC, Borkar D, Sun CQ, Shantha JG, Baxter SL. Variations in Electronic Health Record-Based Definitions of Diabetic Retinopathy Cohorts: A Literature Review and Quantitative Analysis. OPHTHALMOLOGY SCIENCE 2024; 4:100468. [PMID: 38560278 PMCID: PMC10973665 DOI: 10.1016/j.xops.2024.100468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 01/04/2024] [Accepted: 01/11/2024] [Indexed: 04/04/2024]
Abstract
Purpose Use of the electronic health record (EHR) has motivated the need for data standardization. A gap in knowledge exists regarding variations in existing terminologies for defining diabetic retinopathy (DR) cohorts. This study aimed to review the literature and analyze variations regarding codified definitions of DR. Design Literature review and quantitative analysis. Subjects Published manuscripts. Methods Four graders reviewed PubMed and Google Scholar for peer-reviewed studies. Studies were included if they used codified definitions of DR (e.g., billing codes). Data elements such as author names, publication year, purpose, data set type, and DR definitions were manually extracted. Each study was reviewed by ≥ 2 authors to validate inclusion eligibility. Quantitative analyses of the codified definitions were then performed to characterize the variation between DR cohort definitions. Main Outcome Measures Number of studies included and numeric counts of billing codes used to define codified cohorts. Results In total, 43 studies met the inclusion criteria. Half of the included studies used datasets based on structured EHR data (i.e., data registries, institutional EHR review), and half used claims data. All but 1 of the studies used billing codes such as the International Classification of Diseases 9th or 10th edition (ICD-9 or ICD-10), either alone or in addition to another terminology for defining disease. Of the 27 included studies that used ICD-9 and the 20 studies that used ICD-10 codes, the most common codes used pertained to the full spectrum of DR severity. Diabetic retinopathy complications (e.g., vitreous hemorrhage) were also used to define some DR cohorts. Conclusions Substantial variations exist among codified definitions for DR cohorts within retrospective studies. Variable definitions may limit generalizability and reproducibility of retrospective studies. More work is needed to standardize disease cohorts. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Jimmy S Chen
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Ivan A Copado
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cecilia Vallejos
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Fritz Gerald P Kalaw
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Priyanka Soe
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| | - Cindy X Cai
- Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Brian C Toy
- Department of Ophthalmology, Roski Eye Institute, Keck School of Medicine, University of Southern California, Los Angeles, California
| | - Durga Borkar
- Department of Ophthalmology, Duke Eye Center, Duke University, Durham, North Carolina
| | - Catherine Q Sun
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Jessica G Shantha
- F.I. Proctor Foundation, University of California San Francisco, San Francisco, California
- Department of Ophthalmology, University of California San Francisco, San Francisco, California
| | - Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, California
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California
| |
Collapse
|
2
|
Silva L, Pacheco T, Araújo E, Duarte RJ, Ribeiro-Vaz I, Ferreira-da-Silva R. Unveiling the future: precision pharmacovigilance in the era of personalized medicine. Int J Clin Pharm 2024; 46:755-760. [PMID: 38416349 PMCID: PMC11133017 DOI: 10.1007/s11096-024-01709-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 01/30/2024] [Indexed: 02/29/2024]
Abstract
In the era of personalized medicine, pharmacovigilance faces new challenges and opportunities, demanding a shift from traditional approaches. This article delves into the evolving landscape of drug safety monitoring in the context of personalized treatments. We aim to provide a succinct reflection on the intersection of tailored therapeutic strategies and vigilant pharmacovigilance practices. We discuss the integration of pharmacogenetics in enhancing drug safety, illustrating how genetic profiling aids in predicting drug responses and adverse reactions. Emphasizing the importance of phase IV-post-marketing surveillance, we explore the limitations of pre-marketing trials and the necessity for a comprehensive approach to drug safety. The article discusses the pivotal role of pharmacogenetics in pre-exposure risk management and the redefinition of pharmacoepidemiological methods for post-exposure surveillance. We highlight the significance of integrating patient-specific genetic profiles in creating personalized medication leaflets and the use of advanced computational methods in data analysis. Additionally, we examine the ethical, privacy, and data security challenges inherent in precision medicine, emphasizing their implications for patient consent and data management.
Collapse
Affiliation(s)
- Lurdes Silva
- Faculty of Pharmacy of the University of Porto, Porto, Portugal
| | - Teresa Pacheco
- Faculty of Pharmacy of the University of Porto, Porto, Portugal
| | - Emília Araújo
- Palliative Care Service, Portuguese Oncology Institute of Porto (IPO Porto), Porto, Portugal
- Center for Health Technology and Services Research, Associate Laboratory RISE - Health Research Network (CINTESIS@RISE), Porto, Portugal
| | | | - Inês Ribeiro-Vaz
- Center for Health Technology and Services Research, Associate Laboratory RISE - Health Research Network (CINTESIS@RISE), Porto, Portugal
- Porto Pharmacovigilance Centre, Faculty of Medicine of the University of Porto, Porto, Portugal
- Department of Community Medicine, Health Information and Decision, Faculty of Medicine of the University of Porto, Porto, Portugal
| | - Renato Ferreira-da-Silva
- Center for Health Technology and Services Research, Associate Laboratory RISE - Health Research Network (CINTESIS@RISE), Porto, Portugal.
- Porto Pharmacovigilance Centre, Faculty of Medicine of the University of Porto, Porto, Portugal.
- Department of Community Medicine, Health Information and Decision, Faculty of Medicine of the University of Porto, Porto, Portugal.
| |
Collapse
|
3
|
Wei WQ, Rowley R, Wood A, MacArthur J, Embi PJ, Denaxas S. Improving reporting standards for phenotyping algorithm in biomedical research: 5 fundamental dimensions. J Am Med Inform Assoc 2024; 31:1036-1041. [PMID: 38269642 PMCID: PMC10990558 DOI: 10.1093/jamia/ocae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/12/2023] [Accepted: 01/08/2024] [Indexed: 01/26/2024] Open
Abstract
INTRODUCTION Phenotyping algorithms enable the interpretation of complex health data and definition of clinically relevant phenotypes; they have become crucial in biomedical research. However, the lack of standardization and transparency inhibits the cross-comparison of findings among different studies, limits large scale meta-analyses, confuses the research community, and prevents the reuse of algorithms, which results in duplication of efforts and the waste of valuable resources. RECOMMENDATIONS Here, we propose five independent fundamental dimensions of phenotyping algorithms-complexity, performance, efficiency, implementability, and maintenance-through which researchers can describe, measure, and deploy any algorithms efficiently and effectively. These dimensions must be considered in the context of explicit use cases and transparent methods to ensure that they do not reflect unexpected biases or exacerbate inequities.
Collapse
Affiliation(s)
- Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Robb Rowley
- National Human Genome Research Institute, Bethesda, MD 20892, United States
| | - Angela Wood
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, CB2 1TN, United Kingdom
| | - Jacqueline MacArthur
- British Heart Foundation Data Science Center, Health Data Research, London, NW1 2BE, United Kingdom
| | - Peter J Embi
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, United States
| | - Spiros Denaxas
- British Heart Foundation Data Science Center, Health Data Research, London, NW1 2BE, United Kingdom
- Institute of Health Informatics, University College London, London, WC1E 6BT, United Kingdom
| |
Collapse
|
4
|
Chen C, An G, Yu X, Wang S, Lin P, Yuan J, Zhuang Y, Lu X, Bai Y, Zhang G, Su J, Qu J, Xu L, Wang H. Screening Mutations of the Monogenic Syndromic High Myopia by Whole Exome Sequencing From MAGIC Project. Invest Ophthalmol Vis Sci 2024; 65:9. [PMID: 38315492 PMCID: PMC10851780 DOI: 10.1167/iovs.65.2.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/17/2024] [Indexed: 02/07/2024] Open
Abstract
Purpose This observational study aimed to identify mutations in monogenic syndromic high myopia (msHM) using data from reported samples (n = 9370) of the Myopia Associated Genetics and Intervention Consortium (MAGIC) project. Methods The targeted panel containing 298 msHM-related genes was constructed and screening of clinically actionable variants was performed based on whole exome sequencing. Capillary sequencing was used to verify the identified gene mutations in the probands and perform segregation analysis with their relatives. Results A total of 381 candidate variants in 84 genes and 85 eye diseases were found to contribute to msHM in 3.6% (335/9370) of patients with HM. Among them, the 22 genes with the most variations accounted for 62.7% of the diagnostic cases. In the genotype-phenotype association analysis, 60% (201/335) of suspected msHM cases were recalled and 25 patients (12.4%) received a definitive genetic diagnosis. Pathogenic variants were distributed in 18 msHM-related diseases, mainly involving retinal dystrophy genes (e.g. TRPM1, CACNA1F, and FZD4), connective tissue disease genes (e.g. FBN1 and COL2A1), corneal or lens development genes (HSF4, GJA8, and MIP), and other genes (TEK). The msHM gene mutation types were allocated to four categories: nonsense mutations (36%), missense mutations (36%), frameshift mutations (20%), and splice site mutations (8%). Conclusions This study highlights the importance of thorough molecular subtyping of msHM to provide appropriate genetic counselling and multispecialty care for children and adolescents with HM.
Collapse
Affiliation(s)
- Chong Chen
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou, China
| | - Gang An
- Institute of PSI Genomics Co., Ltd., Wenzhou, China
| | - Xiaoguang Yu
- Institute of PSI Genomics Co., Ltd., Wenzhou, China
| | - Siyu Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Peng Lin
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Jian Yuan
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Youyuan Zhuang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Xiaoyan Lu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Yu Bai
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou, China
| | - Guosi Zhang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Jianzhong Su
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
| | - Jia Qu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou, China
| | - Liangde Xu
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou, China
| | - Hong Wang
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- State Key Laboratory of Ophthalmology, Optometry and Visual Science, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- National Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, China
- Center of Optometry International Innovation of Wenzhou, Eye Valley, Wenzhou, China
| |
Collapse
|
5
|
Groza T, Caufield H, Gration D, Baynam G, Haendel MA, Robinson PN, Mungall CJ, Reese JT. An evaluation of GPT models for phenotype concept recognition. BMC Med Inform Decis Mak 2024; 24:30. [PMID: 38297371 PMCID: PMC10829255 DOI: 10.1186/s12911-024-02439-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
OBJECTIVE Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Kent St, Bentley, WA, 6102, Australia.
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore, 169609, Singapore.
| | - Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
- Faculty of Health and Medical Sciences, University of Western Australia, 35 Stirling Hwy, Crawley, WA, 6009, Australia
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|
6
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. PATTERNS (NEW YORK, N.Y.) 2024; 5:100887. [PMID: 38264716 PMCID: PMC10801236 DOI: 10.1016/j.patter.2023.100887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 01/25/2024]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
7
|
Groza T, Wu H, Dinger ME, Danis D, Hilton C, Bagley A, Davids JR, Luo L, Lu Z, Robinson PN. Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics 2023; 39:btad716. [PMID: 38001031 PMCID: PMC10710372 DOI: 10.1093/bioinformatics/btad716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/20/2023] [Accepted: 11/23/2023] [Indexed: 11/26/2023] Open
Abstract
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Honghan Wu
- Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom
| | - Marcel E Dinger
- Pryzm Health, Sydney, NSW 2089, Australia
- School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Coleman Hilton
- Shriners Children’s Corporate Headquarters, Tampa, FL 33607, United States
| | - Anita Bagley
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| |
Collapse
|
8
|
Rehm HL, Alaimo JT, Aradhya S, Bayrak-Toydemir P, Best H, Brandon R, Buchan JG, Chao EC, Chen E, Clifford J, Cohen ASA, Conlin LK, Das S, Davis KW, Del Gaudio D, Del Viso F, DiVincenzo C, Eisenberg M, Guidugli L, Hammer MB, Harrison SM, Hatchell KE, Dyer LH, Hoang LU, Holt JM, Jobanputra V, Karbassi ID, Kearney HM, Kelly MA, Kelly JM, Kluge ML, Komala T, Kruszka P, Lau L, Lebo MS, Marshall CR, McKnight D, McWalter K, Meng Y, Nagan N, Neckelmann CS, Neerman N, Niu Z, Paolillo VK, Paolucci SA, Perry D, Pesaran T, Radtke K, Rasmussen KJ, Retterer K, Saunders CJ, Spiteri E, Stanley C, Szuto A, Taft RJ, Thiffault I, Thomas BC, Thomas-Wilson A, Thorpe E, Tidwell TJ, Towne MC, Zouk H. The landscape of reported VUS in multi-gene panel and genomic testing: Time for a change. Genet Med 2023; 25:100947. [PMID: 37534744 PMCID: PMC10825061 DOI: 10.1016/j.gim.2023.100947] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/20/2023] [Accepted: 07/26/2023] [Indexed: 08/04/2023] Open
Abstract
PURPOSE Variants of uncertain significance (VUS) are a common result of diagnostic genetic testing and can be difficult to manage with potential misinterpretation and downstream costs, including time investment by clinicians. We investigated the rate of VUS reported on diagnostic testing via multi-gene panels (MGPs) and exome and genome sequencing (ES/GS) to measure the magnitude of uncertain results and explore ways to reduce their potentially detrimental impact. METHODS Rates of inconclusive results due to VUS were collected from over 1.5 million sequencing test results from 19 clinical laboratories in North America from 2020 to 2021. RESULTS We found a lower rate of inconclusive test results due to VUSs from ES/GS (22.5%) compared with MGPs (32.6%; P < .0001). For MGPs, the rate of inconclusive results correlated with panel size. The use of trios reduced inconclusive rates (18.9% vs 27.6%; P < .0001), whereas the use of GS compared with ES had no impact (22.2% vs 22.6%; P = ns). CONCLUSION The high rate of VUS observed in diagnostic MGP testing warrants examining current variant reporting practices. We propose several approaches to reduce reported VUS rates, while directing clinician resources toward important VUS follow-up.
Collapse
Affiliation(s)
- Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA; Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Pathology, Harvard Medical School, Boston, MA.
| | - Joseph T Alaimo
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO; Department of Pediatrics, School of Medicine, University of Missouri, Kansas City, MO; Genomic Medicine Center, Children's Mercy Hospital, Kansas City, MO
| | - Swaroop Aradhya
- Invitae, San Francisco, CA; Department of Pathology, Stanford University School of Medicine, Palo Alto, CA
| | - Pinar Bayrak-Toydemir
- ARUP Laboratories, Salt Lake City, UT; Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT
| | - Hunter Best
- ARUP Laboratories, Salt Lake City, UT; Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT
| | | | - Jillian G Buchan
- Genetics Division, Laboratory Medicine and Pathology, University of Washington, Seattle, WA
| | | | | | | | - Ana S A Cohen
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO; Department of Pediatrics, School of Medicine, University of Missouri, Kansas City, MO; Genomic Medicine Center, Children's Mercy Hospital, Kansas City, MO
| | - Laura K Conlin
- Division of Genomic Diagnostics, Children's Hospital of Philadelphia, Philadelphia, PA; Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA
| | - Soma Das
- Human Genetics, University of Chicago, Chicago, IL
| | | | | | - Florencia Del Viso
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO
| | | | - Marcia Eisenberg
- Women's Health and Genetics, Labcorp, Research Triangle Park, NC
| | - Lucia Guidugli
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | - Monia B Hammer
- Rady Children's Institute for Genomic Medicine, San Diego, CA
| | | | | | | | | | - James M Holt
- HudsonAlpha Clinical Services Lab, LLC, Huntsville, AL
| | - Vaidehi Jobanputra
- Molecular Diagnostics, New York Genome Center, New York, NY; Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY
| | | | - Hutton M Kearney
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | | | - Jacob M Kelly
- HudsonAlpha Clinical Services Lab, LLC, Huntsville, AL
| | - Michelle L Kluge
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | | | | | - Lynette Lau
- Division of Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Matthew S Lebo
- Pathology, Harvard Medical School, Boston, MA; Laboratory for Molecular Medicine, Mass General Brigham, Cambridge, MA
| | - Christian R Marshall
- Division of Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
| | | | | | - Yan Meng
- Fulgent Genetics, Temple City, CA
| | | | | | | | - Zhiyv Niu
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | - Vitoria K Paolillo
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO
| | - Sarah A Paolucci
- Genetics Division, Laboratory Medicine and Pathology, University of Washington, Seattle, WA
| | | | | | | | - Kristen J Rasmussen
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN
| | | | - Carol J Saunders
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO; Genomic Medicine Center, Children's Mercy Hospital, Kansas City, MO; Department of Pediatrics and Pathology, School of Medicine, University of Missouri, Kansas City, MO
| | | | | | - Anna Szuto
- Clinical and Metabolic Genetics, The Hospital for Sick Children, Toronto, ON, Canada
| | | | - Isabelle Thiffault
- Department of Pathology and Laboratory Medicine, Children's Mercy Hospital, Kansas City, MO; Department of Pediatrics, School of Medicine, University of Missouri, Kansas City, MO; Genomic Medicine Center, Children's Mercy Hospital, Kansas City, MO
| | | | | | | | | | | | - Hana Zouk
- Pathology, Harvard Medical School, Boston, MA; Laboratory for Molecular Medicine, Mass General Brigham, Cambridge, MA
| |
Collapse
|
9
|
Fayos De Arizón L, Viera ER, Pilco M, Perera A, De Maeztu G, Nicolau A, Furlano M, Torra R. Artificial intelligence: a new field of knowledge for nephrologists? Clin Kidney J 2023; 16:2314-2326. [PMID: 38046016 PMCID: PMC10689169 DOI: 10.1093/ckj/sfad182] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Indexed: 12/05/2023] Open
Abstract
Artificial intelligence (AI) is a science that involves creating machines that can imitate human intelligence and learn. AI is ubiquitous in our daily lives, from search engines like Google to home assistants like Alexa and, more recently, OpenAI with its chatbot. AI can improve clinical care and research, but its use requires a solid understanding of its fundamentals, the promises and perils of algorithmic fairness, the barriers and solutions to its clinical implementation, and the pathways to developing an AI-competent workforce. The potential of AI in the field of nephrology is vast, particularly in the areas of diagnosis, treatment and prediction. One of the most significant advantages of AI is the ability to improve diagnostic accuracy. Machine learning algorithms can be trained to recognize patterns in patient data, including lab results, imaging and medical history, in order to identify early signs of kidney disease and thereby allow timely diagnoses and prompt initiation of treatment plans that can improve outcomes for patients. In short, AI holds the promise of advancing personalized medicine to new levels. While AI has tremendous potential, there are also significant challenges to its implementation, including data access and quality, data privacy and security, bias, trustworthiness, computing power, AI integration and legal issues. The European Commission's proposed regulatory framework for AI technology will play a significant role in ensuring the safe and ethical implementation of these technologies in the healthcare industry. Training nephrologists in the fundamentals of AI is imperative because traditionally, decision-making pertaining to the diagnosis, prognosis and treatment of renal patients has relied on ingrained practices, whereas AI serves as a powerful tool for swiftly and confidently synthesizing this information.
Collapse
Affiliation(s)
- Leonor Fayos De Arizón
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Elizabeth R Viera
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Melissa Pilco
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alexandre Perera
- Center for Biomedical Engineering Research (CREB), Universitat Politècnica de Barcelona (UPC), Barcelona, Spain; Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain; Institut de Recerca Sant Joan de Déu, Esplugues de Llobregat, Barcelona, Spain
| | | | | | - Monica Furlano
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Roser Torra
- Nephrology Department, Fundació Puigvert; Institut d'Investigacions Biomèdiques Sant Pau (IIB-Sant Pau); Departament de Medicina, Universitat Autònoma de Barcelona, Barcelona, Spain
| |
Collapse
|
10
|
Chen F, Ahimaz P, Wang K, Chung WK, Ta C, Weng C, Liu C. Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders. RESEARCH SQUARE 2023:rs.3.rs-3593490. [PMID: 38045411 PMCID: PMC10690317 DOI: 10.21203/rs.3.rs-3593490/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Rare disease patients often endure prolonged diagnostic odysseys and may still remain undiagnosed for years. Selecting the appropriate genetic tests is crucial to lead to timely diagnosis. Phenotypic features offer great potential for aiding genomic diagnosis in rare disease cases. We see great promise in effective integration of phenotypic information into genetic test selection workflow. In this study, we present a phenotype-driven molecular genetic test recommendation (Phen2Test) for pediatric rare disease diagnosis. Phen2Test was constructed using frequency matrix of phecodes and demographic data from the EHR before ordering genetic tests, with the objective to streamline the selection of molecular genetic tests (whole-exome / whole-genome sequencing, or gene panels) for clinicians with minimum genetic training expertise. We developed and evaluated binary classifiers based on 1,005 individuals referred to genetic counselors for potential genetic evaluation. In the evaluation using the gold standard cohort, the model achieved strong performance with an AUROC of 0.82 and an AUPRC of 0.92. Furthermore, we tested the model on another silver standard cohort (n=6,458), achieving an overall AUROC of 0.72 and an AUPRC of 0.671. Phen2Test was adjusted to align with current clinical guidelines, showing superior performance with more recent data, demonstrating its potential for use within a learning healthcare system as a genomic medicine intervention that adapts to guideline updates. This study showcases the practical utility of phenotypic features in recommending molecular genetic tests with performance comparable to clinical geneticists. Phen2Test could assist clinicians with limited genetic training and knowledge to order appropriate genetic tests.
Collapse
Affiliation(s)
- Fangyi Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Priyanka Ahimaz
- Department of Pediatrics, Columbia University, New York, NY, USA
- Institute of Genomic Medicine, Columbia University, New York, NY, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wendy K. Chung
- Department of Pediatrics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Casey Ta
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
11
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. ARXIV 2023:arXiv:2308.06294v2. [PMID: 37986722 PMCID: PMC10659449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
12
|
Sebro RA, Kahn CE. Automated detection of causal relationships among diseases and imaging findings in textual radiology reports. J Am Med Inform Assoc 2023; 30:1701-1706. [PMID: 37381076 PMCID: PMC10531499 DOI: 10.1093/jamia/ocad119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 06/10/2023] [Accepted: 06/16/2023] [Indexed: 06/30/2023] Open
Abstract
OBJECTIVE Textual radiology reports contain a wealth of information that may help understand associations among diseases and imaging observations. This study evaluated the ability to detect causal associations among diseases and imaging findings from their co-occurrence in radiology reports. MATERIALS AND METHODS This IRB-approved and HIPAA-compliant study analyzed 1 702 462 consecutive reports of 1 396 293 patients; patient consent was waived. Reports were analyzed for positive mention of 16 839 entities (disorders and imaging findings) of the Radiology Gamuts Ontology (RGO). Entities that occurred in fewer than 25 patients were excluded. A Bayesian network structure-learning algorithm was applied at P < 0.05 threshold: edges were evaluated as possible causal relationships. RGO and/or physician consensus served as ground truth. RESULTS 2742 of 16 839 RGO entities were included, 53 849 patients (3.9%) had at least one included entity. The algorithm identified 725 pairs of entities as causally related; 634 were confirmed by reference to RGO or physician review (87% precision). As shown by its positive likelihood ratio, the algorithm increased detection of causally associated entities 6876-fold. DISCUSSION Causal relationships among diseases and imaging findings can be detected with high precision from textual radiology reports. CONCLUSION This approach finds causal relationships among diseases and imaging findings with high precision from textual radiology reports, despite the fact that causally related entities represent only 0.039% of all pairs of entities. Applying this approach to larger report text corpora may help detect unspecified or heretofore unrecognized associations.
Collapse
Affiliation(s)
- Ronnie A Sebro
- Department of Radiology, Department of Orthopedic Surgery, and Center for Augmented Intelligence, Mayo Clinic, Jacksonville, Florida, USA
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
13
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
14
|
Labbe T, Castel P, Sanner JM, Saleh M. ChatGPT for phenotypes extraction: one model to rule them all? ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38082605 DOI: 10.1109/embc40787.2023.10340611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Information Extraction (IE) is a core task in Natural Language Processing (NLP) where the objective is to identify factual knowledge in textual documents (often unstructured), and feed downstream use cases with the resulting output. In genomic medicine for instance, being able to extract the most precise list of phenotypes associated to a patient allows to improve genetic disease diagnostic, which represents a vital step in the modern deep phenotyping approach. As most of the phenotypic information lies in clinical reports, the challenge is to build an IE pipeline to automatically recognize phenotype concepts from free-text notes. A new machine learning paradigm around large language models (LLM) has given rise of an increasing number of academic works on this topic lately, where sophisticated combinations of different technics have been employed to improve the phenotypes extraction accuracy. Even more recently released, the ChatGPT1 application nevertheless raises the question of the relevance of these approches compared to this new generic one based on an instruction-oriented LLM. In this paper, we propose a rigorous evaluation of ChatGPT and the current state-of-the-art solutions on this specific task, and discuss the possible impacts and the technical evolutions to consider in the medical domain.Clinical relevance- Deep phenotyping on electronic health records has proven its ability to improve genetic diagnosis by clinical exomes [10]. Thus, comparing state-of-the-art solutions in order to derive insights and improving research paths is essential.
Collapse
|
15
|
Venkat V, Abdelhalim H, DeGroat W, Zeeshan S, Ahmed Z. Investigating genes associated with heart failure, atrial fibrillation, and other cardiovascular diseases, and predicting disease using machine learning techniques for translational research and precision medicine. Genomics 2023; 115:110584. [PMID: 36813091 DOI: 10.1016/j.ygeno.2023.110584] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 02/06/2023] [Accepted: 02/11/2023] [Indexed: 02/22/2023]
Abstract
Cardiovascular disease (CVD) is the leading cause of mortality and loss of disability adjusted life years (DALYs) globally. CVDs like Heart Failure (HF) and Atrial Fibrillation (AF) are associated with physical effects on the heart muscles. As a result of the complex nature, progression, inherent genetic makeup, and heterogeneity of CVDs, personalized treatments are believed to be critical. Rightful application of artificial intelligence (AI) and machine learning (ML) approaches can lead to new insights into CVDs for providing better personalized treatments with predictive analysis and deep phenotyping. In this study we focused on implementing AI/ML techniques on RNA-seq driven gene-expression data to investigate genes associated with HF, AF, and other CVDs, and predict disease with high accuracy. The study involved generating RNA-seq data derived from the serum of consented CVD patients. Next, we processed the sequenced data using our RNA-seq pipeline and applied GVViZ for gene-disease data annotation and expression analysis. To achieve our research objectives, we developed a new Findable, Accessible, Intelligent, and Reproducible (FAIR) approach that includes a five-level biostatistical evaluation, primarily based on the Random Forest (RF) algorithm. During our AI/ML analysis, we have fitted, trained, and implemented our model to classify and distinguish high-risk CVD patients based on their age, gender, and race. With the successful execution of our model, we predicted the association of highly significant HF, AF, and other CVDs genes with demographic variables.
Collapse
Affiliation(s)
- Vignesh Venkat
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Habiba Abdelhalim
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - William DeGroat
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Rutgers Cancer Institute of New Jersey, Rutgers University, 195 Little Albany St, New Brunswick, NJ, USA
| | - Zeeshan Ahmed
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA; Department of Medicine, Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson St, New Brunswick, NJ, USA.
| |
Collapse
|
16
|
Pacheco JA, Rasmussen LV, Wiley K, Person TN, Cronkite DJ, Sohn S, Murphy S, Gundelach JH, Gainer V, Castro VM, Liu C, Mentch F, Lingren T, Sundaresan AS, Eickelberg G, Willis V, Furmanchuk A, Patel R, Carrell DS, Deng Y, Walton N, Satterfield BA, Kullo IJ, Dikilitas O, Smith JC, Peterson JF, Shang N, Kiryluk K, Ni Y, Li Y, Nadkarni GN, Rosenthal EA, Walunas TL, Williams MS, Karlson EW, Linder JE, Luo Y, Weng C, Wei W. Evaluation of the portability of computable phenotypes with natural language processing in the eMERGE network. Sci Rep 2023; 13:1971. [PMID: 36737471 PMCID: PMC9898520 DOI: 10.1038/s41598-023-27481-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.
Collapse
Affiliation(s)
| | | | - Ken Wiley
- National Human Genome Research Institute, Bethesda, USA
| | | | - David J Cronkite
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | | | | | | | | | | | - Cong Liu
- Columbia University, New York, USA
| | - Frank Mentch
- Children's Hospital of Philadelphia, Philadelphia, USA
| | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | | | | | | | | | | | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | - Yu Deng
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | | | | | - Yizhao Ni
- Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Yikuan Li
- Northwestern University, Evanston, USA
| | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Evanston, USA
| | | | - WeiQi Wei
- Vanderbilt University Medical Center, Nashville, USA
| |
Collapse
|
17
|
Abstract
Hundreds of different genetic causes of chronic kidney disease are now recognized, and while individually rare, taken together they are significant contributors to both adult and pediatric diseases. Traditional genetics approaches relied heavily on the identification of large families with multiple affected members and have been fundamental to the identification of genetic kidney diseases. With the increased utilization of massively parallel sequencing and improvements to genotype imputation, we can analyze rare variants in large cohorts of unrelated individuals, leading to personalized care for patients and significant research advancements. This review evaluates the contribution of rare disorders to patient care and the study of genetic kidney diseases and highlights key advancements that utilize new techniques to improve our ability to identify new gene-disease associations.
Collapse
Affiliation(s)
- Mark D Elliott
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Institute for Genomic Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - Hila Milo Rasouly
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| | - Ali G Gharavi
- Division of Nephrology, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA;
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
- Institute for Genomic Medicine, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA
| |
Collapse
|
18
|
Phenotype-aware prioritisation of rare Mendelian disease variants. Trends Genet 2022; 38:1271-1283. [PMID: 35934592 PMCID: PMC9950798 DOI: 10.1016/j.tig.2022.07.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 06/06/2022] [Accepted: 07/05/2022] [Indexed: 01/24/2023]
Abstract
A molecular diagnosis from the analysis of sequencing data in rare Mendelian diseases has a huge impact on the management of patients and their families. Numerous patient phenotype-aware variant prioritisation (VP) tools have been developed to help automate this process, and shorten the diagnostic odyssey, but performance statistics on real patient data are limited. Here we identify, assess, and compare the performance of all up-to-date, freely available, and programmatically accessible tools using a whole-exome, retinal disease dataset from 134 individuals with a molecular diagnosis. All tools were able to identify around two-thirds of the genetic diagnoses as the top-ranked candidate, with LIRICAL performing best overall. Finally, we discuss the challenges to overcome most cases remaining undiagnosed after current, state-of-the-art practices.
Collapse
|
19
|
Nixon A, Fang L, Havrilla JM, Wang K. Termviewer - A Web Application for Streamlined Human Phenotype Ontology (HPO) Tagging and Document Annotation. Chem Biodivers 2022; 19:e202200805. [PMID: 36328766 DOI: 10.1002/cbdv.202200805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022]
Abstract
Clinical notes from electronic health records (EHRs) contain a large amount of clinical phenotype data on patients that can provide insights into the phenotypic presentation of various diseases. A number of Natural Language Processing (NLP) algorithms have been utilized in the past few years to annotate medical concepts, such as Human Phenotype Ontology (HPO) terms, from clinical notes. However, efficient use of NLP algorithms requires the use of high-quality clinical notes with phenotype descriptions, and erroneous annotations often exist in results from these NLP algorithms. Manual review by human experts is often needed to compile the correct phenotype information on individual patients. Here we develop TermViewer, a web application that allows multi-party collaborative annotation and quality assessment of clinical notes that have already been processed and tagged by NLP algorithms. TermViewer allows users to view clinical notes with HPO terms highlighted, and to easily classify high-quality notes and revise incorrect tagging of HPO terms. Currently, TermViewer combines MetaMap and cTAKES, two of the most widely used NLP tools for tagging medical terms, and identifies where these two tools agree and disagree, allowing users to perform collaborative manual reviews of computationally generated HPO annotations. TermViewer can be a stand-alone tool for analyzing notes or become part of a machine-learning pipeline where tagged HPO terms can be used as additional input data. TermViewer is available at https://github.com/WGLab/TermViewer.
Collapse
Affiliation(s)
- Anna Nixon
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| |
Collapse
|
20
|
Early illustrations of the importance of systematic phenotyping. Eur J Hum Genet 2022; 30:1102. [PMID: 36221027 PMCID: PMC9554047 DOI: 10.1038/s41431-022-01165-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 07/21/2022] [Indexed: 11/05/2022] Open
|
21
|
Johnson B, Ouyang K, Frank L, Truty R, Rojahn S, Morales A, Aradhya S, Nykamp K. Systematic use of phenotype evidence in clinical genetic testing reduces the frequency of variants of uncertain significance. Am J Med Genet A 2022; 188:2642-2651. [PMID: 35570716 PMCID: PMC9544038 DOI: 10.1002/ajmg.a.62779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 04/23/2022] [Indexed: 01/24/2023]
Abstract
Guidelines for variant interpretation include criteria for incorporating phenotype evidence, but this evidence is inconsistently applied. Systematic approaches to using phenotype evidence are needed. We developed a method for curating disease phenotypes as highly or moderately predictive of variant pathogenicity based on the frequency of their association with disease-causing variants. To evaluate this method's accuracy, we retrospectively reviewed variants with clinical classifications that had evolved from uncertain to definitive in genes associated with curated predictive phenotypes. To demonstrate the clinical validity and utility of this approach, we compared variant classifications determined with and without predictive phenotype evidence. The curation method was accurate for 93%-98% of eligible variants. Among variants interpreted using highly predictive phenotype evidence, the percentage classified as pathogenic or likely pathogenic was 80%, compared with 46%-54% had the evidence not been used. Positive results among individuals harboring variants with highly predictive phenotype-guided interpretations would have been missed in 25%-37% of diagnostic tests and 39%-50% of carrier screens had other approaches to phenotype evidence been used. In summary, predictive phenotype evidence associated with specific curated genes can be systematically incorporated into variant interpretation to reduce uncertainty and increase the clinical utility of genetic testing.
Collapse
Affiliation(s)
| | | | | | | | | | - Ana Morales
- Invitae CorporationSan FranciscoCaliforniaUSA
| | | | | |
Collapse
|
22
|
Liu C, Ta CN, Havrilla JM, Nestor JG, Spotnitz ME, Geneslaw AS, Hu Y, Chung WK, Wang K, Weng C. OARD: Open annotations for rare diseases and their phenotypes based on real-world data. Am J Hum Genet 2022; 109:1591-1604. [PMID: 35998640 PMCID: PMC9502051 DOI: 10.1016/j.ajhg.2022.08.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open
Abstract
Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.
Collapse
Affiliation(s)
- Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Casey N Ta
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Jim M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jordan G Nestor
- Division of Nephrology, Department of Medicine, Columbia University, New York, NY 10032, USA
| | - Matthew E Spotnitz
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Andrew S Geneslaw
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
23
|
Havrilla JM, Singaravelu A, Driscoll DM, Minkovsky L, Helbig I, Medne L, Wang K, Krantz I, Desai BR. PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med Inform Decis Mak 2022; 22:198. [PMID: 35902925 PMCID: PMC9335954 DOI: 10.1186/s12911-022-01927-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 07/06/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Clinical phenotype information greatly facilitates genetic diagnostic interpretations pipelines in disease. While post-hoc extraction using natural language processing on unstructured clinical notes continues to improve, there is a need to improve point-of-care collection of patient phenotypes. Therefore, we developed "PheNominal", a point-of-care web application, embedded within Epic electronic health record (EHR) workflows, to permit capture of standardized phenotype data. METHODS Using bi-directional web services available within commercial EHRs, we developed a lightweight web application that allows users to rapidly browse and identify relevant terms from the Human Phenotype Ontology (HPO). Selected terms are saved discretely within the patient's EHR, permitting reuse both in clinical notes as well as in downstream diagnostic and research pipelines. RESULTS In the 16 months since implementation, PheNominal was used to capture discrete phenotype data for over 1500 individuals and 11,000 HPO terms during clinic and inpatient encounters for a genetic diagnostic consultation service within a quaternary-care pediatric academic medical center. An average of 7 HPO terms were captured per patient. Compared to a manual workflow, the average time to enter terms for a patient was reduced from 15 to 5 min per patient, and there were fewer annotation errors. CONCLUSIONS Modern EHRs support integration of external applications using application programming interfaces. We describe a practical application of these interfaces to facilitate deep phenotype capture in a discrete, structured format within a busy clinical workflow. Future versions will include a vendor-agnostic implementation using FHIR. We describe pilot efforts to integrate structured phenotyping through controlled dictionaries into diagnostic and research pipelines, reducing manual effort for phenotype documentation and reducing errors in data entry.
Collapse
Affiliation(s)
- James M. Havrilla
- grid.239552.a0000 0001 0680 8770Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Anbumalar Singaravelu
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Dennis M. Driscoll
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Leonard Minkovsky
- grid.239552.a0000 0001 0680 8770Emerging Technology and Transformation Team, Information Services, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Ingo Helbig
- grid.239552.a0000 0001 0680 8770Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.239552.a0000 0001 0680 8770The Epilepsy NeuroGenetics Initiative (ENGIN), Children’s Hospital of Philadelphia, Philadelphia, USA ,grid.239552.a0000 0001 0680 8770Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Livija Medne
- grid.239552.a0000 0001 0680 8770Roberts Individualized Medical Genetics Center, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Kai Wang
- grid.239552.a0000 0001 0680 8770Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.239552.a0000 0001 0680 8770Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| | - Ian Krantz
- grid.239552.a0000 0001 0680 8770Roberts Individualized Medical Genetics Center, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Bimal R. Desai
- grid.25879.310000 0004 1936 8972Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104 USA
| |
Collapse
|
24
|
A Formative Study of the Implementation of Whole Genome Sequencing in Northern Ireland. Genes (Basel) 2022; 13:genes13071104. [PMID: 35885887 PMCID: PMC9316942 DOI: 10.3390/genes13071104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/13/2022] [Accepted: 06/14/2022] [Indexed: 02/05/2023] Open
Abstract
Background: The UK 100,000 Genomes Project was a transformational research project which facilitated whole genome sequencing (WGS) diagnostics for rare diseases. We evaluated experiences of introducing WGS in Northern Ireland, providing recommendations for future projects. Methods: This formative evaluation included (1) an appraisal of the logistics of implementing and delivering WGS, (2) a survey of participant self-reported views and experiences, (3) semi-structured interviews with healthcare staff as key informants who were involved in the delivery of WGS and (4) a workshop discussion about interprofessional collaboration with respect to molecular diagnostics. Results: We engaged with >400 participants, with detailed reflections obtained from 74 participants including patients, caregivers, key National Health Service (NHS) informants, and researchers (patient survey n = 42; semi-structured interviews n = 19; attendees of the discussion workshop n = 13). Overarching themes included the need to improve rare disease awareness, education, and support services, as well as interprofessional collaboration being central to an effective, mainstreamed molecular diagnostic service. Conclusions: Recommendations for streamlining precision medicine for patients with rare diseases include administrative improvements (e.g., streamlining of the consent process), educational improvements (e.g., rare disease training provided from undergraduate to postgraduate education alongside genomics training for non-genetic specialists) and analytical improvements (e.g., multidisciplinary collaboration and improved computational infrastructure).
Collapse
|
25
|
Peng J, Xu D, Lee R, Xu S, Zhou Y, Wang K. Expediting knowledge acquisition by a web framework for Knowledge Graph Exploration and Visualization (KGEV): case studies on COVID-19 and Human Phenotype Ontology. BMC Med Inform Decis Mak 2022; 22:147. [PMID: 35655307 PMCID: PMC9161770 DOI: 10.1186/s12911-022-01848-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Knowledges graphs (KGs) serve as a convenient framework for structuring knowledge. A number of computational methods have been developed to generate KGs from biomedical literature and use them for downstream tasks such as link prediction and question answering. However, there is a lack of computational tools or web frameworks to support the exploration and visualization of the KG themselves, which would facilitate interactive knowledge discovery and formulation of novel biological hypotheses.
Method
We developed a web framework for Knowledge Graph Exploration and Visualization (KGEV), to construct and visualize KGs in five stages: triple extraction, triple filtration, metadata preparation, knowledge integration, and graph database preparation. The application has convenient user interface tools, such as node and edge search and filtering, data source filtering, neighborhood retrieval, and shortest path calculation, that work by querying a backend graph database. Unlike other KGs, our framework allows fast retrieval of relevant texts supporting the relationships in the KG, thus allowing human reviewers to judge the reliability of the knowledge extracted.
Results
We demonstrated a case study of using the KGEV framework to perform research on COVID-19. The COVID-19 pandemic resulted in an explosion of relevant literature, making it challenging to make full use of the vast and heterogenous sources of information. We generated a COVID-19 KG with heterogenous information, including literature information from the CORD-19 dataset, as well as other existing knowledge from eight data sources. We showed the utility of KGEV in three intuitive case studies to explore and query knowledge on COVID-19. A demo of this web application can be accessed at http://covid19nlp.wglab.org. Finally, we also demonstrated a turn-key adaption of the KGEV framework to study clinical phenotypic presentation of human diseases by Human Phenotype Ontology (HPO), illustrating the versatility of the framework.
Conclusion
In an era of literature explosion, the KGEV framework can be applied to many emerging diseases to support structured navigation of the vast amount of newly published biomedical literature and other existing biological knowledge in various databases. It can be also used as a general-purpose tool to explore and query gene-phenotype-disease-drug relationships interactively.
Collapse
|
26
|
Austin-Tse CA, Jobanputra V, Perry DL, Bick D, Taft RJ, Venner E, Gibbs RA, Young T, Barnett S, Belmont JW, Boczek N, Chowdhury S, Ellsworth KA, Guha S, Kulkarni S, Marcou C, Meng L, Murdock DR, Rehman AU, Spiteri E, Thomas-Wilson A, Kearney HM, Rehm HL. Best practices for the interpretation and reporting of clinical whole genome sequencing. NPJ Genom Med 2022; 7:27. [PMID: 35395838 PMCID: PMC8993917 DOI: 10.1038/s41525-022-00295-z] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 02/17/2022] [Indexed: 01/19/2023] Open
Abstract
Whole genome sequencing (WGS) shows promise as a first-tier diagnostic test for patients with rare genetic disorders. However, standards addressing the definition and deployment practice of a best-in-class test are lacking. To address these gaps, the Medical Genome Initiative, a consortium of leading health care and research organizations in the US and Canada, was formed to expand access to high quality clinical WGS by convening experts and publishing best practices. Here, we present best practice recommendations for the interpretation and reporting of clinical diagnostic WGS, including discussion of challenges and emerging approaches that will be critical to harness the full potential of this comprehensive test.
Collapse
Affiliation(s)
- Christina A Austin-Tse
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA. .,Laboratory for Molecular Medicine, Mass General Brigham Personalized Medicine, Cambridge, MA, USA. .,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Vaidehi Jobanputra
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA.,Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY, USA
| | | | - David Bick
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Eric Venner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ted Young
- Genome Diagnostics, Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, ON, Canada
| | - Sarah Barnett
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | | | - Nicole Boczek
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine, College of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | | | - Saurav Guha
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA
| | - Shashikant Kulkarni
- Baylor Genetics and Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Cherisse Marcou
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Linyan Meng
- Baylor Genetics and Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David R Murdock
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Atteeq U Rehman
- Molecular Diagnostics Laboratory, New York Genome Center, New York, NY, USA
| | - Elizabeth Spiteri
- Department of Pathology, Stanford Medicine, Stanford University, Stanford, CA, USA
| | | | - Hutton M Kearney
- Division of Laboratory Genetics and Genomics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | |
Collapse
|
27
|
Link NB, Huang S, Cai T, Sun J, Dahal K, Costa L, Cho K, Liao K, Cai T, Hong C. Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. Int J Med Inform 2022; 162:104753. [PMID: 35405530 DOI: 10.1016/j.ijmedinf.2022.104753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/05/2023]
Abstract
OBJECTIVE The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.
Collapse
Affiliation(s)
- Nicholas B Link
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
| | - Sicong Huang
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Jiehuan Sun
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Kumar Dahal
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Lauren Costa
- VA Boston Healthcare System, Boston, MA, United States
| | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, United States
| | - Katherine Liao
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianxi Cai
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Chuan Hong
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
28
|
Slavotinek A, Prasad H, Yip T, Rego S, Hoban H, Kvale M. Predicting genes from phenotypes using human phenotype ontology (HPO) terms. Hum Genet 2022; 141:1749-1760. [PMID: 35357580 DOI: 10.1007/s00439-022-02449-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 03/16/2022] [Indexed: 11/28/2022]
Abstract
The interpretation of genomic variants following whole exome sequencing (WES) can be aided using human phenotype ontology (HPO) terms to standardize clinical features and predict causative genes. We performed WES on 453 patients diagnosed prior to 18 years of age and identified 114 pathogenic (P) or likely pathogenic (LP) variants in 112 patients. We utilized PhenoDB to extract HPO terms from provider notes and then used Phen2Gene to generate a gene score and gene ranking from each list of HPO terms. We assigned Phen2Gene gene rankings to 6 rank classes, with class 1 covering raw gene rankings of 1 to 10 and class 2 covering rankings from 11 to 50 out of a total of 17,126 possible gene rankings. Phen2Gene ranked causative genes into rank class 1 or 2 in 27.7% of cases and the genes in rank class 1 were all associated with well-characterized phenotypes. We found significant associations between the gene score and the number of years, since the gene was first published, the number of HPO terms with an hierarchical depth greater or equal to 11, and the number of Online Mendelian Inheritance in Man terms associated with the phenotype and gene. We conclude that genes associated with recognizable phenotypes and terms deep in the HPO hierarchy have the best chance of producing a high gene score and ranking in class 1 to 2 using Phen2Gene software with HPO terms. Clinicians and laboratory staff should consider these results when HPO terms are employed to prioritize candidate genes.
Collapse
Affiliation(s)
- Anne Slavotinek
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA.
| | - Hannah Prasad
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Tiffany Yip
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Shannon Rego
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Hannah Hoban
- Division of Genetics, Department of Pediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Mark Kvale
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
29
|
Yuan X, Wang J, Dai B, Sun Y, Zhang K, Chen F, Peng Q, Huang Y, Zhang X, Chen J, Xu X, Chuan J, Mu W, Li H, Fang P, Gong Q, Zhang P. Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases. Brief Bioinform 2022; 23:6521702. [PMID: 35134823 PMCID: PMC8921623 DOI: 10.1093/bib/bbac019] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 01/10/2022] [Accepted: 01/13/2022] [Indexed: 12/31/2022] Open
Abstract
It’s challenging work to identify disease-causing genes from the next-generation sequencing (NGS) data of patients with Mendelian disorders. To improve this situation, researchers have developed many phenotype-driven gene prioritization methods using a patient’s genotype and phenotype information, or phenotype information only as input to rank the candidate’s pathogenic genes. Evaluations of these ranking methods provide practitioners with convenience for choosing an appropriate tool for their workflows, but retrospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate. In this research, the performance of ten recognized causal-gene prioritization methods was benchmarked using 305 cases from the Deciphering Developmental Disorders (DDD) project and 209 in-house cases via a relatively unbiased methodology. The evaluation results show that methods using Human Phenotype Ontology (HPO) terms and Variant Call Format (VCF) files as input achieved better overall performance than those using phenotypic data alone. Besides, LIRICAL and AMELIE, two of the best methods in our benchmark experiments, complement each other in cases with the causal genes ranked highly, suggesting a possible integrative approach to further enhance the diagnostic efficiency. Our benchmarking provides valuable reference information to the computer-assisted rapid diagnosis in Mendelian diseases and sheds some light on the potential direction of future improvement on disease-causing gene prioritization methods.
Collapse
Affiliation(s)
- Xiao Yuan
- Changsha KingMed Center for Clinical Laboratory, Changsha, China.,Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China.,Genetalks Biotech. Co., Ltd., Changsha, China
| | - Jing Wang
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Bing Dai
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Yanfang Sun
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Keke Zhang
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Fangfang Chen
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Qian Peng
- Changsha KingMed Center for Clinical Laboratory, Changsha, China
| | - Yixuan Huang
- Beijing Geneworks Technology Co., Ltd., Beijing, China
| | - Xinlei Zhang
- Reproductive & Genetics Hospital of Citic & Xiangya, Changsha, China
| | - Junru Chen
- Genetalks Biotech. Co., Ltd., Changsha, China
| | - Xilin Xu
- Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Jun Chuan
- Changsha KingMed Center for Clinical Laboratory, Changsha, China.,Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Wenbo Mu
- Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Huiyuan Li
- Changsha KingMed Center for Clinical Laboratory, Changsha, China.,Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Ping Fang
- Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Qiang Gong
- Changsha KingMed Center for Clinical Laboratory, Changsha, China.,Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China
| | - Peng Zhang
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| |
Collapse
|
30
|
He Q, Shen H, Shao X, Chen W, Wu Y, Liu R, Li S, Zhou Z. Cardiovascular Phenotypes Profiling for L-Transposition of the Great Arteries and Prognosis Analysis. Front Cardiovasc Med 2022; 8:781041. [PMID: 35127856 PMCID: PMC8814104 DOI: 10.3389/fcvm.2021.781041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 12/23/2021] [Indexed: 11/24/2022] Open
Abstract
Objectives Congenitally corrected transposition of the great arteries (ccTGA) is a rare and complex congenital heart disease with the characteristics of double discordance. Enormous co-existed anomalies are the culprit of prognosis evaluation and clinical decision. We aim at delineating a novel ccTGA clustering modality under human phenotype ontology (HPO) instruction and elucidating the relationship between phenotypes and prognosis in patients with ccTGA. Methods A retrospective review of 270 patients diagnosed with ccTGA in Fuwai hospital from 2009 to 2020 and cross-sectional follow-up were performed. HPO-instructed clustering method was administered in ccTGA risk stratification. Kaplan-Meier survival, Landmark analysis, and cox regression analysis were used to investigate the difference of outcomes among clusters. Results The median follow-up time was 4.29 (2.07–7.37) years. A total of three distinct phenotypic clusters were obtained after HPO-instructed clustering with 21 in cluster 1, 136 in cluster 2, and 113 in cluster 3. Landmark analysis revealed significantly worse mid-term outcomes in all-cause mortality (p = 0.021) and composite endpoints (p = 0.004) of cluster 3 in comparison with cluster 1 and cluster 2. Multivariate analysis indicated that pulmonary arterial hypertension (PAH), atrioventricular septal defect (AVSD), and arrhythmia were risk factors for composite endpoints. Moreover, the surgical treatment was significantly different among the three groups (p < 0.001) and surgical strategies had different effects on the prognosis of the different phenotypic clusters. Conclusions Human phenotype ontology-instructed clustering can be a potentially powerful tool for phenotypic risk stratification in patients with complex congenital heart diseases, which may improve prognosis prediction and clinical decision.
Collapse
Affiliation(s)
- Qiyu He
- Pediatric Cardiac Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Huayan Shen
- Department of Laboratory Medicine, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xinyang Shao
- Department of Laboratory Medicine, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wen Chen
- Department of Laboratory Medicine, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yafeng Wu
- Center for Applied Statistics, School of Statistics, Renmin University of China, Beijing, China
| | - Rui Liu
- Pediatric Cardiac Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shoujun Li
- Pediatric Cardiac Center, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Shoujun Li
| | - Zhou Zhou
- Department of Laboratory Medicine, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Zhou Zhou
| |
Collapse
|
31
|
Ahmed Z. Precision medicine with multi-omics strategies, deep phenotyping, and predictive analysis. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 190:101-125. [DOI: 10.1016/bs.pmbts.2022.02.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
32
|
Artificial Intelligence in Clinical Immunology. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_83] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
Slater LT, Karwath A, Hoehndorf R, Gkoutos GV. Effects of Negation and Uncertainty Stratification on Text-Derived Patient Profile Similarity. Front Digit Health 2021; 3:781227. [PMID: 34939069 PMCID: PMC8685209 DOI: 10.3389/fdgth.2021.781227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/12/2021] [Indexed: 11/13/2022] Open
Abstract
Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.
Collapse
Affiliation(s)
- Luke T Slater
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Andreas Karwath
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Georgios V Gkoutos
- Centre for Computational Biology, College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom.,University Hospitals Birmingham National Health Service Foundation Trust, Birmingham, United Kingdom.,MRC Health Data Research UK (HDR UK) Midlands, Birmingham, United Kingdom.,National Institute for Health Research Experimental Cancer Medicine Centre, Birmingham, United Kingdom.,National Institute for Health Research Surgical Reconstruction and Microbiology Research Centre, Birmingham, United Kingdom.,National Institute for Health Research Biomedical Research Centre, Birmingham, United Kingdom
| |
Collapse
|
34
|
De La Vega FM, Chowdhury S, Moore B, Frise E, McCarthy J, Hernandez EJ, Wong T, James K, Guidugli L, Agrawal PB, Genetti CA, Brownstein CA, Beggs AH, Löscher BS, Franke A, Boone B, Levy SE, Õunap K, Pajusalu S, Huentelman M, Ramsey K, Naymik M, Narayanan V, Veeraraghavan N, Billings P, Reese MG, Yandell M, Kingsmore SF. Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Med 2021; 13:153. [PMID: 34645491 PMCID: PMC8515723 DOI: 10.1186/s13073-021-00965-0] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 08/27/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Clinical interpretation of genetic variants in the context of the patient's phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed genome interpretation by integrating predictive methods with the growing knowledge of genetic disease. Here we assess the diagnostic performance of Fabric GEM, a new, AI-based, clinical decision support tool for expediting genome interpretation. METHODS We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole-genome or whole-exome sequencing (WGS, WES). We replicated our analyses in a separate cohort of 60 cases collected from five academic medical centers. For comparison, we also analyzed these cases with current state-of-the-art variant prioritization tools. Included in the comparisons were trio, duo, and singleton cases. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted from clinical notes by two means: manually and using an automated clinical natural language processing (CNLP) tool. Finally, 14 previously unsolved cases were reanalyzed. RESULTS GEM ranked over 90% of the causal genes among the top or second candidate and prioritized for review a median of 3 candidate genes per case, using either manually curated or CNLP-derived phenotype descriptions. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top candidate and in 19/20 within the top five, irrespective of whether SV calls were provided or inferred ab initio by GEM using its own internal SV detection algorithm. GEM showed similar performance in absence of parental genotypes. Analysis of 14 previously unsolved cases resulted in a novel finding for one case, candidates ultimately not advanced upon manual review for 3 cases, and no new findings for 10 cases. CONCLUSIONS GEM enabled diagnostic interpretation inclusive of all variant types through automated nomination of a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing cost and expediting case review.
Collapse
Affiliation(s)
- Francisco M. De La Vega
- Fabric Genomics Inc., Oakland, CA USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA USA
- Current Address: Tempus Labs Inc., Redwood City, CA 94065 USA
| | - Shimul Chowdhury
- Rady Children’s Institute for Genomic Medicine, San Diego, CA USA
| | - Barry Moore
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT USA
| | | | | | - Edgar Javier Hernandez
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT USA
| | - Terence Wong
- Rady Children’s Institute for Genomic Medicine, San Diego, CA USA
| | - Kiely James
- Rady Children’s Institute for Genomic Medicine, San Diego, CA USA
| | - Lucia Guidugli
- Rady Children’s Institute for Genomic Medicine, San Diego, CA USA
| | - Pankaj B. Agrawal
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA USA
- Division of Newborn Medicine, Boston Children’s Hospital, Boston, MA USA
| | - Casie A. Genetti
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA USA
| | - Catherine A. Brownstein
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA USA
| | - Alan H. Beggs
- Division of Genetics and Genomics, The Manton Center for Orphan Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA USA
| | - Britt-Sabina Löscher
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel & University Hospital Schleswig-Holstein, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel & University Hospital Schleswig-Holstein, Kiel, Germany
| | - Braden Boone
- HudsonAlpha Institute for Biotechnology, Huntsville, AL USA
| | - Shawn E. Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL USA
| | - Katrin Õunap
- Department of Clinical Genetics, United Laboratories, Tartu University Hospital, Tartu, Estonia
- Department of Clinical Genetics, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Sander Pajusalu
- Department of Clinical Genetics, United Laboratories, Tartu University Hospital, Tartu, Estonia
- Department of Clinical Genetics, Institute of Clinical Medicine, University of Tartu, Tartu, Estonia
| | - Matt Huentelman
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ USA
| | - Keri Ramsey
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ USA
| | - Marcus Naymik
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ USA
| | - Vinodh Narayanan
- Center for Rare Childhood Disorders, Translational Genomics Research Institute, Phoenix, AZ USA
| | | | | | | | - Mark Yandell
- Fabric Genomics Inc., Oakland, CA USA
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT USA
| | | |
Collapse
|
35
|
A data-driven architecture using natural language processing to improve phenotyping efficiency and accelerate genetic diagnoses of rare disorders. HGG ADVANCES 2021; 2. [PMID: 34514437 PMCID: PMC8432593 DOI: 10.1016/j.xhgg.2021.100035] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Effective genetic diagnosis requires the correlation of genetic variant data with detailed phenotypic information. However, manual encoding of clinical data into machine-readable forms is laborious and subject to observer bias. Natural language processing (NLP) of electronic health records has great potential to enhance reproducibility at scale but suffers from idiosyncrasies in physician notes and other medical records. We developed methods to optimize NLP outputs for automated diagnosis. We filtered NLP-extracted Human Phenotype Ontology (HPO) terms to more closely resemble manually extracted terms and identified filter parameters across a three-dimensional space for optimal gene prioritization. We then developed a tiered pipeline that reduces manual effort by prioritizing smaller subsets of genes to consider for genetic diagnosis. Our filtering pipeline enabled NLP-based extraction of HPO terms to serve as a sufficient replacement for manual extraction in 92% of prospectively evaluated cases. In 75% of cases, the correct causal gene was ranked higher with our applied filters than without any filters. We describe a framework that can maximize the utility of NLP-based phenotype extraction for gene prioritization and diagnosis. The framework is implemented within a cloud-based modular architecture that can be deployed across health and research institutions.
Collapse
|
36
|
Barbosa-Gouveia S, Vázquez-Mosquera ME, González-Vioque E, Álvarez JV, Chans R, Laranjeira F, Martins E, Ferreira AC, Avila-Alvarez A, Couce ML. Utility of Gene Panels for the Diagnosis of Inborn Errors of Metabolism in a Metabolic Reference Center. Genes (Basel) 2021; 12:1262. [PMID: 34440436 PMCID: PMC8391361 DOI: 10.3390/genes12081262] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/04/2021] [Accepted: 08/16/2021] [Indexed: 11/28/2022] Open
Abstract
Next-generation sequencing (NGS) technologies have been proposed as a first-line test for the diagnosis of inborn errors of metabolism (IEM), a group of genetically heterogeneous disorders with overlapping or nonspecific phenotypes. Over a 3-year period, we prospectively analyzed 311 pediatric patients with a suspected IEM using four targeted gene panels. The rate of positive diagnosis was 61.86% for intermediary metabolism defects, 32.84% for complex molecular defects, 19% for hypoglycemic/hyperglycemic events, and 17% for mitochondrial diseases, and a conclusive molecular diagnosis was established in 2-4 weeks. Forty-one patients for whom negative results were obtained with the mitochondrial diseases panel underwent subsequent analyses using the NeuroSeq panel, which groups all genes from the individual panels together with genes associated with neurological disorders (1870 genes in total). This achieved a diagnostic rate of 32%. We next evaluated the utility of a tool, Phenomizer, for differential diagnosis, and established a correlation between phenotype and molecular findings in 39.3% of patients. Finally, we evaluated the mutational architecture of the genes analyzed by determining z-scores, loss-of-function observed/expected upper bound fraction (LOEUF), and haploinsufficiency (HI) scores. In summary, targeted gene panels for specific groups of IEMs enabled rapid and effective diagnosis, which is critical for the therapeutic management of IEM patients.
Collapse
Affiliation(s)
- Sofia Barbosa-Gouveia
- Unit of Diagnosis and Treatment of Congenital Metabolic Diseases, Department of Paediatrics, IDIS-Health Research Institute of Santiago de Compostela, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), European Reference Network for Hereditary Metabolic Disorders (MetabERN), Santiago de Compostela University Clinical Hospital, 15704 Santiago de Compostela, Spain; (S.B.-G.); (M.E.V.-M.); (J.V.Á.); (R.C.)
| | - María E. Vázquez-Mosquera
- Unit of Diagnosis and Treatment of Congenital Metabolic Diseases, Department of Paediatrics, IDIS-Health Research Institute of Santiago de Compostela, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), European Reference Network for Hereditary Metabolic Disorders (MetabERN), Santiago de Compostela University Clinical Hospital, 15704 Santiago de Compostela, Spain; (S.B.-G.); (M.E.V.-M.); (J.V.Á.); (R.C.)
| | - Emiliano González-Vioque
- Department of Clinical Biochemistry, Puerta de Hierro-Majadahonda University Hospital, 28222 Majadahonda, Spain;
| | - José V. Álvarez
- Unit of Diagnosis and Treatment of Congenital Metabolic Diseases, Department of Paediatrics, IDIS-Health Research Institute of Santiago de Compostela, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), European Reference Network for Hereditary Metabolic Disorders (MetabERN), Santiago de Compostela University Clinical Hospital, 15704 Santiago de Compostela, Spain; (S.B.-G.); (M.E.V.-M.); (J.V.Á.); (R.C.)
| | - Roi Chans
- Unit of Diagnosis and Treatment of Congenital Metabolic Diseases, Department of Paediatrics, IDIS-Health Research Institute of Santiago de Compostela, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), European Reference Network for Hereditary Metabolic Disorders (MetabERN), Santiago de Compostela University Clinical Hospital, 15704 Santiago de Compostela, Spain; (S.B.-G.); (M.E.V.-M.); (J.V.Á.); (R.C.)
| | - Francisco Laranjeira
- Biochemical Genetics Unit, Centro de Genética Médica Doutor Jacinto Magalhães, 4050-466 Porto, Portugal;
| | - Esmeralda Martins
- Centro Materno-Infantil do Norte, Centro Hospitalar Universitário do Porto (CHUP), Coordinator of the Centro de Referência de Doenças Hereditárias do Metabolismo do CHUP, 4050-466 Porto, Portugal;
| | - Ana Cristina Ferreira
- Hospital D. Estefânia, Centro Hospitalar de Lisboa Central (CHLC), Coordinator of the Centro de Referência de Doenças Hereditárias do Metabolismo do CHLC, 1169-050 Lisboa, Portugal;
| | - Alejandro Avila-Alvarez
- Neonatology Unit, Pediatrics Department, Complexo Hospitalario Universitario de A Coruña, SERGAS, 15006 A Coruña, Spain;
| | - María L. Couce
- Unit of Diagnosis and Treatment of Congenital Metabolic Diseases, Department of Paediatrics, IDIS-Health Research Institute of Santiago de Compostela, Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), European Reference Network for Hereditary Metabolic Disorders (MetabERN), Santiago de Compostela University Clinical Hospital, 15704 Santiago de Compostela, Spain; (S.B.-G.); (M.E.V.-M.); (J.V.Á.); (R.C.)
| |
Collapse
|
37
|
Forrest IS, Chaudhary K, Vy HMT, Bafna S, Kim S, Won HH, Loos RJ, Cho J, Pasquale LR, Nadkarni GN, Rocheleau G, Do R. Genetic pleiotropy of ERCC6 loss-of-function and deleterious missense variants links retinal dystrophy, arrhythmia, and immunodeficiency in diverse ancestries. Hum Mutat 2021; 42:969-977. [PMID: 34005834 PMCID: PMC8295228 DOI: 10.1002/humu.24220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 04/27/2021] [Accepted: 05/15/2021] [Indexed: 11/08/2022]
Abstract
Biobanks with exomes linked to electronic health records (EHRs) enable the study of genetic pleiotropy between rare variants and seemingly disparate diseases. We performed robust clinical phenotyping of rare, putatively deleterious variants (loss-of-function [LoF] and deleterious missense variants) in ERCC6, a gene implicated in inherited retinal disease. We analyzed 213,084 exomes, along with a targeted set of retinal, cardiac, and immune phenotypes from two large-scale EHR-linked biobanks. In the primary analysis, a burden of deleterious variants in ERCC6 was strongly associated with (1) retinal disorders; (2) cardiac and electrocardiogram perturbations; and (3) immunodeficiency and decreased immunoglobulin levels. Meta-analysis of results from the BioMe Biobank and UK Biobank showed a significant association of deleterious ERCC6 burden with retinal dystrophy (odds ratio [OR] = 2.6, 95% confidence interval [CI]: 1.5-4.6; p = 8.7 × 10-4 ), atypical atrial flutter (OR = 3.5, 95% CI: 1.9-6.5; p = 6.2 × 10-5 ), arrhythmia (OR = 1.5, 95% CI: 1.2-2.0; p = 2.7 × 10-3 ), and lymphocyte immunodeficiency (OR = 3.8, 95% CI: 2.1-6.8; p = 5.0 × 10-6 ). Carriers of ERCC6 LoF variants who lacked a diagnosis of these conditions exhibited increased symptoms, indicating underdiagnosis. These results reveal a unique genetic link among retinal, cardiac, and immune disorders and underscore the value of EHR-linked biobanks in assessing the full clinical profile of carriers of rare variants.
Collapse
Affiliation(s)
- Iain S. Forrest
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Kumardeep Chaudhary
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ha My T. Vy
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shantanu Bafna
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Soyeon Kim
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Hong-Hee Won
- Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Ruth J.F. Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Judy Cho
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Louis R. Pasquale
- Department of Ophthalmology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Eye and Vision Research Institute, New York Eye and Ear Infirmary of Mount Sinai, New York, NY, USA
| | - Girish N. Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ghislain Rocheleau
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- BioMe Phenomics Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
38
|
Liu L, Zhu S. Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review. PHENOMICS (CHAM, SWITZERLAND) 2021; 1:171-185. [PMID: 36939789 PMCID: PMC9590544 DOI: 10.1007/s43657-021-00019-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/05/2021] [Accepted: 06/16/2021] [Indexed: 12/01/2022]
Abstract
Deciphering the relationship between human proteins (genes) and phenotypes is one of the fundamental tasks in phenomics research. The Human Phenotype Ontology (HPO) builds upon a standardized logical vocabulary to describe the abnormal phenotypes encountered in human diseases and paves the way towards the computational analysis of their genetic causes. To date, many computational methods have been proposed to predict the HPO annotations of proteins. In this paper, we conduct a comprehensive review of the existing approaches to predicting HPO annotations of novel proteins, identifying missing HPO annotations, and prioritizing candidate proteins with respect to a certain HPO term. For each topic, we first give the formalized description of the problem, and then systematically revisit the published literatures highlighting their advantages and disadvantages, followed by the discussion on the challenges and promising future directions. In addition, we point out several potential topics to be worthy of exploration including the selection of negative HPO annotations and detecting HPO misannotations. We believe that this review will provide insight to the researchers in the field of computational phenotype analyses in terms of comprehending and developing novel prediction algorithms.
Collapse
Affiliation(s)
- Lizhi Liu
- School of Computer Science, Fudan University, Shanghai, 200433 China
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433 China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433 China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, 200433 China
| |
Collapse
|
39
|
Clinical Phenotypic Spectrum of 4095 Individuals with Down Syndrome from Text Mining of Electronic Health Records. Genes (Basel) 2021; 12:genes12081159. [PMID: 34440331 PMCID: PMC8393657 DOI: 10.3390/genes12081159] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 07/25/2021] [Accepted: 07/26/2021] [Indexed: 12/30/2022] Open
Abstract
Human genetic disorders, such as Down syndrome, have a wide variety of clinical phenotypic presentations, and characterizing each nuanced phenotype and subtype can be difficult. In this study, we examined the electronic health records of 4095 individuals with Down syndrome at the Children’s Hospital of Philadelphia to create a method to characterize the phenotypic spectrum digitally. We extracted Human Phenotype Ontology (HPO) terms from quality-filtered patient notes using a natural language processing (NLP) approach MetaMap. We catalogued the most common HPO terms related to Down syndrome patients and compared the terms with those from a baseline population. We characterized the top 100 HPO terms by their frequencies at different ages of clinical visits and highlighted selected terms that have time-dependent distributions. We also discovered phenotypic terms that have not been significantly associated with Down syndrome, such as “Proptosis”, “Downslanted palpebral fissures”, and “Microtia”. In summary, our study demonstrated that the clinical phenotypic spectrum of individual with Mendelian diseases can be characterized through NLP-based digital phenotyping on population-scale electronic health records (EHRs).
Collapse
|
40
|
Quinonez SC, Terefework Z. The introduction of clinical genetic testing in Ethiopia: Experiences and lessons learned. Am J Med Genet A 2021; 185:2995-3004. [PMID: 34169623 DOI: 10.1002/ajmg.a.62396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/22/2021] [Accepted: 05/31/2021] [Indexed: 11/08/2022]
Abstract
Limited data are available on genetic testing laboratories in low- and middle-income countries including those in sub-Saharan Africa (SSA). To characterize the need for genetic testing in SSA we describe the experience of MRC-ET Advanced Laboratory, a genetic testing laboratory in Ethiopia. Test results were analyzed based on indication(s) for testing, referral category, and diagnostic yield. A total of 1311 tests were run using the full MRC-Holland catalogue of Multiplex-Ligation Probe Amplification assays. Of all samples, 77% were postnatal samples, 15% products of conception (POC), and 8% amniotic samples. Of postnatal samples, the most common testing categories were multiple congenital anomalies (32%), disorders of sex development (17%), and Obstetrics/Gynecology (16%). Forty-three percent of postnatal samples were diagnostic, 11% were variants of uncertain significance (VUS), and 46% were normal with Trisomy 21 the most common diagnosis. Of POC samples, 10% were diagnostic, 34% revealed VUSs, and 55% were normal with Trisomy 18 the most common diagnosis. Of amniotic samples 17.5% were diagnostic, 3% revealed VUSs, and 79% were normal with Trisomy 18 the most common diagnosis. There is increasing demand for genetic testing in Ethiopia. Diagnostic genetic testing in SSA deserves increased attention as testing platforms become more affordable.
Collapse
Affiliation(s)
- Shane C Quinonez
- Division of Pediatric Genetics, Metabolism, and Genomic Medicine, Department of Pediatrics, Michigan Medicine, Ann Arbor, Michigan, USA.,Division of Genetic Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | | |
Collapse
|
41
|
Birgmeier J, Haeussler M, Deisseroth CA, Steinberg EH, Jagadeesh KA, Ratner AJ, Guturu H, Wenger AM, Diekhans ME, Stenson PD, Cooper DN, Ré C, Beggs AH, Bernstein JA, Bejerano G. AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature. Sci Transl Med 2021; 12:12/544/eaau9113. [PMID: 32434849 DOI: 10.1126/scitranslmed.aau9113] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 08/14/2019] [Accepted: 04/22/2020] [Indexed: 12/21/2022]
Abstract
The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient's disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient's given set of phenotypes. Diagnosis of singleton patients (without relatives' exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database-based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children's Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu.
Collapse
Affiliation(s)
- Johannes Birgmeier
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Maximilian Haeussler
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cole A Deisseroth
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ethan H Steinberg
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Karthik A Jagadeesh
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alexander J Ratner
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Harendra Guturu
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Aaron M Wenger
- Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA
| | - Mark E Diekhans
- Santa Cruz Genomics Institute, MS CBSE, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
| | - Christopher Ré
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Alan H Beggs
- Manton Center for Orphan Disease Research, Division of Genetics and Genomics, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA. .,Department of Pediatrics, Stanford School of Medicine, Stanford, CA 94305, USA.,Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
42
|
Slater K, Karwath A, Williams JA, Russell S, Makepeace S, Carberry A, Hoehndorf R, Gkoutos GV. Towards similarity-based differential diagnostics for common diseases. Comput Biol Med 2021; 133:104360. [PMID: 33836447 PMCID: PMC8204262 DOI: 10.1016/j.compbiomed.2021.104360] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/22/2021] [Accepted: 03/24/2021] [Indexed: 11/30/2022]
Abstract
Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.
Collapse
Affiliation(s)
- Karin Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK.
| | - Andreas Karwath
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - John A Williams
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Sophie Russell
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Silver Makepeace
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Alexander Carberry
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, University of Birmingham, UK; Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, UK; NIHR Experimental Cancer Medicine Centre, UK; NIHR Surgical Reconstruction and Microbiology Research Centre, UK; NIHR Biomedical Research Centre, UK; MRC Health Data Research UK (HDR UK) Midlands, UK; University Hospitals Birmingham NHS Foundation Trust, Edgbaston, Birmingham, UK
| |
Collapse
|
43
|
Havrilla JM, Liu C, Dong X, Weng C, Wang K. PhenCards: a data resource linking human phenotype information to biomedical knowledge. Genome Med 2021; 13:91. [PMID: 34034817 PMCID: PMC8147460 DOI: 10.1186/s13073-021-00909-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/13/2021] [Indexed: 02/07/2023] Open
Abstract
We present PhenCards ( https://phencards.org ), a database and web server intended as a one-stop shop for previously disconnected biomedical knowledge related to human clinical phenotypes. Users can query human phenotype terms or clinical notes. PhenCards obtains relevant disease/phenotype prevalence and co-occurrence, drug, procedural, pathway, literature, grant, and collaborator data. PhenCards recommends the most probable genetic diseases and candidate genes based on phenotype terms from clinical notes. PhenCards facilitates exploration of phenotype, e.g., which drugs cause or are prescribed for patient symptoms, which genes likely cause specific symptoms, and which comorbidities co-occur with phenotypes.
Collapse
Affiliation(s)
- James M Havrilla
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Xiangchen Dong
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA.
| |
Collapse
|
44
|
Lewis-Smith D, Galer PD, Balagura G, Kearney H, Ganesan S, Cosico M, O'Brien M, Vaidiswaran P, Krause R, Ellis CA, Thomas RH, Robinson PN, Helbig I. Modeling seizures in the Human Phenotype Ontology according to contemporary ILAE concepts makes big phenotypic data tractable. Epilepsia 2021; 62:1293-1305. [PMID: 33949685 PMCID: PMC8272408 DOI: 10.1111/epi.16908] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/19/2021] [Accepted: 04/01/2021] [Indexed: 01/08/2023]
Abstract
Objective: The clinical features of epilepsy determine how it is defined, which in turn guides management. Therefore, consideration of the fundamental clinical entities that comprise an epilepsy is essential in the study of causes, trajectories, and treatment responses. The Human Phenotype Ontology (HPO) is used widely in clinical and research genetics for concise communication and modeling of clinical features, allowing extracted data to be harmonized using logical inference. We sought to redesign the HPO seizure subontology to improve its consistency with current epileptological concepts, supporting the use of large clinical data sets in high-throughput clinical and research genomics. Methods: We created a new HPO seizure subontology based on the 2017 International League Against Epilepsy (ILAE) Operational Classification of Seizure Types, and integrated concepts of status epilepticus, febrile, reflex, and neonatal seizures at different levels of detail. We compared the HPO seizure subontology prior to, and following, our revision, according to the information that could be inferred about the seizures of 791 individuals from three independent cohorts: 2 previously published and 150 newly recruited individuals. Each cohort’s data were provided in a different format and harmonized using the two versions of the HPO. Results: The new seizure subontology increased the number of descriptive concepts for seizures 5-fold. The number of seizure descriptors that could be annotated to the cohort increased by 40% and the total amount of information about individuals’ seizures increased by 38%. The most important qualitative difference was the relationship of focal to bilateral tonic-clonic seizure to generalized-onset and focal-onset seizures.
Collapse
Affiliation(s)
- David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK.,Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Peter D Galer
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Ganna Balagura
- Medical Genetics Unit, IRCSS Giannina Gaslini Institute, Genoa, Italy
| | - Hugh Kearney
- FutureNeuro the SFI Research Centre for Chronic and Rare Neurological Diseases, Royal College of Surgeons in Ireland, Dublin, Ireland.,Department of Neurology, Beaumont Hospital, Dublin, Ireland
| | - Shiva Ganesan
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Mahgenn Cosico
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Margaret O'Brien
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Priya Vaidiswaran
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, Université du Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Colin A Ellis
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK.,Department of Clinical Neurosciences, Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.,Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
45
|
罗 芳, 李 昊. [Application of the artificial intelligence-rapid whole-genome sequencing diagnostic system in the neonatal/pediatric intensive care unit]. ZHONGGUO DANG DAI ER KE ZA ZHI = CHINESE JOURNAL OF CONTEMPORARY PEDIATRICS 2021; 23:433-437. [PMID: 34020729 PMCID: PMC8140348 DOI: 10.7499/j.issn.1008-8830.2012143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Accepted: 03/10/2021] [Indexed: 06/12/2023]
Abstract
Pediatric patients in the neonatal intensive care unit (NICU) and the pediatric intensive care unit (PICU) have a high incidence rate of genetic diseases, and early rapid etiological diagnosis and targeted interventions can help to reduce mortality or improve prognosis. Whole-genome sequencing covers more comprehensive information including point mutation, copy number, and structural and rearrangement variations in the intron region and has become one of the powerful diagnostic tools for genetic diseases. Sequencing data require highly professional judgment and interpretation and are returned for clinical application after several weeks, which cannot meet the need for the diagnosis and treatment of genetic diseases in children. This article introduces the clinical application of rapid whole-genome sequencing in the NICU/PICU and briefly describes related techniques of artificial intelligence-rapid whole-genome sequencing diagnostic system, a rapid high-throughput automated platform for the diagnosis of genetic diseases. The diagnostic system introduces artificial intelligence into the processing of data after whole-genome sequencing and can solve the problems of long time and professional interpretation required for routine genome sequencing and provide a rapid diagnostic regimen for critically ill children suspected of genetic diseases within 24 hours, and therefore, it holds promise for clinical application.
Collapse
Affiliation(s)
- 芳 罗
- 浙江大学医学院附属第一医院儿科, 浙江杭州 310003Department of Pediatrics, First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, China
| | - 昊旻 李
- 浙江大学医学院附属儿童医院, 浙江杭州 310052
| |
Collapse
|
46
|
Abstract
PURPOSE OF REVIEW The current review seeks to provide a comprehensive update on the revolutionary technology of whole exome sequencing (WES) which has been used to interrogate abnormal foetal phenotypes since the last few years, and is changing the paradigms of prenatal diagnosis, facilitating accurate genetic diagnosis and optimal management of pregnancies affected with foetal abnormalities, as well enabling delineation of novel Mendelian disorders. RECENT FINDINGS WES has contributed to identification of more than 1000 Mendelian genes and made rapid strides into clinical diagnostics in recent years. Diagnostic yield of WES in postnatal cohorts has ranged from 25 to 50%, and this test is now a first tier investigation for various clinical presentations. Various abnormal perinatal phenotypes have also been investigated using WES since 2014, with diagnostic yields ranging from 8.5 to 80%. Studies in foetal phenotypes have been challenging and guidelines in this cohort are still evolving. SUMMARY WES has proven to be a disrupting technology, enabling genetic diagnosis for pregnancies complicated by previously unexplained foetal abnormalities, and revealing a significant contribution of single gene disorders in these, thereby changing clinical diagnostic paradigms. The application of this technology in perinatal cohorts is also providing interesting insights into single gene defects presenting as previously unknown genetic syndromes, hence contributing to expansion of Mendelian genetics to encompass various foetal phenotypes.
Collapse
|
47
|
Crawford K, Xian J, Helbig KL, Galer PD, Parthasarathy S, Lewis-Smith D, Kaufman MC, Fitch E, Ganesan S, O'Brien M, Codoni V, Ellis CA, Conway LJ, Taylor D, Krause R, Helbig I. Computational analysis of 10,860 phenotypic annotations in individuals with SCN2A-related disorders. Genet Med 2021; 23:1263-1272. [PMID: 33731876 PMCID: PMC8257493 DOI: 10.1038/s41436-021-01120-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 11/10/2022] Open
Abstract
Purpose Pathogenic variants in SCN2A cause a wide range of neurodevelopmental phenotypes. Reports of genotype–phenotype correlations are often anecdotal, and the available phenotypic data have not been systematically analyzed. Methods We extracted phenotypic information from primary descriptions of SCN2A-related disorders in the literature between 2001 and 2019, which we coded in Human Phenotype Ontology (HPO) terms. With higher-level phenotype terms inferred by the HPO structure, we assessed the frequencies of clinical features and investigated the association of these features with variant classes and locations within the NaV1.2 protein. Results We identified 413 unrelated individuals and derived a total of 10,860 HPO terms with 562 unique terms. Protein-truncating variants were associated with autism and behavioral abnormalities. Missense variants were associated with neonatal onset, epileptic spasms, and seizures, regardless of type. Phenotypic similarity was identified in 8/62 recurrent SCN2A variants. Three independent principal components accounted for 33% of the phenotypic variance, allowing for separation of gain-of-function versus loss-of-function variants with good performance. Conclusion Our work shows that translating clinical features into a computable format using a standardized language allows for quantitative phenotype analysis, mapping the phenotypic landscape of SCN2A-related disorders in unprecedented detail and revealing genotype–phenotype correlations along a multidimensional spectrum.
Collapse
Affiliation(s)
- Katherine Crawford
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Genetic Counseling, Arcadia University, Glenside, PA, USA
| | - Julie Xian
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Neuroscience Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Katherine L Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Peter D Galer
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shridhar Parthasarathy
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biology, The College of New Jersey, Ewing Township, NJ, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne, UK.,Royal Victoria Infirmary, Newcastle-upon-Tyne, UK
| | - Michael C Kaufman
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Eryn Fitch
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shiva Ganesan
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Margaret O'Brien
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Veronica Codoni
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Colin A Ellis
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Laura J Conway
- Genetic Counseling, Arcadia University, Glenside, PA, USA
| | - Deanne Taylor
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA. .,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA. .,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA. .,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
48
|
Rostam Niakan Kalhori S, Tanhapour M, Gholamzadeh M. Enhanced childhood diseases treatment using computational models: Systematic review of intelligent experiments heading to precision medicine. J Biomed Inform 2021; 115:103687. [PMID: 33497811 DOI: 10.1016/j.jbi.2021.103687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/05/2020] [Accepted: 01/18/2021] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Precision or personalized Medicine (PM) is used for the prevention and treatment of diseases by considering a huge amount of information about individuals variables. Due to high volume of information, AI-based computational models are required. A large set of studies conducted to examine the PM approach to improve childhood clinical outcomes. Thus, the main goal of this study was to review the application of health information technology and especially artificial intelligence (AI) methods for the treatment of childhood disease using PM. METHODS PubMed, Scopus, Web of Science, and EMBASE databases were searched up to December 18, 2019. Articles that focused on informatics applications for childhood disease PM included in this study. Included papers were classified for qualitative analysis and interpreting results. The results were analyzed using Microsoft Excel 2019. RESULTS From 341 citations, 62 papers met our inclusion criteria. The number of published papers that used AI methods to apply for PM in childhood diseases increased from 2010 to 2019. Our results showed that most applied methods were related to machine learning discipline. In terms of clinical scope, the largest number of clinical articles are devoted to oncology. Besides, the analysis showed that genomics was the most PM approach used regarding childhood disease. CONCLUSION This systematic review examined papers that used AI methods for applying PM approaches in childhood diseases from medical informatics perspectives. Thus, it provided new insight to researchers who are interested in knowing research needs in this field.
Collapse
Affiliation(s)
- Sharareh Rostam Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mozhgan Tanhapour
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Marsa Gholamzadeh
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
49
|
Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, Danis D, Balagura G, Baynam G, Brower AM, Callahan TJ, Chute CG, Est JL, Galer PD, Ganesan S, Griese M, Haimel M, Pazmandi J, Hanauer M, Harris NL, Hartnett M, Hastreiter M, Hauck F, He Y, Jeske T, Kearney H, Kindle G, Klein C, Knoflach K, Krause R, Lagorce D, McMurry JA, Miller JA, Munoz-Torres M, Peters RL, Rapp CK, Rath AM, Rind SA, Rosenberg A, Segal MM, Seidel MG, Smedley D, Talmy T, Thomas Y, Wiafe SA, Xian J, Yüksel Z, Helbig I, Mungall CJ, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2021. Nucleic Acids Res 2021; 49:D1207-D1217. [PMID: 33264411 PMCID: PMC7778952 DOI: 10.1093/nar/gkaa1043] [Citation(s) in RCA: 501] [Impact Index Per Article: 167.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/11/2020] [Accepted: 11/16/2020] [Indexed: 12/21/2022] Open
Abstract
The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
Collapse
Affiliation(s)
| | - Michael Gargano
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Nicolas Matentzoglu
- Monarch Initiative
- Semanticly Ltd, London, UK
- European Bioinformatics Institute (EMBL-EBI)
| | - Leigh C Carmody
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Nicole A Vasilevsky
- Monarch Initiative
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University
| | | | - Ganna Balagura
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, and Maternal and Child Health, University of Genoa, Genoa, Italy
- Pediatric Neurology and Muscular Diseases Unit, IRCCS ‘G. Gaslini’ Institute, Genoa, Italy
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies, King Edward memorial Hospital, Perth, Australia
- Telethon Kids Institute and the Division of Paediatrics, Faculty of Helath and Medical Sciences, University of Western Australia, Perth, Australia
| | - Amy M Brower
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Colorado, USA
| | | | - Johanna L Est
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Peter D Galer
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shiva Ganesan
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Matthias Griese
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Matthias Haimel
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Julia Pazmandi
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | - Marc Hanauer
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Nomi L Harris
- Monarch Initiative
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA, USA
| | - Michael J Hartnett
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Maximilian Hastreiter
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Fabian Hauck
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- German Centre for Infection Research (DZIF), Munich, Germany
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Tim Jeske
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Hugh Kearney
- FutureNeuro, SFI Research Centre for Chronic and Rare Neurological Diseases, Ireland
| | - Gerhard Kindle
- Institute for Immunodeficiency, Center for Chronic Immunodeficiency (CCI). Faculty of Medicine, Medical Center - University of Freiburg, Freiburg, Germany
- Centre for Biobanking FREEZE, Faculty of Medicine, Medical Center - University of Freiburg, Freiburg, Germany
| | - Christoph Klein
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Katrin Knoflach
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367 Belvaux, Luxembourg
| | - David Lagorce
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Julie A McMurry
- Monarch Initiative
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Jillian A Miller
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Monica C Munoz-Torres
- Monarch Initiative
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Rebecca L Peters
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Christina K Rapp
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Ana M Rath
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Shahmir A Rind
- WA Register of Developmental Anomalies
- Curtin University, Western Australia, Australia
| | - Avi Z Rosenberg
- Division of Kidney-Urologic Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Markus G Seidel
- Research Unit for Pediatric Hematology and Immunology, Division of Pediatric Hemato-Oncology, Department of Pediatrics and Adolescent Medicine, Medical University of Graz, Graz, Austria
| | - Damian Smedley
- The William Harvey Research Institute, Charterhouse Square Barts and the London School of Medicine and Dentistry Queen Mary University of London, London EC1M 6BQ, UK
| | - Tomer Talmy
- Genomic Research Department, Emedgene Technologies, Tel Aviv, Israel
- Faculty of Medicine, Hebrew University Hadassah Medical School, Jerusalem, Israel
| | - Yarlalu Thomas
- West Australian Register of Developmental Anomalies, East Perth, WA, Australia
| | | | - Julie Xian
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, PA, USA
| | - Zafer Yüksel
- Human Genetics, Bioscientia GmbH, Ingelheim, Germany
| | - Ingo Helbig
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Christopher J Mungall
- Monarch Initiative
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA, USA
| | - Melissa A Haendel
- Monarch Initiative
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Peter N Robinson
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
50
|
Zhao Y, Weroha SJ, Goode EL, Liu H, Wang C. Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness. BMC Med Inform Decis Mak 2021; 21:3. [PMID: 33407429 PMCID: PMC7789545 DOI: 10.1186/s12911-020-01364-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 12/06/2020] [Indexed: 11/25/2022] Open
Abstract
Background Next-generation sequencing provides comprehensive information about individuals’ genetic makeup and is commonplace in oncology clinical practice. However, the utility of genetic information in the clinical decision-making process has not been examined extensively from a real-world, data-driven perspective. Through mining real-world data (RWD) from clinical notes, we could extract patients’ genetic information and further associate treatment decisions with genetic information. Methods We proposed a real-world evidence (RWE) study framework that incorporates context-based natural language processing (NLP) methods and data quality examination before final association analysis. The framework was demonstrated in a Foundation-tested women cancer cohort (N = 196). Upon retrieval of patients’ genetic information using NLP system, we assessed the completeness of genetic data captured in unstructured clinical notes according to a genetic data-model. We examined the distribution of different topics regarding BRCA1/2 throughout patients’ treatment process, and then analyzed the association between BRCA1/2 mutation status and the discussion/prescription of targeted therapy. Results We identified seven topics in the clinical context of genetic mentions including: Information, Evaluation, Insurance, Order, Negative, Positive, and Variants of unknown significance. Our rule-based system achieved a precision of 0.87, recall of 0.93 and F-measure of 0.91. Our machine learning system achieved a precision of 0.901, recall of 0.899 and F-measure of 0.9 for four-topic classification and a precision of 0.833, recall of 0.823 and F-measure of 0.82 for seven-topic classification. We found in result-containing sentences, the capture of BRCA1/2 mutation information was 75%, but detailed variant information (e.g. variant types) is largely missing. Using cleaned RWD, significant associations were found between BRCA1/2 positive mutation and targeted therapies. Conclusions In conclusion, we demonstrated a framework to generate RWE using RWD from different clinical sources. Rule-based NLP system achieved the best performance for resolving contextual variability when extracting RWD from unstructured clinical notes. Data quality issues such as incompleteness and discrepancies exist thus manual data cleaning is needed before further analysis can be performed. Finally, we were able to use cleaned RWD to evaluate the real-world utility of genetic information to initiate a prescription of targeted therapy.
Collapse
Affiliation(s)
- Yiqing Zhao
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Saravut J Weroha
- Division of Medical Oncology, Department of Oncology, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Ellen L Goode
- Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA
| | - Chen Wang
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 205 3rd Ave SW, Rochester, MN, 55905, USA.
| |
Collapse
|