1
|
Škorjanc A, Smrkolj V, Umek N. GOReverseLookup: A gene ontology reverse lookup tool. Comput Biol Med 2025; 191:110185. [PMID: 40239235 DOI: 10.1016/j.compbiomed.2025.110185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Revised: 03/27/2025] [Accepted: 04/08/2025] [Indexed: 04/18/2025]
Abstract
BACKGROUND AND OBJECTIVE The Gene Ontology (GO) project has been pivotal in providing a structured framework for characterizing genes and annotating them to specific biological concepts. While traditional gene annotation primarily focuses on mapping genes to GO terms, descriptors of biological concepts, there is a growing need for tools facilitating reverse querying. This paper introduces GOReverseLookup, a novel tool designed to identify over- or underrepresented genes in researcher-defined states of interest (phenotypes), described by sets of GO terms. GOReverseLookup supplements the existing power of Gene Ontology by the possibility of orthologous gene querying across several databases, such as Ensembl and UniProtKB. This combination allows for a more nuanced identification of significant genes across a range of cross-species research contexts. METHODS GOReverseLookup queries genes associated with input GO terms. Bundles of GO terms encapsulate user-defined states of interest, e.g., angiogenesis. In the second stage of the analysis, all GO terms associated with each gene are fetched, and finally, the statistical relevance of the genes being involved in one (or all) of the defined states of interests is computed. RESULTS The two presented use cases illustrate its utility in discovering genes related to rheumatoid arthritis and genes linked with chronic inflammation and tumorigenesis. In both cases, GOReverseLookup discovered a substantial number of genes significantly associated with the aforementioned states of interest. CONCLUSIONS GOReverseLookup proves to be a valuable resource for unraveling the genetic basis of phenotypes, with diverse practical potentials in functional genomics, systems biology, and drug discovery. We anticipate that GOReverseLookup will significantly aid in identifying potential gene targets during the initial research phases.
Collapse
Affiliation(s)
- Aljoša Škorjanc
- Institute of Anatomy, Faculty of Medicine, University of Ljubljana, Korytkova 2, Ljubljana, Slovenia
| | - Vladimir Smrkolj
- Institute of Anatomy, Faculty of Medicine, University of Ljubljana, Korytkova 2, Ljubljana, Slovenia; National Institute of Chemistry, Hajdrihova ulica 19, Ljubljana, Slovenia
| | - Nejc Umek
- Institute of Anatomy, Faculty of Medicine, University of Ljubljana, Korytkova 2, Ljubljana, Slovenia.
| |
Collapse
|
2
|
Zemet R, Parobek CM, Adams AD, Maktabi MA, Shay L, Meng L, Liu P, Dai H, Xia F, Eng C, Van den Veyver IB, Vossaert L. Diagnostic Yield of Exome Sequencing for Pregnancies With and Without Fetal Anomalies and for Stillbirth. Prenat Diagn 2025. [PMID: 40423626 DOI: 10.1002/pd.6817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Revised: 05/01/2025] [Accepted: 05/06/2025] [Indexed: 05/28/2025]
Abstract
OBJECTIVE Exome sequencing (ES) benefits the genetic work-up for fetuses with structural anomalies, but data on its utility for fetuses without anomalies and stillbirths is more limited. We report our experience with prenatal ES for all three indications. METHOD We retrospectively reviewed results from 344 trio-ES performed for fetuses with structural anomalies (N = 262), stillbirths (N = 39), and fetuses without anomalies (N = 43), many of which had a relevant family history. We classified pathogenic variants (P), likely pathogenic variants (LP), or variants of uncertain significance (VUS) favoring pathogenicity in a gene consistent with the fetal phenotype as diagnostic results. We used Fisher's exact test for statistical analysis. RESULTS Trio-ES provided a diagnosis for 93/262 (35.5%) fetuses with structural anomalies, with comparable yields for multiple and single anomalies (p = 0.81). A molecular diagnosis was made for 10/39 stillbirths (25.6%), of which all but one had structural anomalies, and 66.6% had multiple anomalies. In the absence of structural anomalies, one of 43 fetuses (2.3%) was found to have compound heterozygous pathogenic variants in ORC6 associated with Meier-Gorlin syndrome. CONCLUSION Prenatal trio-ES yields molecular diagnoses across a spectrum of indications. Larger studies are needed to further define the added benefits and challenges of diagnostic ES for fetuses without anomalies.
Collapse
Affiliation(s)
- Roni Zemet
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, Baylor College of Medicine, Houston, Texas, USA
| | - Christian M Parobek
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - April D Adams
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine and Reproductive and Prenatal Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Mohamad Ali Maktabi
- Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, Texas, USA
| | - Lena Shay
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine, Baylor College of Medicine, Houston, Texas, USA
| | - Linyan Meng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| | - Hongzheng Dai
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| | - Fan Xia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| | - Christine Eng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| | - Ignatia B Van den Veyver
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Obstetrics and Gynecology, Division of Maternal-Fetal Medicine and Reproductive and Prenatal Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Liesbeth Vossaert
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Baylor Genetics Laboratories, Houston, Texas, USA
| |
Collapse
|
3
|
Groza T, Rayabsri W, Gration D, Hariram H, Jamuar SS, Baynam G. First steps toward building natural history of diseases computationally: Lessons learned from the Noonan syndrome use case. Am J Hum Genet 2025; 112:1158-1172. [PMID: 40245863 PMCID: PMC12120186 DOI: 10.1016/j.ajhg.2025.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Revised: 03/20/2025] [Accepted: 03/21/2025] [Indexed: 04/19/2025] Open
Abstract
Rare diseases (RDs) are conditions affecting fewer than 1 in 2,000 people, with over 7,000 identified, primarily genetic in nature, and more than half impacting children. Although each RD affects a small population, collectively, between 3.5% and 5.9% of the global population, or 262.9-446.2 million people, live with an RD. Most RDs lack established treatment protocols, highlighting the need for proper care pathways addressing prognosis, diagnosis, and management. Advances in generative AI and large language models (LLMs) offer new opportunities to document the temporal progression of phenotypic features, addressing gaps in current knowledge bases. This study proposes an LLM-based framework to capture the natural history of diseases, specifically focusing on Noonan syndrome. The framework aims to document phenotypic trajectories, validate against RD knowledge bases, and integrate insights into care coordination using electronic health record (EHR) data from the Undiagnosed Diseases Program Singapore.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia; Bioinformatics Institute, Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street #07-01 Matrix, Singapore 138671, Singapore; SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore; School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Kent Street, Bentley, WA 6102, Australia.
| | - Warittha Rayabsri
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia
| | - Harshini Hariram
- Medical Student, Division of Medical Education, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, UK
| | - Saumya Shekhar Jamuar
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore; Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore 229899, Singapore; SingHealth Duke-NUS Genomic Medicine Centre, 100 Bukit Timah Road, Singapore 229899, Singapore
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia; Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia; Faculty of Health and Medical Sciences, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| |
Collapse
|
4
|
Nieto-Patlán A, Ross J, Mohan S, Paczosa MK, Soliman R, Sarmento O, Aliu E, Thiyagarajan L, Chandra A, Picard C, Warnatz K, Jolles S, Lesmana H, Maglione PJ, Platt CD, Sediva A, Sullivan KE, Zhang K, Raval F, Tangye SG, Abraham RS. Curation of gene-disease relationships in primary antibody deficiencies using the ClinGen validation framework. J Allergy Clin Immunol 2025; 155:1647-1663. [PMID: 39826876 DOI: 10.1016/j.jaci.2025.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 01/01/2025] [Accepted: 01/06/2025] [Indexed: 01/22/2025]
Abstract
BACKGROUND The Clinical Genome Resource (ClinGen) is an international collaborative effort among scientists and clinicians, diagnostic and research laboratories, and the patient community. Using a standardized framework, ClinGen has established guidelines to classify gene-disease relationships as definitive, strong, moderate, and limited on the basis of available scientific and clinical evidence. When the genetic and functional evidence for a gene-disease relationship has conflicting interpretations or contradictory evidence, they can be disputed or refuted. OBJECTIVE We assessed genes related to primary antibody deficiencies. METHODS The ClinGen Antibody Deficiencies Gene Curation Expert Panel, using the ClinGen framework, classified genes related to primary antibody deficiency that primarily affect B-cell development and/or function, and that account for the largest proportion of inborn errors of immunity or primary immunodeficiencies. RESULTS The expert panel curated a total of 65 genes associated with humoral immune defects to validate 74 gene-disease relationships. Of these, 40 were classified as definitive, 1 as strong, 16 as moderate, 15 as limited, and 2 as disputed. The curation process involved reviewing 490 patient records and 3546 associated human phenotype ontology entries. The 3 most frequently observed terms related to primary antibody deficiency were decreased circulating antibody level, pneumonia, and lymphadenopathy. CONCLUSIONS These curations (publicly available at ClinicalGenome.org) represent the first effort to provide a comprehensive genetic and phenotypic revision of genetic disorders affecting humoral immunity, as reviewed and approved by experts in the field.
Collapse
Affiliation(s)
- Alejandro Nieto-Patlán
- Department of Pediatrics, Baylor College of Medicine, Houston, Tex; Department of Allergy, Immunology and Rheumatology, Center for Human Immunobiology, Texas Children's Hospital, Houston, Tex; Departamento de Genética, Hospital Infantil de México Federico Gómez, Mexico City, Mexico
| | - Justyne Ross
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC
| | - Shruthi Mohan
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC
| | | | - Rasha Soliman
- Queen Mary University of London, London, United Kingdom
| | | | - Ermal Aliu
- Milton S. Hershey Medical Center, Hershey, Pa
| | - Lavvina Thiyagarajan
- Sydney Children's Hospitals Network, Sydney, Australia; School of Clinical Medicine, University of New South Wales, Sydney, Australia
| | - Anita Chandra
- Department of Clinical Immunology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom; Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Capucine Picard
- Université Paris Cité, Paris, France; Study Center for Primary Immunodeficiencies, Necker-Enfants Malades Hospital, Assistance Publique Hôpitaux de Paris (APHP), Paris, France; Laboratory of Lymphocyte Activation and Susceptibility to EBV infection, Inserm UMR 1163, Institut Imagine, Paris, France
| | - Klaus Warnatz
- Department of Rheumatology and Clinical Immunology, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Center for Chronic Immunodeficiency, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Department of Immunology, University Hospital Zurich, Zurich, Switzerland
| | - Stephen Jolles
- Immunodeficiency Centre for Wales, University Hospital of Wales, Cardiff, United Kingdom
| | - Harry Lesmana
- Department of Medical Genetics and Genomics, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio; Department of Pediatric Hematology, Oncology and BMT, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio
| | - Paul J Maglione
- Department of Medicine, Boston University Chobanian, and Avedisian School of Medicine, Boston, Mass
| | | | - Anna Sediva
- Motol University Hospital and the 2nd Faculty of Medicine, Charles University, Prague, Czech Republic
| | | | - Kejian Zhang
- GoBroad Healthcare Group, GoBroad Clinical Research Center, Boren Hospital, Beijing, China
| | | | - Stuart G Tangye
- Garvan Institute of Medical Research, Darlinghurst, Australia; School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Sydney, Australia
| | | |
Collapse
|
5
|
Malone Jenkins S, Palmquist RN, Moore B, Boyden SE, Nicholas TJ, Bayrak-Toydemir P, Mao R, Farrell JAR, Holt CH, Rynearson SG, Solorzano CM, Ward A, Best DH, Al-Sweel N, Bentley DL, Brunelli L, Chow CY, Close DW, Cormier MJ, Deshotel MJ, Durtschi J, Eide EJ, Floyd L, Fredrickson EK, Fulmer ML, Hernandez EJ, Kapron AL, Karren MA, Lewis RG, Miller CE, Murtaugh LC, Nicholson KE, Noble K, O'Fallon BD, O'Shea JM, Pattison DC, Pedersen BS, Petersen BJ, Peterson BD, Pizzo L, Reynolds HM, Rindler P, Torr CB, Wen T, Yost HJ, Zhao J, Yandell M, Marth GT, Quinlan AR, Carey JC, Shayota BJ, Tristani-Firouzi M, Bonkowsky JL. The Utah NeoSeq Project: a collaborative multidisciplinary program to facilitate genomic diagnostics in the neonatal intensive care unit. NPJ Genom Med 2025; 10:26. [PMID: 40121231 PMCID: PMC11929918 DOI: 10.1038/s41525-025-00483-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 02/28/2025] [Indexed: 03/25/2025] Open
Abstract
Rapid genomic diagnostics in the Neonatal Intensive Care Unit represents a paradigm shift in medicine with increasing evidence of the utility of early diagnosis, impacting management. The goal of the Utah NeoSeq Project was to implement and evaluate a multidisciplinary and longitudinal rapid sequencing program while transitioning to CLIA-certified sequencing. Enrollment of 65 infants resulted in 26 (40%) with a diagnostic variant(s) and 7 (11%) harboring a strong candidate. This includes re-analyses resulting in four additional diagnoses. Parental surveys indicated that 7% (4/59) of parents had a decisional conflict after consent, and 3% (2/59) experienced decisional regret after the results. Fifty-two provider surveys were conducted. Seventy-nine percent (41/52) of results and 86% (19/22) of diagnostic results were "very useful" or "useful" and associated with management changes. The NeoSeq Project demonstrates that a multidisciplinary collaborative approach to diagnosis is feasible. We have developed a generalizable, collaborative protocol that addresses the need for expedited genetic evaluation with emerging technologies.
Collapse
Affiliation(s)
- Sabrina Malone Jenkins
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA.
- Center for Personalized Medicine, Primary Children's Hospital, Intermountain Healthcare, Salt Lake City, UT, USA.
| | - Rachel N Palmquist
- Center for Personalized Medicine, Primary Children's Hospital, Intermountain Healthcare, Salt Lake City, UT, USA
- Division of Pediatric Neurology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Barry Moore
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Steven E Boyden
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Thomas J Nicholas
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Pinar Bayrak-Toydemir
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Rong Mao
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - J Andrew R Farrell
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Carson H Holt
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Shawn G Rynearson
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Chelsea M Solorzano
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Alistair Ward
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - D Hunter Best
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Najla Al-Sweel
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Dawn L Bentley
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Luca Brunelli
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Clement Y Chow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Devin W Close
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Michael J Cormier
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Malia J Deshotel
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Jacob Durtschi
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
| | - Erik J Eide
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Luaiva Floyd
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Eric K Fredrickson
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Makenzie L Fulmer
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
- University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Edgar J Hernandez
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Ashley L Kapron
- Center for Genomic Medicine, University of Utah, Salt Lake City, UT, USA
| | - Mary Anne Karren
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Robert G Lewis
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Christine E Miller
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
| | - L Charles Murtaugh
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Kelsey E Nicholson
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Katherine Noble
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Brendan D O'Fallon
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
| | - John M O'Shea
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - David C Pattison
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Brent S Pedersen
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Brandy J Petersen
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Bennet D Peterson
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Lucilla Pizzo
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | - Paul Rindler
- Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT, USA
| | - Carrie B Torr
- Division of Neonatology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
- Grant Scott Bonham Fetal Center at Primary Children's Hospital, Salt Lake City, UT, USA
- Center for Health Ethics, Arts and Humanities, University of Utah, Salt Lake City, UT, USA
| | - Ting Wen
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - H Joseph Yost
- Molecular Medicine Program, Department of Pediatrics and Department of Neurobiology, University of Utah, Salt Lake City, UT, USA
| | - Jian Zhao
- Institute for Research and Innovation, ARUP Laboratories, Salt Lake City, UT, USA
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Mark Yandell
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Gabor T Marth
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - John C Carey
- Division of Medical Genetics, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Brian J Shayota
- Division of Medical Genetics, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Martin Tristani-Firouzi
- Division of Cardiology, Department of Pediatrics, University of Utah School of Medicine, UT, Salt Lake City, USA
| | - Joshua L Bonkowsky
- Center for Personalized Medicine, Primary Children's Hospital, Intermountain Healthcare, Salt Lake City, UT, USA
- Division of Pediatric Neurology, Department of Pediatrics, University of Utah School of Medicine, Salt Lake City, UT, USA
| |
Collapse
|
6
|
Lambert J, Leutenegger AL, Baudot A, Jannot AS. Improving patient clustering by incorporating structured variable label relationships in similarity measures. BMC Med Res Methodol 2025; 25:72. [PMID: 40089699 PMCID: PMC11910865 DOI: 10.1186/s12874-025-02459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 01/03/2025] [Indexed: 03/17/2025] Open
Abstract
BACKGROUND Patient stratification is the cornerstone of numerous health investigations, serving to enhance the estimation of treatment efficacy and facilitating patient matching. To stratify patients, similarity measures between patients can be computed from clinical variables contained in medical health records. These variables have both values and labels structured in ontologies or other classification systems. The relevance of considering variable label relationships in the computation of patient similarity measures has been poorly studied. OBJECTIVE We adapt and evaluate several weighted versions of the Cosine similarity in order to consider structured label relationships to compute patient similarities from a medico-administrative database. MATERIALS AND METHODS As a use case, we clustered patients aged 60 years from their annual medicine reimbursements contained in the Échantillon Généraliste des Bénéficiaires, a random sample of a French medico-administrative database. We used four patient similarity measures: the standard Cosine similarity, a weighted Cosine similarity measure that includes variable frequencies and two weighted Cosine similarity measures that consider variable label relationships. We construct patient networks from each similarity measure and identify clusters of patients using the Markov Cluster algorithm. We evaluate the performance of the different similarity measures with enrichment tests based on patient diagnoses. RESULTS The weighted similarity measures that include structured variable label relationships perform better to identify similar patients. Indeed, using these weighted measures, we identify more clusters associated with different diagnose enrichment. Importantly, the enrichment tests provide clinically interpretable insights into these patient clusters. CONCLUSION Considering label relationships when computing patient similarities improves stratification of patients regarding their health status.
Collapse
Affiliation(s)
- Judith Lambert
- Sorbonne Université, Université Paris Cité, INSERM, Centre de Recherche des Cordeliers, Paris, F-75006, France.
- HeKA, Inria Paris, Paris, F-75015, France.
- Aix Marseille Univ, INSERM, MMG, Marseille, UMR1251, France.
| | | | - Anaïs Baudot
- Aix Marseille Univ, INSERM, MMG, Marseille, UMR1251, France
- CNRS, Marseille, France
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Anne-Sophie Jannot
- HeKA, Inria Paris, Paris, F-75015, France
- Université Paris Cité, Sorbonne Université, INSERM, Centre de Recherche des Cordeliers, F-75006, Paris, France
- French National Rare Disease Registry (BNDMR), Greater Paris University Hospitals (AP-HP), Paris, France
| |
Collapse
|
7
|
Khang R, Lee H, Kim J, Moon D, Jang S, Lee E, Song Y, Ryu SW, Lee S, Han H, Kim S, Jang S, Sohn YB, Kim WS, Lee JE, Kim J, Cho Y, Lee BL, Lim HH, Kook H, Kang KS, Kwon S, Lee J, Seo GH, Oh SH, Cheon CK. Genome Sequencing of Rare Disease Patients Through the Korean Regional Rare Disease Diagnostic Support Program. Hum Mutat 2025; 2025:6096758. [PMID: 40226308 PMCID: PMC11987077 DOI: 10.1155/humu/6096758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 01/22/2025] [Indexed: 04/15/2025]
Abstract
Affecting fewer than 20,000 people as defined in South Korea, rare diseases pose significant diagnostic challenges due to their diverse manifestations and genetic heterogeneity. Genome sequencing (GS) offers a promising solution by enabling simultaneous screening for thousands of rare genetic disorders. This study explores the diagnostic utility and necessity of GS within the government-funded Korean Regional Rare Disease Diagnostic Support Program (KR-RDSP), a collaborative initiative involving 11 regional rare disease centers across Korea. The program was launched as a proof-of-concept study in 2023 to equip the genetic clinics with a diagnostic tool to expedite the diagnoses for rare disease patients who reside outside the urban Seoul region where diagnostic resources are limited. The study leveraged GS to diagnose a cohort of 400 patients exhibiting a wide spectrum of symptoms. The overall diagnostic yield was 36.3% (145/400), with 4.8% (7/145) of the diagnosed patients being reported with variants that could not have been identified by chromosomal microarray or exome sequencing (ES), highlighting the added value of comprehensive genomic analysis. The implementation of a centralized GS analysis system streamlined the diagnostic process, enabling timely reporting within a reasonable turnaround time of ≤ 35 days. Segregation analysis by Sanger sequencing played a crucial role in confirming or reclassifying variant pathogenicity by elucidating inheritance patterns. Here, we summarize diagnostic statistics from the 400 GS dataset gathered from June 2023 to December 2023 and show interesting and informative case examples that illustrate the diagnostic efficacy of GS, highlighting its ability to uncover elusive genetic etiologies and provide personalized treatment insights. The study also highlights the successful implementation of the program for the 11 regional rare disease centers across Korea with a practical workflow, comprehensive testing, comparable diagnostic yield to previous reports, and, most importantly, reasonable turnaround time.
Collapse
Affiliation(s)
- Rin Khang
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Hane Lee
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Jihye Kim
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Dongseok Moon
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Seokhui Jang
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Eugene Lee
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Yongjun Song
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Seung Woo Ryu
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Sohyun Lee
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Heonjong Han
- Research and Development Center, 3billion Inc., Seoul, Republic of Korea
| | - Sukwon Kim
- Research and Development Center, 3billion Inc., Seoul, Republic of Korea
| | - Sohyun Jang
- Research and Development Center, 3billion Inc., Seoul, Republic of Korea
| | - Young Bae Sohn
- Rare Disease Center of Southern Gyeonggi Region, Department of Medical Genetics, Ajou University Hospital, Ajou University School of Medicine, Suwon, Republic of Korea
| | - Won Seop Kim
- Rare Disease Center of Chungbuk Region, Department of Pediatrics, Chungbuk National University Hospital, Cheongju, Republic of Korea
| | - Ji-Eun Lee
- Rare Disease Center of Northwestern Gyeonggi Province, Department of Pediatrics, Inha University Hospital, Incheon, Republic of Korea
| | - Juwon Kim
- Rare Disease Center of Gangwon Region, Yonsei University Wonju College of Medicine, Wonju Severance Christian Hospital, Wonju, Republic of Korea
| | - Yonggon Cho
- Jeonbuk Regional Center for Rare Diseases, Department of Laboratory Medicine, Jeonbuk National University Hospital, Jeonju, Republic of Korea
| | - Bo Lyun Lee
- Rare Disease Center of Busan Region, Department of Pediatrics, Busan Paik Hospital, Inje University College of Medicine, Busan, Republic of Korea
| | - Han Hyuk Lim
- Rare Disease Center of Chungnam Region, Department of Pediatrics, Chungnam National University Hospital, Chungnam National University College of Medicine, Daejeon, Republic of Korea
| | - Hoon Kook
- Rare Disease Center of Chonnam Region, Department of Pediatrics, Chonnam National University Hwasun Hospital, Gwangju, Republic of Korea
| | - Ki-Soo Kang
- Rare Disease Center of Jeju Region, Department of Pediatrics, Jeju National University Hospital, Jeju National University College of Medicine, Jeju, Republic of Korea
| | - Soonhak Kwon
- Rare Disease Center for Daegu/Gyeongbuk Region and Department of Pediatrics, Kyungpook National University Children's Hospital and School of Medicine, Kyungpook National University, Daegu, Republic of Korea
| | - Jiwon Lee
- Division of Rare Disease Management, Korea Disease Control and Prevention Agency, Cheongju, Republic of Korea
| | - Go Hun Seo
- Medical Genetics Division, 3billion Inc., Seoul, Republic of Korea
| | - Seung Hwan Oh
- Department of Laboratory Medicine, Pusan National University Yangsan Hospital, Pusan National University School of Medicine, Yangsan, Republic of Korea
| | - Chong Kun Cheon
- Rare Disease Center of Gyeongnam Region, Department of Pediatrics, Pusan National University Children's Hospital, Pusan National University School of Medicine, Yangsan, Republic of Korea
| |
Collapse
|
8
|
Shear MA, Robinson PN, Sparks TN. Fetal imaging, phenotyping, and genomic testing in modern prenatal diagnosis. Best Pract Res Clin Obstet Gynaecol 2025; 98:102575. [PMID: 39740319 DOI: 10.1016/j.bpobgyn.2024.102575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 08/31/2024] [Accepted: 12/01/2024] [Indexed: 01/02/2025]
Abstract
Genetic tests available in the prenatal setting have expanded rapidly with next generation sequencing, and fetal imaging can detect a breadth of many structural and functional abnormalities. To identify a fetal genetic disease, deep phenotyping is increasingly important to generate a differential diagnosis, choose the most appropriate genetic tests, and inform the results of those tests. The Human Phenotype Ontology (HPO) organizes and defines the features of human disease to support deep phenotyping, and ongoing efforts are being made to improve the scope of the HPO to comprehensively include fetal phenotypes. There are important limitations of fetal phenotyping to understand, including ongoing structural development and limited knowledge of how many genetic diseases present uniquely in utero. This article provides an overview of the use of HPO terms and artificial intelligence in the approach to fetal phenotyping and genetic testing.
Collapse
Affiliation(s)
- Matthew A Shear
- Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, California, USA; Division of Medical Genetics, Department of Pediatrics, University of California, San Francisco, California, USA.
| | | | - Teresa N Sparks
- Division of Maternal-Fetal Medicine, Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, California, USA.
| |
Collapse
|
9
|
Bonde LD, Abdelrazek IM, Seif L, Alawi M, Matrawy K, Nabil K, Abdalla E, Kutsche K, Harms FL. Homozygous synonymous FAM111A variant underlies an autosomal recessive form of Kenny-Caffey syndrome. J Hum Genet 2025; 70:87-97. [PMID: 39501122 PMCID: PMC11762410 DOI: 10.1038/s10038-024-01301-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 10/11/2024] [Accepted: 10/16/2024] [Indexed: 11/08/2024]
Abstract
FAM111A (family with sequence similarity 111 member A) is a serine protease and removes covalent DNA-protein cross-links during DNA replication. Heterozygous gain-of-function variants in FAM111A cause skeletal dysplasias, such as the perinatal lethal osteocraniostenosis and the milder Kenny-Caffey syndrome (KCS). We report two siblings born to consanguineous parents with dysmorphic craniofacial features, postnatal growth retardation, ophthalmologic manifestations, hair and nail anomalies, and skeletal abnormalities such as thickened cortex and stenosis of the medullary cavity of the long bones suggestive of KCS. Using exome sequencing, a homozygous synonymous FAM111A variant, NM_001312909.2:c.81 G > A; p.Pro27=, that affects the last base of the exon and is predicted to alter FAM111A pre-mRNA splicing, was identified in both siblings. We identified aberrantly spliced FAM111A transcripts, reduced FAM111A mRNA levels, and near-complete absence of FAM111A protein in fibroblasts of both patients. After treatment of patient and control fibroblasts with different concentrations of camptothecin that induces covalent DNA-protein cross-links, we observed a tendency towards a reduced proportion of metabolically active cells in patient compared to control fibroblasts. However, under these culture conditions, we did not find consistent and statistically significant differences in cell cycle progression and apoptotic cell death between patient and control cells. Our findings show that FAM111A deficiency underlies an autosomal recessive form of FAM111A-related KCS. Based on our results and published data, we hypothesize that loss of FAM111A and FAM111A protease hyperactivity, as observed for gain-of-function patient-variant proteins, may converge on a similar pathomechanism underlying skeletal dysplasias.
Collapse
Affiliation(s)
- Loisa Dana Bonde
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Ibrahim M Abdelrazek
- Department of Human Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Lara Seif
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Malik Alawi
- Bioinformatics Core, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Khaled Matrawy
- Diagnostic Radiology and Medical Imaging Department, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Karim Nabil
- Department of Ophthalmology, Faculty of Medicine, Alexandria University, Alexandria, Egypt
| | - Ebtesam Abdalla
- Department of Human Genetics, Medical Research Institute, Alexandria University, Alexandria, Egypt
| | - Kerstin Kutsche
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Frederike Leonie Harms
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| |
Collapse
|
10
|
Mao X, Huang Y, Jin Y, Wang L, Chen X, Liu H, Yang X, Xu H, Luan X, Xiao Y, Feng S, Zhu J, Zhang X, Jiang R, Zhang S, Chen T. A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs. NPJ Digit Med 2025; 8:68. [PMID: 39875532 PMCID: PMC11775211 DOI: 10.1038/s41746-025-01452-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/15/2025] [Indexed: 01/30/2025] Open
Abstract
Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes a BERT-based natural language processing model to extract phenotypes from clinical texts in EHRs and employs five new diagnostic models for differential diagnoses of rare diseases. The AI system was developed and evaluated on diverse, multi-country rare disease datasets, comprising 2271 cases with 431 rare diseases. In 1936 test cases, PhenoBrain achieved an average predicted top-3 recall of 0.513 and a top-10 recall of 0.654, surpassing 13 leading prediction methods. In a human-computer study with 75 cases, PhenoBrain exhibited exceptional performance with a top-3 recall of 0.613 and a top-10 recall of 0.813, surpassing the performance of 50 specialist physicians and large language models like ChatGPT and GPT-4. Combining PhenoBrain's predictions with specialists increased the top-3 recall to 0.768, demonstrating its potential to enhance diagnostic accuracy in clinical workflows.
Collapse
Affiliation(s)
- Xiaohao Mao
- Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China
| | - Yu Huang
- Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China.
- Tencent Jarvis Lab, Shenzhen, China.
| | - Ye Jin
- Medical Research Center, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Lun Wang
- Department of Internal Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xuanzhong Chen
- Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China
| | - Honghong Liu
- Department of Internal Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xinglin Yang
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Haopeng Xu
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Xiaodong Luan
- State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ying Xiao
- Department of Geriatrics, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Siqin Feng
- Department of Cardiology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiahao Zhu
- Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China
| | - Xuegong Zhang
- Department of Automation & BNRist, Tsinghua University, Beijing, China
| | - Rui Jiang
- Department of Automation & BNRist, Tsinghua University, Beijing, China
| | - Shuyang Zhang
- Department of Cardiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| | - Ting Chen
- Department of Computer Science and Technology & Institute for Artificial Intelligence & BNRist, Tsinghua University, Beijing, China.
| |
Collapse
|
11
|
Han H, Seo GH, Hyun SI, Kwon K, Ryu SW, Khang R, Lee E, Kim J, Song Y, Jeong WC, Han J, Kim DW, Yang S, Lee S, Jang S, Lee J, Lee H. Exome sequencing of 18,994 ethnically diverse patients with suspected rare Mendelian disorders. NPJ Genom Med 2025; 10:6. [PMID: 39843441 PMCID: PMC11754811 DOI: 10.1038/s41525-024-00455-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Accepted: 12/04/2024] [Indexed: 01/24/2025] Open
Abstract
We investigated the effectiveness of exome sequencing (ES) in diagnosing ethnically diverse patients with rare genetic disorders. A total of 18,994 patients referred to a single reference laboratory for ES between 2020 and 2022 were studied for the diagnostic rate and factors influencing the diagnostic rate. The overall diagnostic rate was 31.8%. Dermatological disorders, skeletal disorders, and neurodevelopmental disorders disease categories, early age-of-onset, presence of consanguinity, and the presence of parental sequencing data were found to be correlated with a higher diagnostic rate. Nearly 68K variants were identified in our dataset at a higher frequency than that observed in gnomAD 4.0. Of these, 507 variants could be classified as likely benign, representing 0.04% of non-benign variants in ClinVar (507/1,433,904) and 0.20% of the non-benign ClinVar variants observed at least once in our cohort (507/276,777). The overall diagnostic rate is comparable to that observed in other large cohort studies with less diverse ethnic backgrounds.
Collapse
Affiliation(s)
| | | | - Seong-In Hyun
- 3billion, Inc., Seoul, South Korea
- Center for RNA Research, Institute for Basic Science, Seoul, South Korea
| | | | | | | | | | | | | | - Won Chan Jeong
- 3billion, Inc., Seoul, South Korea
- AI Research Center, Seegene Medical Foundation, Seoul, South Korea
| | | | - Dong-Wook Kim
- 3billion, Inc., Seoul, South Korea
- Graduate School of Science and Technology Policy, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
| | | | | | - Sohyun Jang
- 3billion, Inc., Seoul, South Korea
- Genolution, Seoul, South Korea
| | | | - Hane Lee
- 3billion, Inc., Seoul, South Korea.
| |
Collapse
|
12
|
Leist IC, Rivas-Torrubia M, Alarcón-Riquelme ME, Barturen G, Consortium PC, Gut IG, Rueda M. Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. BMC Bioinformatics 2024; 25:373. [PMID: 39633268 PMCID: PMC11616229 DOI: 10.1186/s12859-024-05993-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 11/19/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Phenotypic data comparison is essential for disease association studies, patient stratification, and genotype-phenotype correlation analysis. To support these efforts, the Global Alliance for Genomics and Health (GA4GH) established Phenopackets v2 and Beacon v2 standards for storing, sharing, and discovering genomic and phenotypic data. These standards provide a consistent framework for organizing biological data, simplifying their transformation into computer-friendly formats. However, matching participants using GA4GH-based formats remains challenging, as current methods are not fully compatible, limiting their effectiveness. RESULTS Here, we introduce Pheno-Ranker, an open-source software toolkit for individual-level comparison of phenotypic data. As input, it accepts JSON/YAML data exchange formats from Beacon v2 and Phenopackets v2 data models, as well as any data structure encoded in JSON, YAML, or CSV formats. Internally, the hierarchical data structure is flattened to one dimension and then transformed through one-hot encoding. This allows for efficient pairwise (all-to-all) comparisons within cohorts or for matching of a patient's profile in cohorts. Users have the flexibility to refine their comparisons by including or excluding terms, applying weights to variables, and obtaining statistical significance through Z-scores and p-values. The output consists of text files, which can be further analyzed using unsupervised learning techniques, such as clustering or multidimensional scaling (MDS), and with graph analytics. Pheno-Ranker's performance has been validated with simulated and synthetic data, showing its accuracy, robustness, and efficiency across various health data scenarios. A real data use case from the PRECISESADS study highlights its practical utility in clinical research. CONCLUSIONS Pheno-Ranker is a user-friendly, lightweight software for semantic similarity analysis of phenotypic data in Beacon v2 and Phenopackets v2 formats, extendable to other data types. It enables the comparison of a wide range of variables beyond HPO or OMIM terms while preserving full context. The software is designed as a command-line tool with additional utilities for CSV import, data simulation, summary statistics plotting, and QR code generation. For interactive analysis, it also includes a web-based user interface built with R Shiny. Links to the online documentation, including a Google Colab tutorial, and the tool's source code are available on the project home page: https://github.com/CNAG-Biomedical-Informatics/pheno-ranker .
Collapse
Affiliation(s)
- Ivo C Leist
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain
- Universitat de Barcelona (UB), Barcelona, Spain
| | - María Rivas-Torrubia
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
| | - Marta E Alarcón-Riquelme
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Guillermo Barturen
- Pfizer-University of Granada-Junta de Andalucía Centre for Genomics and Oncological Research, Granada, Spain
- Department of Genetics, Faculty of Science, University of Granada, 18071, Granada, Spain
- Bioinformatics Laboratory, Centro de Investigación Biomédica, Biotechnology Institute, PTS, Avda del Conocimiento S/N, 18100, Granada, Spain
| | | | - Ivo G Gut
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain
- Universitat de Barcelona (UB), Barcelona, Spain
| | - Manuel Rueda
- Centro Nacional de Análisis Genómico, C/Baldiri Reixac 4, 08028, Barcelona, Spain.
- Universitat de Barcelona (UB), Barcelona, Spain.
| |
Collapse
|
13
|
Tammen I, Mather M, Leeb T, Nicholas FW. Online Mendelian Inheritance in Animals (OMIA): a genetic resource for vertebrate animals. Mamm Genome 2024; 35:556-564. [PMID: 39143381 PMCID: PMC11522177 DOI: 10.1007/s00335-024-10059-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/01/2024] [Indexed: 08/16/2024]
Abstract
Online Mendelian Inheritance in Animals (OMIA) is a freely available curated knowledgebase that contains information and facilitates research on inherited traits and diseases in animals. For the past 29 years, OMIA has been used by animal geneticists, breeders, and veterinarians worldwide as a definitive source of information. Recent increases in curation capacity and funding for software engineering support have resulted in software upgrades and commencement of several initiatives, which include the enhancement of variant information and links to human data resources, and the introduction of ontology-based breed information and categories. We provide an overview of current information and recent enhancements to OMIA and discuss how we are expanding the integration of OMIA into other resources and databases via the use of ontologies and the adaptation of tools used in human genetics.
Collapse
Affiliation(s)
- Imke Tammen
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2006, Australia.
| | - Marius Mather
- Sydney Informatics Hub, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, 3001, Switzerland
| | - Frank W Nicholas
- Sydney School of Veterinary Science, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
14
|
Vela-Amieva M, Alcántara-Ortigoza MA, González-del Angel A, Fernández-Hernández L, Reyna-Fabián ME, Estandía-Ortega B, Guillén-López S, López-Mejía L, Belmont-Martínez L, Carrillo-Nieto RI, Ibarra-González I, Ryu SW, Lee H, Fernández-Lainez C. Concordance Between Biochemical and Molecular Diagnosis Obtained by WES in Mexican Patients with Inborn Errors of Intermediary Metabolism: Utility for Therapeutic Management. Int J Mol Sci 2024; 25:11722. [PMID: 39519275 PMCID: PMC11546494 DOI: 10.3390/ijms252111722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 10/18/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
Biochemical phenotyping has been the milestone for diagnosing and managing patients affected by inborn errors of intermediary metabolism (IEiM); however, identifying the genotype responsible for these monogenic disorders greatly contributes to achieving these goals. Herein, whole-exome sequencing (WES) was used to determine the genotypes of 95 unrelated Mexican pediatric patients suspected of having IEiM. They were classified into those bearing specific biochemical abnormalities (Group 1), and those presenting unspecific biochemical profiles (Group 2). The overall concordance between the initial biochemical diagnosis and final genotypic diagnoses was 72.6% (N = 69/95 patients), with the highest concordance achieved in Group 1 (91.3%, N = 63/69), whereas the concordance was limited in Group 2 (23.07%). This finding suggests that previous biochemical phenotyping correlated with the high WES diagnostic success. Concordance was high for urea cycle disorders (94.1%) and organic acid disorders (77.4%). The identified mutational spectrum comprised 83 IEiM-relevant variants (pathogenic, likely pathogenic, and variants of uncertain significance or VUS), including three novel ones, distributed among 29 different genes responsible for amino acid, organic acid, urea cycle, carbohydrate, and lipid disorders. Inconclusive WES results (7.3%, N = 7/95) relied on monoallelic pathogenic genotypes or those involving two VUS for autosomal-recessive IEiMs. A second monogenic disease was observed in 10.5% (N = 10/95) of the patients. According to the WES results, modifications in treatment had to be made in 33.6% (N = 32/95) of patients, mainly attributed to the presence of a second monogenic disease, or to an actionable trait. This study includes the largest cohort of Mexican patients to date with biochemically suspected IEiM who were genetically diagnosed through WES, underscoring its importance in medical management.
Collapse
Affiliation(s)
- Marcela Vela-Amieva
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | | | - Ariadna González-del Angel
- Laboratorio de Biología Molecular, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Liliana Fernández-Hernández
- Laboratorio de Biología Molecular, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Miriam Erandi Reyna-Fabián
- Laboratorio de Biología Molecular, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Bernardette Estandía-Ortega
- Laboratorio de Biología Molecular, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Sara Guillén-López
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Lizbeth López-Mejía
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Leticia Belmont-Martínez
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Rosa Itzel Carrillo-Nieto
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| | - Isabel Ibarra-González
- Unidad de Genética de la Nutrición, Instituto de Investigaciones Biomédicas, UNAM, Mexico City C.P. 04530, Mexico
| | | | - Hane Lee
- 3billion, Inc., Seoul 03161, Republic of Korea
| | - Cynthia Fernández-Lainez
- Laboratorio de Errores Innatos del Metabolismo y Tamiz, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City C.P. 04530, Mexico
| |
Collapse
|
15
|
Kim J, Wang K, Weng C, Liu C. Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease. Am J Hum Genet 2024; 111:2190-2202. [PMID: 39255797 PMCID: PMC11480789 DOI: 10.1016/j.ajhg.2024.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/12/2024] Open
Abstract
Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.
Collapse
Affiliation(s)
- Junyoung Kim
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
16
|
Badwal AK, Singh S. A comprehensive review on the current status of CRISPR based clinical trials for rare diseases. Int J Biol Macromol 2024; 277:134097. [PMID: 39059527 DOI: 10.1016/j.ijbiomac.2024.134097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 07/28/2024]
Abstract
A considerable fraction of population in the world suffers from rare diseases. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and its related Cas proteins offer a modern form of curative gene therapy for treating the rare diseases. Hereditary transthyretin amyloidosis, hereditary angioedema, duchenne muscular dystrophy and Rett syndrome are a few examples of such rare diseases. CRISPR/Cas9, for example, has been used in the treatment of β-thalassemia and sickle cell disease (Frangoul et al., 2021; Pavani et al., 2021) [1,2]. Neurological diseases such as Huntington's have also been focused in some studies involving CRISPR/Cas (Yang et al., 2017; Yan et al., 2023) [3,4]. Delivery of these biologicals via vector and non vector mediated methods depends on the type of target cells, characteristics of expression, time duration of expression, size of foreign genetic material etc. For instance, retroviruses find their applicability in case of ex vivo delivery in somatic cells due to their ability to integrate in the host genome. These have been successfully used in gene therapy involving X-SCID patients although, incidence of inappropriate activation has been reported. On the other hand, ex vivo gene therapy for β-thalassemia involved use of BB305 lentiviral vector for high level expression of CRISPR biological in HSCs. The efficacy and safety of these biologicals will decide their future application as efficient genome editing tools as they go forward in further stages of human clinical trials. This review focuses on CRISPR/Cas based therapies which are at various stages of clinical trials for treatment of rare diseases and the constraints and ethical issues associated with them.
Collapse
Affiliation(s)
- Amneet Kaur Badwal
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Mohali 160062, Punjab, India
| | - Sushma Singh
- Department of Biotechnology, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Mohali 160062, Punjab, India.
| |
Collapse
|
17
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
18
|
Swietlik EM, Fay M, Morrell NW. Exploring Diagnostic and Therapeutic Odyssey in Pulmonary Arterial Hypertension: Insights from In-Depth Semi-Structured Interviews. Respiration 2024; 104:26-39. [PMID: 39250896 DOI: 10.1159/000540556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Accepted: 07/20/2024] [Indexed: 09/11/2024] Open
Abstract
INTRODUCTION Establishing a diagnosis is paramount in medical practice as it shapes patients' experiences and guides treatment. Patients grappling with rare diseases face a triple challenge: prolonged diagnostic journeys, limited responses to existing therapies, and the absence of effective monitoring tools. Genetic diagnosis often provides crucial diagnostic and prognostic information, opening up possibilities for genotype-targeted treatments and facilitating counselling and relative testing. The NIHR BioResource - Rare Diseases (NBR) Study and the Cohort Study in Idiopathic and Hereditary Pulmonary Arterial Hypertension (PAH Cohort study) aimed to enhance diagnosis and treatment for PAH, successfully identifying the genetic cause in 25% of idiopathic cases. However, the diagnostic and therapeutic odyssey in patients with PAH remains largely unexplored. METHODS Stakeholders from the NBR and PAH Cohort studies were recruited using purposive sampling. In-depth interviews and focus groups were recorded, transcribed, anonymised, and analysed thematically using MAXQDA software. RESULTS The study involved 53 interviews and focus groups with 63 participants, revealing key themes across five stages of the diagnostic odyssey: initial health concerns and interactions with general practitioners, experiences of misdiagnosis, relief upon receiving the correct diagnosis, and mixed emotions regarding genetic results and the challenges of living with the disease. Following the diagnosis, participants embarked on a therapeutic journey, facing various challenges, including the disease's impact on professional and social lives, the learning curve associated with understanding the disease, shifts in communication dynamics with healthcare providers, therapeutic hurdles, and insurance-related issues. Building on these insights, we identified areas of unmet needs, such as improved collaboration with primary care providers and local hospitals, the provision of psychological support and counselling, and the necessity for ongoing patient education in the ever-evolving realms of research and therapy. CONCLUSIONS The study highlights the significant challenges encountered throughout the diagnostic and therapeutic journey in PAH. To enhance patient outcomes, it is crucial to raise awareness of the disease, establish clear diagnostic pathways, and seamlessly integrate genetic diagnostics into clinical practice. Streamlining the diagnostic process can be achieved by utilising existing clinical infrastructure to support research and fostering better communication within the NHS. Moreover, there is an urgent need for more effective therapies alongside less burdensome drug delivery methods.
Collapse
Affiliation(s)
- Emilia M Swietlik
- Department of Medicine, The Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Department of Pulmonology, Collegium Medicum, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland
- Respiratory Medicine Department, Addenbrooke's Hospital, Cambridge, UK
| | | | - Nicholas W Morrell
- Department of Medicine, The Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
19
|
Flaharty KA, Hu P, Hanchard SL, Ripper ME, Duong D, Waikel RL, Solomon BD. Evaluating large language models on medical, lay-language, and self-reported descriptions of genetic conditions. Am J Hum Genet 2024; 111:1819-1833. [PMID: 39146935 PMCID: PMC11393706 DOI: 10.1016/j.ajhg.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 07/15/2024] [Accepted: 07/16/2024] [Indexed: 08/17/2024] Open
Abstract
Large language models (LLMs) are generating interest in medical settings. For example, LLMs can respond coherently to medical queries by providing plausible differential diagnoses based on clinical notes. However, there are many questions to explore, such as evaluating differences between open- and closed-source LLMs as well as LLM performance on queries from both medical and non-medical users. In this study, we assessed multiple LLMs, including Llama-2-chat, Vicuna, Medllama2, Bard/Gemini, Claude, ChatGPT3.5, and ChatGPT-4, as well as non-LLM approaches (Google search and Phenomizer) regarding their ability to identify genetic conditions from textbook-like clinician questions and their corresponding layperson translations related to 63 genetic conditions. For open-source LLMs, larger models were more accurate than smaller LLMs: 7b, 13b, and larger than 33b parameter models obtained accuracy ranges from 21%-49%, 41%-51%, and 54%-68%, respectively. Closed-source LLMs outperformed open-source LLMs, with ChatGPT-4 performing best (89%-90%). Three of 11 LLMs and Google search had significant performance gaps between clinician and layperson prompts. We also evaluated how in-context prompting and keyword removal affected open-source LLM performance. Models were provided with 2 types of in-context prompts: list-type prompts, which improved LLM performance, and definition-type prompts, which did not. We further analyzed removal of rare terms from descriptions, which decreased accuracy for 5 of 7 evaluated LLMs. Finally, we observed much lower performance with real individuals' descriptions; LLMs answered these questions with a maximum 21% accuracy.
Collapse
Affiliation(s)
- Kendall A Flaharty
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA.
| | - Ping Hu
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Suzanna Ledgister Hanchard
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Molly E Ripper
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Dat Duong
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Rebekah L Waikel
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA
| | - Benjamin D Solomon
- Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Dr, Bethesda, MD 20892, USA.
| |
Collapse
|
20
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning large language models for rare disease concept normalization. J Am Med Inform Assoc 2024; 31:2076-2083. [PMID: 38829731 PMCID: PMC11339522 DOI: 10.1093/jamia/ocae133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 05/20/2024] [Accepted: 05/22/2024] [Indexed: 06/05/2024] Open
Abstract
OBJECTIVE We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). METHODS We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama 2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. RESULTS When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ∼20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. CONCLUSION Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLMs to identify named medical entities from clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ 08520, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
21
|
Greene D, Thys C, Berry IR, Jarvis J, Ortibus E, Mumford AD, Freson K, Turro E. Mutations in the U4 snRNA gene RNU4-2 cause one of the most prevalent monogenic neurodevelopmental disorders. Nat Med 2024; 30:2165-2169. [PMID: 38821540 PMCID: PMC11333284 DOI: 10.1038/s41591-024-03085-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 05/23/2024] [Indexed: 06/02/2024]
Abstract
Most people with intellectual disability (ID) do not receive a molecular diagnosis following genetic testing. To identify new etiologies of ID, we performed a genetic association analysis comparing the burden of rare variants in 41,132 noncoding genes between 5,529 unrelated cases and 46,401 unrelated controls. RNU4-2, which encodes U4 small nuclear RNA, a critical component of the spliceosome, was the most strongly associated gene. We implicated de novo variants among 47 cases in two regions of RNU4-2 in the etiology of a syndrome characterized by ID, microcephaly, short stature, hypotonia, seizures and motor delay. We replicated this finding in three collections, bringing the number of unrelated cases to 73. Analysis of national genomic diagnostic data showed RNU4-2 to be a more common etiological gene for neurodevelopmental abnormality than any previously reported autosomal gene. Our findings add to the growing evidence of spliceosome dysfunction in the etiologies of neurological disorders.
Collapse
Affiliation(s)
- Daniel Greene
- Department of Medicine, University of Cambridge, Cambridge, UK
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chantal Thys
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Ian R Berry
- NHS South West Genomic Laboratory Hub, Southmead Hospital, Bristol, UK
- NHS South West Genomic Medicine Service Alliance, Bristol, UK
| | - Joanna Jarvis
- Clinical Genetics Unit, Birmingham Women's Hospital, Birmingham, UK
| | - Els Ortibus
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Paediatric Neurology Department, University Hospitals of KU Leuven, Leuven, Belgium
| | - Andrew D Mumford
- NHS South West Genomic Medicine Service Alliance, Bristol, UK
- Bristol Medical School, University of Bristol, Bristol, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Ernest Turro
- Department of Medicine, University of Cambridge, Cambridge, UK.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
22
|
Gresky J, Frotscher M, Dorn J, Scheelen-Nováček K, Ahlbrecht Y, Jakob T, Schönbuchner T, Canalejo J, Ducke B, Petiti E. The Digital Atlas of Ancient Rare Diseases (DAARD) and its relevance for current research. Orphanet J Rare Dis 2024; 19:277. [PMID: 39044201 PMCID: PMC11267669 DOI: 10.1186/s13023-024-03280-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 07/03/2024] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND The history of rare diseases is largely unknown. Research on this topic has focused on individual cases of prominent (historical) individuals and artistic (e.g., iconographic) representations. Medical collections include large numbers of specimens that exhibit signs of rare diseases, but most of them date to relatively recent periods. However, cases of rare diseases detected in mummies and skeletal remains derived from archaeological excavations have also been recorded. Nevertheless, this direct evidence from historical and archaeological contexts is mainly absent from academic discourse and generally not consulted in medical research on rare diseases. RESULTS This desideratum is addressed by the Digital Atlas of Ancient Rare Diseases (DAARD: https://daard.dainst.org ), which is an open access/open data database and web-based mapping tool that collects evidence of different rare diseases found in skeletons and mummies globally and throughout all historic and prehistoric time periods. This easily searchable database allows queries by diagnosis, the preservation level of human remains, research methodology, place of curation and publications. In this manuscript, the design and functionality of the DAARD are illustrated using examples of achondroplasia and other types of stunted growth. CONCLUSIONS As an open, collaborative repository for collecting, mapping and querying well-structured medical data on individuals from ancient times, the DAARD opens new avenues of research. Over time, the number of rare diseases will increase through the addition of new cases from varied backgrounds such as museum collections and archaeological excavations. Depending on the research question, phenotypic or genetic information can be retrieved, as well as information on the general occurrence of a rare disease in selected space-time intervals. Furthermore, for individuals diagnosed with a rare disease, this approach can help them to build identity and reveal an aspect of their condition they might not have been aware of. Thus, the DAARD contributes to the understanding of rare diseases from a long-term perspective and adds to the latest medical research.
Collapse
Affiliation(s)
- Julia Gresky
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany.
| | - Melina Frotscher
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | - Juliane Dorn
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | | | - Yannick Ahlbrecht
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| | - Tina Jakob
- Department of Archaeology, Durham University, Durham, UK
| | | | | | - Benjamin Ducke
- Central Research Services/IT, German Archaeological Institute, Berlin, Germany
| | - Emmanuele Petiti
- Division of Natural Sciences, German Archaeological Institute, Berlin, Germany
| |
Collapse
|
23
|
Hussain SI, Muhammad N, Shah SA, Rehman AU, Khan SA, Saleha S, Khan YM, Muhammad N, Khan S, Wasif N. Variants in HCFC1 and MN1 genes causing intellectual disability in two Pakistani families. BMC Med Genomics 2024; 17:176. [PMID: 38956580 PMCID: PMC11221130 DOI: 10.1186/s12920-024-01943-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/21/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Intellectual disability (ID) is a neurodevelopmental condition affecting around 2% of children and young adults worldwide, characterized by deficits in intellectual functioning and adaptive behavior. Genetic factors contribute to the development of ID phenotypes, including mutations and structural changes in chromosomes. Pathogenic variants in the HCFC1 gene cause X-linked mental retardation syndrome, also known as Siderius type X-linked mental retardation. The MN1 gene is necessary for palate development, and mutations in this gene result in a genetic condition called CEBALID syndrome. METHODS Exome sequencing was used to identify the disease-causing variants in two affected families, A and B, from various regions of Pakistan. Affected individuals in these two families presented ID, developmental delay, and behavioral abnormalities. The validation and co-segregation analysis of the filtered variant was carried out using Sanger sequencing. RESULTS In an X-linked family A, a novel hemizygous missense variant (c.5705G > A; p.Ser1902Asn) in the HCFC1 gene (NM_005334.3) was identified, while in family B exome sequencing revealed a heterozygous nonsense variant (c.3680 G > A; p. Trp1227Ter) in exon-1 of the MN1 gene (NM_032581.4). Sanger sequencing confirmed the segregation of these variants with ID in each family. CONCLUSIONS The investigation of two Pakistani families revealed pathogenic genetic variants in the HCFC1 and MN1 genes, which cause ID and expand the mutational spectrum of these genes.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Shahbaz Ali Shah
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Adil U Rehman
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
- Department of Computer Science and Bioinformatics, Khushal Khan Khatak University, Karak, Pakistan
| | - Shamim Saleha
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Yar Muhammad Khan
- Department of Biotechnology, University of Science and Technology, Bannu, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan.
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany.
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.
| |
Collapse
|
24
|
Brankovic M, Ivanovic V, Basta I, Khang R, Lee E, Stevic Z, Ralic B, Tubic R, Seo G, Markovic V, Bozovic I, Svetel M, Marjanovic A, Veselinovic N, Mesaros S, Jankovic M, Savic-Pavicevic D, Jovin Z, Novakovic I, Lee H, Peric S. Whole exome sequencing in Serbian patients with hereditary spastic paraplegia. Neurogenetics 2024; 25:165-177. [PMID: 38499745 DOI: 10.1007/s10048-024-00755-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/08/2024] [Indexed: 03/20/2024]
Abstract
Hereditary spastic paraplegia (HSP) is a group of neurodegenerative diseases with a high genetic and clinical heterogeneity. Numerous HSP patients remain genetically undiagnosed despite screening for known genetic causes of HSP. Therefore, identification of novel variants and genes is needed. Our previous study analyzed 74 adult Serbian HSP patients from 65 families using panel of the 13 most common HSP genes in combination with a copy number variation analysis. Conclusive genetic findings were established in 23 patients from 19 families (29%). In the present study, nine patients from nine families previously negative on the HSP gene panel were selected for the whole exome sequencing (WES). Further, 44 newly diagnosed adult HSP patients from 44 families were sent to WES directly, since many studies showed WES may be used as the first step in HSP diagnosis. WES analysis of cohort 1 revealed a likely genetic cause in five (56%) of nine HSP families, including variants in the ETHE1, ZFYVE26, RNF170, CAPN1, and WASHC5 genes. In cohort 2, possible causative variants were found in seven (16%) of 44 patients (later updated to 27% when other diagnosis were excluded), comprising six different genes: SPAST, SPG11, WASCH5, KIF1A, KIF5A, and ABCD1. These results expand the genetic spectrum of HSP patients in Serbia and the region with implications for molecular genetic diagnosis and future causative therapies. Wide HSP panel can be the first step in diagnosis, alongside with the copy number variation (CNV) analysis, while WES should be performed after.
Collapse
Affiliation(s)
- Marija Brankovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia.
| | - Vukan Ivanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Ivana Basta
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | | | - Zorica Stevic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | | | - Radoje Tubic
- Institute for Oncology and Radiology of Serbia, Belgrade, Serbia
| | | | - Vladana Markovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ivo Bozovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Marina Svetel
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Ana Marjanovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Nikola Veselinovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Sarlota Mesaros
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Milena Jankovic
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| | - Dusanka Savic-Pavicevic
- Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Belgrade, Serbia
| | - Zita Jovin
- Neurology Clinic, University Clinical Center of Vojvodina, Novi Sad, Serbia
| | - Ivana Novakovic
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
| | - Hane Lee
- 3Billion, Inc., Seoul, South Korea
| | - Stojan Peric
- Faculty of Medicine, University of Belgrade, Dr Subotica 6, Belgrade, Serbia
- Neurology Clinic, University Clinical Center of Serbia, Belgrade, Serbia
| |
Collapse
|
25
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning Large Language Models for Rare Disease Concept Normalization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.28.573586. [PMID: 38234802 PMCID: PMC10793431 DOI: 10.1101/2023.12.28.573586] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
26
|
Hanukoglu A, Banne E, Lev D, Wainstein J. Autosomal Dominant, Long-Standing Dysglycemia in 2 Families with Unique Phenotypic Features. Clin Med Insights Endocrinol Diabetes 2024; 17:11795514241259740. [PMID: 38854748 PMCID: PMC11159530 DOI: 10.1177/11795514241259740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/10/2024] [Indexed: 06/11/2024] Open
Abstract
We describe 2 families with 5 members from 2 generations whose clinical and laboratory characteristics over up to 15 years were consistent with dysglycemia/impaired glucose tolerance. In both families (2 probands and 3 family members), long-term follow-up excluded diabetes type 1 and type 2. Diabetes type 1 antibodies were persistently negative and C-peptide levels were normal. In Family 1, the proband, during a follow-up of 7 years (10.3-17.5 years of age), exhibited persistently high HbA1c (>5.7%) with fasting blood glucose levels mostly higher than 100 mg/dl and postprandial glucose levels up to 180 mg/dl. She eventually required oral anti-diabetics with an improvement in glycemic balance. The father and sister also had persistent mild hyperglycemia with borderline high HbA1c (mostly > 5.7%) levels over 15 and 6.2 years respectively. In Family 2, the proband exhibited borderline high fasting hyperglycemia (>100 mg/dl) at age 16.2 years with increasing HbA1c levels (from 5.6%-5.9%) and impaired glucose tolerance at age 18.3 years (2 h blood glucose 156 mg/dl after 75 g glucose). His sister also exhibited borderline hyperglycemia with borderline high HbA1c over 2 years (13.6-15.4 years). These subjects shared a unique phenotype. They are tall and slim with decreased BMI. Three subjects from Generation II failed to thrive during infancy. In view of the data from 2 generations suggesting maturity-onset diabetes of the young (MODY) with autosomal dominant inheritance, we sought to analyze the MODY genes. In Family 1, the molecular analysis by the MODY panel including 11 genes and whole exome sequencing did not detect any mutation in the proband. In Family 2, the MODY panel was also negative in the proband's sister. These families may represent a hitherto unidentified syndrome. Unique features described in this report may help to reveal additional families with similar characteristics and to decipher the molecular basis of this syndrome. In selected cases, oral antidiabetics in adolescents may improve the glycemic balance.
Collapse
Affiliation(s)
- Aaron Hanukoglu
- Division of Pediatric Endocrinology, Holon, Israel
- E. Wolfson Medical Center, Holon, Israel
- Maccabi Healthcare Services, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
| | - Ehud Banne
- E. Wolfson Medical Center, Holon, Israel
- Rina Mor Institute of Medical Genetics, Holon, Israel
| | - Dorit Lev
- E. Wolfson Medical Center, Holon, Israel
- Maccabi Healthcare Services, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
- Rina Mor Institute of Medical Genetics, Holon, Israel
| | - Julio Wainstein
- E. Wolfson Medical Center, Holon, Israel
- Tel-Aviv University, Sackler School of Medicine, Tel Aviv, Israel
- Diabetes Unit, Holon, Israel
| |
Collapse
|
27
|
Prawitt D, Eggermann T. Molecular mechanisms of human overgrowth and use of omics in its diagnostics: chances and challenges. Front Genet 2024; 15:1382371. [PMID: 38894719 PMCID: PMC11183334 DOI: 10.3389/fgene.2024.1382371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/14/2024] [Indexed: 06/21/2024] Open
Abstract
Overgrowth disorders comprise a group of entities with a variable phenotypic spectrum ranging from tall stature to isolated or lateralized overgrowth of body parts and or organs. Depending on the underlying physiological pathway affected by pathogenic genetic alterations, overgrowth syndromes are associated with a broad spectrum of neoplasia predisposition, (cardio) vascular and neurodevelopmental anomalies, and dysmorphisms. Pathologic overgrowth may be of prenatal or postnatal onset. It either results from an increased number of cells (intrinsic cellular hyperplasia), hypertrophy of the normal number of cells, an increase in interstitial spaces, or from a combination of all of these. The underlying molecular causes comprise a growing number of genetic alterations affecting skeletal growth and Growth-relevant signaling cascades as major effectors, and they can affect the whole body or parts of it (mosaicism). Furthermore, epigenetic modifications play a critical role in the manifestation of some overgrowth diseases. The diagnosis of overgrowth syndromes as the prerequisite of a personalized clinical management can be challenging, due to their clinical and molecular heterogeneity. Physicians should consider molecular genetic testing as a first diagnostic step in overgrowth syndromes. In particular, the urgent need for a precise diagnosis in tumor predisposition syndromes has to be taken into account as the basis for an early monitoring and therapy. With the (future) implementation of next-generation sequencing approaches and further omic technologies, clinical diagnoses can not only be verified, but they also confirm the clinical and molecular spectrum of overgrowth disorders, including unexpected findings and identification of atypical cases. However, the limitations of the applied assays have to be considered, for each of the disorders of interest, the spectrum of possible types of genomic variants has to be considered as they might require different methodological strategies. Additionally, the integration of artificial intelligence (AI) in diagnostic workflows significantly contribute to the phenotype-driven selection and interpretation of molecular and physiological data.
Collapse
Affiliation(s)
- Dirk Prawitt
- Center for Pediatrics and Adolescent Medicine, University Medical Center, Mainz, Germany
| | - Thomas Eggermann
- Institute for Human Genetics and Genome Medicine, Medical Faculty, RWTH Aachen, Aachen, Germany
| |
Collapse
|
28
|
Faviez C, Chen X, Garcelon N, Zaidan M, Billot K, Petzold F, Faour H, Douillet M, Rozet JM, Cormier-Daire V, Attié-Bitach T, Lyonnet S, Saunier S, Burgun A. Objectivizing issues in the diagnosis of complex rare diseases: lessons learned from testing existing diagnosis support systems on ciliopathies. BMC Med Inform Decis Mak 2024; 24:134. [PMID: 38789985 PMCID: PMC11127295 DOI: 10.1186/s12911-024-02538-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/17/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. METHODS Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. RESULTS A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. CONCLUSION Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France.
- HeKA, Inria Paris, Paris, F-75012, France.
- Universite Paris Cite, Paris, France.
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Mohamad Zaidan
- Service de Néphrologie, Dialyse et Transplantation, Hôpital Universitaire Bicêtre, Assistance Publique-Hôpitaux de Paris (AP-HP), Kremlin Bicêtre, F-94270, France
| | - Katy Billot
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Friederike Petzold
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Division of Nephrology, University of Leipzig Medical Center, Leipzig, Germany
| | - Hassan Faour
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Maxime Douillet
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Jean-Michel Rozet
- Laboratory of Genetics in Ophthalmology, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Valérie Cormier-Daire
- Reference Centre for Constitutional Bone Diseases, laboratory of Osteochondrodysplasia, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Tania Attié-Bitach
- Service d'Histologie-Embryologie-Cytogénétique, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Stanislas Lyonnet
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
- Laboratory of Embryology and Genetics of Congenital Malformations, INSERM UMR 1163, Imagine Institute, Paris Cité, Paris, F-75015, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Department of Medical Informatics, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| |
Collapse
|
29
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
30
|
Sippelli F, Briuglia S, Ferraloro C, Capra AP, Agolini E, Abbate T, Pepe G, Aversa T, Wasniewska M, Corica D. Identification of a novel GNAS mutation in a family with pseudohypoparathyroidism type 1A. BMC Pediatr 2024; 24:271. [PMID: 38664677 PMCID: PMC11044326 DOI: 10.1186/s12887-024-04761-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 04/12/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Pseudohypoparathyroidism (PHP) is caused by loss-of-function mutations at the GNAS gene (as in the PHP type 1A; PHP1A), de novo or inherited at heterozygous state, or by epigenetic alterations at the GNAS locus (as in the PHP1B). The condition of PHP refers to a heterogeneous group of disorders that share common clinical and biological features of PTH resistance. Manifestations related to resistance to other hormones are also reported in many patients with PHP, in association with the phenotypic picture of Albright hereditary osteodystrophy characterized by short stature, round facies, subcutaneous ossifications, brachydactyly, mental retardation and, in some subtypes, obesity. The purpose of our study is to report a new mutation in the GNAS gene and to describe the significant phenotypic variability of three sisters with PHP1A bearing the same mutation. CASE PRESENTATION We describe the cases of three sisters with PHP1A bearing the same mutation but characterized by a significantly different phenotypic picture at onset and during follow-up in terms of clinical features, auxological pattern and biochemical changes. Clinical exome sequencing revealed a never before described heterozygote mutation in the GNAS gene (NM_000516.5 c.118_139 + 51del) of autosomal dominant maternal transmission in the three siblings, confirming the diagnosis of PHP1A. CONCLUSIONS This study reported on a novel mutation of GNAS gene and highlighted the clinical heterogeneity of PHP1A characterized by wide genotype-phenotype variability. The appropriate diagnosis has crucial implications for patient care and long-term multidisciplinary follow-up.
Collapse
Affiliation(s)
- Fabio Sippelli
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Silvana Briuglia
- Department of Biomedical and Dental Sciences and Morphofunctional Imaging, University of Messina, Messina, Italy
| | - Chiara Ferraloro
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Anna Paola Capra
- Department of Chemical, Biological, Pharmaceutical, and Environmental Sciences, University of Messina, Messina, Italy
| | - Emanuele Agolini
- Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Tiziana Abbate
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Giorgia Pepe
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Tommaso Aversa
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Malgorzata Wasniewska
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy
| | - Domenico Corica
- Department of Human Pathology of Adulthood and Childhood, University of Messina, Messina, Italy.
| |
Collapse
|
31
|
Margiotti K, Fabiani M, Cima A, Libotte F, Mesoraca A, Giorlandino C. Prenatal Diagnosis by Trio Clinical Exome Sequencing: Single Center Experience. Curr Issues Mol Biol 2024; 46:3209-3217. [PMID: 38666931 PMCID: PMC11048976 DOI: 10.3390/cimb46040201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/28/2024] Open
Abstract
Fetal anomalies, characterized by structural or functional abnormalities occurring during intrauterine life, pose a significant medical challenge, with a notable prevalence, affecting approximately 2-3% of live births and 20% of spontaneous miscarriages. This study aims to identify the genetic cause of ultrasound anomalies through clinical exome sequencing (CES) analysis. The focus is on utilizing CES analysis in a trio setting, involving the fetuses and both parents. To achieve this objective, prenatal trio clinical exome sequencing was conducted in 51 fetuseses exhibiting ultrasound anomalies with previously negative results from chromosomal microarray (CMA) analysis. The study revealed pathogenic variants in 24% of the analyzed cases (12 out of 51). It is worth noting that the findings include de novo variants in 50% of cases and the transmission of causative variants from asymptomatic parents in 50% of cases. Trio clinical exome sequencing stands out as a crucial tool in advancing prenatal diagnostics, surpassing the effectiveness of relying solely on chromosomal microarray analysis. This underscores its potential to become a routine diagnostic standard in prenatal care, particularly for cases involving ultrasound anomalies.
Collapse
Affiliation(s)
- Katia Margiotti
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Marco Fabiani
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Antonella Cima
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Francesco Libotte
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Alvaro Mesoraca
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
| | - Claudio Giorlandino
- Human Genetics Lab, Altamedica Main Centre, Viale Liegi 45, 00198 Rome, Italy; (M.F.); (A.C.); (F.L.); (A.M.); (C.G.)
- Fetal-Maternal Medical Centre, Altamedica Viale Liegi 45, 00198 Rome, Italy
| |
Collapse
|
32
|
Kim HH, Kim DW, Woo J, Lee K. Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare Mendelian disorders. Hum Genomics 2024; 18:28. [PMID: 38509596 PMCID: PMC10956189 DOI: 10.1186/s40246-024-00595-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 03/03/2024] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND In the process of finding the causative variant of rare diseases, accurate assessment and prioritization of genetic variants is essential. Previous variant prioritization tools mainly depend on the in-silico prediction of the pathogenicity of variants, which results in low sensitivity and difficulty in interpreting the prioritization result. In this study, we propose an explainable algorithm for variant prioritization, named 3ASC, with higher sensitivity and ability to annotate evidence used for prioritization. 3ASC annotates each variant with the 28 criteria defined by the ACMG/AMP genome interpretation guidelines and features related to the clinical interpretation of the variants. The system can explain the result based on annotated evidence and feature contributions. RESULTS We trained various machine learning algorithms using in-house patient data. The performance of variant ranking was assessed using the recall rate of identifying causative variants in the top-ranked variants. The best practice model was a random forest classifier that showed top 1 recall of 85.6% and top 3 recall of 94.4%. The 3ASC annotates the ACMG/AMP criteria for each genetic variant of a patient so that clinical geneticists can interpret the result as in the CAGI6 SickKids challenge. In the challenge, 3ASC identified causal genes for 10 out of 14 patient cases, with evidence of decreased gene expression for 6 cases. Among them, two genes (HDAC8 and CASK) had decreased gene expression profiles confirmed by transcriptome data. CONCLUSIONS 3ASC can prioritize genetic variants with higher sensitivity compared to previous methods by integrating various features related to clinical interpretation, including features related to false positive risk such as quality control and disease inheritance pattern. The system allows interpretation of each variant based on the ACMG/AMP criteria and feature contribution assessed using explainable AI techniques.
Collapse
Affiliation(s)
- Ho Heon Kim
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Dong-Wook Kim
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Junwoo Woo
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea
| | - Kyoungyeul Lee
- Research and Development Center, 3billion, 14th floor, 416 Teheran-ro, Gangnam-gu, Seoul, 06193, Republic of Korea.
| |
Collapse
|
33
|
Bhasin MA, Knaus A, Incardona P, Schmid A, Holtgrewe M, Elbracht M, Krawitz PM, Hsieh TC. Enhancing Variant Prioritization in VarFish through On-Premise Computational Facial Analysis. Genes (Basel) 2024; 15:370. [PMID: 38540429 PMCID: PMC10969976 DOI: 10.3390/genes15030370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/03/2024] [Accepted: 03/13/2024] [Indexed: 06/14/2024] Open
Abstract
Genomic variant prioritization is crucial for identifying disease-associated genetic variations. Integrating facial and clinical feature analyses into this process enhances performance. This study demonstrates the integration of facial analysis (GestaltMatcher) and Human Phenotype Ontology analysis (CADA) within VarFish, an open-source variant analysis framework. Challenges related to non-open-source components were addressed by providing an open-source version of GestaltMatcher, facilitating on-premise facial analysis to address data privacy concerns. Performance evaluation on 163 patients recruited from a German multi-center study of rare diseases showed PEDIA's superior accuracy in variant prioritization compared to individual scores. This study highlights the importance of further benchmarking and future integration of advanced facial analysis approaches aligned with ACMG guidelines to enhance variant classification.
Collapse
Affiliation(s)
- Meghna Ahuja Bhasin
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Alexej Knaus
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Pietro Incardona
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
- Core Unit for Bioinformatics Data Analysis, Medical Faculty, University of Bonn, 53127 Bonn, Germany
| | - Alexander Schmid
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Manuel Holtgrewe
- CUBI—Core Unit Bioinformatics, Berlin Institute of Health, 10117 Berlin, Germany;
| | - Miriam Elbracht
- Institute for Human Genetics and Genomic Medicine, Medical Faculty, RWTH Aachen University, 52062 Aachen, Germany;
| | - Peter M. Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| | - Tzung-Chien Hsieh
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn, 53127 Bonn, Germany; (M.A.B.); (A.K.); (P.I.); (A.S.); (P.M.K.)
| |
Collapse
|
34
|
Carrer A, Romaniello MG, Calderara ML, Mariani M, Biondi A, Selicorni A. Application of the Face2Gene tool in an Italian dysmorphological pediatric clinic: Retrospective validation and future perspectives. Am J Med Genet A 2024; 194:e63459. [PMID: 37927205 DOI: 10.1002/ajmg.a.63459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/15/2023] [Accepted: 10/16/2023] [Indexed: 11/07/2023]
Abstract
Neurodevelopmental disorders exhibit recurrent facial features that can suggest the genetic diagnosis at a glance, but recognizing subtle dysmorphisms is a specialized skill that requires very long training. Face2Gene (FDNA Inc) is an innovative computer-aided phenotyping tool that analyses patient's portraits and suggests 30 candidate syndromes with similar morphology in a prioritized list. We hypothesized that the software could support even expert physicians in the diagnostic workup of genetic conditions. In this study, we assessed the performance of Face2Gene in an Italian dysmorphological pediatrics clinic. We uploaded two-dimensional face pictures of 145 children affected by genetic conditions with typical phenotypic traits. All diagnoses were previously confirmed by cytogenetic or molecular tests. Overall, the software's differential included the correct syndrome in most cases (98%). We evaluated the efficiency of the algorithm even considering the rareness of the genetic conditions. All "common" diagnoses were correctly identified, most of them with high diagnostic accuracy (93% in top-3 matches). Finally, the performance for the most common pediatric syndromes was calculated. Face2Gene performed well even for ultra-rare genetic conditions (75% within top-3 matches and 83% within top-10 matches). Expert geneticists maybe do not need computer support to recognize common syndromes, but our results prove that the tool can be useful not only for general pediatricians but also in dysmorphological clinics for ultra-rare genetic conditions.
Collapse
Affiliation(s)
- Alessia Carrer
- Department of Health Sciences, University of Milan, Milan, Italy
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| | - Maria Giovanna Romaniello
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
- School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Maria Letizia Calderara
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
- Department of Medicine and Surgery, University of Insubria, Varese, Italy
| | - Milena Mariani
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| | - Andrea Biondi
- Department of Medicine and Surgery, University of Insubria, Varese, Italy
- Paediatrics, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | - Angelo Selicorni
- Mariani Foundation Center for Fragile Child, Pediatric Unit ASST Lariana, Como, Italy
| |
Collapse
|
35
|
Lee JY, Oh SH, Keum C, Lee BL, Chung WY. Clinical application of prospective whole-exome sequencing in the diagnosis of genetic disease: Experience of a regional disease center in South Korea. Ann Hum Genet 2024; 88:101-112. [PMID: 37795942 DOI: 10.1111/ahg.12530] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 08/29/2023] [Accepted: 09/11/2023] [Indexed: 10/06/2023]
Abstract
INTRODUCTION Next-generation sequencing helps clinicians diagnose patients with suspected genetic disorders. The current study aimed to investigate the diagnostic yield and clinical utility of prospective whole-exome sequencing (WES) in rare diseases. METHODS WES was performed in 92 patients who presented with clinical symptoms suggestive of genetic disorders. The WES data were analyzed using an in-house developed software. The patients' phenotypic characteristics were classified according to the human phenotype ontology. RESULTS WES detected 64 variants, 13 were classified as pathogenic, 26 as likely pathogenic, and 25 as variants of uncertain significance. In 57 patients with these variants, 30 were identified as causal variants. The diagnostic yield was higher in patients with abnormalities in joint mobility and skin morphology than in those with cerebellar hypoplasia/atrophy, epilepsy, global developmental delay, dysmorphic features/facial dysmorphisms, and chronic kidney disease/abnormal renal morphology. CONCLUSION In this study, a WES-based variant interpretation system was employed to provide a definitive diagnosis for 28.3% of the patients suspected of having genetic disorders. WES is particularly useful for diagnosing rare diseases with symptoms that affect more than one system, when targeted genetic panels are difficult to employ.
Collapse
Affiliation(s)
- Ja Young Lee
- Department of Laboratory Medicine, Inje University College of Medicine, Busan, South Korea
| | - Seung-Hwan Oh
- Department of Laboratory Medicine, Pusan National University School of Medicine, Yangsan, South Korea
| | | | - Bo Lyun Lee
- Department of Pediatrics, Inje University College of Medicine, Busan, South Korea
| | - Woo Yeong Chung
- Department of Pediatrics, Inje University College of Medicine, Busan, South Korea
| |
Collapse
|
36
|
Latif M, Hashmi JA, Alayoubi AM, Ayub A, Basit S. Identification of Novel and Recurrent Variants in BTD, GBE1, AGL and ASL Genes in Families with Metabolic Disorders in Saudi Arabia. J Clin Med 2024; 13:1193. [PMID: 38592052 PMCID: PMC10932034 DOI: 10.3390/jcm13051193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/07/2023] [Accepted: 11/14/2023] [Indexed: 04/10/2024] Open
Abstract
Background and Objectives: Inherited metabolic disorders (IMDs) are a group of genetic disorders characterized by defects in enzymes or transport proteins involved in metabolic processes. These defects result in an abnormal accumulation of metabolites and thus interfere with the body's metabolism. A variety of IMDs exist and differential diagnosis is often challenging. Our objective was to gain insight into the genetic basis of IMDs and the correlations between specific genetic mutations and clinical presentations in patients admitted at various hospitals in the Madinah region of the Kingdom of Saudi Arabia. Material and Methods: Whole exome sequencing (WES) has emerged as a powerful tool for diagnosing IMDs and allows for the identification of disease-causing genetic mutations in individuals suspected of IMDs. This ensures accurate diagnosis and appropriate management. WES was performed in four families with multiple individuals showing clinical presentation of IMDs. Validation of the variants identified through WES was conducted using Sanger sequencing. Furthermore, various computational analyses were employed to uncover the disease gene co-expression and metabolic pathways. Results: Exome variant data analysis revealed missense variants in the BTD (c.1270G > C), ASL (c.1300G > T), GBE1 (c.985T > G) and AGL (c.113C > G) genes. Mutations in these genes are known to cause IMDs. Conclusions: Thus, our data showed that exome sequencing, in conjunction with clinical and biochemical characteristics and pathological hallmarks, could deliver an accurate and high-throughput outcome for the diagnosis and sub-typing of IMDs. Overall, our findings emphasize that the integration of WES with clinical and pathological information has the potential to improve the diagnosis and understanding of IMDs and related disorders, ultimately benefiting patients and the medical community.
Collapse
Affiliation(s)
- Muhammad Latif
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| | - Jamil Amjad Hashmi
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| | - Abdulfatah M. Alayoubi
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
| | - Arusha Ayub
- Department of Medicine, School of Health Sciences, University of Georgia, Tbilisi, P. O. Box-0171, Georgia;
| | - Sulman Basit
- Department of Basic Medical Sciences, College of Medicine, Taibah University, Madinah 42353, Saudi Arabia; (J.A.H.); (A.M.A.)
- Center for Genetics and Inherited Diseases, Taibah University, Madinah 42353, Saudi Arabia
| |
Collapse
|
37
|
Yang J, Shu L, Han M, Pan J, Chen L, Yuan T, Tan L, Shu Q, Duan H, Li H. RDmaster: A novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput Biol Med 2024; 169:107924. [PMID: 38181610 DOI: 10.1016/j.compbiomed.2024.107924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 12/18/2023] [Accepted: 01/01/2024] [Indexed: 01/07/2024]
Abstract
BACKGROUND Clinicians often lack the necessary expertise to differentially diagnose multiple underlying rare diseases (RDs) due to their complex and overlapping clinical features, leading to misdiagnoses and delayed treatments. The aim of this study is to develop a novel electronic differential diagnostic support system for RDs. METHOD Through integrating two Bayesian diagnostic methods, a candidate list was generated with enhance clinical interpretability for the further Q&A based differential diagnosis (DDX). To achieve an efficient Q&A dialogue strategy, we introduce a novel metric named the adaptive information gain and Gini index (AIGGI) to evaluate the expected gain of interrogated phenotypes within real-time diagnostic states. RESULTS This DDX tool called RDmaster has been implemented as a web-based platform (http://rdmaster.nbscn.org/). A diagnostic trial involving 238 published RD patients revealed that RDmaster outperformed existing RD diagnostic tools, as well as ChatGPT, and was shown to enhance the diagnostic accuracy through its Q&A system. CONCLUSIONS The RDmaster offers an effective multi-omics differential diagnostic technique and outperforms existing tools and popular large language models, particularly enhancing differential diagnosis in collecting diagnostically beneficial phenotypes.
Collapse
Affiliation(s)
- Jian Yang
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China; The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Liqi Shu
- Rhode Island Hospital, Warren Alpert Medical School of Brown University, Rhode Island, USA
| | - Mingyu Han
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Jiarong Pan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Lihua Chen
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Tianming Yuan
- Neonatal Department, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Linhua Tan
- Surgical Intensive Care Unit, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Qiang Shu
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China
| | - Huilong Duan
- The College of Biomedical Engineering and Instrument Science, Zhejiang University, Zhejiang, China
| | - Haomin Li
- Clinical Data Center, The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Zhejiang, China.
| |
Collapse
|
38
|
Lagorce D, Lebreton E, Matalonga L, Hongnat O, Chahdil M, Piscia D, Paramonov I, Ellwanger K, Köhler S, Robinson P, Graessner H, Beltran S, Lucano C, Hanauer M, Rath A. Phenotypic similarity-based approach for variant prioritization for unsolved rare disease: a preliminary methodological report. Eur J Hum Genet 2024; 32:182-189. [PMID: 37926714 PMCID: PMC10853199 DOI: 10.1038/s41431-023-01486-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 09/13/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Rare diseases (RD) have a prevalence of not more than 1/2000 persons in the European population, and are characterised by the difficulty experienced in obtaining a correct and timely diagnosis. According to Orphanet, 72.5% of RD have a genetic origin although 35% of them do not yet have an identified causative gene. A significant proportion of patients suspected to have a genetic RD receive an inconclusive exome/genome sequencing. Working towards the International Rare Diseases Research Consortium (IRDiRC)'s goal for 2027 to ensure that all people living with a RD receive a diagnosis within one year of coming to medical attention, the Solve-RD project aims to identify the molecular causes underlying undiagnosed RD. As part of this strategy, we developed a phenotypic similarity-based variant prioritization methodology comparing submitted cases with other submitted cases and with known RD in Orphanet. Three complementary approaches based on phenotypic similarity calculations using the Human Phenotype Ontology (HPO), the Orphanet Rare Diseases Ontology (ORDO) and the HPO-ORDO Ontological Module (HOOM) were developed; genomic data reanalysis was performed by the RD-Connect Genome-Phenome Analysis Platform (GPAP). The methodology was tested in 4 exemplary cases discussed with experts from European Reference Networks. Variants of interest (pathogenic or likely pathogenic) were detected in 8.8% of the 725 cases clustered by similarity calculations. Diagnostic hypotheses were validated in 42.1% of them and needed further exploration in another 10.9%. Based on the promising results, we are devising an automated standardized phenotypic-based re-analysis pipeline to be applied to the entire unsolved cases cohort.
Collapse
Affiliation(s)
- David Lagorce
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France.
| | - Emeline Lebreton
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Leslie Matalonga
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Oscar Hongnat
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Maroua Chahdil
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Davide Piscia
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Ida Paramonov
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Kornelia Ellwanger
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Centre for Rare Diseases, University of Tübingen, Tübingen, Germany
| | | | - Peter Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Holm Graessner
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Centre for Rare Diseases, University of Tübingen, Tübingen, Germany
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, 08028, Spain
| | - Caterina Lucano
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014, Paris, France
| |
Collapse
|
39
|
Nawaz H, Parveen A, Khan SA, Zalan AK, Khan MA, Muhammad N, Hassib NF, Mostafa MI, Elhossini RM, Roshdy NN, Ullah A, Arif A, Khan S, Ammerpohl O, Wasif N. Brachyolmia, dental anomalies and short stature (DASS): Phenotype and genotype analyses of Egyptian and Pakistani patients. Heliyon 2024; 10:e23688. [PMID: 38192829 PMCID: PMC10772639 DOI: 10.1016/j.heliyon.2023.e23688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 11/29/2023] [Accepted: 12/09/2023] [Indexed: 01/10/2024] Open
Abstract
Brachyolmia is a heterogeneous group of developmental disorders characterized by a short trunk, short stature, scoliosis, and generalized platyspondyly without significant deformities in the long bones. DASS (Dental Abnormalities and Short Stature), caused by alterations in the LTBP3 gene, was previously considered as a subtype of brachyolmia. The present study investigated three unrelated consanguineous families (A, B, C) with Brachyolmia and DASS from Egypt and Pakistan. In our Egyptian patients, we also observed hearing impairment. Exome sequencing was performed to determine the genetic causes of the diverse clinical conditions in the patients. Exome sequencing identified a novel homozygous splice acceptor site variant (LTBP3:c.3629-1G > T; p. ?) responsible for DASS phenotypes and a known homozygous missense variant (CABP2: c.590T > C; p.Ile197Thr) causing hearing impairment in the Egyptian patients. In addition, two previously reported homozygous frameshift variants (LTBP3:c.132delG; p.Pro45Argfs*25) and (LTBP3:c.2216delG; p.Gly739Alafs*7) were identified in Pakistani patients. This study emphasizes the vital role of LTBP3 in the axial skeleton and tooth morphogenesis and expands the mutational spectrum of LTBP3. We are reporting LTBP3 variants in seven patients of three families, majorly causing brachyolmia with dental and cardiac anomalies. Skeletal assessment documented short webbed neck, broad chest, evidences of mild long bones involvement, short distal phalanges, pes planus and osteopenic bone texture as additional associated findings expanding the clinical phenotype of DASS. The current study reveals that the hearing impairment phenotype in Egyptian patients of family A has a separate transmission mechanism independent of LTBP3.
Collapse
Affiliation(s)
- Hamed Nawaz
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Asia Parveen
- Department of Biochemistry, Faculty of Life Sciences, Gulab Devi Educational Complex, Gulab Devi Hospital, 54000, Lahore, Pakistan
- Faculty of Science and Technology, University of Central Punjab (UCP), Lahore, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
- Department of Computer Science and Bioinformatics, Khushal Khan Khatak University, Karak, Pakistan
| | - Abul Khair Zalan
- BDS, MDS Registrar Pediatric Dentistry, Department of Pediatric Dentistry, School of Dentistry, PIMS, Islamabad, Pakistan
| | - Muhammad Adnan Khan
- Dental Material, Institute of Basic Medical Sciences, Khyber Medical University Peshawar, Peshawar, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Nehal F. Hassib
- Orodental Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
- School of Dentistry, New Giza University, Giza, Egypt
| | - Mostafa I. Mostafa
- Orodental Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
| | - Rasha M. Elhossini
- Clinical Genetics Department, Human Genetics and Genome Research Institute, National Research Centre, Cairo, 12622, Egypt
| | - Nehal Nabil Roshdy
- Endodontics, Faculty of Dentistry, Cairo University, Cairo, 11553, Egypt
| | - Asmat Ullah
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Amina Arif
- Faculty of Science and Technology, University of Central Punjab (UCP), Lahore, Pakistan
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Pakistan
| | - Ole Ammerpohl
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, D-24105, Kiel, Germany
| |
Collapse
|
40
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT. PATTERNS (NEW YORK, N.Y.) 2024; 5:100887. [PMID: 38264716 PMCID: PMC10801236 DOI: 10.1016/j.patter.2023.100887] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/25/2023] [Accepted: 11/06/2023] [Indexed: 01/25/2024]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics Facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
41
|
Hussain SI, Muhammad N, Khan N, Khan M, Fardous F, Tahir R, Yasin M, Khan SA, Saleha S, Muhammad N, Wasif N, Khan S. Molecular insight into CREBBP and TANGO2 variants causing intellectual disability. J Gene Med 2024; 26:e3591. [PMID: 37721116 DOI: 10.1002/jgm.3591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/07/2023] [Accepted: 08/24/2023] [Indexed: 09/19/2023] Open
Abstract
BACKGROUND Intellectual disability (ID) can be associated with different syndromes such as Rubinstein-Taybi syndrome (RSTS) and can also be related to conditions such as metabolic encephalomyopathic crises, recurrent,with rhabdomyolysis, cardiac arrhythmias and neurodegeneration. Rare congenital RSTS1 (OMIM 180849) is characterized by mental and growth retardation, significant and duplicated distal phalanges of thumbs and halluces, facial dysmorphisms, and an elevated risk of malignancies. Microdeletions and point mutations in the CREB-binding protein (CREBBP) gene, located at 16p13.3, have been reported to cause RSTS. By contrast, TANGO2-related metabolic encephalopathy and arrhythmia (TRMEA) is a rare metabolic condition that causes repeated metabolic crises, hypoglycemia, lactic acidosis, rhabdomyolysis, arrhythmias and encephalopathy with cognitive decline. Clinicians need more clinical and genetic evidence to detect and comprehend the phenotypic spectrum of this disorder. METHODS Exome sequencing was used to identify the disease-causing variants in two affected families A and B from District Kohat and District Karak, Khyber Pakhtunkhwa. Affected individuals from both families presented symptoms of ID, developmental delay and behavioral abnormalities. The validation and co-segregation analysis of the filtered variant was carried out using Sanger sequencing. RESULTS In the present study, two families (A and B) exhibiting various forms of IDs were enrolled. In Family A, exome sequencing revealed a novel missense variant (NM 004380.3: c.4571A>G; NP_004371.2: p.Lys1524Arg) in the CREBBP gene, whereas, in Family B, a splice site variant (NM 152906.7: c.605 + 1G>A) in the TANGO2 gene was identified. Sanger sequencing of both variants confirmed their segregation with ID in both families. The in silico tools verified the aberrant changes in the CREBBP protein structure. Wild-type and mutant CREBBP protein structures were superimposed and conformational changes were observed likely altering the protein function. CONCLUSIONS RSTS and TRMEA are exceedingly rare disorders for which specific clinical characteristics have been clearly established, but more investigations are underway and required. Multicenter studies are needed to increase our understanding of the clinical phenotypes, mainly showing the genotype-phenotype associations.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Niamatullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Mobeen Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Fardous Fardous
- Department of Medical Lab Technology, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Raheel Tahir
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Yasin
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Shamim Saleha
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, Ulm, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science and Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| |
Collapse
|
42
|
Meyer C, Romero NB, Evangelista T, Cadot B, Laporte J, Jeannin-Girardon A, Collet P, Ayadi A, Chennen K, Poch O. IMPatienT: An Integrated Web Application to Digitize, Process and Explore Multimodal PATIENt daTa. J Neuromuscul Dis 2024; 11:855-870. [PMID: 38701156 PMCID: PMC11307071 DOI: 10.3233/jnd-230085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 05/05/2024]
Abstract
Medical acts, such as imaging, lead to the production of various medical text reports that describe the relevant findings. This induces multimodality in patient data by combining image data with free-text and consequently, multimodal data have become central to drive research and improve diagnoses. However, the exploitation of patient data is problematic as the ecosystem of analysis tools is fragmented according to the type of data (images, text, genetics), the task (processing, exploration) and domain of interest (clinical phenotype, histology). To address the challenges, we developed IMPatienT (Integrated digital Multimodal PATIENt daTa), a simple, flexible and open-source web application to digitize, process and explore multimodal patient data. IMPatienT has a modular architecture allowing to: (i) create a standard vocabulary for a domain, (ii) digitize and process free-text data, (iii) annotate images and perform image segmentation, (iv) generate a visualization dashboard and provide diagnosis decision support. To demonstrate the advantages of IMPatienT, we present a use case on a corpus of 40 simulated muscle biopsy reports of congenital myopathy patients. As IMPatienT provides users with the ability to design their own vocabulary, it can be adapted to any research domain and can be used as a patient registry for exploratory data analysis. A demo instance of the application is available at https://impatient.lbgi.fr/.
Collapse
Affiliation(s)
- Corentin Meyer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Norma Beatriz Romero
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Teresinha Evangelista
- Neuromuscular Morphology Unit, Myology Institute, Reference Center of Neuromuscular Diseases Nord-Est-IDF, GHU Pitié-Salpêtrière, Paris, France
| | - Brunot Cadot
- Sorbonne Université, INSERM, Center for Research in Myology, Myology Institute, GHU Pitié-Salpêtrière, Paris, France
| | - Jocelyn Laporte
- Department Translational Medicine, IGBMC, CNRS UMR 7104, Illkirch, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Ali Ayadi
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR 7357, University of Strasbourg, Strasbourg, France
| |
Collapse
|
43
|
Kouri C, Sommer G, Martinez de Lapiscina I, Elzenaty RN, Tack LJW, Cools M, Ahmed SF, Flück CE. Clinical and genetic characteristics of a large international cohort of individuals with rare NR5A1/SF-1 variants of sex development. EBioMedicine 2024; 99:104941. [PMID: 38168586 PMCID: PMC10797150 DOI: 10.1016/j.ebiom.2023.104941] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Steroidogenic factor 1 (SF-1/NR5A1) is essential for human sex development. Heterozygous NR5A1/SF-1 variants manifest with a broad range of phenotypes of differences of sex development (DSD), which remain unexplained. METHODS We conducted a retrospective analysis on the so far largest international cohort of individuals with NR5A1/SF-1 variants, identified through the I-DSD registry and a research network. FINDINGS Among 197 individuals with NR5A1/SF-1 variants, we confirmed diverse phenotypes. Over 70% of 46, XY individuals had a severe DSD phenotype, while 90% of 46, XX individuals had female-typical sex development. Close to 100 different novel and known NR5A1/SF-1 variants were identified, without specific hot spots. Additionally, likely disease-associated variants in other genes were reported in 32 individuals out of 128 tested (25%), particularly in those with severe or opposite sex DSD phenotypes. Interestingly, 48% of these variants were found in known DSD or SF-1 interacting genes, but no frequent gene-clusters were identified. Sex registration at birth varied, with <10% undergoing reassignment. Gonadectomy was performed in 30% and genital surgery in 58%. Associated organ anomalies were observed in 27% of individuals with a DSD, mainly concerning the spleen. Intrafamilial phenotypes also varied considerably. INTERPRETATION The observed phenotypic variability in individuals and families with NR5A1/SF-1 variants is large and remains unpredictable. It may often not be solely explained by the monogenic pathogenicity of the NR5A1/SF-1 variants but is likely influenced by additional genetic variants and as-yet-unknown factors. FUNDING Swiss National Science Foundation (320030-197725) and Boveri Foundation Zürich, Switzerland.
Collapse
Affiliation(s)
- Chrysanthi Kouri
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern 3012, Switzerland
| | - Grit Sommer
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Institute of Social and Preventive Medicine, University of Bern, Switzerland, University of Bern, Bern 3012, Switzerland
| | - Idoia Martinez de Lapiscina
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Research into the Genetics and Control of Diabetes and Other Endocrine Disorders, Biobizkaia Health Research Institute, Cruces University Hospital, Barakaldo 48903, Spain; CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Instituto de Salud Carlos III, Madrid 28029, Spain; CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid 28029, Spain; Endo-ERN, Amsterdam 1081 HV, the Netherlands
| | - Rawda Naamneh Elzenaty
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland; Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern 3012, Switzerland
| | - Lloyd J W Tack
- Department of Paediatric Endocrinology, Department of Paediatrics and Internal Medicine, Ghent University Hospital, Ghent University, Ghent 9000, Belgium
| | - Martine Cools
- Department of Paediatric Endocrinology, Department of Paediatrics and Internal Medicine, Ghent University Hospital, Ghent University, Ghent 9000, Belgium
| | - S Faisal Ahmed
- Developmental Endocrinology Research Group, University of Glasgow, Royal Hospital for Sick Children, Glasgow G51 4TF, UK
| | - Christa E Flück
- Pediatric Endocrinology, Diabetology and Metabolism, Department of Pediatrics, Inselspital, Bern University Hospital, University of Bern, Bern 3010, Switzerland; Department for BioMedical Research, University of Bern, Bern 3008, Switzerland.
| |
Collapse
|
44
|
Karafyllis I, Nuoffer JM, Michelis JP, Chilver-Stainer L. Untreated Classic Galactosemia: A Rare Cause of Adult-Onset Progressive Cerebellar Ataxia - A Case Report. Case Rep Neurol 2024; 16:55-62. [PMID: 38444718 PMCID: PMC10914380 DOI: 10.1159/000536679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 01/24/2024] [Indexed: 03/07/2024] Open
Abstract
Introduction Identifying the underlying etiology of nonfamilial adult-onset progressive cerebellar ataxia is often challenging because neurologists must consider almost all nongenetic and genetic causes of ataxia. Case Presentation A 39-year-old woman was hospitalized for progressive ataxia with pyramidal and cognitive dysfunction after a right arm shaking and coordination problem deteriorated progressively over 1.5 years. The patient's medical history included amenorrhea, cataracts, developmental delays, consanguinity of the parents, motor coordination issues, and diarrhea and vomiting in infancy. An important finding that enabled us to solve the diagnostic conundrum was the elevated carbohydrate-deficient transferrin levels in the lack of alcohol-related symptoms, which also occur in untreated carbohydrate metabolism disorders, sometimes with ataxia as a leading symptom. The decreased erythrocyte galactose-1-phosphate uridyltransferase (GALT) enzyme activity and the elevated erythrocyte galactose-1-phosphate (Gal-1P) concentration led to the final diagnosis of galactosemia, a rare metabolic disorder. The patient's condition stayed stable with strict adherence to lactose-free and galactose-restricted diets, regular physiotherapy, and speech therapy, despite attempts to control the crippling tremor. Conclusion This case highlights the importance of considering rare diseases based on unexplained clinical and laboratory findings. Newborn screening does not change the long-term complications of early-treated classical galactosemia. A small percentage of these patients develop ataxia tremor syndrome.
Collapse
Affiliation(s)
- Ioannis Karafyllis
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department of Neurology, Cantonal Hospital Olten, Olten, Switzerland
| | - Jean-Marc Nuoffer
- Department of Pediatric Endocrinology, Diabetology and Metabolism, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- University Institute of Clinical Chemistry, University of Bern, Bern, Switzerland
| | - Joan-Philipp Michelis
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Lara Chilver-Stainer
- Department of Neurology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
| |
Collapse
|
45
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A tool for annotating and analyzing treatments and clinical management of human disease. MED 2023; 4:913-927.e3. [PMID: 37963467 PMCID: PMC10842845 DOI: 10.1016/j.medj.2023.10.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/31/2023] [Accepted: 10/14/2023] [Indexed: 11/16/2023]
Abstract
BACKGROUND Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions. METHODS MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology. FINDINGS MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases. CONCLUSIONS MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO). FUNDING NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way, Cambridge CB2 0PY, UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Markus S Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada; Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada; Brain and Mind Research Institute, University of Ottawa, Ottawa, Canada; Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany; Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg, Saar, Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, Cambridge CB2 0BB, UK
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada
| | | | | | | | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
46
|
Groza T, Wu H, Dinger ME, Danis D, Hilton C, Bagley A, Davids JR, Luo L, Lu Z, Robinson PN. Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics 2023; 39:btad716. [PMID: 38001031 PMCID: PMC10710372 DOI: 10.1093/bioinformatics/btad716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/20/2023] [Accepted: 11/23/2023] [Indexed: 11/26/2023] Open
Abstract
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Honghan Wu
- Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom
| | - Marcel E Dinger
- Pryzm Health, Sydney, NSW 2089, Australia
- School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Coleman Hilton
- Shriners Children’s Corporate Headquarters, Tampa, FL 33607, United States
| | - Anita Bagley
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| |
Collapse
|
47
|
Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y, Wang K. Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT. ARXIV 2023:arXiv:2308.06294v2. [PMID: 37986722 PMCID: PMC10659449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
Collapse
Affiliation(s)
- Jingye Yang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Wendy Deng
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Da Wu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Biostatistics and Bioinformatics facility, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
48
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
49
|
Alsentzer E, Finlayson SG, Li MM, Kobren SN, Kohane IS. Simulation of undiagnosed patients with novel genetic conditions. Nat Commun 2023; 14:6403. [PMID: 37828001 PMCID: PMC10570269 DOI: 10.1038/s41467-023-41980-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/26/2023] [Indexed: 10/14/2023] Open
Abstract
Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.
Collapse
Grants
- U01 HG007690 NHGRI NIH HHS
- U54 NS108251 NINDS NIH HHS
- U01 HG010219 NHGRI NIH HHS
- U01 HG007672 NHGRI NIH HHS
- U01 HG010233 NHGRI NIH HHS
- U01 HG010230 NHGRI NIH HHS
- U01 HG007943 NHGRI NIH HHS
- U01 HG010217 NHGRI NIH HHS
- U01 HG007942 NHGRI NIH HHS
- U01 HG010215 NHGRI NIH HHS
- U01 HG007708 NHGRI NIH HHS
- T32 HG002295 NHGRI NIH HHS
- T32 GM007753 NIGMS NIH HHS
- U01 HG007674 NHGRI NIH HHS
- U01 TR001395 NCATS NIH HHS
- U01 HG007709 NHGRI NIH HHS
- U54 NS093793 NINDS NIH HHS
- U01 HG007530 NHGRI NIH HHS
- U01 TR002471 NCATS NIH HHS
- U01 HG007703 NHGRI NIH HHS
- UDN research reported in this manuscript was supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Number(s) U01HG007709, U01HG010219, U01HG010230, U01HG010217, U01HG010233, U01HG010215, U01HG007672, U01HG007690, U01HG007708, U01HG007703, U01HG007674, U01HG007530, U01HG007942, U01HG007943, U01TR001395, U01TR002471, U54NS108251, and U54NS093793.
- E.A. is supported by a Microsoft Research PhD Fellowship.
- S.F. is supported by award Number T32GM007753 from the National Institute of General Medical Sciences.
- M.L. is supported by T32HG002295 from the National Human Genome Research Institute and a National Science Foundation Graduate Research Fellowship.
Collapse
Affiliation(s)
- Emily Alsentzer
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
| | - Samuel G Finlayson
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Program in Health Sciences and Technology, MIT, Cambridge, MA, 02139, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA, 98105, USA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, 98105, USA
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
- Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, 02115, USA
| | - Shilpa N Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
50
|
Hussain SI, Muhammad N, Shah SUD, Fardous F, Khan SA, Khan N, Rehman AU, Siddique M, Wasan SA, Niaz R, Ullah H, Khan N, Muhammad N, Mirza MU, Wasif N, Khan S. Structural and functional implications of SLC13A3 and SLC9A6 mutations: an in silico approach to understanding intellectual disability. BMC Neurol 2023; 23:353. [PMID: 37794328 PMCID: PMC10548666 DOI: 10.1186/s12883-023-03397-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/20/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Intellectual disability (ID) is a condition that varies widely in both its clinical presentation and its genetic underpinnings. It significantly impacts patients' learning capacities and lowers their IQ below 70. The solute carrier (SLC) family is the most abundant class of transmembrane transporters and is responsible for the translocation of various substances across cell membranes, including nutrients, ions, metabolites, and medicines. The SLC13A3 gene encodes a plasma membrane-localized Na+/dicarboxylate cotransporter 3 (NaDC3) primarily expressed in the kidney, astrocytes, and the choroid plexus. In addition to three Na + ions, it brings four to six carbon dicarboxylates into the cytosol. Recently, it was discovered that patients with acute reversible leukoencephalopathy and a-ketoglutarate accumulation (ARLIAK) carry pathogenic mutations in the SLC13A3 gene, and the X-linked neurodevelopmental condition Christianson Syndrome is caused by mutations in the SLC9A6 gene, which encodes the recycling endosomal alkali cation/proton exchanger NHE6, also called sodium-hydrogen exchanger-6. As a result, there are severe impairments in the patient's mental capacity, physical skills, and adaptive behavior. METHODS AND RESULTS Two Pakistani families (A and B) with autosomal recessive and X-linked intellectual disorders were clinically evaluated, and two novel disease-causing variants in the SLC13A3 gene (NM 022829.5) and the SLC9A6 gene (NM 001042537.2) were identified using whole exome sequencing. Family-A segregated a novel homozygous missense variant (c.1478 C > T; p. Pro493Leu) in the exon-11 of the SLC13A3 gene. At the same time, family-B segregated a novel missense variant (c.1342G > A; p.Gly448Arg) in the exon-10 of the SLC9A6 gene. By integrating computational approaches, our findings provided insights into the molecular mechanisms underlying the development of ID in individuals with SLC13A3 and SLC9A6 mutations. CONCLUSION We have utilized in-silico tools in the current study to examine the deleterious effects of the identified variants, which carry the potential to understand the genotype-phenotype relationships in neurodevelopmental disorders.
Collapse
Affiliation(s)
- Syeda Iqra Hussain
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Nazif Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Salah Ud Din Shah
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Fardous Fardous
- Department of Medical Lab Technology, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Sher Alam Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Niamatullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Adil U Rehman
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Mehwish Siddique
- Department of Zoology, Government Post Graduate College for Women, Satellite Town, Gujranwala, Pakistan
| | - Shoukat Ali Wasan
- Department of Botany, Faculty of Natural Sciences, Shah Abdul Latif University, Khairpur, Sindh, Pakistan
| | - Rooh Niaz
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Hafiz Ullah
- Gomal Center of Biochemistry and Biotechnology (GCBB), Gomal University D. I. Khan, D. I. Khan, Pakistan
| | - Niamat Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Noor Muhammad
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan
| | - Muhammad Usman Mirza
- Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON, N9B 1C4, Canada
| | - Naveed Wasif
- Institute of Human Genetics, Ulm University and Ulm University Medical Center, 89081, Ulm, Germany.
- Institute of Human Genetics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.
| | - Saadullah Khan
- Department of Biotechnology and Genetic Engineering, Kohat University of Science & Technology (KUST), Kohat, Khyber Pakhtunkhwa, Pakistan.
| |
Collapse
|