1
|
Kafkas Ş, Abdelhakim M, Althagafi A, Toonsi S, Alghamdi M, Schofield PN, Hoehndorf R. The application of Large Language Models to the phenotype-based prioritization of causative genes in rare disease patients. Sci Rep 2025; 15:15093. [PMID: 40301638 PMCID: PMC12041562 DOI: 10.1038/s41598-025-99539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 04/21/2025] [Indexed: 05/01/2025] Open
Abstract
Computational methods for identifying gene-disease associations can use both genomic and phenotypic information to prioritize genes and variants that may be associated with genetic diseases. Phenotype-based methods commonly rely on comparing phenotypes observed in a patient with databases of genotype-to-phenotype associations using measures of semantic similarity. They are constrained by the quality and completeness of these resources as well as the quality and completeness of patient phenotype annotation. Genotype-to-phenotype associations used by these methods are largely derived from the literature and coded using phenotype ontologies. Large Language Models (LLMs) have been trained on large amounts of text and data and have shown their potential to answer complex questions across multiple domains. Here, we evaluate the effectiveness of LLMs in prioritizing disease-associated genes compared to existing bioinformatics methods. We show that LLMs can prioritize disease-associated genes as well, or better than, dedicated bioinformatics methods relying on pre-defined phenotype similarity, when gene sets range from 5 to 100 candidates. We apply our approach to a cohort of undiagnosed patients with rare diseases and show that LLMs can be used to provide diagnostic support that helps in identifying plausible candidate genes. Our results show that LLMs may offer an alternative to traditional bioinformatics methods to prioritize disease-associated genes based on disease phenotypes. They may, therefore, potentially enhance diagnostic accuracy and simplify the process for rare genetic diseases.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, 26571, Taif, Saudi Arabia
| | - Sumyyah Toonsi
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
- KAUST Center of Excellence for Generative AI, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia
| | - Malak Alghamdi
- Medical Genetic Division, Department of Pediatrics, College of Medicine, King Saud University, 11461, Riyadh, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- KAUST Center of Excellence for Smart Health (KCSH), King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
- KAUST Center of Excellence for Generative AI, King Abdullah University of Science and Technology, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
2
|
Dingemans AJM, Jansen S, van Reeuwijk J, de Leeuw N, Pfundt R, Schuurs-Hoeijmakers J, van Bon BW, Marcelis C, Ockeloen CW, Willemsen M, van der Sluijs PJ, Santen GWE, Kooy RF, Vulto-van Silfhout AT, Kleefstra T, Koolen DA, Vissers LELM, de Vries BBA. Prevalence of comorbidities in individuals with neurodevelopmental disorders from the aggregated phenomics data of 51,227 pediatric individuals. Nat Med 2024; 30:1994-2003. [PMID: 38745008 DOI: 10.1038/s41591-024-03005-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
The prevalence of comorbidities in individuals with neurodevelopmental disorders (NDDs) is not well understood, yet these are important for accurate diagnosis and prognosis in routine care and for characterizing the clinical spectrum of NDD syndromes. We thus developed PhenomAD-NDD, an aggregated database containing the comorbid phenotypic data of 51,227 individuals with NDD, all harmonized into Human Phenotype Ontology (HPO), with in total 3,054 unique HPO terms. We demonstrate that almost all congenital anomalies are more prevalent in the NDD population than in the general population, and the NDD baseline prevalence allows for an approximation of the enrichment of symptoms. For example, such analyses of 33 genetic NDDs show that 32% of enriched phenotypes are currently not reported in the clinical synopsis in the Online Mendelian Inheritance in Man (OMIM). PhenomAD-NDD is open to all via a visualization online tool and allows us to determine the enrichment of symptoms in NDD.
Collapse
Affiliation(s)
- Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Sandra Jansen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Jeroen van Reeuwijk
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Nicole de Leeuw
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Janneke Schuurs-Hoeijmakers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Bregje W van Bon
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Carlo Marcelis
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Charlotte W Ockeloen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Marjolein Willemsen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | - Gijs W E Santen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - R Frank Kooy
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Anneke T Vulto-van Silfhout
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Tjitske Kleefstra
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands.
| |
Collapse
|
3
|
Kafkas Ș, Abdelhakim M, Uludag M, Althagafi A, Alghamdi M, Hoehndorf R. Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes. BMC Bioinformatics 2023; 24:294. [PMID: 37479972 PMCID: PMC10362560 DOI: 10.1186/s12859-023-05406-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 07/10/2023] [Indexed: 07/23/2023] Open
Abstract
BACKGROUND Identifying variants associated with diseases is a challenging task in medical genetics research. Current studies that prioritize variants within individual genomes generally rely on known variants, evidence from literature and genomes, and patient symptoms and clinical signs. The functionalities of the existing tools, which rank variants based on given patient symptoms and clinical signs, are restricted to the coverage of ontologies such as the Human Phenotype Ontology (HPO). However, most clinicians do not limit themselves to HPO while describing patient symptoms/signs and their associated variants/genes. There is thus a need for an automated tool that can prioritize variants based on freely expressed patient symptoms and clinical signs. RESULTS STARVar is a Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes. STARVar uses patient symptoms and clinical signs, either linked to HPO or expressed in free text format. It returns a ranked list of variants based on a combined score from two classifiers utilizing evidence from genomics and literature. STARVar improves over related tools on a set of synthetic patients. In addition, we demonstrated its distinct contribution to the domain on another synthetic dataset covering publicly available clinical genotype-phenotype associations by using symptoms and clinical signs expressed in free text format. CONCLUSIONS STARVar stands as a unique and efficient tool that has the advantage of ranking variants with flexibly expressed patient symptoms in free-form text. Therefore, STARVar can be easily integrated into bioinformatics workflows designed to analyze disease-associated genomes. AVAILABILITY STARVar is freely available from https://github.com/bio-ontology-research-group/STARVar .
Collapse
Affiliation(s)
- Șenay Kafkas
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Marwa Abdelhakim
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Mahmut Uludag
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, 21655 Taif, Saudi Arabia
| | - Malak Alghamdi
- Medical Genetic Division, Department of Pediatrics, College of Medicine, King Saud University, 2925 Riyadh, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
4
|
Truong TTT, Panizzutti B, Kim JH, Walder K. Repurposing Drugs via Network Analysis: Opportunities for Psychiatric Disorders. Pharmaceutics 2022; 14:1464. [PMID: 35890359 PMCID: PMC9319329 DOI: 10.3390/pharmaceutics14071464] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 02/04/2023] Open
Abstract
Despite advances in pharmacology and neuroscience, the path to new medications for psychiatric disorders largely remains stagnated. Drug repurposing offers a more efficient pathway compared with de novo drug discovery with lower cost and less risk. Various computational approaches have been applied to mine the vast amount of biomedical data generated over recent decades. Among these methods, network-based drug repurposing stands out as a potent tool for the comprehension of multiple domains of knowledge considering the interactions or associations of various factors. Aligned well with the poly-pharmacology paradigm shift in drug discovery, network-based approaches offer great opportunities to discover repurposing candidates for complex psychiatric disorders. In this review, we present the potential of network-based drug repurposing in psychiatry focusing on the incentives for using network-centric repurposing, major network-based repurposing strategies and data resources, applications in psychiatry and challenges of network-based drug repurposing. This review aims to provide readers with an update on network-based drug repurposing in psychiatry. We expect the repurposing approach to become a pivotal tool in the coming years to battle debilitating psychiatric disorders.
Collapse
Affiliation(s)
- Trang T. T. Truong
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| | - Bruna Panizzutti
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| | - Jee Hyun Kim
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
- Mental Health Theme, The Florey Institute of Neuroscience and Mental Health, Parkville 3010, Australia
| | - Ken Walder
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| |
Collapse
|