1
|
Golnari P, Prantzalos K, Hood V, Meskis MA, Isom LL, Wilcox K, Parent JM, Lal D, Lhatoo SD, Goodkin HP, Wirrell EC, Knupp KG, Patel M, Loeb JA, Sullivan JE, Harte-Hargrove L, Fureman BE, Buchhalter J, Sahoo SS. Ontology accelerates few-shot learning capability of large language model: A study in extraction of drug efficacy in a rare pediatric epilepsy. Int J Med Inform 2025; 201:105942. [PMID: 40311258 DOI: 10.1016/j.ijmedinf.2025.105942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 03/11/2025] [Accepted: 04/20/2025] [Indexed: 05/03/2025]
Abstract
OBJECTIVE Dravet Syndrome (DS) is a developmental and epileptic encephalopathy that is characterized by severe, prolonged motor seizures and high resistance to multiple antiseizure medications (ASMs) with multiple comorbidities. Evaluating the efficacy of new drugs in DS preclinical models and mapping them to human phenotypes of DS through analysis of published literature is an important goal for improving outcomes in this rare pediatric epilepsy. MATERIALS AND METHODS Large language models (LLM) have demonstrated great promise in parsing published literature; however, the performance of LLMs falls short in medical applications. In this study, we investigate the effectiveness of domain ontology developed by human experts to optimize LLMs for medical text processing in a rare disease. Utilizing a benchmark dataset that describes the efficacy of 17 ASMs tested in preclinical models and DS patients, we define a new ontology-augmented phased in-context learning (PCL) approach to process 4935 full-text DS articles. We expand this analysis to 7 new drugs that demonstrate efficacy in reducing seizures to identify gaps in current knowledge for designing new experimental studies for drug discovery in DS. RESULTS Few-shot or in-context learning is a foundational capability of LLMs and the few-shot learning capability of the Gemini 1.0 Pro version LLM dramatically increases when we augment prompts with the DS epilepsy ontology. The DS epilepsy ontology is the largest epilepsy and seizure ontology in clinical use that was developed by DS basic scientists and clinical neurologists. The ontology-augmented PCL prompt achieves 100% accuracy in reproducing the benchmark drug efficacy dataset for 17 ASMs with only two examples for in-context learning. CONCLUSION The new ontology-augmented PCL approach significantly accelerates the few-shot learning capabilities of the Gemini LLM, thereby reducing the number of required examples and time needed to optimize LLMs for medical applications.
Collapse
Affiliation(s)
- Pedram Golnari
- Department of Population and Quantitative Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Katrina Prantzalos
- Department of Population and Quantitative Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Veronica Hood
- Dravet Syndrome Foundation, Inc, PO Box 3026, Cherry Hill, NJ 08034, USA
| | - Mary Anne Meskis
- Dravet Syndrome Foundation, Inc, PO Box 3026, Cherry Hill, NJ 08034, USA
| | - Lori L Isom
- Department of Pharmacology, University of Michigan Medical School, Ann Arbor, MI 48109-5632, USA
| | - Karen Wilcox
- Department of Pharmacology and Toxicology, University of Utah, Salt Lake City, UT 84112, USA
| | - Jack M Parent
- Department of Neurology and Michigan Neuroscience Institute, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Dennis Lal
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Neurogenetics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T., Cambridge, MA 02142, USA
| | - Samden D Lhatoo
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Howard P Goodkin
- Department of Neurology and Pediatrics, University of Virginia, Charlottesville VA 22903, USA
| | - Elaine C Wirrell
- Divisions of Child & Adolescent Neurology and Epilepsy, Department of Neurology, Mayo Clinic, Rochester, MN 55905, USA
| | - Kelly G Knupp
- Department of Pediatrics and Neurology, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Manisha Patel
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus School of Medicine, Aurora, CO 80045, USA
| | - Jeffrey A Loeb
- Department of Neurology and Rehabilitation, University of Illinois Chicago, Chicago, IL 60612, USA
| | - Joseph E Sullivan
- Departments of Neurology and Pediatrics, Benioff Children's Hospital, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | - Brandy E Fureman
- Epilepsy Foundation, 3540 Crain Highway, Suite 675, Bowie, MD 20716, USA
| | - Jeffrey Buchhalter
- Department of Pediatrics, University of Calgary School of Medicine, Calgary AB T2N 4N1, Canada
| | - Satya S Sahoo
- Department of Population and Quantitative Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
2
|
Shin J, Wu J, Kim HJ, Xi W. Neighborhood-level social determinants of suicidality in youth with schizophrenia: An EHR-based study. Schizophr Res 2025; 281:74-81. [PMID: 40318312 DOI: 10.1016/j.schres.2025.04.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 04/18/2025] [Accepted: 04/29/2025] [Indexed: 05/07/2025]
Abstract
BACKGROUND Suicidal thoughts and behaviors (STB) among youth with schizophrenia represent a significant public health concern. It is well-established that neighborhood-level social determinants of health (SDoHs) can impact health outcomes in individuals with schizophrenia. We aimed to investigate the effects of neighborhood-level social determinants on developing future STB in youth with schizophrenia. METHODS We conducted a retrospective cohort study using electronic health records from the INSIGHT Clinical Research Network, which contains >22 million unique patients across five healthcare systems in New York City. Patients' neighborhood-level SDoHs were measured at their residential ZIP Code Tabulation Area using a composite measure, Social Deprivation Index (SDI), as well as specific components derived from the American Community Survey. Survival analysis was used to study the association between neighborhood-level SDoHs and time to STB since the first schizophrenia diagnosis. RESULTS Between 10/1/2015 and 10/1/2022, we identified 1209 youth aged between 10 and 25 years with a schizophrenia diagnosis and no prior STB, among whom 176 developed STB during follow-up. SDI quintiles were not associated with the risk of future STB, whereas two specific neighborhood characteristics, Gini index and percentage of residents commuting by car/truck/van, were associated with a decreased risk of STB, after controlling for patients' demographic characteristics. CONCLUSIONS Although the overall neighborhood deprivation level was not associated with the risk of STB among youth with schizophrenia, specific neighborhood characteristics were. These findings underscore the need for more targeted community-based suicide prevention strategies. Further research is essential to better understand the underlying mechanism of these associations.
Collapse
Affiliation(s)
- Jeonghyun Shin
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jialin Wu
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Hyun Jung Kim
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA; Division of Psychotic Disorders, McLean Hospital, Belmont, MA, USA
| | - Wenna Xi
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
3
|
Faviez C, Chen X, Garcelon N, Zaidan M, Billot K, Petzold F, Faour H, Douillet M, Rozet JM, Cormier-Daire V, Attié-Bitach T, Lyonnet S, Saunier S, Burgun A. Objectivizing issues in the diagnosis of complex rare diseases: lessons learned from testing existing diagnosis support systems on ciliopathies. BMC Med Inform Decis Mak 2024; 24:134. [PMID: 38789985 PMCID: PMC11127295 DOI: 10.1186/s12911-024-02538-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 05/17/2024] [Indexed: 05/26/2024] Open
Abstract
BACKGROUND There are approximately 8,000 different rare diseases that affect roughly 400 million people worldwide. Many of them suffer from delayed diagnosis. Ciliopathies are rare monogenic disorders characterized by a significant phenotypic and genetic heterogeneity that raises an important challenge for clinical diagnosis. Diagnosis support systems (DSS) applied to electronic health record (EHR) data may help identify undiagnosed patients, which is of paramount importance to improve patients' care. Our objective was to evaluate three online-accessible rare disease DSSs using phenotypes derived from EHRs for the diagnosis of ciliopathies. METHODS Two datasets of ciliopathy cases, either proven or suspected, and two datasets of controls were used to evaluate the DSSs. Patient phenotypes were automatically extracted from their EHRs and converted to Human Phenotype Ontology terms. We tested the ability of the DSSs to diagnose cases in contrast to controls based on Orphanet ontology. RESULTS A total of 79 cases and 38 controls were selected. Performances of the DSSs on ciliopathy real world data (best DSS with area under the ROC curve = 0.72) were not as good as published performances on the test set used in the DSS development phase. None of these systems obtained results which could be described as "expert-level". Patients with multisystemic symptoms were generally easier to diagnose than patients with isolated symptoms. Diseases easily confused with ciliopathy generally affected multiple organs and had overlapping phenotypes. Four challenges need to be considered to improve the performances: to make the DSSs interoperable with EHR systems, to validate the performances in real-life settings, to deal with data quality, and to leverage methods and resources for rare and complex diseases. CONCLUSION Our study provides insights into the complexities of diagnosing highly heterogenous rare diseases and offers lessons derived from evaluation existing DSSs in real-world settings. These insights are not only beneficial for ciliopathy diagnosis but also hold relevance for the enhancement of DSS for various complex rare disorders, by guiding the development of more clinically relevant rare disease DSSs, that could support early diagnosis and finally make more patients eligible for treatment.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France.
- HeKA, Inria Paris, Paris, F-75012, France.
- Universite Paris Cite, Paris, France.
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Mohamad Zaidan
- Service de Néphrologie, Dialyse et Transplantation, Hôpital Universitaire Bicêtre, Assistance Publique-Hôpitaux de Paris (AP-HP), Kremlin Bicêtre, F-94270, France
| | - Katy Billot
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Friederike Petzold
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Division of Nephrology, University of Leipzig Medical Center, Leipzig, Germany
| | - Hassan Faour
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Maxime Douillet
- Data Science Platform, Université Paris Cité, Imagine Institute, INSERM UMR 1163, Paris, F-75015, France
| | - Jean-Michel Rozet
- Laboratory of Genetics in Ophthalmology, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Valérie Cormier-Daire
- Reference Centre for Constitutional Bone Diseases, laboratory of Osteochondrodysplasia, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Tania Attié-Bitach
- Service d'Histologie-Embryologie-Cytogénétique, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| | - Stanislas Lyonnet
- Service de médecine génomique des maladies rares, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
- Laboratory of Embryology and Genetics of Congenital Malformations, INSERM UMR 1163, Imagine Institute, Paris Cité, Paris, F-75015, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, Imagine Institute, INSERM UMR 1163, Université Paris Cité, Paris, F-75015, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Sorbonne Université, INSERM, Université Paris Cité, Paris, F-75006, France
- HeKA, Inria Paris, Paris, F-75012, France
- Department of Medical Informatics, Hôpital Necker-Enfants Malades, AP-HP, Paris, F-75015, France
| |
Collapse
|
4
|
Faviez C, Vincent M, Garcelon N, Boyer O, Knebelmann B, Heidet L, Saunier S, Chen X, Burgun A. Performance and clinical utility of a new supervised machine-learning pipeline in detecting rare ciliopathy patients based on deep phenotyping from electronic health records and semantic similarity. Orphanet J Rare Dis 2024; 19:55. [PMID: 38336713 PMCID: PMC10858490 DOI: 10.1186/s13023-024-03063-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 02/03/2024] [Indexed: 02/12/2024] Open
Abstract
BACKGROUND Rare diseases affect approximately 400 million people worldwide. Many of them suffer from delayed diagnosis. Among them, NPHP1-related renal ciliopathies need to be diagnosed as early as possible as potential treatments have been recently investigated with promising results. Our objective was to develop a supervised machine learning pipeline for the detection of NPHP1 ciliopathy patients from a large number of nephrology patients using electronic health records (EHRs). METHODS AND RESULTS We designed a pipeline combining a phenotyping module re-using unstructured EHR data, a semantic similarity module to address the phenotype dependence, a feature selection step to deal with high dimensionality, an undersampling step to address the class imbalance, and a classification step with multiple train-test split for the small number of rare cases. The pipeline was applied to thirty NPHP1 patients and 7231 controls and achieved good performances (sensitivity 86% with specificity 90%). A qualitative review of the EHRs of 40 misclassified controls showed that 25% had phenotypes belonging to the ciliopathy spectrum, which demonstrates the ability of our system to detect patients with similar conditions. CONCLUSIONS Our pipeline reached very encouraging performance scores for pre-diagnosing ciliopathy patients. The identified patients could then undergo genetic testing. The same data-driven approach can be adapted to other rare diseases facing underdiagnosis challenges.
Collapse
Affiliation(s)
- Carole Faviez
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France.
- Inria, 75012, Paris, France.
| | - Marc Vincent
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Nicolas Garcelon
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Olivia Boyer
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Bertrand Knebelmann
- Nephrology and Transplantation Department, MARHEA, Hôpital Necker-Enfants Malades, AP-HP, Université Paris Cité, 75015, Paris, France
| | - Laurence Heidet
- Department of Pediatric Nephrology, APHP-Centre, Reference Center for Inherited Renal Diseases (MARHEA), Imagine Institute, Hôpital Necker-Enfants Malades, Université Paris Cité, 75015, Paris, France
| | - Sophie Saunier
- Laboratory of Renal Hereditary Diseases, INSERM UMR 1163, Imagine Institute, Université Paris Cité, 75015, Paris, France
| | - Xiaoyi Chen
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Université Paris Cité, Imagine Institute, Data Science Platform, INSERM UMR 1163, 75015, Paris, France
| | - Anita Burgun
- Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, INSERM UMR 1138, 75006, Paris, France
- Inria, 75012, Paris, France
- Département d'informatique Médicale, Hôpital Necker-Enfants Malades, AP-HP, 75015, Paris, France
| |
Collapse
|