1
|
Abeysinghe R, Tao S, Lhatoo SD, Zhang GQ, Cui L. Leveraging pretrained language models for seizure frequency extraction from epilepsy evaluation reports. NPJ Digit Med 2025; 8:208. [PMID: 40229513 PMCID: PMC11997153 DOI: 10.1038/s41746-025-01592-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 03/28/2025] [Indexed: 04/16/2025] Open
Abstract
Seizure frequency is essential for evaluating epilepsy treatment, ensuring patient safety, and reducing risk for Sudden Unexpected Death in Epilepsy. As this information is often described in clinical narratives, this study presents an approach to extracting structured seizure frequency details from such unstructured text. We investigated two tasks: (1) extracting phrases describing seizure frequency, and (2) extracting seizure frequency attributes. For both tasks, we fine-tuned three BERT-based models (bert-large-cased, biobert-large-cased, and Bio_ClinicalBERT), as well as three generative large language models (GPT-4, GPT-3.5 Turbo, and Llama-2-70b-hf). The final structured output integrated the results from both tasks. GPT-4 attained the best performance across all tasks with precision, recall, and F1-score of 86.61%, 85.04%, and 85.79% respectively for frequency phrase extraction; 90.23%, 93.51%, and 91.84% for seizure frequency attribute extraction; and 86.64%, 85.06%, and 85.82% for the final structured output. These findings highlight the potential of fine-tuned generative models in extractive tasks from limited text strings.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- Department of Neurology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Shiqiang Tao
- Department of Neurology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Samden D Lhatoo
- Department of Neurology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Guo-Qiang Zhang
- Department of Neurology, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, USA
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Licong Cui
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, USA.
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
2
|
Vakilna YS, Li X, Hampson JS, Huang Y, Mosher JC, Dabaghian Y, Luo X, Talavera B, Pati S, Masel T, Hays R, Szabo CA, Zhang GQ, Lhatoo SD. Reliable detection of generalized convulsive seizures using an off-the-shelf digital watch: A multisite phase 2 study. Epilepsia 2024; 65:2054-2068. [PMID: 38738972 PMCID: PMC11251850 DOI: 10.1111/epi.17974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/19/2024] [Accepted: 03/25/2024] [Indexed: 05/14/2024]
Abstract
OBJECTIVE The aim of this study was to develop a machine learning algorithm using an off-the-shelf digital watch, the Samsung watch (SM-R800), and evaluate its effectiveness for the detection of generalized convulsive seizures (GCS) in persons with epilepsy. METHODS This multisite epilepsy monitoring unit (EMU) phase 2 study included 36 adult patients. Each patient wore a Samsung watch that contained accelerometer, gyroscope, and photoplethysmographic sensors. Sixty-eight time and frequency domain features were extracted from the sensor data and were used to train a random forest algorithm. A testing framework was developed that would better reflect the EMU setting, consisting of (1) leave-one-patient-out cross-validation (LOPO CV) on GCS patients, (2) false alarm rate (FAR) testing on nonseizure patients, and (3) "fixed-and-frozen" prospective testing on a prospective patient cohort. Balanced accuracy, precision, sensitivity, and FAR were used to quantify the performance of the algorithm. Seizure onsets and offsets were determined by using video-electroencephalographic (EEG) monitoring. Feature importance was calculated as the mean decrease in Gini impurity during the LOPO CV testing. RESULTS LOPO CV results showed balanced accuracy of .93 (95% confidence interval [CI] = .8-.98), precision of .68 (95% CI = .46-.85), sensitivity of .87 (95% CI = .62-.96), and FAR of .21/24 h (interquartile range [IQR] = 0-.90). Testing the algorithm on patients without seizure resulted in an FAR of .28/24 h (IQR = 0-.61). During the "fixed-and-frozen" prospective testing, two patients had three GCS, which were detected by the algorithm, while generating an FAR of .25/24 h (IQR = 0-.89). Feature importance showed that heart rate-based features outperformed accelerometer/gyroscope-based features. SIGNIFICANCE Commercially available wearable digital watches that reliably detect GCS, with minimum false alarm rates, may overcome usage adoption and other limitations of custom-built devices. Contingent on the outcomes of a prospective phase 3 study, such devices have the potential to provide non-EEG-based seizure surveillance and forecasting in the clinical setting.
Collapse
Affiliation(s)
- Yash Shashank Vakilna
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Xiaojin Li
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Jaison S. Hampson
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Yan Huang
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - John C. Mosher
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Yuri Dabaghian
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Xi Luo
- The University of Texas Health Science Center at Houston, Department of Biostatistics and Data Science, Houston, Texas, USA
| | - Blanca Talavera
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Sandipan Pati
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Todd Masel
- The University of Texas Medical Branch, Department of Neurology, Galveston, Texas, USA
| | - Ryan Hays
- The University of Texas Southwestern Medical Center, Department of Neurology, Dallas, Texas, USA
| | - Charles Akos Szabo
- The University of Texas Health Science Center at San Antonio, Department of Neurology, Texas, USA
| | - Guo-Qiang Zhang
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| | - Samden D. Lhatoo
- The University of Texas Health Science Center at Houston, Department of Neurology, Houston, TX, USA
- Texas Institute of Restorative Neurotechnologies (TIRN), The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, USA
| |
Collapse
|
3
|
Zhang GQ, Li X, Huang Y, Cui L. Temporal Cohort Logic. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:1237-1246. [PMID: 37128360 PMCID: PMC10148298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
We introduce a new logic, called Temporal Cohort Logic (TCL), for cohort specification and discovery in clinical and population health research. TCL is created to fill a conceptual gap in formalizing temporal reasoning in biomedicine, in a similar role that temporal logics play for computer science and its applications. We provide formal syntax and semantics for TCL and illustrate the various logical constructs using examples related to human health. Relationships and distinctions with existing temporal logical frameworks are discussed. Applications in electronic health record (EHR) and in neurophysiological data resource are provided. Our approach differs from existing temporal logics, in that we explicitly capture Allen's interval algebra as modal operators in a language of temporal logic (rather than addressing it in the semantic structure). This has two major implications. First, it provides a formal logical framework for reasoning about time in biomedicine, allowing general (i.e., higher-levels of abstraction) investigation into the properties of this approach (such as proof systems, completeness, expressiveness, and decidability) independent of a specific query language or a database system. Second, it puts our approach in the context of logical developments in computer science, allowing potential translation of existing results into the setting of TCL and its variants or subsystems so as to illuminate opportunities and computational challenges involved in temporal reasoning for biomedicine.
Collapse
Affiliation(s)
- Guo-Qiang Zhang
- McGovern Medical School
- School of Biomedical Informatics
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Xiaojin Li
- McGovern Medical School
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Yan Huang
- McGovern Medical School
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| | - Licong Cui
- School of Biomedical Informatics
- Texas Institute for Restorative Neurotechnologies The University of Texas Health Science Center at Houston, Houston, Texas 77030, USA
| |
Collapse
|
4
|
Li X, Huang Y, Lhatoo SD, Tao S, Vilella Bertran L, Zhang GQ, Cui L. A hybrid unsupervised and supervised learning approach for postictal generalized EEG suppression detection. Front Neuroinform 2022; 16:1040084. [PMID: 36601382 PMCID: PMC9806125 DOI: 10.3389/fninf.2022.1040084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/07/2022] [Indexed: 12/23/2022] Open
Abstract
Sudden unexpected death of epilepsy (SUDEP) is a catastrophic and fatal complication of epilepsy and is the primary cause of mortality in those who have uncontrolled seizures. While several multifactorial processes have been implicated including cardiac, respiratory, autonomic dysfunction leading to arrhythmia, hypoxia, and cessation of cerebral and brainstem function, the mechanisms underlying SUDEP are not completely understood. Postictal generalized electroencephalogram (EEG) suppression (PGES) is a potential risk marker for SUDEP, as studies have shown that prolonged PGES was significantly associated with a higher risk of SUDEP. Automated PGES detection techniques have been developed to efficiently obtain PGES durations for SUDEP risk assessment. However, real-world data recorded in epilepsy monitoring units (EMUs) may contain high-amplitude signals due to physiological artifacts, such as breathing, muscle, and movement artifacts, making it difficult to determine the end of PGES. In this paper, we present a hybrid approach that combines the benefits of unsupervised and supervised learning for PGES detection using multi-channel EEG recordings. A K-means clustering model is leveraged to group EEG recordings with similar artifact features. We introduce a new learning strategy for training a set of random forest (RF) models based on clustering results to improve PGES detection performance. Our approach achieved a 5-second tolerance-based detection accuracy of 64.92%, a 10-second tolerance-based detection accuracy of 79.85%, and an average predicted time distance of 8.26 seconds with 286 EEG recordings using leave-one-out (LOO) cross-validation. The results demonstrated that our hybrid approach provided better performance compared to other existing approaches.
Collapse
Affiliation(s)
- Xiaojin Li
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Yan Huang
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Samden D. Lhatoo
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Shiqiang Tao
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Laura Vilella Bertran
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Guo-Qiang Zhang
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, United States,Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States,School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States,*Correspondence: Guo-Qiang Zhang
| | - Licong Cui
- Texas Institute for Restorative Neurotechnologies, The University of Texas Health Science Center at Houston, Houston, TX, United States,School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States,Licong Cui
| |
Collapse
|
5
|
Sahoo SS, Kobow K, Zhang J, Buchhalter J, Dayyani M, Upadhyaya DP, Prantzalos K, Bhattacharjee M, Blumcke I, Wiebe S, Lhatoo SD. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci Rep 2022; 12:19430. [PMID: 36371527 PMCID: PMC9653502 DOI: 10.1038/s41598-022-23101-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 10/25/2022] [Indexed: 11/13/2022] Open
Abstract
Biomedical ontologies are widely used to harmonize heterogeneous data and integrate large volumes of clinical data from multiple sources. This study analyzed the utility of ontologies beyond their traditional roles, that is, in addressing a challenging and currently underserved field of feature engineering in machine learning workflows. Machine learning workflows are being increasingly used to analyze medical records with heterogeneous phenotypic, genotypic, and related medical terms to improve patient care. We performed a retrospective study using neuropathology reports from the German Neuropathology Reference Center for Epilepsy Surgery at Erlangen, Germany. This cohort included 312 patients who underwent epilepsy surgery and were labeled with one or more diagnoses, including dual pathology, hippocampal sclerosis, malformation of cortical dysplasia, tumor, encephalitis, and gliosis. We modeled the diagnosis terms together with their microscopy, immunohistochemistry, anatomy, etiologies, and imaging findings using the description logic-based Web Ontology Language (OWL) in the Epilepsy and Seizure Ontology (EpSO). Three tree-based machine learning models were used to classify the neuropathology reports into one or more diagnosis classes with and without ontology-based feature engineering. We used five-fold cross validation to avoid overfitting with a fixed number of repetitions while leaving out one subset of data for testing, and we used recall, balanced accuracy, and hamming loss as performance metrics for the multi-label classification task. The epilepsy ontology-based feature engineering approach improved the performance of all the three learning models with an improvement of 35.7%, 54.5%, and 33.3% in logistics regression, random forest, and gradient tree boosting models respectively. The run time performance of all three models improved significantly with ontology-based feature engineering with gradient tree boosting model showing a 93.8% reduction in the time required for training and testing of the model. Although, all three models showed an overall improved performance across the three-performance metrics using ontology-based feature engineering, the rate of improvement was not consistent across all input features. To analyze this variation in performance, we computed feature importance scores and found that microscopy had the highest importance score across the three models, followed by imaging, immunohistochemistry, and anatomy in a decreasing order of importance scores. This study showed that ontologies have an important role in feature engineering to make heterogeneous clinical data accessible to machine learning models and also improve the performance of machine learning models in multilabel multiclass classification tasks.
Collapse
Affiliation(s)
- Satya S Sahoo
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Katja Kobow
- Institute of Neuropathology, Erlangen, Germany
| | - Jianzhe Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Jeffrey Buchhalter
- Department of Pediatrics, University of Calgary School of Medicine, Calgary, Canada
| | - Mojtaba Dayyani
- Department of Neurology, University of Texas Health Sciences Center, Texas, USA
| | - Dipak P Upadhyaya
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Katrina Prantzalos
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | | | | | - Samuel Wiebe
- Department of Pediatrics, University of Calgary School of Medicine, Calgary, Canada.
| | - Samden D Lhatoo
- Department of Neurology, University of Texas Health Sciences Center, Texas, USA.
| |
Collapse
|