1
|
Di Basilio D, King L, Lloyd S, Michael P, Shardlow M. Asking questions that are "close to the bone": integrating thematic analysis and natural language processing to explore the experiences of people with traumatic brain injuries engaging with patient-reported outcome measures. Front Digit Health 2024; 6:1387139. [PMID: 38983792 PMCID: PMC11231399 DOI: 10.3389/fdgth.2024.1387139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/13/2024] [Indexed: 07/11/2024] Open
Abstract
Introduction Patient-reported outcomes measures (PROMs) are valuable tools for assessing health-related quality of life and treatment effectiveness in individuals with traumatic brain injuries (TBIs). Understanding the experiences of individuals with TBIs in completing PROMs is crucial for improving their utility and relevance in clinical practice. Methods Sixteen semi-structured interviews were conducted with a sample of individuals with TBIs. The interviews were transcribed verbatim and analysed using Thematic Analysis (TA) and Natural Language Processing (NLP) techniques to identify themes and emotional connotations related to the experiences of completing PROMs. Results The TA of the data revealed six key themes regarding the experiences of individuals with TBIs in completing PROMs. Participants expressed varying levels of understanding and engagement with PROMs, with factors such as cognitive impairments and communication difficulties influencing their experiences. Additionally, insightful suggestions emerged on the barriers to the completion of PROMs, the factors facilitating it, and the suggestions for improving their contents and delivery methods. The sentiment analyses performed using NLP techniques allowed for the retrieval of the general sentimental and emotional "tones" in the participants' narratives of their experiences with PROMs, which were mainly characterised by low positive sentiment connotations. Although mostly neutral, participants' narratives also revealed the presence of emotions such as fear and, to a lesser extent, anger. The combination of a semantic and sentiment analysis of the experiences of people with TBIs rendered valuable information on the views and emotional responses to different aspects of the PROMs. Discussion The findings highlighted the complexities involved in administering PROMs to individuals with TBIs and underscored the need for tailored approaches to accommodate their unique challenges. Integrating TA-based and NLP techniques can offer valuable insights into the experiences of individuals with TBIs and enhance the interpretation of qualitative data in this population.
Collapse
Affiliation(s)
- Daniela Di Basilio
- Division of Health Research, School of Health and Medicine, Lancaster University, Lancaster, United Kingdom
| | - Lorraine King
- Department of Neuropsychology, North Staffordshire Combined Healthcare NHS Trust, Stoke-on-Trent, United Kingdom
| | - Sarah Lloyd
- Department of Psychology, Manchester Metropolitan University, Manchester, United Kingdom
| | - Panayiotis Michael
- Department of Psychology, Manchester Metropolitan University, Manchester, United Kingdom
| | - Matthew Shardlow
- Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, United Kingdom
| |
Collapse
|
2
|
Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024; 31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Liwei Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Huan He
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
| | - Andrew Wen
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Nansu Zong
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| | - Anamika Kumari
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
| | - Sicheng Zhou
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Rui Zhang
- Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
| | - Chenyu Li
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
| | - Jennifer St Sauver
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Sunghwan Sohn
- Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
| |
Collapse
|
3
|
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Lituiev D, Butte AJ. A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports. J Am Med Inform Assoc 2024:ocae146. [PMID: 38900207 DOI: 10.1093/jamia/ocae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open
Abstract
OBJECTIVE Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs could reduce the need for large-scale data annotations. MATERIALS AND METHODS We curated a dataset of 769 breast cancer pathology reports, manually labeled with 12 categories, to compare zero-shot classification capability of the following LLMs: GPT-4, GPT-3.5, Starling, and ClinicalCamel, with task-specific supervised classification performance of 3 models: random forests, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. RESULTS Across all 12 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, LSTM-Att (average macro F1-score of 0.86 vs 0.75), with advantage on tasks with high label imbalance. Other LLMs demonstrated poor performance. Frequent GPT-4 error categories included incorrect inferences from multiple samples and from history, and complex task design, and several LSTM-Att errors were related to poor generalization to the test set. DISCUSSION On tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of data labeling. However, if the use of LLMs is prohibitive, the use of simpler models with large annotated datasets can provide comparable results. CONCLUSIONS GPT-4 demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in clinical studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Divneet Mandair
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Zhiwei Zheng
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Ahmed Wali
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yan-Ning Yu
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Yuwei Quan
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Dmytro Lituiev
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA 94158, United States
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA 94607, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94158, United States
| |
Collapse
|
4
|
Assié G, Allassonnière S. Artificial Intelligence in Endocrinology: On Track Toward Great Opportunities. J Clin Endocrinol Metab 2024; 109:e1462-e1467. [PMID: 38466742 DOI: 10.1210/clinem/dgae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/13/2024] [Accepted: 03/08/2024] [Indexed: 03/13/2024]
Abstract
In endocrinology, the types and quantity of digital data are increasing rapidly. Computing capabilities are also developing at an incredible rate, as illustrated by the recent expansion in the use of popular generative artificial intelligence (AI) applications. Numerous diagnostic and therapeutic devices using AI have already entered routine endocrine practice, and developments in this field are expected to continue to accelerate. Endocrinologists will need to be supported in managing AI applications. Beyond technological training, interdisciplinary vision is needed to encompass the ethical and legal aspects of AI, to manage the profound impact of AI on patient/provider relationships, and to maintain an optimal balance between human input and AI in endocrinology.
Collapse
Affiliation(s)
- Guillaume Assié
- Université Paris Cité, CNRS UMR8104, INSERM U1016, Institut Cochin, F-75014 Paris, France
- Service d'endocrinologie, Center for Rare Adrenal Diseases, Assistance Publique-Hôpitaux de Paris, Hôpital Cochin, 75014 Paris, France
| | - Stéphanie Allassonnière
- Université Paris Cité, UFR Medecine, 75006 Paris, France
- HeKA INSERM, INRIA Paris, Centre de Recherche des Cordeliers Paris, Université Paris Cité, 75006 Paris, France
| |
Collapse
|
5
|
Fu S, Jia H, Vassilaki M, Keloth VK, Dang Y, Zhou Y, Garg M, Petersen RC, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. J Biomed Inform 2024; 152:104623. [PMID: 38458578 PMCID: PMC11005095 DOI: 10.1016/j.jbi.2024.104623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/12/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]
Abstract
INTRODUCTION Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.
Collapse
Affiliation(s)
- Sunyang Fu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | - Heling Jia
- Mayo Clinic, Rochester, MN, United States.
| | | | | | - Yifang Dang
- University of Texas Health Science Center, Houston, TX, United States.
| | - Yujia Zhou
- University of Texas Health Science Center, Houston, TX, United States.
| | | | | | | | | | - Liwei Wang
- Mayo Clinic, Rochester, MN, United States.
| | - Andrew Wen
- University of Texas Health Science Center, Houston, TX, United States.
| | - Fang Li
- University of Texas Health Science Center, Houston, TX, United States.
| | - Hua Xu
- Yale University, New Haven, CT, United States.
| | - Cui Tao
- University of Texas Health Science Center, Houston, TX, United States.
| | | | - Hongfang Liu
- Mayo Clinic, Rochester, MN, United States; University of Texas Health Science Center, Houston, TX, United States.
| | | |
Collapse
|
6
|
Grotenhuis Z, Mosteiro PJ, Leeuwenberg AM. Modest performance of text mining to extract health outcomes may be almost sufficient for high-quality prognostic model development. Comput Biol Med 2024; 170:108014. [PMID: 38301515 DOI: 10.1016/j.compbiomed.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/03/2024] [Accepted: 01/19/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Across medicine, prognostic models are used to estimate patient risk of certain future health outcomes (e.g., cardiovascular or mortality risk). To develop (or train) prognostic models, historic patient-level training data is needed containing both the predictive factors (i.e., features) and the relevant health outcomes (i.e., labels). Sometimes, when the health outcomes are not recorded in structured data, these are first extracted from textual notes using text mining techniques. Because there exist many studies utilizing text mining to obtain outcome data for prognostic model development, our aim is to study the impact of the text mining quality on downstream prognostic model performance. METHODS We conducted a simulation study charting the relationship between text mining quality and prognostic model performance using an illustrative case study about in-hospital mortality prediction in intensive care unit patients. We repeatedly developed and evaluated a prognostic model for in-hospital mortality, using outcome data extracted by multiple text mining models of varying quality. RESULTS Interestingly, we found in our case study that a relatively low-quality text mining model (F1 score ≈ 0.50) could already be used to train a prognostic model with quite good discrimination (area under the receiver operating characteristic curve of around 0.80). The calibration of the risks estimated by the prognostic model seemed unreliable across the majority of settings, even when text mining models were of relatively high quality (F1 ≈ 0.80). DISCUSSION Developing prognostic models on text-extracted outcomes using imperfect text mining models seems promising. However, it is likely that prognostic models developed using this approach may not produce well-calibrated risk estimates, and require recalibration in (possibly a smaller amount of) manually extracted outcome data.
Collapse
Affiliation(s)
- Zwierd Grotenhuis
- Department of Information and Computing Sciences, Utrecht University, The Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, The Netherlands
| | - Pablo J Mosteiro
- Department of Information and Computing Sciences, Utrecht University, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, The Netherlands.
| |
Collapse
|
7
|
Sushil M, Butte AJ, Schuit E, van Smeden M, Leeuwenberg AM. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration. J Clin Epidemiol 2024; 167:111258. [PMID: 38219811 DOI: 10.1016/j.jclinepi.2024.111258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/21/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]
Abstract
OBJECTIVES Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies. STUDY DESIGN AND SETTING In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration). RESULTS The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models. CONCLUSION Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
8
|
Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Butte AJ. A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification. RESEARCH SQUARE 2024:rs.3.rs-3914899. [PMID: 38405831 PMCID: PMC10889046 DOI: 10.21203/rs.3.rs-3914899/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Although supervised machine learning is popular for information extraction from clinical notes, creating large, annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs can reduce the need for large-scale data annotations. We curated a manually labeled dataset of 769 breast cancer pathology reports, labeled with 13 categories, to compare zero-shot classification capability of the GPT-4 model and the GPT-3.5 model with supervised classification performance of three model architectures: random forests classifier, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. Across all 13 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, the LSTM-Att model (average macro F1 score of 0.83 vs. 0.75). On tasks with a high imbalance between labels, the differences were more prominent. Frequent sources of GPT-4 errors included inferences from multiple samples and complex task design. On complex tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of large-scale data labeling. However, if the use of LLMs is prohibitive, the use of simpler supervised models with large annotated datasets can provide comparable results. LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets. This may increase the utilization of NLP-based variables and outcomes in observational clinical studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
| | - Divneet Mandair
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
| | | | | | | | | | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, USA
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA, USA
- Department of Pediatrics, University of California, San Francisco, CA, USA
| |
Collapse
|
9
|
Rijk MH, Platteel TN, Mulder MMM, Geersing GJ, Rutten FH, van Smeden M, Venekamp RP, Leeuwenberg TM. Incomplete and possibly selective recording of signs, symptoms, and measurements in free text fields of primary care electronic health records of adults with lower respiratory tract infections. J Clin Epidemiol 2024; 166:111240. [PMID: 38072176 DOI: 10.1016/j.jclinepi.2023.111240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/17/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
OBJECTIVES To assess the completeness of recording of relevant signs, symptoms, and measurements in Dutch free text fields of primary care electronic health records (EHR) of adults with lower respiratory tract infections (LRTI). STUDY DESIGN AND SETTING Retrospective cohort study embedded in a prediction modeling project using routine health care data of the Julius General Practitioners' Network of adult patients with LRTI. Free text fields of 1,000 primary care consultations of LRTI episodes between 2016 and 2019 were manually annotated to retrieve data on the recording of sixteen relevant signs, symptoms, and measurements. RESULTS For 12/16 (75%) of the relevant signs, symptoms, and measurements, more than 50% of the values was not recorded. The patterns of recorded values indicated selective recording of positive or abnormal values. Recording rates varied across consultation type (physical consultation vs. home visit), diagnosis (acute bronchitis vs. pneumonia), antibiotic prescription issued (yes vs. no), and between practices. CONCLUSION In EHR of primary care LRTI patients, recording of signs, symptoms, and measurements in free text fields is incomplete and possibly selective. When using free text data in EHR-based research, careful consideration of its recording patterns and appropriate missing data handling techniques is therefore required.
Collapse
Affiliation(s)
- Merijn H Rijk
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| | - Tamara N Platteel
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Marissa M M Mulder
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Geert-Jan Geersing
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Frans H Rutten
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Department of Epidemiology & Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Roderick P Venekamp
- Department of General Practice & Nursing Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Tuur M Leeuwenberg
- Department of Epidemiology & Health Economics, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
10
|
Wang L, He H, Wen A, Moon S, Fu S, Peterson KJ, Ai X, Liu S, Kavuluru R, Liu H. Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis. JMIR Med Inform 2023; 11:e48072. [PMID: 37368483 PMCID: PMC10337517 DOI: 10.2196/48072] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/25/2023] [Accepted: 06/01/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used. OBJECTIVE In this study, we aimed to construct an FH lexical resource for information extraction and normalization. METHODS We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation. RESULTS The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable. CONCLUSIONS The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Kevin J Peterson
- Center for Digital Health, Mayo Clinic, Rochester, MN, United States
| | - Xuguang Ai
- Department of Computer Science, University of Kentucky, Lexington, KY, United States
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, KY, United States
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|