Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fu S, Wang L, Moon S, Zong N, He H, Pejaver V, Relevo R, Walden A, Haendel M, Chute CG, Liu H. Recommended practices and ethical considerations for natural language processing-assisted observational research: A scoping review. Clin Transl Sci 2023;16:398-411. [PMID: 36478394 PMCID: PMC10014687 DOI: 10.1111/cts.13463] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/03/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open

For:	Fu S, Wang L, Moon S, Zong N, He H, Pejaver V, Relevo R, Walden A, Haendel M, Chute CG, Liu H. Recommended practices and ethical considerations for natural language processing-assisted observational research: A scoping review. Clin Transl Sci 2023;16:398-411. [PMID: 36478394 PMCID: PMC10014687 DOI: 10.1111/cts.13463] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/03/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open

Number

Cited by Other Article(s)

Gilad-Bachrach R, Obolski U. Guidance on reporting the use of natural language processing methods. Clin Microbiol Infect 2025;31:677-679. [PMID: 39725081 DOI: 10.1016/j.cmi.2024.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 12/08/2024] [Accepted: 12/18/2024] [Indexed: 12/28/2024]

Coleman BC, Corcoran KL, Brandt CA, Goulet JL, Luther SL, Lisi AJ. Identifying Patient-Reported Outcome Measure Documentation in Veterans Health Administration Chiropractic Clinic Notes: Natural Language Processing Analysis. JMIR Med Inform 2025;13:e66466. [PMID: 40173367 PMCID: PMC12038758 DOI: 10.2196/66466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 03/11/2025] [Accepted: 03/15/2025] [Indexed: 04/04/2025] Open

Abstract

Background

The use of patient-reported outcome measures (PROMs) is an expected component of high-quality, measurement-based chiropractic care. The largest health care system offering integrated chiropractic care is the Veterans Health Administration (VHA). Challenges limit monitoring PROM use as a care quality metric at a national scale in the VHA. Structured data are unavailable, with PROMs often embedded within clinic text notes as unstructured data requiring time-intensive, peer-conducted chart review for evaluation. Natural language processing (NLP) of clinic text notes is one promising solution to extracting care quality data from unstructured text.

Objective

This study aims to test NLP approaches to identify PROMs documented in VHA chiropractic text notes.

Methods

VHA chiropractic notes from October 1, 2017, to September 30, 2020, were obtained from the VHA Musculoskeletal Diagnosis/Complementary and Integrative Health Cohort. A rule-based NLP model built using medspaCy and spaCy was evaluated on text matching and note categorization tasks. SpaCy was used to build bag-of-words, convoluted neural networks, and ensemble models for note categorization. Performance metrics for each model and task included precision, recall, and F-measure. Cross-validation was used to validate performance metric estimates for the statistical and machine-learning models.

Results

Our sample included 377,213 visit notes from 56,628 patients. The rule-based model performance was good for soft-boundary text-matching (precision=81.1%, recall=96.7%, and F-measure=88.2%) and excellent for note categorization (precision=90.3%, recall=99.5%, and F-measure=94.7%). Cross-validation performance of the statistical and machine learning models for the note categorization task was very good overall, but lower than rule-based model performance. The overall prevalence of PROM documentation was low (17.0%).

Conclusions

We evaluated multiple NLP methods across a series of tasks, with optimal performance achieved using a rule-based method. By leveraging NLP approaches, we can overcome the challenges posed by unstructured clinical text notes to track documented PROM use. Overall documented use of PROMs in chiropractic notes was low and highlights a potential for quality improvement. This work represents a methodological advancement in the identification and monitoring of documented use of PROMs to ensure consistent, high-quality chiropractic care for veterans.

Collapse

Affiliation(s)

Brian C Coleman Pain Research, Informatics, Multimorbidities, and Education Center, VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT, 06516, United States, 1 2039325711 Department of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT, United States Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
Kelsey L Corcoran Pain Research, Informatics, Multimorbidities, and Education Center, VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT, 06516, United States, 1 2039325711 Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
Cynthia A Brandt Pain Research, Informatics, Multimorbidities, and Education Center, VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT, 06516, United States, 1 2039325711 Department of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT, United States Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States
Joseph L Goulet Pain Research, Informatics, Multimorbidities, and Education Center, VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT, 06516, United States, 1 2039325711 Department of Emergency Medicine, Yale School of Medicine, Yale University, New Haven, CT, United States
Stephen L Luther Center of Innovation for Complex Chronic Healthcare, Edward Hines, Jr. VA Hospital, Hines, IL, United States College of Public Health, University of South Florida, Tampa, FL, United States
Anthony J Lisi Pain Research, Informatics, Multimorbidities, and Education Center, VA Connecticut Healthcare System, 950 Campbell Ave, West Haven, CT, 06516, United States, 1 2039325711 Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, United States

Collapse

Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Lituiev D, Butte AJ. A comparative study of large language model-based zero-shot inference and task-specific supervised classification of breast cancer pathology reports. J Am Med Inform Assoc 2024;31:2315-2327. [PMID: 38900207 DOI: 10.1093/jamia/ocae146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/21/2024] Open

Eguia H, Sánchez-Bocanegra CL, Vinciarelli F, Alvarez-Lopez F, Saigí-Rubió F. Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review. J Med Internet Res 2024;26:e55315. [PMID: 39348889 PMCID: PMC11474138 DOI: 10.2196/55315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/20/2024] [Accepted: 07/24/2024] [Indexed: 10/02/2024] Open

Abstract

BACKGROUND

Ensuring access to accurate and verified information is essential for effective patient treatment and diagnosis. Although health workers rely on the internet for clinical data, there is a need for a more streamlined approach.

OBJECTIVE

This systematic review aims to assess the current state of artificial intelligence (AI) and natural language processing (NLP) techniques in health care to identify their potential use in electronic health records and automated information searches.

METHODS

A search was conducted in the PubMed, Embase, ScienceDirect, Scopus, and Web of Science online databases for articles published between January 2000 and April 2023. The only inclusion criteria were (1) original research articles and studies on the application of AI-based medical clinical decision support using NLP techniques and (2) publications in English. A Critical Appraisal Skills Programme tool was used to assess the quality of the studies.

RESULTS

The search yielded 707 articles, from which 26 studies were included (24 original articles and 2 systematic reviews). Of the evaluated articles, 21 (81%) explained the use of NLP as a source of data collection, 18 (69%) used electronic health records as a data source, and a further 8 (31%) were based on clinical data. Only 5 (19%) of the articles showed the use of combined strategies for NLP to obtain clinical data. In total, 16 (62%) articles presented stand-alone data review algorithms. Other studies (n=9, 35%) showed that the clinical decision support system alternative was also a way of displaying the information obtained for immediate clinical use.

CONCLUSIONS

The use of NLP engines can effectively improve clinical decision systems' accuracy, while biphasic tools combining AI algorithms and human criteria may optimize clinical diagnosis and treatment flows.

TRIAL REGISTRATION

PROSPERO CRD42022373386; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=373386.

Collapse

Di Basilio D, King L, Lloyd S, Michael P, Shardlow M. Asking questions that are "close to the bone": integrating thematic analysis and natural language processing to explore the experiences of people with traumatic brain injuries engaging with patient-reported outcome measures. Front Digit Health 2024;6:1387139. [PMID: 38983792 PMCID: PMC11231399 DOI: 10.3389/fdgth.2024.1387139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 05/13/2024] [Indexed: 07/11/2024] Open

Abstract

Introduction

Patient-reported outcomes measures (PROMs) are valuable tools for assessing health-related quality of life and treatment effectiveness in individuals with traumatic brain injuries (TBIs). Understanding the experiences of individuals with TBIs in completing PROMs is crucial for improving their utility and relevance in clinical practice.

Methods

Sixteen semi-structured interviews were conducted with a sample of individuals with TBIs. The interviews were transcribed verbatim and analysed using Thematic Analysis (TA) and Natural Language Processing (NLP) techniques to identify themes and emotional connotations related to the experiences of completing PROMs.

Results

The TA of the data revealed six key themes regarding the experiences of individuals with TBIs in completing PROMs. Participants expressed varying levels of understanding and engagement with PROMs, with factors such as cognitive impairments and communication difficulties influencing their experiences. Additionally, insightful suggestions emerged on the barriers to the completion of PROMs, the factors facilitating it, and the suggestions for improving their contents and delivery methods. The sentiment analyses performed using NLP techniques allowed for the retrieval of the general sentimental and emotional "tones" in the participants' narratives of their experiences with PROMs, which were mainly characterised by low positive sentiment connotations. Although mostly neutral, participants' narratives also revealed the presence of emotions such as fear and, to a lesser extent, anger. The combination of a semantic and sentiment analysis of the experiences of people with TBIs rendered valuable information on the views and emotional responses to different aspects of the PROMs.

Discussion

The findings highlighted the complexities involved in administering PROMs to individuals with TBIs and underscored the need for tailored approaches to accommodate their unique challenges. Integrating TA-based and NLP techniques can offer valuable insights into the experiences of individuals with TBIs and enhance the interpretation of qualitative data in this population.

Collapse

Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024;31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open

Abstract

BACKGROUND

Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process.

OBJECTIVES

This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks.

MATERIALS AND METHODS

We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator.

RESULTS

The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies.

CONCLUSION

The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.

Collapse

Affiliation(s)

Sunyang Fu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Liwei Wang Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Huan He Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
Andrew Wen Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Nansu Zong Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
Anamika Kumari Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Feifan Liu Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Sicheng Zhou Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Rui Zhang Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Chenyu Li Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Yanshan Wang Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Jennifer St Sauver Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
Hongfang Liu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Sunghwan Sohn Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States

Collapse

Assié G, Allassonnière S. Artificial Intelligence in Endocrinology: On Track Toward Great Opportunities. J Clin Endocrinol Metab 2024;109:e1462-e1467. [PMID: 38466742 DOI: 10.1210/clinem/dgae154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/13/2024] [Accepted: 03/08/2024] [Indexed: 03/13/2024]

Huo B, McKechnie T, Ortenzi M, Lee Y, Antoniou S, Mayol J, Ahmed H, Boudreau V, Ramji K, Eskicioglu C. Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. HEALTH AND TECHNOLOGY 2024;14:463-469. [DOI: 10.1007/s12553-024-00836-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/27/2024] [Indexed: 02/11/2025]

Fu S, Jia H, Vassilaki M, Keloth VK, Dang Y, Zhou Y, Garg M, Petersen RC, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. J Biomed Inform 2024;152:104623. [PMID: 38458578 PMCID: PMC11005095 DOI: 10.1016/j.jbi.2024.104623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/12/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]

Abstract

INTRODUCTION

Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions.

METHODS

FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs.

RESULTS

ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance.

CONCLUSION

NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.

Collapse

Sushil M, Kennedy VE, Mandair D, Miao BY, Zack T, Butte AJ. CORAL: Expert-Curated Oncology Reports to Advance Language Model Inference. NEJM AI 2024;1:10.1056/aidbp2300110. [PMID: 40255242 PMCID: PMC12007910 DOI: 10.1056/aidbp2300110] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/22/2025]

Abstract

BACKGROUND

Both medical care and observational studies in oncology require a thorough understanding of a patient's disease progression and treatment history, often elaborately documented within clinical notes. As large language models (LLMs) are being considered for use within medical workflows, it becomes important to evaluate their potential in oncology. However, no current information representation schema fully encapsulates the diversity of oncology information within clinical notes, and no comprehensively annotated oncology notes exist publicly, thereby limiting a thorough evaluation.

METHODS

We curated a new fine-grained, expert-labeled dataset of 40 deidentified breast and pancreatic cancer progress notes at the University of California, San Francisco, and assessed the abilities of three recent LLMs (GPT-4, GPT-3.5-turbo, and FLAN-UL2) in zero-shot extraction of detailed oncological information from two narrative sections of clinical progress notes. Model performance was quantified with BLEU-4, ROUGE-1, and exact-match (EM) F1 score metrics.

RESULTS

Our team annotated 9028 entities, 9986 modifiers, and 5312 relationships. The GPT-4 model exhibited overall best performance, with an average BLEU score of 0.73, an average ROUGE score of 0.72, an average EM F1 score of 0.51, and an average accuracy of 68% (expert manual evaluation on subset). Notably, GPT-4 was proficient in tumor characteristic and medication extraction and demonstrated superior performance in advanced reasoning tasks of inferring symptoms due to cancer and considerations of future medications. Common errors included partial responses with missing information and hallucinations with note-specific information.

CONCLUSIONS

By developing a comprehensive schema and benchmark of oncology-specific information in oncology notes, we uncovered both the strengths and the limitations of LLMs. Our evaluation showed variable zero-shot extraction capability among the GPT-3.5-turbo, GPT-4, and FLAN-UL2 models and highlighted a need for further improvements, particularly in complex medical reasoning, before performing reliable information extraction for clinical research and complex population management and documenting quality patient care. (Funded by the National Institute of Health, Food and Drug Administration and others.).

Collapse

Grotenhuis Z, Mosteiro PJ, Leeuwenberg AM. Modest performance of text mining to extract health outcomes may be almost sufficient for high-quality prognostic model development. Comput Biol Med 2024;170:108014. [PMID: 38301515 DOI: 10.1016/j.compbiomed.2024.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 01/03/2024] [Accepted: 01/19/2024] [Indexed: 02/03/2024]

Abstract

BACKGROUND

Across medicine, prognostic models are used to estimate patient risk of certain future health outcomes (e.g., cardiovascular or mortality risk). To develop (or train) prognostic models, historic patient-level training data is needed containing both the predictive factors (i.e., features) and the relevant health outcomes (i.e., labels). Sometimes, when the health outcomes are not recorded in structured data, these are first extracted from textual notes using text mining techniques. Because there exist many studies utilizing text mining to obtain outcome data for prognostic model development, our aim is to study the impact of the text mining quality on downstream prognostic model performance.

METHODS

We conducted a simulation study charting the relationship between text mining quality and prognostic model performance using an illustrative case study about in-hospital mortality prediction in intensive care unit patients. We repeatedly developed and evaluated a prognostic model for in-hospital mortality, using outcome data extracted by multiple text mining models of varying quality.

RESULTS

Interestingly, we found in our case study that a relatively low-quality text mining model (F1 score ≈ 0.50) could already be used to train a prognostic model with quite good discrimination (area under the receiver operating characteristic curve of around 0.80). The calibration of the risks estimated by the prognostic model seemed unreliable across the majority of settings, even when text mining models were of relatively high quality (F1 ≈ 0.80).

DISCUSSION

Developing prognostic models on text-extracted outcomes using imperfect text mining models seems promising. However, it is likely that prognostic models developed using this approach may not produce well-calibrated risk estimates, and require recalibration in (possibly a smaller amount of) manually extracted outcome data.

Collapse

Sushil M, Butte AJ, Schuit E, van Smeden M, Leeuwenberg AM. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration. J Clin Epidemiol 2024;167:111258. [PMID: 38219811 DOI: 10.1016/j.jclinepi.2024.111258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/21/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]

Abstract

OBJECTIVES

Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies.

STUDY DESIGN AND SETTING

In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration).

RESULTS

The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models.

CONCLUSION

Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.

Collapse

Sushil M, Zack T, Mandair D, Zheng Z, Wali A, Yu YN, Quan Y, Butte AJ. A comparative study of zero-shot inference with large language models and supervised modeling in breast cancer pathology classification. RESEARCH SQUARE 2024:rs.3.rs-3914899. [PMID: 38405831 PMCID: PMC10889046 DOI: 10.21203/rs.3.rs-3914899/v1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Rijk MH, Platteel TN, Mulder MMM, Geersing GJ, Rutten FH, van Smeden M, Venekamp RP, Leeuwenberg TM. Incomplete and possibly selective recording of signs, symptoms, and measurements in free text fields of primary care electronic health records of adults with lower respiratory tract infections. J Clin Epidemiol 2024;166:111240. [PMID: 38072176 DOI: 10.1016/j.jclinepi.2023.111240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/17/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]

Wang L, He H, Wen A, Moon S, Fu S, Peterson KJ, Ai X, Liu S, Kavuluru R, Liu H. Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis. JMIR Med Inform 2023;11:e48072. [PMID: 37368483 PMCID: PMC10337517 DOI: 10.2196/48072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/25/2023] [Accepted: 06/01/2023] [Indexed: 06/28/2023] Open

Abstract

BACKGROUND

A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used.

OBJECTIVE

In this study, we aimed to construct an FH lexical resource for information extraction and normalization.

METHODS

We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation.

RESULTS

The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable.

CONCLUSIONS

The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

Collapse