1
|
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer MB, Ai X, Lai PT, Wang Z, Keloth VK, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng WJ, Adelman RA, Lu Z, Xu H. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nat Commun 2025; 16:3280. [PMID: 40188094 PMCID: PMC11972378 DOI: 10.1038/s41467-025-56989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/07/2025] [Indexed: 04/07/2025] Open
Abstract
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs-GPT and LLaMA representatives-on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with the traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here, we show that traditional fine-tuning outperforms zero- or few-shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open-source LLMs still require fine-tuning to close performance gaps. We find issues like missing information and hallucinations in LLM outputs. These results offer practical insights for applying LLMs in BioNLP.
Collapse
Affiliation(s)
- Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Yan Hu
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Xueqing Peng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qianqian Xie
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qiao Jin
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aidan Gilson
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Maxwell B Singer
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Xuguang Ai
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Po-Ting Lai
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhizheng Wang
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vipina K Keloth
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kalpana Raja
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jimin Huang
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Huan He
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Fongci Lin
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jingcheng Du
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, Medical School, University of Minnesota, Minneapolis, MN, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - W Jim Zheng
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Ron A Adelman
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
2
|
Ahmad I, Amelio A, Merla A, Scozzari F. A survey on the role of artificial intelligence in managing Long COVID. Front Artif Intell 2024; 6:1292466. [PMID: 38274052 PMCID: PMC10808521 DOI: 10.3389/frai.2023.1292466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
In the last years, several techniques of artificial intelligence have been applied to data from COVID-19. In addition to the symptoms related to COVID-19, many individuals with SARS-CoV-2 infection have described various long-lasting symptoms, now termed Long COVID. In this context, artificial intelligence techniques have been utilized to analyze data from Long COVID patients in order to assist doctors and alleviate the considerable strain on care and rehabilitation facilities. In this paper, we explore the impact of the machine learning methodologies that have been applied to analyze the many aspects of Long COVID syndrome, from clinical presentation through diagnosis. We also include the text mining techniques used to extract insights and trends from large amounts of text data related to Long COVID. Finally, we critically compare the various approaches and outline the work that has to be done to create a robust artificial intelligence approach for efficient diagnosis and treatment of Long COVID.
Collapse
Affiliation(s)
- Ijaz Ahmad
- Department of Human, Legal and Economic Sciences, Telematic University “Leonardo da Vinci”, Chieti, Italy
| | - Alessia Amelio
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Francesca Scozzari
- Laboratory of Computational Logic and Artificial Intelligence, Department of Economic Studies, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| |
Collapse
|
3
|
Somayajula SA, Litake O, Liang Y, Hosseini R, Nemati S, Wilson DO, Weinreb RN, Malhotra A, Xie P. Improving long COVID-related text classification: a novel end-to-end domain-adaptive paraphrasing framework. Sci Rep 2024; 14:85. [PMID: 38168099 PMCID: PMC10761882 DOI: 10.1038/s41598-023-48594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 11/28/2023] [Indexed: 01/05/2024] Open
Abstract
The emergence of long COVID during the ongoing COVID-19 pandemic has presented considerable challenges for healthcare professionals and researchers. The task of identifying relevant literature is particularly daunting due to the rapidly evolving scientific landscape, inconsistent definitions, and a lack of standardized nomenclature. This paper proposes a novel solution to this challenge by employing machine learning techniques to classify long COVID literature. However, the scarcity of annotated data for machine learning poses a significant obstacle. To overcome this, we introduce a strategy called medical paraphrasing, which diversifies the training data while maintaining the original content. Additionally, we propose a Data-Reweighting-Based Multi-Level Optimization Framework for Domain Adaptive Paraphrasing, supported by a Meta-Weight-Network (MWN). This innovative approach incorporates feedback from the downstream text classification model to influence the training of the paraphrasing model. During the training process, the framework assigns higher weights to the training examples that contribute more effectively to the downstream task of long COVID text classification. Our findings demonstrate that this method substantially improves the accuracy and efficiency of long COVID literature classification, offering a valuable tool for physicians and researchers navigating this complex and ever-evolving field.
Collapse
Affiliation(s)
- Sai Ashish Somayajula
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Onkar Litake
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Youwei Liang
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Ramtin Hosseini
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA
| | - Shamim Nemati
- Division of Biomedical Informatics, University of California, La Jolla, San Diego, USA
| | - David O Wilson
- Department of Medicine, University of Pittsburgh Medical Center, Pittsburgh, USA
| | - Robert N Weinreb
- Hamilton Glaucoma Center, Shiley Eye Center and Department of Ophthalmology, University of California, La Jolla, San Diego, USA
| | - Atul Malhotra
- UC San Diego Health, Department of Medicine, La Jolla, San Diego, USA
| | - Pengtao Xie
- Department of Electrical and Computer Engineering, University of California, La Jolla, San Diego, USA.
| |
Collapse
|
4
|
Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertová L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023; 2023:7071696. [PMID: 36882099 PMCID: PMC9991492 DOI: 10.1093/database/baad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 01/06/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023]
Abstract
The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.
Collapse
Affiliation(s)
| | | | - Virginia Adams
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mohammed A Alliheedi
- Department of Computer Science, Al Baha University, 4781 King Fahd Rd, Al Aqiq 65779, Saudi Arabia
| | - João Rafael Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
- Department of Information and Communications Technologies, University of A Coruña, Camiño do Lagar de Castro, A Coruña 15008, Spain
| | - Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Robert Bevan
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Da’an District, Taipei City , Taipei 106, Taiwan
| | - Arslan Erdengasileng
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Matthew Hodgskiss
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Ryuki Ida
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Hyunjae Kim
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Keqiao Li
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Robert E Mercer
- Department of Computer Science, The University of Western Ontario, Room 355, Middlesex College, Ontario , London N6A 5B7, Canada
| | - Lukrécia Mertová
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
| | - Ghadeer Mobasher
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Institute of Computer Science, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Hoo-Chang Shin
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mujeen Sung
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Tomoki Tsujimura
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Wen-Chao Yeh
- Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan
| | - Zhiyong Lu
- *Corresponding author: Tel: +1-301-594-7089; Fax: +1-301-480-2290;
| |
Collapse
|