1
|
Yeganova L, Kim W, Tian S, Comeau DC, Wilbur WJ, Lu Z. LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery. Nucleic Acids Res 2025:gkaf417. [PMID: 40377097 DOI: 10.1093/nar/gkaf417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/24/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025] Open
Abstract
LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and paragraph levels. It provides unified access to 38 million PubMed abstracts and 6.6 million full-length articles in the PubMed Central (PMC) Open Access subset, encompassing 1.4 billion sentences and ∼300 million paragraphs, and is updated weekly. Compared to PubMed and PMC, the primary platforms for biomedical information search, LitSense offers cross-platform functionality by searching seamlessly across both PubMed and PMC and returning relevant results at a more granular level. Building on the success of the original LitSense launched in 2018, LitSense 2.0 introduces two major enhancements. The first is the addition of paragraph-level search: users can now choose to search either against sentences or against paragraphs. The second is improved retrieval accuracy via a state-of-the-art biomedical text encoder, ensuring more reliable identification of relevant results across the entire biomedical literature.
Collapse
Affiliation(s)
- Lana Yeganova
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Won Kim
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Shubo Tian
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Donald C Comeau
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - W John Wilbur
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Zhiyong Lu
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| |
Collapse
|
2
|
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer MB, Ai X, Lai PT, Wang Z, Keloth VK, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng WJ, Adelman RA, Lu Z, Xu H. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nat Commun 2025; 16:3280. [PMID: 40188094 PMCID: PMC11972378 DOI: 10.1038/s41467-025-56989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/07/2025] [Indexed: 04/07/2025] Open
Abstract
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs-GPT and LLaMA representatives-on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with the traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here, we show that traditional fine-tuning outperforms zero- or few-shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open-source LLMs still require fine-tuning to close performance gaps. We find issues like missing information and hallucinations in LLM outputs. These results offer practical insights for applying LLMs in BioNLP.
Collapse
Affiliation(s)
- Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Yan Hu
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Xueqing Peng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qianqian Xie
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qiao Jin
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aidan Gilson
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Maxwell B Singer
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Xuguang Ai
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Po-Ting Lai
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhizheng Wang
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vipina K Keloth
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kalpana Raja
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jimin Huang
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Huan He
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Fongci Lin
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jingcheng Du
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, Medical School, University of Minnesota, Minneapolis, MN, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - W Jim Zheng
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Ron A Adelman
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
3
|
Ma S, Jiang S, Yang O, Zhang X, Fu Y, Zhang Y, Kaareen A, Ling M, Chen J, Shang C. Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation. JMIR Form Res 2024; 8:e49031. [PMID: 38265858 PMCID: PMC10851114 DOI: 10.2196/49031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 12/06/2023] [Accepted: 12/29/2023] [Indexed: 01/25/2024] Open
Abstract
BACKGROUND From 2016 to 2021, the volume of peer-reviewed publications related to tobacco has experienced a significant increase. This presents a considerable challenge in efficiently summarizing, synthesizing, and disseminating research findings, especially when it comes to addressing specific target populations, such as the LGBTQ+ (lesbian, gay, bisexual, transgender, queer, intersex, asexual, Two Spirit, and other persons who identify as part of this community) populations. OBJECTIVE In order to expedite evidence synthesis and research gap discoveries, this pilot study has the following three aims: (1) to compile a specialized semantic database for tobacco policy research to extract information from journal article abstracts, (2) to develop natural language processing (NLP) algorithms that comprehend the literature on nicotine and tobacco product use among sexual and gender diverse populations, and (3) to compare the discoveries of the NLP algorithms with an ongoing systematic review of tobacco policy research among LGBTQ+ populations. METHODS We built a tobacco research domain-specific semantic database using data from 2993 paper abstracts from 4 leading tobacco-specific journals, with enrichment from other publicly available sources. We then trained an NLP model to extract named entities after learning patterns and relationships between words and their context in text, which further enriched the semantic database. Using this iterative process, we extracted and assessed studies relevant to LGBTQ+ tobacco control issues, further comparing our findings with an ongoing systematic review that also focuses on evidence synthesis for this demographic group. RESULTS In total, 33 studies were identified as relevant to sexual and gender diverse individuals' nicotine and tobacco product use. Consistent with the ongoing systematic review, the NLP results showed that there is a scarcity of studies assessing policy impact on this demographic using causal inference methods. In addition, the literature is dominated by US data. We found that the product drawing the most attention in the body of existing research is cigarettes or cigarette smoking and that the number of studies of various age groups is almost evenly distributed between youth or young adults and adults, consistent with the research needs identified by the US health agencies. CONCLUSIONS Our pilot study serves as a compelling demonstration of the capabilities of NLP tools in expediting the processes of evidence synthesis and the identification of research gaps. While future research is needed to statistically test the NLP tool's performance, there is potential for NLP tools to fundamentally transform the approach to evidence synthesis.
Collapse
Affiliation(s)
- Shaoying Ma
- Center for Tobacco Research, The Ohio State University Comprehensive Cancer Center, Columbus, OH, United States
| | - Shuning Jiang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Olivia Yang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Xuanzhi Zhang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Yu Fu
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Yusen Zhang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Aadeeba Kaareen
- Center for Tobacco Research, The Ohio State University Comprehensive Cancer Center, Columbus, OH, United States
| | - Meng Ling
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Jian Chen
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| | - Ce Shang
- Center for Tobacco Research, The Ohio State University Comprehensive Cancer Center, Columbus, OH, United States
| |
Collapse
|
4
|
Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art. PLoS One 2022; 17:e0276539. [PMID: 36409715 PMCID: PMC9678326 DOI: 10.1371/journal.pone.0276539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 10/08/2022] [Indexed: 11/22/2022] Open
Abstract
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.
Collapse
Affiliation(s)
- Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| |
Collapse
|
5
|
Chen Q, Du J, Allot A, Lu Z. LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2584-2595. [PMID: 35536809 PMCID: PMC9647722 DOI: 10.1109/tcbb.2022.3173562] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/19/2022] [Accepted: 04/22/2022] [Indexed: 05/20/2023]
Abstract
The rapid growth of biomedical literature poses a significant challenge for curation and interpretation. This has become more evident during the COVID-19 pandemic. LitCovid, a literature database of COVID-19 related papers in PubMed, has accumulated over 200,000 articles with millions of accesses. Approximately 10,000 new articles are added to LitCovid every month. A main curation task in LitCovid is topic annotation where an article is assigned with up to eight topics, e.g., Treatment and Diagnosis. The annotated topics have been widely used both in LitCovid (e.g., accounting for ∼18% of total uses) and downstream studies such as network generation. However, it has been a primary curation bottleneck due to the nature of the task and the rapid literature growth. This study proposes LITMC-BERT, a transformer-based multi-label classification method in biomedical literature. It uses a shared transformer backbone for all the labels while also captures label-specific features and the correlations between label pairs. We compare LITMC-BERT with three baseline models on two datasets. Its micro-F1 and instance-based F1 are 5% and 4% higher than the current best results, respectively, and only requires ∼18% of the inference time than the Binary BERT baseline. The related datasets and models are available via https://github.com/ncbi/ml-transformer.
Collapse
|
6
|
Chen Q, Allot A, Leaman R, Islamaj R, Du J, Fang L, Wang K, Xu S, Zhang Y, Bagherzadeh P, Bergler S, Bhatnagar A, Bhavsar N, Chang YC, Lin SJ, Tang W, Zhang H, Tavchioski I, Pollak S, Tian S, Zhang J, Otmakhova Y, Yepes AJ, Dong H, Wu H, Dufour R, Labrak Y, Chatterjee N, Tandon K, Laleye FAA, Rakotoson L, Chersoni E, Gu J, Friedrich A, Pujari SC, Chizhikova M, Sivadasan N, Vg S, Lu Z. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database (Oxford) 2022; 2022:baac069. [PMID: 36043400 PMCID: PMC9428574 DOI: 10.1093/database/baac069] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 08/02/2022] [Accepted: 08/13/2022] [Indexed: 05/03/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.
Collapse
Affiliation(s)
- Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Alexis Allot
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Robert Leaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Rezarta Islamaj
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| | - Jingcheng Du
- School of Biomedical Informatics, UT Health, TX, Houston 77030, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Shuo Xu
- College of Economics and Management, Beijing University of Technology, Beijing, QC, China
| | - Yuefu Zhang
- College of Economics and Management, Beijing University of Technology, Beijing, QC, China
| | | | | | | | | | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Sheng-Jie Lin
- Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
| | - Wentai Tang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Hongtong Zhang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, China
| | - Ilija Tavchioski
- Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Jožef Stefan Institute, Ljubljana, Slovenia
| | | | - Shubo Tian
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yulia Otmakhova
- School of Computing and Information Systems, University of Melbourne, Melbourne, AU-VIC, Australia
| | | | - Hang Dong
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, UK
| | | | | | - Niladri Chatterjee
- Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
| | - Kushagri Tandon
- Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
| | | | | | - Emmanuele Chersoni
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jinghang Gu
- Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
| | | | - Subhash Chandra Pujari
- Institute of Computer Science, Heidelberg University, Heidelberg, Germany
- Bosch Center for Artificial Intelligence, Renningen, Germany
| | - Mariia Chizhikova
- SINAI Group, Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Jaén, Spain
| | | | | | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
| |
Collapse
|