1
|
Yeganova L, Kim W, Tian S, Comeau DC, Wilbur WJ, Lu Z. LitSense 2.0: AI-powered biomedical information retrieval with sentence and passage level knowledge discovery. Nucleic Acids Res 2025:gkaf417. [PMID: 40377097 DOI: 10.1093/nar/gkaf417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/24/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025] Open
Abstract
LitSense 2.0 (https://www.ncbi.nlm.nih.gov/research/litsense2/) is an advanced biomedical search system enhanced with dense vector semantic retrieval, designed for accessing literature on sentence and paragraph levels. It provides unified access to 38 million PubMed abstracts and 6.6 million full-length articles in the PubMed Central (PMC) Open Access subset, encompassing 1.4 billion sentences and ∼300 million paragraphs, and is updated weekly. Compared to PubMed and PMC, the primary platforms for biomedical information search, LitSense offers cross-platform functionality by searching seamlessly across both PubMed and PMC and returning relevant results at a more granular level. Building on the success of the original LitSense launched in 2018, LitSense 2.0 introduces two major enhancements. The first is the addition of paragraph-level search: users can now choose to search either against sentences or against paragraphs. The second is improved retrieval accuracy via a state-of-the-art biomedical text encoder, ensuring more reliable identification of relevant results across the entire biomedical literature.
Collapse
Affiliation(s)
- Lana Yeganova
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Won Kim
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Shubo Tian
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Donald C Comeau
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - W John Wilbur
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| | - Zhiyong Lu
- Division of Intramural Research (DIR), National Library of Medicine (NLM), National Institutes of Health (NIH), MD 20894 Bethesda, United States
| |
Collapse
|
2
|
Bonfigli A, Bacco L, Pecchia L, Merone M, Dell'Orletta F. Efficient multi-task learning with instance selection for biomedical NLP. Comput Biol Med 2025; 190:110050. [PMID: 40168806 DOI: 10.1016/j.compbiomed.2025.110050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 03/14/2025] [Accepted: 03/17/2025] [Indexed: 04/03/2025]
Abstract
BACKGROUND Biomedical natural language processing (NLP) increasingly relies on large language models and extensive datasets, presenting significant computational challenges. METHODS We propose Blue5, a multi-task model based on SciFive that incorporates instance selection (IS) to enable efficient, multi-task learning (MTL) on biomedical data. We adapt the E2SC-IS framework for the biomedical domain, integrating a calibrated SVM classifier to reduce computational costs. RESULTS Our approach achieves an average data reduction of 26.6% across the several tasks of the BLUE (Biomedical Language Understanding Evaluation) Benchmark, while maintaining performance comparable with state-of-the-art models. The multi-task SVM configuration emerges as the most effective, demonstrating the power of combining IS with MTL for biomedical NLP. As a result of the unified framework, Blue5 effectively selects the most informative instances across tasks, ensuring model generalization while efficiently handling multiple NLP tasks. CONCLUSION Our work offers a practical solution to address growing computational demands, enabling more scalable and accessible applications of advanced NLP techniques in biomedical research and healthcare.
Collapse
Affiliation(s)
- Agnese Bonfigli
- ItaliaNLP Lab, Institute of Computational Linguistics "Antonio Zampolli", National Research Council, Via Giuseppe Moruzzi, 1, Pisa, 56124, Italy; Research Unit of Intelligent Technology for Health and Wellbeing, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, Rome, 00128, Italy
| | - Luca Bacco
- ItaliaNLP Lab, Institute of Computational Linguistics "Antonio Zampolli", National Research Council, Via Giuseppe Moruzzi, 1, Pisa, 56124, Italy; Research Unit of Computer Systems and Bioinformatics, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, Rome, 00128, Italy.
| | - Leandro Pecchia
- Research Unit of Intelligent Technology for Health and Wellbeing, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, Rome, 00128, Italy; Fondazione Policlinico Universitario Campus Bio-Medico di Roma, Via Alvaro del Portillo, 200, Rome, 00128, Italy
| | - Mario Merone
- Research Unit of Intelligent Technology for Health and Wellbeing, Department of Engineering, Università Campus Bio-Medico di Roma, Via Alvaro del Portillo, 21, Rome, 00128, Italy
| | - Felice Dell'Orletta
- ItaliaNLP Lab, Institute of Computational Linguistics "Antonio Zampolli", National Research Council, Via Giuseppe Moruzzi, 1, Pisa, 56124, Italy
| |
Collapse
|
3
|
Neveditsin N, Lingras P, Mago V. Clinical insights: A comprehensive review of language models in medicine. PLOS DIGITAL HEALTH 2025; 4:e0000800. [PMID: 40338967 PMCID: PMC12061104 DOI: 10.1371/journal.pdig.0000800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 02/25/2025] [Indexed: 05/10/2025]
Abstract
This paper explores the advancements and applications of language models in healthcare, focusing on their clinical use cases. It examines the evolution from early encoder-based systems requiring extensive fine-tuning to state-of-the-art large language and multimodal models capable of integrating text and visual data through in-context learning. The analysis emphasizes locally deployable models, which enhance data privacy and operational autonomy, and their applications in tasks such as text generation, classification, information extraction, and conversational systems. The paper also highlights a structured organization of tasks and a tiered ethical approach, providing a valuable resource for researchers and practitioners, while discussing key challenges related to ethics, evaluation, and implementation.
Collapse
Affiliation(s)
- Nikita Neveditsin
- Department of Mathematics and Computing Science, Saint Mary’s University, Halifax, Nova Scotia, Canada
| | - Pawan Lingras
- Department of Mathematics and Computing Science, Saint Mary’s University, Halifax, Nova Scotia, Canada
| | - Vijay Mago
- School of Health Policy and Management, York University, Toronto, Ontario, Canada
| |
Collapse
|
4
|
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer MB, Ai X, Lai PT, Wang Z, Keloth VK, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng WJ, Adelman RA, Lu Z, Xu H. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nat Commun 2025; 16:3280. [PMID: 40188094 PMCID: PMC11972378 DOI: 10.1038/s41467-025-56989-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/07/2025] [Indexed: 04/07/2025] Open
Abstract
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs-GPT and LLaMA representatives-on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with the traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here, we show that traditional fine-tuning outperforms zero- or few-shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open-source LLMs still require fine-tuning to close performance gaps. We find issues like missing information and hallucinations in LLM outputs. These results offer practical insights for applying LLMs in BioNLP.
Collapse
Affiliation(s)
- Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Yan Hu
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Xueqing Peng
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qianqian Xie
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Qiao Jin
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Aidan Gilson
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Maxwell B Singer
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Xuguang Ai
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Po-Ting Lai
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Zhizheng Wang
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Vipina K Keloth
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Kalpana Raja
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jimin Huang
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Huan He
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Fongci Lin
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Jingcheng Du
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Rui Zhang
- Division of Computational Health Sciences, Department of Surgery, Medical School, University of Minnesota, Minneapolis, MN, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - W Jim Zheng
- McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX, USA
| | - Ron A Adelman
- Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, CT, USA
| | - Zhiyong Lu
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
| |
Collapse
|
5
|
Remy F, Demuynck K, Demeester T. BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights. J Am Med Inform Assoc 2024; 31:1844-1855. [PMID: 38412333 PMCID: PMC11339519 DOI: 10.1093/jamia/ocae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 02/02/2024] [Indexed: 02/29/2024] Open
Abstract
OBJECTIVE In this study, we investigate the potential of large language models (LLMs) to complement biomedical knowledge graphs in the training of semantic models for the biomedical and clinical domains. MATERIALS AND METHODS Drawing on the wealth of the Unified Medical Language System knowledge graph and harnessing cutting-edge LLMs, we propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences, consisting of 3 steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. RESULTS Through rigorous evaluations of diverse downstream tasks, we demonstrate consistent and substantial improvements over the previous state of the art for semantic textual similarity (STS), biomedical concept representation (BCR), and clinically named entity linking, across 15+ datasets. Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages and finetuned on 7 European languages. DISCUSSION Many clinical pipelines can benefit from our latest models. Our new multilingual model enables a range of languages to benefit from our advancements in biomedical semantic representation learning, opening a new avenue for bioinformatics researchers around the world. As a result, we hope to see BioLORD-2023 becoming a precious tool for future biomedical applications. CONCLUSION In this article, we introduced BioLORD-2023, a state-of-the-art model for STS and BCR designed for the clinical domain.
Collapse
Affiliation(s)
- François Remy
- Internet and Data Science Lab, imec, Ghent University, Ghent, Belgium
| | - Kris Demuynck
- Internet and Data Science Lab, imec, Ghent University, Ghent, Belgium
| | - Thomas Demeester
- Internet and Data Science Lab, imec, Ghent University, Ghent, Belgium
| |
Collapse
|
6
|
Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024; 154:102900. [PMID: 38878555 PMCID: PMC11638972 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]
Abstract
With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
Collapse
Affiliation(s)
- Subhash Nerella
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | | | - Jiaqing Zhang
- Department of Electrical and Computer Engineering, University of Florida, Gainesville, United States
| | - Miguel Contreras
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Scott Siegel
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Aysegul Bumin
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Brandon Silva
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, United States
| | - Jessica Sena
- Department Of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, United States
| | - Azra Bihorac
- Department of Medicine, University of Florida, Gainesville, United States
| | - Kia Khezeli
- Department of Biomedical Engineering, University of Florida, Gainesville, United States
| | - Parisa Rashidi
- Department of Biomedical Engineering, University of Florida, Gainesville, United States.
| |
Collapse
|
7
|
Wada S, Takeda T, Okada K, Manabe S, Konishi S, Kamohara J, Matsumura Y. Oversampling effect in pretraining for bidirectional encoder representations from transformers (BERT) to localize medical BERT and enhance biomedical BERT. Artif Intell Med 2024; 153:102889. [PMID: 38728811 DOI: 10.1016/j.artmed.2024.102889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 05/03/2024] [Accepted: 05/04/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND Pretraining large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural language processing. With the introduction of transformer-based language models, such as bidirectional encoder representations from transformers (BERT), the performance of information extraction from free text has improved significantly in both the general and medical domains. However, it is difficult to train specific BERT models to perform well in domains for which few databases of a high quality and large size are publicly available. OBJECTIVE We hypothesized that this problem could be addressed by oversampling a domain-specific corpus and using it for pretraining with a larger corpus in a balanced manner. In the present study, we verified our hypothesis by developing pretraining models using our method and evaluating their performance. METHODS Our proposed method was based on the simultaneous pretraining of models with knowledge from distinct domains after oversampling. We conducted three experiments in which we generated (1) English biomedical BERT from a small biomedical corpus, (2) Japanese medical BERT from a small medical corpus, and (3) enhanced biomedical BERT pretrained with complete PubMed abstracts in a balanced manner. We then compared their performance with those of conventional models. RESULTS Our English BERT pretrained using both general and small medical domain corpora performed sufficiently well for practical use on the biomedical language understanding evaluation (BLUE) benchmark. Moreover, our proposed method was more effective than the conventional methods for each biomedical corpus of the same corpus size in the general domain. Our Japanese medical BERT outperformed the other BERT models built using a conventional method for almost all the medical tasks. The model demonstrated the same trend as that of the first experiment in English. Further, our enhanced biomedical BERT model, which was not pretrained on clinical notes, achieved superior clinical and biomedical scores on the BLUE benchmark with an increase of 0.3 points in the clinical score and 0.5 points in the biomedical score. These scores were above those of the models trained without our proposed method. CONCLUSIONS Well-balanced pretraining using oversampling instances derived from a corpus appropriate for the target task allowed us to construct a high-performance BERT model.
Collapse
Affiliation(s)
- Shoya Wada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan.
| | - Toshihiro Takeda
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Katsuki Okada
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Shirou Manabe
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | - Shozo Konishi
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| | | | - Yasushi Matsumura
- Department of Medical Informatics, Osaka University Graduate School of Medicine, Japan
| |
Collapse
|
8
|
Livne M, Miftahutdinov Z, Tutubalina E, Kuznetsov M, Polykovskiy D, Brundyn A, Jhunjhunwala A, Costa A, Aliper A, Aspuru-Guzik A, Zhavoronkov A. nach0: multimodal natural and chemical languages foundation model. Chem Sci 2024; 15:8380-8389. [PMID: 38846388 PMCID: PMC11151847 DOI: 10.1039/d4sc00966e] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 04/26/2024] [Indexed: 06/09/2024] Open
Abstract
Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.
Collapse
Affiliation(s)
- Micha Livne
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Zulfat Miftahutdinov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Elena Tutubalina
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| | - Maksim Kuznetsov
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Daniil Polykovskiy
- Insilico Medicine Canada Inc. 3710-1250 René-Lévesque West Montreal Quebec Canada
| | - Annika Brundyn
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | | | - Anthony Costa
- NVIDIA 2788 San Tomas Expressway Santa Clara 95051 CA USA
| | - Alex Aliper
- Insilico Medicine AI Ltd. Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City Abu Dhabi United Arab Emirates
| | - Alán Aspuru-Guzik
- University of Toronto Lash Miller Building 80 St. George Street Toronto Ontario Canada
| | - Alex Zhavoronkov
- Insilico Medicine Hong Kong Ltd. Unit 310, 3/F, Building 8W, Phase 2, Hong Kong Science Park, Pak Shek Kok New Territories Hong Kong
| |
Collapse
|
9
|
Huang X, Gong H. A Dual-Attention Learning Network With Word and Sentence Embedding for Medical Visual Question Answering. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:832-845. [PMID: 37812550 DOI: 10.1109/tmi.2023.3322868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
Research in medical visual question answering (MVQA) can contribute to the development of computer-aided diagnosis. MVQA is a task that aims to predict accurate and convincing answers based on given medical images and associated natural language questions. This task requires extracting medical knowledge-rich feature content and making fine-grained understandings of them. Therefore, constructing an effective feature extraction and understanding scheme are keys to modeling. Existing MVQA question extraction schemes mainly focus on word information, ignoring medical information in the text, such as medical concepts and domain-specific terms. Meanwhile, some visual and textual feature understanding schemes cannot effectively capture the correlation between regions and keywords for reasonable visual reasoning. In this study, a dual-attention learning network with word and sentence embedding (DALNet-WSE) is proposed. We design a module, transformer with sentence embedding (TSE), to extract a double embedding representation of questions containing keywords and medical information. A dual-attention learning (DAL) module consisting of self-attention and guided attention is proposed to model intensive intramodal and intermodal interactions. With multiple DAL modules (DALs), learning visual and textual co-attention can increase the granularity of understanding and improve visual reasoning. Experimental results on the ImageCLEF 2019 VQA-MED (VQA-MED 2019) and VQA-RAD datasets demonstrate that our proposed method outperforms previous state-of-the-art methods. According to the ablation studies and Grad-CAM maps, DALNet-WSE can extract rich textual information and has strong visual reasoning ability.
Collapse
|
10
|
Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda JM, Reeves R, Miller T, Waitman LR, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:834-843. [PMID: 38222429 PMCID: PMC10785935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The types of clinical notes in electronic health records (EHRs) are diverse and it would be great to standardize them to ensure unified data retrieval, exchange, and integration. The LOINC Document Ontology (DO) is a subset of LOINC that is created specifically for naming and describing clinical documents. Despite the efforts of promoting and improving this ontology, how to efficiently deploy it in real-world clinical settings has yet to be explored. In this study we evaluated the utility of LOINC DO by mapping clinical note titles collected from five institutions to the LOINC DO and classifying the mapping into three classes based on semantic similarity between note titles and LOINC DO codes. Additionally, we developed a standardization pipeline that automatically maps clinical note titles from multiple sites to suitable LOINC DO codes, without accessing the content of clinical notes. The pipeline can be initialized with different large language models, and we compared the performances between them. The results showed that our automated pipeline achieved an accuracy of 0.90. By comparing the manual and automated mapping results, we analyzed the coverage of LOINC DO in describing multi-site clinical note titles and summarized the potential scope for extension.
Collapse
Affiliation(s)
- Xu Zuo
- University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yujia Zhou
- University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jon Duke
- Georgia Institute of Technology, Atlanta, GA, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - George Hripcsak
- Columbia University, New York City, NY, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - Nigam Shah
- Stanford University, Stanford, CA, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - Juan M Banda
- Georgia State University, Atlanta, GA, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - Ruth Reeves
- Vanderbilt University Medical Center, Nashville, TN, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - Timothy Miller
- Boston Children's Hospital, Boston, MA, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | | | - Karthik Natarajan
- Columbia University, New York City, NY, USA
- OHDSI Consortium, Natural Language Processing Working Group
| | - Hua Xu
- Yale University, New Haven, CT, USA
- OHDSI Consortium, Natural Language Processing Working Group
| |
Collapse
|
11
|
Jin Q, Kim W, Chen Q, Comeau DC, Yeganova L, Wilbur WJ, Lu Z. MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. Bioinformatics 2023; 39:btad651. [PMID: 37930897 PMCID: PMC10627406 DOI: 10.1093/bioinformatics/btad651] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 09/29/2023] [Indexed: 11/08/2023] Open
Abstract
MOTIVATION Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce MedCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot semantic IR in biomedicine. RESULTS To train MedCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely integrated retriever and re-ranker. Experimental results show that MedCPT sets new state-of-the-art performance on six biomedical IR tasks, outperforming various baselines including much larger models, such as GPT-3-sized cpt-text-XL. In addition, MedCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, MedCPT can be readily applied to various real-world biomedical IR tasks. AVAILABILITY AND IMPLEMENTATION The MedCPT code and model are available at https://github.com/ncbi/MedCPT.
Collapse
Affiliation(s)
- Qiao Jin
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Won Kim
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Qingyu Chen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Donald C Comeau
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Lana Yeganova
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - W John Wilbur
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, United States
| |
Collapse
|
12
|
Sun Z, Lin M, Zhu Q, Xie Q, Wang F, Lu Z, Peng Y. A scoping review on multimodal deep learning in biomedical images and texts. J Biomed Inform 2023; 146:104482. [PMID: 37652343 PMCID: PMC10591890 DOI: 10.1016/j.jbi.2023.104482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
OBJECTIVE Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
Collapse
Affiliation(s)
- Zhaoyi Sun
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Mingquan Lin
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Qingqing Zhu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Qianqian Xie
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Fei Wang
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| |
Collapse
|
13
|
Chen Q, Sun H, Liu H, Jiang Y, Ran T, Jin X, Xiao X, Lin Z, Chen H, Niu Z. An extensive benchmark study on biomedical text generation and mining with ChatGPT. Bioinformatics 2023; 39:btad557. [PMID: 37682111 PMCID: PMC10562950 DOI: 10.1093/bioinformatics/btad557] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/09/2023] [Accepted: 09/06/2023] [Indexed: 09/09/2023] Open
Abstract
MOTIVATION In recent years, the development of natural language process (NLP) technologies and deep learning hardware has led to significant improvement in large language models (LLMs). The ChatGPT, the state-of-the-art LLM built on GPT-3.5 and GPT-4, shows excellent capabilities in general language understanding and reasoning. Researchers also tested the GPTs on a variety of NLP-related tasks and benchmarks and got excellent results. With exciting performance on daily chat, researchers began to explore the capacity of ChatGPT on expertise that requires professional education for human and we are interested in the biomedical domain. RESULTS To evaluate the performance of ChatGPT on biomedical-related tasks, this article presents a comprehensive benchmark study on the use of ChatGPT for biomedical corpus, including article abstracts, clinical trials description, biomedical questions, and so on. Typical NLP tasks like named entity recognization, relation extraction, sentence similarity, question and answering, and document classification are included. Overall, ChatGPT got a BLURB score of 58.50 while the state-of-the-art model had a score of 84.30. Through a series of experiments, we demonstrated the effectiveness and versatility of ChatGPT in biomedical text understanding, reasoning and generation, and the limitation of ChatGPT build on GPT-3.5. AVAILABILITY AND IMPLEMENTATION All the datasets are available from BLURB benchmark https://microsoft.github.io/BLURB/index.html. The prompts are described in the article.
Collapse
Affiliation(s)
- Qijie Chen
- AIDD, Mindrank AI Ltd, Zhejiang 310000, China
| | - Haotong Sun
- AIDD, Mindrank AI Ltd, Zhejiang 310000, China
| | - Haoyang Liu
- College of Life Sciences, Nankai University, Tianjin 300071, China
- Guangzhou Laboratory, GuangDong 510005, China
| | | | - Ting Ran
- Guangzhou Laboratory, GuangDong 510005, China
| | - Xurui Jin
- AIDD, Mindrank AI Ltd, Zhejiang 310000, China
| | | | - Zhimin Lin
- AIDD, Mindrank AI Ltd, Zhejiang 310000, China
| | | | - Zhangmin Niu
- AIDD, Mindrank AI Ltd, Zhejiang 310000, China
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| |
Collapse
|
14
|
Giancani S, Albertoni R, Catalano CE. Quality of word and concept embeddings in targetted biomedical domains. Heliyon 2023; 9:e16818. [PMID: 37332929 PMCID: PMC10272317 DOI: 10.1016/j.heliyon.2023.e16818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/29/2023] [Accepted: 05/30/2023] [Indexed: 06/20/2023] Open
Abstract
Embeddings are fundamental resources often reused for building intelligent systems in the biomedical context. As a result, evaluating the quality of previously trained embeddings and ensuring they cover the desired information is critical for the success of applications. This paper proposes a new evaluation methodology to test the coverage of embeddings against a targetted domain of interest. It defines measures to assess the terminology, similarity, and analogy coverage, which are core aspects of the embeddings. Then, it discusses the experimentation carried out on existing biomedical embeddings in the specific context of pulmonary diseases. The proposed methodology and measures are general and may be applied to any application domain.
Collapse
Affiliation(s)
- Salvatore Giancani
- Institut de Neurosciences de la Timone, Unité Mixte de Recherche 7289 Centre National de la Recherce Scientifique and Aix-Marseille Université, Faculty of Medicine, 27, Boulevard Jean Moulin, 13385 Marseille Cedex 05, France
- Istituto di Matematica Applicata e Tecnologie Informatiche, Consiglio Nazionale delle Ricerche, Via De Marini 16, 16149 Genova, Italy
| | - Riccardo Albertoni
- Istituto di Matematica Applicata e Tecnologie Informatiche, Consiglio Nazionale delle Ricerche, Via De Marini 16, 16149 Genova, Italy
| | - Chiara Eva Catalano
- Istituto di Matematica Applicata e Tecnologie Informatiche, Consiglio Nazionale delle Ricerche, Via De Marini 16, 16149 Genova, Italy
| |
Collapse
|
15
|
Tinn R, Cheng H, Gu Y, Usuyama N, Liu X, Naumann T, Gao J, Poon H. Fine-tuning large neural language models for biomedical natural language processing. PATTERNS (NEW YORK, N.Y.) 2023; 4:100729. [PMID: 37123444 PMCID: PMC10140607 DOI: 10.1016/j.patter.2023.100729] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 12/12/2022] [Accepted: 03/17/2023] [Indexed: 05/02/2023]
Abstract
Large neural language models have transformed modern natural language processing (NLP) applications. However, fine-tuning such models for specific tasks remains challenging as model size increases, especially with small labeled datasets, which are common in biomedical NLP. We conduct a systematic study on fine-tuning stability in biomedical NLP. We show that fine-tuning performance may be sensitive to pretraining settings and conduct an exploration of techniques for addressing fine-tuning instability. We show that these techniques can substantially improve fine-tuning performance for low-resource biomedical NLP applications. Specifically, freezing lower layers is helpful for standard BERT- B A S E models, while layerwise decay is more effective for BERT- L A R G E and ELECTRA models. For low-resource text similarity tasks, such as BIOSSES, reinitializing the top layers is the optimal strategy. Overall, domain-specific vocabulary and pretraining facilitate robust models for fine-tuning. Based on these findings, we establish a new state of the art on a wide range of biomedical NLP applications.
Collapse
Affiliation(s)
| | - Hao Cheng
- Microsoft Research, Redmond, WA, USA
| | - Yu Gu
- Microsoft Research, Redmond, WA, USA
| | | | | | | | | | - Hoifung Poon
- Microsoft Research, Redmond, WA, USA
- Corresponding author
| |
Collapse
|
16
|
Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. A reproducible experimental survey on biomedical sentence similarity: A string-based method sets the state of the art. PLoS One 2022; 17:e0276539. [PMID: 36409715 PMCID: PMC9678326 DOI: 10.1371/journal.pone.0276539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 10/08/2022] [Indexed: 11/22/2022] Open
Abstract
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure establishes the new state of the art in sentence similarity analysis in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and highlight the need to refine the current benchmarks. Finally, a notable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.
Collapse
Affiliation(s)
- Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| |
Collapse
|
17
|
Hall K, Chang V, Jayne C. A review on Natural Language Processing Models for COVID-19 research. HEALTHCARE ANALYTICS (NEW YORK, N.Y.) 2022; 2:100078. [PMID: 37520621 PMCID: PMC9295335 DOI: 10.1016/j.health.2022.100078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 07/08/2022] [Accepted: 07/12/2022] [Indexed: 11/22/2022]
Abstract
This survey paper reviews Natural Language Processing Models and their use in COVID-19 research in two main areas. Firstly, a range of transformer-based biomedical pretrained language models are evaluated using the BLURB benchmark. Secondly, models used in sentiment analysis surrounding COVID-19 vaccination are evaluated. We filtered literature curated from various repositories such as PubMed and Scopus and reviewed 27 papers. When evaluated using the BLURB benchmark, the novel T-BPLM BioLinkBERT gives groundbreaking results by incorporating document link knowledge and hyperlinking into its pretraining. Sentiment analysis of COVID-19 vaccination through various Twitter API tools has shown the public's sentiment towards vaccination to be mostly positive. Finally, we outline some limitations and potential solutions to drive the research community to improve the models used for NLP tasks.
Collapse
Affiliation(s)
| | - Victor Chang
- Operations Information Management, ABS, Aston University, UK
| | | |
Collapse
|
18
|
Semantic similarity-based credit attribution on citation paths: a method for allocating residual citation to and investigating depth of influence of scientific communications. Scientometrics 2022. [DOI: 10.1007/s11192-022-04522-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
19
|
Tiwary P, M A, Gautam A. No means 'No'; a non-im-proper modeling approach, with embedded speculative context. Bioinformatics 2022; 38:4790-4796. [PMID: 36040145 PMCID: PMC9563701 DOI: 10.1093/bioinformatics/btac593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/15/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this paper, we investigate various bio model's embeddings(BioBERT, BioELECTRA, PubMedBERT) on their understanding of "negation and speculation context" wherein we found that these models were unable to differentiate "negated context" vs "non-negated context". To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings vs non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on "negation and speculation context" by utilizing a synthesized dataset. RESULTS After super-tuning the models we can see that the model's embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art (SOTA) on negation, speculation cue, and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like Natural Language Inference after super-tuning. AVAILABILITY The source code and the models are available at: https://github.com/comprehend/engg-airesearch/tree/uncertainty-super-tuning. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
20
|
Chen J, Goudey B, Zobel J, Geard N, Verspoor K. Exploring automatic inconsistency detection for literature-based gene ontology annotation. Bioinformatics 2022; 38:i273-i281. [PMID: 35758780 PMCID: PMC9235499 DOI: 10.1093/bioinformatics/btac230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2022] [Indexed: 11/12/2022] Open
Abstract
Motivation Literature-based gene ontology annotations (GOA) are biological database records that use controlled vocabulary to uniformly represent gene function information that is described in the primary literature. Assurance of the quality of GOA is crucial for supporting biological research. However, a range of different kinds of inconsistencies in between literature as evidence and annotated GO terms can be identified; these have not been systematically studied at record level. The existing manual-curation approach to GOA consistency assurance is inefficient and is unable to keep pace with the rate of updates to gene function knowledge. Automatic tools are therefore needed to assist with GOA consistency assurance. This article presents an exploration of different GOA inconsistencies and an early feasibility study of automatic inconsistency detection. Results We have created a reliable synthetic dataset to simulate four realistic types of GOA inconsistency in biological databases. Three automatic approaches are proposed. They provide reasonable performance on the task of distinguishing the four types of inconsistency and are directly applicable to detect inconsistencies in real-world GOA database records. Major challenges resulting from such inconsistencies in the context of several specific application settings are reported. This is the first study to introduce automatic approaches that are designed to address the challenges in current GOA quality assurance workflows. The data underlying this article are available in Github at https://github.com/jiyuc/AutoGOAConsistency.
Collapse
Affiliation(s)
- Jiyu Chen
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Benjamin Goudey
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Justin Zobel
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Parkville, VIC 3010, Australia.,School of Computer Technologies, RMIT University, Melbourne, VIC 3000, Australia
| |
Collapse
|
21
|
Naseem U, Dunn AG, Khushi M, Kim J. Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT. BMC Bioinformatics 2022; 23:144. [PMID: 35448946 PMCID: PMC9022356 DOI: 10.1186/s12859-022-04688-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 03/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks. RESULTS We present 8 variants of BioALBERT, a domain-specific adaptation of a lite bidirectional encoder representations from transformers (ALBERT), trained on biomedical (PubMed and PubMed Central) and clinical (MIMIC-III) corpora and fine-tuned for 6 different tasks across 20 benchmark datasets. Experiments show that a large variant of BioALBERT trained on PubMed outperforms the state-of-the-art on named-entity recognition (+ 11.09% BLURB score improvement), relation extraction (+ 0.80% BLURB score), sentence similarity (+ 1.05% BLURB score), document classification (+ 0.62% F1-score), and question answering (+ 2.83% BLURB score). It represents a new state-of-the-art in 5 out of 6 benchmark BioNLP tasks. CONCLUSIONS The large variant of BioALBERT trained on PubMed achieved a higher BLURB score than previous state-of-the-art models on 5 of the 6 benchmark BioNLP tasks. Depending on the task, 5 different variants of BioALBERT outperformed previous state-of-the-art models on 17 of the 20 benchmark datasets, showing that our model is robust and generalizable in the common BioNLP tasks. We have made BioALBERT freely available which will help the BioNLP community avoid computational cost of training and establish a new set of baselines for future efforts across a broad range of BioNLP tasks.
Collapse
Affiliation(s)
- Usman Naseem
- School of Computer Science, The University of Sydney, Sydney, Australia.
| | - Adam G Dunn
- Biomedical Informatics and Digital Health and Faculty of Medicine and Health, School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Matloob Khushi
- School of Computer Science, The University of Sydney, Sydney, Australia.,School of EAST, University of Suffolk, Ipswich, UK
| | - Jinman Kim
- School of Computer Science, The University of Sydney, Sydney, Australia
| |
Collapse
|
22
|
Text Similarity Measurement Method and Application of Online Medical Community Based on Density Peak Clustering. J ORGAN END USER COM 2022. [DOI: 10.4018/joeuc.302893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Text similarity measurement is a link between basic research such as text modeling and upper-level application research of text potential information. In order to improve the accuracy of text similarity measurement, this paper proposes a semantic similarity calculation method integrating word2vec model and TF-IDF, and applies it to the density peak clustering of Chinese text data consulted by patients in online medical community. Experimental results show that the proposed similarity measurement method is superior to the traditional method. Furthermore, the study is among the first to apply the density peak clustering algorithm to online medical community, which offers a reference for how to find out user demands from medical text data in the big data environment.
Collapse
|
23
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
24
|
Biomedical Semantic Textual Similarity: Evaluation of Sentence Representations Enhanced with Principal Component Reduction and Word Frequency Weighting. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
25
|
Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021; 126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]
Abstract
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Collapse
|
26
|
Chen Q, Rankine A, Peng Y, Aghaarabi E, Lu Z. Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study. JMIR Med Inform 2021; 9:e27386. [PMID: 34967748 PMCID: PMC8759018 DOI: 10.2196/27386] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 01/23/2023] Open
Abstract
Background Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank. Objective Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications. Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures. Results Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications. Conclusions Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.
Collapse
Affiliation(s)
- Qingyu Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Alex Rankine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.,Harvard College, Cambridge, MA, United States
| | - Yifan Peng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.,Weill Cornell Medicine, New York, NY, United States
| | - Elaheh Aghaarabi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.,Towson University, Towson, MD, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
27
|
Frisoni G, Moro G, Carlassare G, Carbonaro A. Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature. SENSORS (BASEL, SWITZERLAND) 2021; 22:3. [PMID: 35009544 PMCID: PMC8747118 DOI: 10.3390/s22010003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 06/14/2023]
Abstract
The automatic extraction of biomedical events from the scientific literature has drawn keen interest in the last several years, recognizing complex and semantically rich graphical interactions otherwise buried in texts. However, very few works revolve around learning embeddings or similarity metrics for event graphs. This gap leaves biological relations unlinked and prevents the application of machine learning techniques to promote discoveries. Taking advantage of recent deep graph kernel solutions and pre-trained language models, we propose Deep Divergence Event Graph Kernels (DDEGK), an unsupervised inductive method to map events into low-dimensional vectors, preserving their structural and semantic similarities. Unlike most other systems, DDEGK operates at a graph level and does not require task-specific labels, feature engineering, or known correspondences between nodes. To this end, our solution compares events against a small set of anchor ones, trains cross-graph attention networks for drawing pairwise alignments (bolstering interpretability), and employs transformer-based models to encode continuous attributes. Extensive experiments have been done on nine biomedical datasets. We show that our learned event representations can be effectively employed in tasks such as graph classification, clustering, and visualization, also facilitating downstream semantic textual similarity. Empirical results demonstrate that DDEGK significantly outperforms other state-of-the-art methods.
Collapse
Affiliation(s)
- Giacomo Frisoni
- Department of Computer Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy; (G.F.); (G.M.)
| | - Gianluca Moro
- Department of Computer Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy; (G.F.); (G.M.)
| | | | - Antonella Carbonaro
- Department of Computer Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy; (G.F.); (G.M.)
| |
Collapse
|
28
|
Chen J, Geard N, Zobel J, Verspoor K. Automatic consistency assurance for literature-based gene ontology annotation. BMC Bioinformatics 2021; 22:565. [PMID: 34823464 PMCID: PMC8620237 DOI: 10.1186/s12859-021-04479-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 11/15/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Literature-based gene ontology (GO) annotation is a process where expert curators use uniform expressions to describe gene functions reported in research papers, creating computable representations of information about biological systems. Manual assurance of consistency between GO annotations and the associated evidence texts identified by expert curators is reliable but time-consuming, and is infeasible in the context of rapidly growing biological literature. A key challenge is maintaining consistency of existing GO annotations as new studies are published and the GO vocabulary is updated. RESULTS In this work, we introduce a formalisation of biological database annotation inconsistencies, identifying four distinct types of inconsistency. We propose a novel and efficient method using state-of-the-art text mining models to automatically distinguish between consistent GO annotation and the different types of inconsistent GO annotation. We evaluate this method using a synthetic dataset generated by directed manipulation of instances in an existing corpus, BC4GO. We provide detailed error analysis for demonstrating that the method achieves high precision on more confident predictions. CONCLUSIONS Two models built using our method for distinct annotation consistency identification tasks achieved high precision and were robust to updates in the GO vocabulary. Our approach demonstrates clear value for human-in-the-loop curation scenarios.
Collapse
Affiliation(s)
- Jiyu Chen
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Justin Zobel
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne, 3010, Australia. .,School of Computing Technologies, RMIT University, Melbourne, VIC, 3000, Australia.
| |
Collapse
|
29
|
Martinez-Gil J, Mokadem R, Morvan F, Küng J, Hameurlain A. Interpretable entity meta-alignment in knowledge graphs using penalized regression: a case study in the biomedical domain. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00263-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
30
|
Transformers-sklearn: a toolkit for medical language understanding with transformer-based models. BMC Med Inform Decis Mak 2021; 21:90. [PMID: 34330244 PMCID: PMC8323195 DOI: 10.1186/s12911-021-01459-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 03/01/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transformer is an attention-based architecture proven the state-of-the-art model in natural language processing (NLP). To reduce the difficulty of beginning to use transformer-based models in medical language understanding and expand the capability of the scikit-learn toolkit in deep learning, we proposed an easy to learn Python toolkit named transformers-sklearn. By wrapping the interfaces of transformers in only three functions (i.e., fit, score, and predict), transformers-sklearn combines the advantages of the transformers and scikit-learn toolkits. METHODS In transformers-sklearn, three Python classes were implemented, namely, BERTologyClassifier for the classification task, BERTologyNERClassifier for the named entity recognition (NER) task, and BERTologyRegressor for the regression task. Each class contains three methods, i.e., fit for fine-tuning transformer-based models with the training dataset, score for evaluating the performance of the fine-tuned model, and predict for predicting the labels of the test dataset. transformers-sklearn is a user-friendly toolkit that (1) Is customizable via a few parameters (e.g., model_name_or_path and model_type), (2) Supports multilingual NLP tasks, and (3) Requires less coding. The input data format is automatically generated by transformers-sklearn with the annotated corpus. Newcomers only need to prepare the dataset. The model framework and training methods are predefined in transformers-sklearn. RESULTS We collected four open-source medical language datasets, including TrialClassification for Chinese medical trial text multi label classification, BC5CDR for English biomedical text name entity recognition, DiabetesNER for Chinese diabetes entity recognition and BIOSSES for English biomedical sentence similarity estimation. In the four medical NLP tasks, the average code size of our script is 45 lines/task, which is one-sixth the size of transformers' script. The experimental results show that transformers-sklearn based on pretrained BERT models achieved macro F1 scores of 0.8225, 0.8703 and 0.6908, respectively, on the TrialClassification, BC5CDR and DiabetesNER tasks and a Pearson correlation of 0.8260 on the BIOSSES task, which is consistent with the results of transformers. CONCLUSIONS The proposed toolkit could help newcomers address medical language understanding tasks using the scikit-learn coding style easily. The code and tutorials of transformers-sklearn are available at https://doi.org/10.5281/zenodo.4453803 . In future, more medical language understanding tasks will be supported to improve the applications of transformers_sklearn.
Collapse
|
31
|
Luo M, Cohen AM, Addepalli S, Smalheiser NR. Identifying main finding sentences in clinical case reports. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2020:5855433. [PMID: 32525207 PMCID: PMC7287507 DOI: 10.1093/database/baaa041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 03/10/2020] [Accepted: 05/06/2020] [Indexed: 11/20/2022]
Abstract
Clinical case reports are the ‘eyewitness reports’ of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.
Collapse
Affiliation(s)
- Mengqi Luo
- Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, Chicago, IL 60612, USA.,School of Information Management, Wuhan University, Wuhan, Hubei 430072, China
| | - Aaron M Cohen
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Sidharth Addepalli
- Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, Chicago, IL 60612, USA
| | - Neil R Smalheiser
- Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, Chicago, IL 60612, USA
| |
Collapse
|
32
|
|
33
|
Lara-Clares A, Lastra-Díaz JJ, Garcia-Serrano A. Protocol for a reproducible experimental survey on biomedical sentence similarity. PLoS One 2021; 16:e0248663. [PMID: 33760855 PMCID: PMC7990182 DOI: 10.1371/journal.pone.0248663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 03/02/2021] [Indexed: 11/28/2022] Open
Abstract
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity methods and experimental results reported in the biomedical domain cannot be reproduced for multiple reasons as follows: the copying of previous results without confirmation, the lack of source code and data to replicate both methods and experiments, and the lack of a detailed definition of the experimental setup, among others. As a consequence of this reproducibility gap, the state of the problem can be neither elucidated nor new lines of research be soundly set. On the other hand, there are other significant gaps in the literature on biomedical sentence similarity as follows: (1) the evaluation of several unexplored sentence similarity methods which deserve to be studied; (2) the evaluation of an unexplored benchmark on biomedical sentence similarity, called Corpus-Transcriptional-Regulation (CTR); (3) a study on the impact of the pre-processing stage and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (4) the lack of software and data resources for the reproducibility of methods and experiments in this line of research. Identified these open problems, this registered report introduces a detailed experimental setup, together with a categorization of the literature, to develop the largest, updated, and for the first time, reproducible experimental survey on biomedical sentence similarity. Our aforementioned experimental survey will be based on our own software replication and the evaluation of all methods being studied on the same software platform, which will be specially developed for this work, and it will become the first publicly available software library for biomedical sentence similarity. Finally, we will provide a very detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain
| |
Collapse
|
34
|
|
35
|
Yang X, He X, Zhang H, Ma Y, Bian J, Wu Y. Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models. JMIR Med Inform 2020; 8:e19735. [PMID: 33226350 PMCID: PMC7721552 DOI: 10.2196/19735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 10/19/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Semantic textual similarity (STS) is one of the fundamental tasks in natural language processing (NLP). Many shared tasks and corpora for STS have been organized and curated in the general English domain; however, such resources are limited in the biomedical domain. In 2019, the National NLP Clinical Challenges (n2c2) challenge developed a comprehensive clinical STS dataset and organized a community effort to solicit state-of-the-art solutions for clinical STS. OBJECTIVE This study presents our transformer-based clinical STS models developed during this challenge as well as new models we explored after the challenge. This project is part of the 2019 n2c2/Open Health NLP shared task on clinical STS. METHODS In this study, we explored 3 transformer-based models for clinical STS: Bidirectional Encoder Representations from Transformers (BERT), XLNet, and Robustly optimized BERT approach (RoBERTa). We examined transformer models pretrained using both general English text and clinical text. We also explored using a general English STS dataset as a supplementary corpus in addition to the clinical training set developed in this challenge. Furthermore, we investigated various ensemble methods to combine different transformer models. RESULTS Our best submission based on the XLNet model achieved the third-best performance (Pearson correlation of 0.8864) in this challenge. After the challenge, we further explored other transformer models and improved the performance to 0.9065 using a RoBERTa model, which outperformed the best-performing system developed in this challenge (Pearson correlation of 0.9010). CONCLUSIONS This study demonstrated the efficiency of utilizing transformer-based models to measure semantic similarity for clinical text. Our models can be applied to clinical applications such as clinical text deduplication and summarization.
Collapse
Affiliation(s)
- Xi Yang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Xing He
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Hansi Zhang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Yinghan Ma
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States
| |
Collapse
|
36
|
Afzal M, Alam F, Malik KM, Malik GM. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation. J Med Internet Res 2020; 22:e19810. [PMID: 33095174 PMCID: PMC7647812 DOI: 10.2196/19810] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023] Open
Abstract
Background Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization. Objective Traditional approaches in ATS for biomedical text suffer from fundamental issues such as an inability to capture clinical context, quality of evidence, and purpose-driven selection of passages for the summary. We aimed to circumvent these limitations through achieving precise, succinct, and coherent information extraction from credible published biomedical resources, and to construct a simplified summary containing the most informative content that can offer a review particular to clinical needs. Methods In our proposed approach, we introduce a novel framework, termed Biomed-Summarizer, that provides quality-aware Patient/Problem, Intervention, Comparison, and Outcome (PICO)-based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates the prognosis quality recognition model with a clinical context–aware model to locate text sequences in the body of a biomedical article for use in the final summary. First, we developed a deep neural network binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we developed a bidirectional long-short term memory recurrent neural network as a clinical context–aware classifier, which was trained on semantically enriched features generated using a word-embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculated the similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments, where the semantic enrichments are obtained using medical ontologies. Last, we generated a representative summary from the high-scoring PICO sequences aggregated by study type, publication credibility, and freshness score. Results Evaluation of the prognosis quality recognition model using a large dataset of biomedical literature related to intracranial aneurysm showed an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The clinical context–aware multiclass classifier outperformed the traditional machine-learning algorithms, including support vector machine, gradient boosted tree, linear regression, K-nearest neighbor, and naïve Bayes, by achieving 93% (16127/17341) accuracy for classifying five categories: aim, population, intervention, results, and outcome. The semantic similarity algorithm achieved a significant Pearson correlation coefficient of 0.61 (0-1 scale) on a well-known BIOSSES dataset (with 100 pair sentences) after semantic enrichment, representing an improvement of 8.9% over baseline Jaccard similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts concerning different metrics, suggesting that the automated summarization is satisfactory. Conclusions By employing the proposed method Biomed-Summarizer, high accuracy in ATS was achieved, enabling seamless curation of research evidence from the biomedical literature to use for clinical decision-making.
Collapse
Affiliation(s)
- Muhammad Afzal
- Department of Software, Sejong University, Seoul, Republic of Korea.,Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Fakhare Alam
- Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Khalid Mahmood Malik
- Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Ghaus M Malik
- Department of Neurosurgery, Henry Ford Hospital, Detroit, MI, United States
| |
Collapse
|
37
|
Tawfik NS, Spruit MR. Evaluating sentence representations for biomedical text: Methods and experimental results. J Biomed Inform 2020; 104:103396. [PMID: 32147441 DOI: 10.1016/j.jbi.2020.103396] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 01/22/2020] [Accepted: 02/24/2020] [Indexed: 10/24/2022]
Abstract
Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub.
Collapse
Affiliation(s)
- Noha S Tawfik
- Computer Engineering Department, College of Engineering, Arab Academy for Science, Technology, and Maritime Transport (AAST), 1029 Alexandria, Egypt; Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands.
| | - Marco R Spruit
- Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands.
| |
Collapse
|
38
|
Islamaj R, Wilbur WJ, Xie N, Gonzales NR, Thanki N, Yamashita R, Zheng C, Marchler-Bauer A, Lu Z. PubMed Text Similarity Model and its application to curation efforts in the Conserved Domain Database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5527151. [PMID: 31267135 DOI: 10.1093/database/baz064] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 11/13/2022]
Abstract
This study proposes a text similarity model to help biocuration efforts of the Conserved Domain Database (CDD). CDD is a curated resource that catalogs annotated multiple sequence alignment models for ancient domains and full-length proteins. These models allow for fast searching and quick identification of conserved motifs in protein sequences via Reverse PSI-BLAST. In addition, CDD curators prepare summaries detailing the function of these conserved domains and specific protein families, based on published peer-reviewed articles. To facilitate information access for database users, it is desirable to specifically identify the referenced articles that support the assertions of curator-composed sentences. Moreover, CDD curators desire an alert system that scans the newly published literature and proposes related articles of relevance to the existing CDD records. Our approach to address these needs is a text similarity method that automatically maps a curator-written statement to candidate sentences extracted from the list of referenced articles, as well as the articles in the PubMed Central database. To evaluate this proposal, we paired CDD description sentences with the top 10 matching sentences from the literature, which were given to curators for review. Through this exercise, we discovered that we were able to map the articles in the reference list to the CDD description statements with an accuracy of 77%. In the dataset that was reviewed by curators, we were able to successfully provide references for 86% of the curator statements. In addition, we suggested new articles for curator review, which were accepted by curators to be added into the reference list at an acceptance rate of 50%. Through this process, we developed a substantial corpus of similar sentences from biomedical articles on protein sequence, structure and function research, which constitute the CDD text similarity corpus. This corpus contains 5159 sentence pairs judged for their similarity on a scale from 1 (low) to 5 (high) doubly annotated by four CDD curators. Curator-assigned similarity scores have a Pearson correlation coefficient of 0.70 and an inter-annotator agreement of 85%. To date, this is the largest biomedical text similarity resource that has been manually judged, evaluated and made publicly available to the community to foster research and development of text similarity algorithms.
Collapse
Affiliation(s)
- Rezarta Islamaj
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - W John Wilbur
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Natalie Xie
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Noreen R Gonzales
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Roxanne Yamashita
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Chanjuan Zheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|
39
|
Hassanzadeh H, Nguyen A, Verspoor K. Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis. J Biomed Inform 2019; 100:103321. [PMID: 31676460 DOI: 10.1016/j.jbi.2019.103321] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 09/28/2019] [Accepted: 10/25/2019] [Indexed: 10/25/2022]
Abstract
OBJECTIVE Published clinical trials and high quality peer reviewed medical publications are considered as the main sources of evidence used for synthesizing systematic reviews or practicing Evidence Based Medicine (EBM). Finding all relevant published evidence for a particular medical case is a time and labour intensive task, given the breadth of the biomedical literature. Automatic quantification of conceptual relationships between key clinical evidence within and across publications, despite variations in the expression of clinically-relevant concepts, can help to facilitate synthesis of evidence. In this study, we aim to provide an approach towards expediting evidence synthesis by quantifying semantic similarity of key evidence as expressed in the form of individual sentences. Such semantic textual similarity can be applied as a key approach for supporting selection of related studies. MATERIAL AND METHODS We propose a generalisable approach for quantifying semantic similarity of clinical evidence in the biomedical literature, specifically considering the similarity of sentences corresponding to a given type of evidence, such as clinical interventions, population information, clinical findings, etc. We develop three sets of generic, ontology-based, and vector-space models of similarity measures that make use of a variety of lexical, conceptual, and contextual information to quantify the similarity of full sentences containing clinical evidence. To understand the impact of different similarity measures on the overall evidence semantic similarity quantification, we provide a comparative analysis of these measures when used as input to an unsupervised linear interpolation and a supervised regression ensemble. In order to provide a reliable test-bed for this experiment, we generate a dataset of 1000 pairs of sentences from biomedical publications that are annotated by ten human experts. We also extend the experiments on an external dataset for further generalisability testing. RESULTS The combination of all diverse similarity measures showed stronger correlations with the gold standard similarity scores in the dataset than any individual kind of measure. Our approach reached near 0.80 average Pearson correlation across different clinical evidence types using the devised similarity measures. Although they were more effective when combined together, individual generic and vector-space measures also resulted in strong similarity quantification when used in both unsupervised and supervised models. On the external dataset, our similarity measures were highly competitive with the state-of-the-art approaches developed and trained specifically on that dataset for predicting semantic similarity. CONCLUSION Experimental results showed that the proposed semantic similarity quantification approach can effectively identify related clinical evidence that is reported in the literature. The comparison with a state-of-the-art method demonstrated the effectiveness of the approach, and experiments with an external dataset support its generalisability.
Collapse
Affiliation(s)
- Hamed Hassanzadeh
- The Australian e-Health Research Centre, CSIRO, Brisbane, QLD, Australia.
| | - Anthony Nguyen
- The Australian e-Health Research Centre, CSIRO, Brisbane, QLD, Australia.
| | - Karin Verspoor
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia.
| |
Collapse
|
40
|
Lithgow-Serrano O, Gama-Castro S, Ishida-Gutiérrez C, Mejía-Almonte C, Tierrafría VH, Martínez-Luna S, Santos-Zavaleta A, Velázquez-Ramírez D, Collado-Vides J. Similarity corpus on microbial transcriptional regulation. J Biomed Semantics 2019; 10:8. [PMID: 31118102 PMCID: PMC6532127 DOI: 10.1186/s13326-019-0200-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 04/16/2019] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND The ability to express the same meaning in different ways is a well-known property of natural language. This amazing property is the source of major difficulties in natural language processing. Given the constant increase in published literature, its curation and information extraction would strongly benefit from efficient automatic processes, for which corpora of sentences evaluated by experts are a valuable resource. RESULTS Given our interest in applying such approaches to the benefit of curation of the biomedical literature, specifically that about gene regulation in microbial organisms, we decided to build a corpus with graded textual similarity evaluated by curators and that was designed specifically oriented to our purposes. Based on the predefined statistical power of future analyses, we defined features of the design, including sampling, selection criteria, balance, and size, among others. A non-fully crossed study design was applied. Each pair of sentences was evaluated by 3 annotators from a total of 7; the scale used in the semantic similarity assessment task within the Semantic Evaluation workshop (SEMEVAL) was adapted to our goals in four successive iterative sessions with clear improvements in the agreed guidelines and interrater reliability results. Alternatives for such a corpus evaluation have been widely discussed. CONCLUSIONS To the best of our knowledge, this is the first similarity corpus-a dataset of pairs of sentences for which human experts rate the semantic similarity of each pair-in this domain of knowledge. We have initiated its incorporation in our research towards high-throughput curation strategies based on natural language processing.
Collapse
Affiliation(s)
- Oscar Lithgow-Serrano
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Autónoma de México (UNAM), Mexico City, México
| | - Socorro Gama-Castro
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Cecilia Ishida-Gutiérrez
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Citlalli Mejía-Almonte
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Víctor H. Tierrafría
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Sara Martínez-Luna
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Alberto Santos-Zavaleta
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - David Velázquez-Ramírez
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
| | - Julio Collado-Vides
- Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100 México
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| |
Collapse
|
41
|
Keith Norambuena B, Meneses Villegas C. An extension to association rules using a similarity-based approach in semantic vector spaces. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-184085] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
42
|
Blagec K, Xu H, Agibetov A, Samwald M. Neural sentence embedding models for semantic similarity estimation in the biomedical domain. BMC Bioinformatics 2019; 20:178. [PMID: 30975071 PMCID: PMC6460644 DOI: 10.1186/s12859-019-2789-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 04/02/2019] [Indexed: 11/10/2022] Open
Abstract
Background Neural network based embedding models are receiving significant attention in the field of natural language processing due to their capability to effectively capture semantic information representing words, sentences or even larger text elements in low-dimensional vector space. While current state-of-the-art models for assessing the semantic similarity of textual statements from biomedical publications depend on the availability of laboriously curated ontologies, unsupervised neural embedding models only require large text corpora as input and do not need manual curation. In this study, we investigated the efficacy of current state-of-the-art neural sentence embedding models for semantic similarity estimation of sentences from biomedical literature. We trained different neural embedding models on 1.7 million articles from the PubMed Open Access dataset, and evaluated them based on a biomedical benchmark set containing 100 sentence pairs annotated by human experts and a smaller contradiction subset derived from the original benchmark set. Results Experimental results showed that, with a Pearson correlation of 0.819, our best unsupervised model based on the Paragraph Vector Distributed Memory algorithm outperforms previous state-of-the-art results achieved on the BIOSSES biomedical benchmark set. Moreover, our proposed supervised model that combines different string-based similarity metrics with a neural embedding model surpasses previous ontology-dependent supervised state-of-the-art approaches in terms of Pearson’s r (r = 0.871) on the biomedical benchmark set. In contrast to the promising results for the original benchmark, we found our best models’ performance on the smaller contradiction subset to be poor. Conclusions In this study, we have highlighted the value of neural network-based models for semantic similarity estimation in the biomedical domain by showing that they can keep up with and even surpass previous state-of-the-art approaches for semantic similarity estimation that depend on the availability of laboriously curated ontologies, when evaluated on a biomedical benchmark set. Capturing contradictions and negations in biomedical sentences, however, emerged as an essential area for further work. Electronic supplementary material The online version of this article (10.1186/s12859-019-2789-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kathrin Blagec
- Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Währinger Straße 25a, 1090, Vienna, Austria
| | - Hong Xu
- Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Währinger Straße 25a, 1090, Vienna, Austria
| | - Asan Agibetov
- Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Währinger Straße 25a, 1090, Vienna, Austria
| | - Matthias Samwald
- Section for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Währinger Straße 25a, 1090, Vienna, Austria.
| |
Collapse
|
43
|
Koroleva A, Kamath S, Paroubek P. Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations. J Biomed Inform 2019; 100S:100058. [PMID: 34384580 DOI: 10.1016/j.yjbinx.2019.100058] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 09/30/2019] [Accepted: 10/03/2019] [Indexed: 10/25/2022]
Abstract
BACKGROUND Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans' health.Automatic assessment of semantic similarity of trial outcomes is required for a number of tasks, such as detection of outcome switching (unjustified changes of pre-defined outcomes of a trial) and implementation of Core Outcome Sets (minimal sets of outcomes that should be reported in a particular medical domain). OBJECTIVE We aimed at building an algorithm for assessing semantic similarity of pairs of primary and reported outcomes.We focused on approaches that do not require manually curated domain-specific resources such as ontologies and thesauri. METHODS We tested several approaches, including single measures of similarity (based on strings, stems and lemmas, paths and distances in an ontology, and vector representations of phrases), classifiers using a combination of single measures as features, and a deep learning approach that consists in fine-tuning pre-trained deep language representations.We tested language models provided by BERT (trained on general-domain texts), BioBERT and SciBERT (trained on biomedical and scientific texts, respectively).We explored the possibility of improving the results by taking into account the variants for referring to an outcome (e.g.the use of a measurement tool name instead on the outcome name; the use of abbreviations).We release an open corpus with annotation for similarity of pairs of outcomes. RESULTS Classifiers using a combination of single measures as features outperformed the single measures, while deep learning algorithms using BioBERT and SciBERT models outperformed the classifiers.BioBERT reached the best F-measure of 89.75%.The addition of variants of outcomes did not improve the results for the best-performing single measures nor for the classifiers, but it improved the performance of deep learning algorithms: BioBERT achieved an F-measure of93.38%. CONCLUSIONS Deep learning approaches using pre-trained language representations outperformed other approaches for similarity assessment of trial outcomes, without relying on any manually curated domain-specific resources (ontologies and other lexical resources). Addition of variants of outcomes further improved the performance of deep learning algorithms.
Collapse
Affiliation(s)
- Anna Koroleva
- LIMSI, CNRS, Université Paris-Saclay, F-91405 Orsay, France; Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands.
| | - Sanjay Kamath
- LIMSI, CNRS, Université Paris-Saclay, F-91405 Orsay, France; LRI Univ. Paris-Sud, CNRS, Université Paris-Saclay, F-91405 Orsay, France
| | | |
Collapse
|
44
|
Schuler T, Kipritidis J, Eade T, Hruby G, Kneebone A, Perez M, Grimberg K, Richardson K, Evill S, Evans B, Gallego B. Big Data Readiness in Radiation Oncology: An Efficient Approach for Relabeling Radiation Therapy Structures With Their TG-263 Standard Name in Real-World Data Sets. Adv Radiat Oncol 2018; 4:191-200. [PMID: 30706028 PMCID: PMC6349627 DOI: 10.1016/j.adro.2018.09.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 09/28/2018] [Indexed: 12/17/2022] Open
Abstract
Purpose To prepare for big data analyses on radiation therapy data, we developed Stature, a tool-supported approach for standardization of structure names in existing radiation therapy plans. We applied the widely endorsed nomenclature standard TG-263 as the mapping target and quantified the structure name inconsistency in 2 real-world data sets. Methods and Materials The clinically relevant structures in the radiation therapy plans were identified by reference to randomized controlled trials. The Stature approach was used by clinicians to identify the synonyms for each relevant structure, which was then mapped to the corresponding TG-263 name. We applied Stature to standardize the structure names for 654 patients with prostate cancer (PCa) and 224 patients with head and neck squamous cell carcinoma (HNSCC) who received curative radiation therapy at our institution between 2007 and 2017. The accuracy of the Stature process was manually validated in a random sample from each cohort. For the HNSCC cohort we measured the resource requirements for Stature, and for the PCa cohort we demonstrated its impact on an example clinical analytics scenario. Results All but 1 synonym group (“Hydrogel”) was mapped to the corresponding TG-263 name, resulting in a TG-263 relabel rate of 99% (8837 of 8925 structures). For the PCa cohort, Stature matched a total of 5969 structures. Of these, 5682 structures were exact matches (ie, following local naming convention), 284 were matched via a synonym, and 3 required manual matching. This original radiation therapy structure names therefore had a naming inconsistency rate of 4.81%. For the HNSCC cohort, Stature mapped a total of 2956 structures (2638 exact, 304 synonym, 14 manual; 10.76% inconsistency rate) and required 7.5 clinician hours. The clinician hours required were one-fifth of those that would be required for manual relabeling. The accuracy of Stature was 99.97% (PCa) and 99.61% (HNSCC). Conclusions The Stature approach was highly accurate and had significant resource efficiencies compared with manual curation.
Collapse
Affiliation(s)
- Thilo Schuler
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia.,Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - John Kipritidis
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Thomas Eade
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia.,Northern Clinical School, University of Sydney, Sydney, Australia
| | - George Hruby
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia.,Northern Clinical School, University of Sydney, Sydney, Australia
| | - Andrew Kneebone
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia.,Northern Clinical School, University of Sydney, Sydney, Australia
| | - Mario Perez
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Kylie Grimberg
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Kylie Richardson
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Sally Evill
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Brooke Evans
- Department of Radiation Oncology, Northern Sydney Cancer Centre, Royal North Shore Hospital, Sydney, Australia
| | - Blanca Gallego
- Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
45
|
Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation. LECTURE NOTES IN COMPUTER SCIENCE 2018. [DOI: 10.1007/978-3-319-98932-7_28] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|