1
|
Du H, Xu J, Du Z, Chen L, Ma S, Wei D, Wang X. MF-MNER: Multi-models Fusion for MNER in Chinese Clinical Electronic Medical Records. Interdiscip Sci 2024; 16:489-502. [PMID: 38578388 PMCID: PMC11289171 DOI: 10.1007/s12539-024-00624-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 04/06/2024]
Abstract
To address the problem of poor entity recognition performance caused by the lack of Chinese annotation in clinical electronic medical records, this paper proposes a multi-medical entity recognition method F-MNER using a fusion technique combining BART, Bi-LSTM, and CRF. First, after cleaning, encoding, and segmenting the electronic medical records, the obtained semantic representations are dynamically fused using a bidirectional autoregressive transformer (BART) model. Then, sequential information is captured using a bidirectional long short-term memory (Bi-LSTM) network. Finally, the conditional random field (CRF) is used to decode and output multi-task entity recognition. Experiments are performed on the CCKS2019 dataset, with micro avg Precision, macro avg Recall, weighted avg Precision reaching 0.880, 0.887, and 0.883, and micro avg F1-score, macro avg F1-score, weighted avg F1-score reaching 0.875, 0.876, and 0.876 respectively. Compared with existing models, our method outperforms the existing literature in three evaluation metrics (micro average, macro average, weighted average) under the same dataset conditions. In the case of weighted average, the Precision, Recall, and F1-score are 19.64%, 15.67%, and 17.58% higher than the existing BERT-BiLSTM-CRF model respectively. Experiments are performed on the actual clinical dataset with our MF-MNER, the Precision, Recall, and F1-score are 0.638, 0.825, and 0.719 under the micro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.685, 0.800, and 0.733 under the macro-avg evaluation mechanism. The Precision, Recall, and F1-score are 0.647, 0.825, and 0.722 under the weighted avg evaluation mechanism. The above results show that our method MF-MNER can integrate the advantages of BART, Bi-LSTM, and CRF layers, significantly improving the performance of downstream named entity recognition tasks with a small amount of annotation, and achieving excellent performance in terms of recall score, which has certain practical significance. Source code and datasets to reproduce the results in this paper are available at https://github.com/xfwang1969/MF-MNER .
Collapse
Affiliation(s)
- Haoze Du
- Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
| | - Jiahao Xu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Zhiyong Du
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Lihui Chen
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Shaohui Ma
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China
| | - Dongqing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, 200240, China.
- Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education, Shanghai Jiaotong University, Shanghai, 200240, China.
- Zhongjing Research and Industrialization, Institute of Chinese Medicine, Zhongguancun Scientific Park, Nanyang, 473000, China.
| | - Xianfang Wang
- School of Computer Science and Technology, Henan Institute of Technology, Xinxiang, 453003, China.
| |
Collapse
|
2
|
Invernici F, Bernasconi A, Ceri S. Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation. J Med Internet Res 2024; 26:e52655. [PMID: 38814687 PMCID: PMC11176882 DOI: 10.2196/52655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/06/2024] [Accepted: 03/30/2024] [Indexed: 05/31/2024] Open
Abstract
BACKGROUND Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. OBJECTIVE This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. METHODS We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. RESULTS We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. CONCLUSIONS Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.
Collapse
Affiliation(s)
- Francesco Invernici
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
3
|
Tian D, Jiang S, Zhang L, Lu X, Xu Y. The role of large language models in medical image processing: a narrative review. Quant Imaging Med Surg 2024; 14:1108-1121. [PMID: 38223123 PMCID: PMC10784029 DOI: 10.21037/qims-23-892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 10/24/2023] [Indexed: 01/16/2024]
Abstract
Background and Objective The rapid advancement of artificial intelligence (AI) has ushered in a new era in natural language processing (NLP), with large language models (LLMs) like ChatGPT leading the way. This paper explores the profound impact of AI, particularly LLMs, in the field of medical image processing. The objective is to provide insights into the transformative potential of AI in improving healthcare by addressing historical challenges associated with manual image interpretation. Methods A comprehensive literature search was conducted on the Web of Science and PubMed databases from 2013 to 2023, focusing on the transformations of LLMs in Medical Imaging Processing. Recent publications on the arXiv database were also reviewed. Our search criteria included all types of articles, including abstracts, review articles, letters, and editorials. The language of publications was restricted to English to facilitate further content analysis. Key Content and Findings The review reveals that AI, driven by LLMs, has revolutionized medical image processing by streamlining the interpretation process, traditionally characterized by time-intensive manual efforts. AI's impact on medical care quality and patient well-being is substantial. With their robust interactivity and multimodal learning capabilities, LLMs offer immense potential for enhancing various aspects of medical image processing. Additionally, the Transformer architecture, foundational to LLMs, is gaining prominence in this domain. Conclusions In conclusion, this review underscores the pivotal role of AI, especially LLMs, in advancing medical image processing. These technologies have the capacity to enhance transfer learning efficiency, integrate multimodal data, facilitate clinical interactivity, and optimize cost-efficiency in healthcare. The potential applications of LLMs in clinical settings are promising, with far-reaching implications for future research, clinical practice, and healthcare policy. The transformative impact of AI in medical image processing is undeniable, and its continued development and implementation are poised to reshape the healthcare landscape for the better.
Collapse
Affiliation(s)
- Dianzhe Tian
- Department of Liver Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shitao Jiang
- Department of Liver Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Lei Zhang
- Department of Liver Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xin Lu
- Department of Liver Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yiyao Xu
- Department of Liver Surgery, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
4
|
Cai L, Li J, Lv H, Liu W, Niu H, Wang Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J Biomed Inform 2023; 143:104418. [PMID: 37290540 DOI: 10.1016/j.jbi.2023.104418] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 04/24/2023] [Accepted: 05/31/2023] [Indexed: 06/10/2023]
Abstract
The past decade has witnessed an explosion of textual information in the biomedical field. Biomedical texts provide a basis for healthcare delivery, knowledge discovery, and decision-making. Over the same period, deep learning has achieved remarkable performance in biomedical natural language processing, however, its development has been limited by well-annotated datasets and interpretability. To solve this, researchers have considered combining domain knowledge (such as biomedical knowledge graph) with biomedical data, which has become a promising means of introducing more information into biomedical datasets and following evidence-based medicine. This paper comprehensively reviews more than 150 recent literature studies on incorporating domain knowledge into deep learning models to facilitate typical biomedical text analysis tasks, including information extraction, text classification, and text generation. We eventually discuss various challenges and future directions.
Collapse
Affiliation(s)
- Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China
| | - Wenjuan Liu
- Aerospace Center Hospital, 100049 Beijing, China
| | - Haijun Niu
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China
| | - Zhenchang Wang
- School of Biological Science and Medical Engineering, Beihang University, 100191 Beijing, China; Department of Radiology, Beijing Friendship Hospital, Capital Medical University, 100050 Beijing, China.
| |
Collapse
|
5
|
Li M, Yang H, Liu Y. Biomedical named entity recognition based on fusion multi-features embedding. Technol Health Care 2023; 31:111-121. [PMID: 37038786 DOI: 10.3233/thc-] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2023]
Abstract
BACKGROUND With the exponential increase in the volume of biomedical literature, text mining tasks are becoming increasingly important in the medical domain. Named entities are the primary identification tasks in text mining, prerequisites and critical parts for building medical domain knowledge graphs, medical question and answer systems, medical text classification. OBJECTIVE The study goal is to recognize biomedical entities effectively by fusing multi-feature embedding. Multiple features provide more comprehensive information so that better predictions can be obtained. METHODS Firstly, three different kinds of features are generated, including deep contextual word-level features, local char-level features, and part-of-speech features at the word representation layer. The word representation vectors are inputs into BiLSTM as features to obtain the dependency information. Finally, the CRF algorithm is used to learn the features of the state sequences to obtain the global optimal tagging sequences. RESULTS The experimental results showed that the model outperformed other state-of-the-art methods for all-around performance in six datasets among eight of four biomedical entity types. CONCLUSION The proposed method has a positive effect on the prediction results. It comprehensively considers the relevant factors of named entity recognition because the semantic information is enhanced by fusing multi-features embedding.
Collapse
|
6
|
Raza S, Schwartz B. Entity and relation extraction from clinical case reports of COVID-19: a natural language processing approach. BMC Med Inform Decis Mak 2023; 23:20. [PMID: 36703154 PMCID: PMC9879259 DOI: 10.1186/s12911-023-02117-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 01/20/2023] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Extracting relevant information about infectious diseases is an essential task. However, a significant obstacle in supporting public health research is the lack of methods for effectively mining large amounts of health data. OBJECTIVE This study aims to use natural language processing (NLP) to extract the key information (clinical factors, social determinants of health) from published cases in the literature. METHODS The proposed framework integrates a data layer for preparing a data cohort from clinical case reports; an NLP layer to find the clinical and demographic-named entities and relations in the texts; and an evaluation layer for benchmarking performance and analysis. The focus of this study is to extract valuable information from COVID-19 case reports. RESULTS The named entity recognition implementation in the NLP layer achieves a performance gain of about 1-3% compared to benchmark methods. Furthermore, even without extensive data labeling, the relation extraction method outperforms benchmark methods in terms of accuracy (by 1-8% better). A thorough examination reveals the disease's presence and symptoms prevalence in patients. CONCLUSIONS A similar approach can be generalized to other infectious diseases. It is worthwhile to use prior knowledge acquired through transfer learning when researching other infectious diseases.
Collapse
Affiliation(s)
- Shaina Raza
- grid.415400.40000 0001 1505 2354Public Health Ontario (PHO), Toronto, ON Canada ,grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada
| | - Brian Schwartz
- grid.415400.40000 0001 1505 2354Public Health Ontario (PHO), Toronto, ON Canada ,grid.17063.330000 0001 2157 2938Dalla Lana School of Public Health, University of Toronto, Toronto, ON Canada
| |
Collapse
|
7
|
Li M, Yang H, Liu Y. Biomedical named entity recognition based on fusion multi-features embedding. Technol Health Care 2023; 31:111-121. [PMID: 37038786 DOI: 10.3233/thc-236011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
BACKGROUND With the exponential increase in the volume of biomedical literature, text mining tasks are becoming increasingly important in the medical domain. Named entities are the primary identification tasks in text mining, prerequisites and critical parts for building medical domain knowledge graphs, medical question and answer systems, medical text classification. OBJECTIVE The study goal is to recognize biomedical entities effectively by fusing multi-feature embedding. Multiple features provide more comprehensive information so that better predictions can be obtained. METHODS Firstly, three different kinds of features are generated, including deep contextual word-level features, local char-level features, and part-of-speech features at the word representation layer. The word representation vectors are inputs into BiLSTM as features to obtain the dependency information. Finally, the CRF algorithm is used to learn the features of the state sequences to obtain the global optimal tagging sequences. RESULTS The experimental results showed that the model outperformed other state-of-the-art methods for all-around performance in six datasets among eight of four biomedical entity types. CONCLUSION The proposed method has a positive effect on the prediction results. It comprehensively considers the relevant factors of named entity recognition because the semantic information is enhanced by fusing multi-features embedding.
Collapse
|
8
|
Hall K, Chang V, Jayne C. A review on Natural Language Processing Models for COVID-19 research. HEALTHCARE ANALYTICS (NEW YORK, N.Y.) 2022; 2:100078. [PMID: 37520621 PMCID: PMC9295335 DOI: 10.1016/j.health.2022.100078] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Revised: 07/08/2022] [Accepted: 07/12/2022] [Indexed: 11/22/2022]
Abstract
This survey paper reviews Natural Language Processing Models and their use in COVID-19 research in two main areas. Firstly, a range of transformer-based biomedical pretrained language models are evaluated using the BLURB benchmark. Secondly, models used in sentiment analysis surrounding COVID-19 vaccination are evaluated. We filtered literature curated from various repositories such as PubMed and Scopus and reviewed 27 papers. When evaluated using the BLURB benchmark, the novel T-BPLM BioLinkBERT gives groundbreaking results by incorporating document link knowledge and hyperlinking into its pretraining. Sentiment analysis of COVID-19 vaccination through various Twitter API tools has shown the public's sentiment towards vaccination to be mostly positive. Finally, we outline some limitations and potential solutions to drive the research community to improve the models used for NLP tasks.
Collapse
Affiliation(s)
| | - Victor Chang
- Operations Information Management, ABS, Aston University, UK
| | | |
Collapse
|
9
|
Wang Q, Liao J, Lapata M, Macleod M. PICO entity extraction for preclinical animal literature. Syst Rev 2022; 11:209. [PMID: 36180888 PMCID: PMC9524079 DOI: 10.1186/s13643-022-02074-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 09/12/2022] [Indexed: 12/09/2022] Open
Abstract
BACKGROUND Natural language processing could assist multiple tasks in systematic reviews to reduce workflow, including the extraction of PICO elements such as study populations, interventions, comparators and outcomes. The PICO framework provides a basis for the retrieval and selection for inclusion of evidence relevant to a specific systematic review question, and automatic approaches to PICO extraction have been developed particularly for reviews of clinical trial findings. Considering the difference between preclinical animal studies and clinical trials, developing separate approaches is necessary. Facilitating preclinical systematic reviews will inform the translation from preclinical to clinical research. METHODS We randomly selected 400 abstracts from the PubMed Central Open Access database which described in vivo animal research and manually annotated these with PICO phrases for Species, Strain, methods of Induction of disease model, Intervention, Comparator and Outcome. We developed a two-stage workflow for preclinical PICO extraction. Firstly we fine-tuned BERT with different pre-trained modules for PICO sentence classification. Then, after removing the text irrelevant to PICO features, we explored LSTM-, CRF- and BERT-based models for PICO entity recognition. We also explored a self-training approach because of the small training corpus. RESULTS For PICO sentence classification, BERT models using all pre-trained modules achieved an F1 score of over 80%, and models pre-trained on PubMed abstracts achieved the highest F1 of 85%. For PICO entity recognition, fine-tuning BERT pre-trained on PubMed abstracts achieved an overall F1 of 71% and satisfactory F1 for Species (98%), Strain (70%), Intervention (70%) and Outcome (67%). The score of Induction and Comparator is less satisfactory, but F1 of Comparator can be improved to 50% by applying self-training. CONCLUSIONS Our study indicates that of the approaches tested, BERT pre-trained on PubMed abstracts is the best for both PICO sentence classification and PICO entity recognition in the preclinical abstracts. Self-training yields better performance for identifying comparators and strains.
Collapse
Affiliation(s)
- Qianying Wang
- CCBS, Edinburgh Medical School, University of Edinburgh, Edinburgh, UK
| | - Jing Liao
- CCBS, Edinburgh Medical School, University of Edinburgh, Edinburgh, UK
| | - Mirella Lapata
- ILCC, School of Informatics, University of Edinburgh, Edinburgh, UK
| | - Malcolm Macleod
- CCBS, Edinburgh Medical School, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
10
|
Mallick R, Arnaboldi V, Davis P, Diamantakis S, Zarowiecki M, Howe K. Accelerated variant curation from scientific literature using biomedical text mining. MICROPUBLICATION BIOLOGY 2022; 2022:10.17912/micropub.biology.000578. [PMID: 35663412 PMCID: PMC9160977 DOI: 10.17912/micropub.biology.000578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 05/19/2022] [Accepted: 06/01/2022] [Indexed: 11/20/2022]
Abstract
Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have developed a hybrid approach that combines regular expressions, Named Entity Recognition based on BERT (Bidirectional Encoder Representations from Transformers) and bag-of-words to extract variant genomic locations from C. elegans papers for WormBase. Our model has a precision of 82.59% for the gene-mutation matches tested on extracted text from 100 papers, and even recovers some data not discovered during manual curation. Code at: https://github.com/WormBase/genomic-info-from-papers.
Collapse
Affiliation(s)
- Rishab Mallick
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Valerio Arnaboldi
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Paul Davis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stavros Diamantakis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Magdalena Zarowiecki
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kevin Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
,
Correspondence to: Kevin Howe (
)
| |
Collapse
|
11
|
Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021; 126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]
Abstract
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Collapse
|