1
|
Grouin C, Grabar N. Year 2022 in Medical Natural Language Processing: Availability of Language Models as a Step in the Democratization of NLP in the Biomedical Area. Yearb Med Inform 2023; 32:244-252. [PMID: 38147866 PMCID: PMC10751107 DOI: 10.1055/s-0043-1768752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVES To analyse the content of publications within the medical Natural Language Processing (NLP) domain in 2022. METHODS Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. RESULTS Three best papers have been selected. We also propose an analysis of the content of the NLP publications in 2022, stressing on some of the topics. CONCLUSION The main trend in 2022 is certainly related to the availability of large language models, especially those based on Transformers, and to their use by non-NLP researchers. This leads to the democratization of the NLP methods. We also observe the renewal of interest to languages other than English, the continuation of research on information extraction and prediction, the massive use of data from social media, and the consideration of needs and interests of patients.
Collapse
Affiliation(s)
- Cyril Grouin
- Université Paris Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91400 Orsay, France
| | - Natalia Grabar
- UMR8163 STL, CNRS, Université de Lille, Domaine du Pont-de-bois, 59653 Villeneuve-d'Ascq cedex, France
| | | |
Collapse
|
2
|
Zaman S, Vimalesvaran K, Howard JP, Chappell D, Varela M, Peters NS, Francis DP, Bharath AA, Linton NWF, Cole GD. Efficient labelling for efficient deep learning: the benefit of a multiple-image-ranking method to generate high volume training data applied to ventricular slice level classification in cardiac MRI. JOURNAL OF MEDICAL ARTIFICIAL INTELLIGENCE 2023; 6:4. [PMID: 37346802 PMCID: PMC7614685 DOI: 10.21037/jmai-22-55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/23/2023]
Abstract
Background Getting the most value from expert clinicians' limited labelling time is a major challenge for artificial intelligence (AI) development in clinical imaging. We present a novel method for ground-truth labelling of cardiac magnetic resonance imaging (CMR) image data by leveraging multiple clinician experts ranking multiple images on a single ordinal axis, rather than manual labelling of one image at a time. We apply this strategy to train a deep learning (DL) model to classify the anatomical position of CMR images. This allows the automated removal of slices that do not contain the left ventricular (LV) myocardium. Methods Anonymised LV short-axis slices from 300 random scans (3,552 individual images) were extracted. Each image's anatomical position relative to the LV was labelled using two different strategies performed for 5 hours each: (I) 'one-image-at-a-time': each image labelled according to its position: 'too basal', 'LV', or 'too apical' individually by one of three experts; and (II) 'multiple-image-ranking': three independent experts ordered slices according to their relative position from 'most-basal' to 'most apical' in batches of eight until each image had been viewed at least 3 times. Two convolutional neural networks were trained for a three-way classification task (each model using data from one labelling strategy). The models' performance was evaluated by accuracy, F1-score, and area under the receiver operating characteristics curve (ROC AUC). Results After excluding images with artefact, 3,323 images were labelled by both strategies. The model trained using labels from the 'multiple-image-ranking strategy' performed better than the model using the 'one-image-at-a-time' labelling strategy (accuracy 86% vs. 72%, P=0.02; F1-score 0.86 vs. 0.75; ROC AUC 0.95 vs. 0.86). For expert clinicians performing this task manually the intra-observer variability was low (Cohen's κ=0.90), but the inter-observer variability was higher (Cohen's κ=0.77). Conclusions We present proof of concept that, given the same clinician labelling effort, comparing multiple images side-by-side using a 'multiple-image-ranking' strategy achieves ground truth labels for DL more accurately than by classifying images individually. We demonstrate a potential clinical application: the automatic removal of unrequired CMR images. This leads to increased efficiency by focussing human and machine attention on images which are needed to answer clinical questions.
Collapse
Affiliation(s)
- Sameer Zaman
- National Heart and Lung Institute, Imperial College London, London, UK
- Imperial College Healthcare NHS Trust, London, UK
- AI for Healthcare Centre for Doctoral Training, Imperial College London, London, UK
| | - Kavitha Vimalesvaran
- National Heart and Lung Institute, Imperial College London, London, UK
- Imperial College Healthcare NHS Trust, London, UK
- AI for Healthcare Centre for Doctoral Training, Imperial College London, London, UK
| | - James P. Howard
- National Heart and Lung Institute, Imperial College London, London, UK
- Imperial College Healthcare NHS Trust, London, UK
| | - Digby Chappell
- AI for Healthcare Centre for Doctoral Training, Imperial College London, London, UK
| | - Marta Varela
- National Heart and Lung Institute, Imperial College London, London, UK
| | | | - Darrel P. Francis
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Anil A. Bharath
- Department of Bioengineering, Imperial College London, London, UK
| | - Nick W. F. Linton
- Imperial College Healthcare NHS Trust, London, UK
- Department of Bioengineering, Imperial College London, London, UK
| | - Graham D. Cole
- Imperial College Healthcare NHS Trust, London, UK
- Department of Bioengineering, Imperial College London, London, UK
| |
Collapse
|
3
|
Singh P, Haimovich J, Reeder C, Khurshid S, Lau ES, Cunningham JW, Philippakis A, Anderson CD, Ho JE, Lubitz SA, Batra P. One Clinician Is All You Need-Cardiac Magnetic Resonance Imaging Measurement Extraction: Deep Learning Algorithm Development. JMIR Med Inform 2022; 10:e38178. [PMID: 35960155 PMCID: PMC9526125 DOI: 10.2196/38178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 07/22/2022] [Accepted: 08/11/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Cardiac magnetic resonance imaging (CMR) is a powerful diagnostic modality that provides detailed quantitative assessment of cardiac anatomy and function. Automated extraction of CMR measurements from clinical reports that are typically stored as unstructured text in electronic health record systems would facilitate their use in research. Existing machine learning approaches either rely on large quantities of expert annotation or require the development of engineered rules that are time-consuming and are specific to the setting in which they were developed. OBJECTIVE We hypothesize that the use of pretrained transformer-based language models may enable label-efficient numerical extraction from clinical text without the need for heuristics or large quantities of expert annotations. Here, we fine-tuned pretrained transformer-based language models on a small quantity of CMR annotations to extract 21 CMR measurements. We assessed the effect of clinical pretraining to reduce labeling needs and explored alternative representations of numerical inputs to improve performance. METHODS Our study sample comprised 99,252 patients that received longitudinal cardiology care in a multi-institutional health care system. There were 12,720 available CMR reports from 9280 patients. We adapted PRAnCER (Platform Enabling Rapid Annotation for Clinical Entity Recognition), an annotation tool for clinical text, to collect annotations from a study clinician on 370 reports. We experimented with 5 different representations of numerical quantities and several model weight initializations. We evaluated extraction performance using macroaveraged F1-scores across the measurements of interest. We applied the best-performing model to extract measurements from the remaining CMR reports in the study sample and evaluated established associations between selected extracted measures with clinical outcomes to demonstrate validity. RESULTS All combinations of weight initializations and numerical representations obtained excellent performance on the gold-standard test set, suggesting that transformer models fine-tuned on a small set of annotations can effectively extract numerical quantities. Our results further indicate that custom numerical representations did not appear to have a significant impact on extraction performance. The best-performing model achieved a macroaveraged F1-score of 0.957 across the evaluated CMR measurements (range 0.92 for the lowest-performing measure of left atrial anterior-posterior dimension to 1.0 for the highest-performing measures of left ventricular end systolic volume index and left ventricular end systolic diameter). Application of the best-performing model to the study cohort yielded 136,407 measurements from all available reports in the study sample. We observed expected associations between extracted left ventricular mass index, left ventricular ejection fraction, and right ventricular ejection fraction with clinical outcomes like atrial fibrillation, heart failure, and mortality. CONCLUSIONS This study demonstrated that a domain-agnostic pretrained transformer model is able to effectively extract quantitative clinical measurements from diagnostic reports with a relatively small number of gold-standard annotations. The proposed workflow may serve as a roadmap for other quantitative entity extraction.
Collapse
Affiliation(s)
- Pulkit Singh
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Julian Haimovich
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher Reeder
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Shaan Khurshid
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Emily S Lau
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Jonathan W Cunningham
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Division of Cardiology, Brigham and Women's Hospital, Boston, MA, United States
| | - Anthony Philippakis
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Eric and Wendy Schmidt Center, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| | - Christopher D Anderson
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, United States
- Henry and Allison McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, United States
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Jennifer E Ho
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- CardioVascular Institute and Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, United States
| | - Steven A Lubitz
- Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, United States
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, United States
| | - Puneet Batra
- Data Sciences Platform, The Broad Institute of Harvard and MIT, Cambridge, MA, United States
| |
Collapse
|
4
|
Li J, Lin Y, Zhao P, Liu W, Cai L, Sun J, Zhao L, Yang Z, Song H, Lv H, Wang Z. Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT). BMC Med Inform Decis Mak 2022; 22:200. [PMID: 35907966 PMCID: PMC9338483 DOI: 10.1186/s12911-022-01946-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 07/18/2022] [Indexed: 11/17/2022] Open
Abstract
Background Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited. Objective The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy. Methods A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values. Results In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760). Conclusion In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01946-y.
Collapse
Affiliation(s)
- Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Yucong Lin
- School of Medical Technology, Beijing Institute of Technology, No.5 Zhongguancun East Road, Beijing, 100050, People's Republic of China
| | - Pengfei Zhao
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Wenjuan Liu
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, No.37 XueYuan Road, Beijing, 100191, People's Republic of China
| | - Jing Sun
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Lei Zhao
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Zhenghan Yang
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing, 100050, People's Republic of China.
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China.
| | - Zhenchang Wang
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China. .,School of Biological Science and Medical Engineering, Beihang University, No.37 XueYuan Road, Beijing, 100191, People's Republic of China.
| |
Collapse
|
5
|
Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. Radiol Artif Intell 2022; 4:e220007. [PMID: 35923377 PMCID: PMC9344209 DOI: 10.1148/ryai.220007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/08/2022] [Accepted: 06/14/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. MATERIALS AND METHODS The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. RESULTS The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. CONCLUSION Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.Keywords: Informatics, Named Entity Recognition, Transfer Learning Supplemental material is available for this article. ©RSNA, 2022See also the commentary by Zech in this issue.
Collapse
|
6
|
Tran QT, Alom MZ, Orr BA. Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors. BMC Bioinformatics 2022; 23:223. [PMID: 35676649 PMCID: PMC9178802 DOI: 10.1186/s12859-022-04764-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Precision medicine for cancer treatment relies on an accurate pathological diagnosis. The number of known tumor classes has increased rapidly, and reliance on traditional methods of histopathologic classification alone has become unfeasible. To help reduce variability, validation costs, and standardize the histopathological diagnostic process, supervised machine learning models using DNA-methylation data have been developed for tumor classification. These methods require large labeled training data sets to obtain clinically acceptable classification accuracy. While there is abundant unlabeled epigenetic data across multiple databases, labeling pathology data for machine learning models is time-consuming and resource-intensive, especially for rare tumor types. Semi-supervised learning (SSL) approaches have been used to maximize the utility of labeled and unlabeled data for classification tasks and are effectively applied in genomics. SSL methods have not yet been explored with epigenetic data nor demonstrated beneficial to central nervous system (CNS) tumor classification. RESULTS This paper explores the application of semi-supervised machine learning on methylation data to improve the accuracy of supervised learning models in classifying CNS tumors. We comprehensively evaluated 11 SSL methods and developed a novel combination approach that included a self-training with editing using support vector machine (SETRED-SVM) model and an L2-penalized, multinomial logistic regression model to obtain high confidence labels from a few labeled instances. Results across eight random forest and neural net models show that the pseudo-labels derived from our SSL method can significantly increase prediction accuracy for 82 CNS tumors and 9 normal controls. CONCLUSIONS The proposed combination of semi-supervised technique and multinomial logistic regression holds the potential to leverage the abundant publicly available unlabeled methylation data effectively. Such an approach is highly beneficial in providing additional training examples, especially for scarce tumor types, to boost the prediction accuracy of supervised models.
Collapse
Affiliation(s)
- Quynh T Tran
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA
| | - Md Zahangir Alom
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA
| | - Brent A Orr
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, MS 250, Memphis, TN, 38105-3678, USA.
| |
Collapse
|