1
|
Use of GPT-4 With Single-Shot Learning to Identify Incidental Findings in Radiology Reports. AJR Am J Roentgenol 2024; 222:e2330651. [PMID: 38197759 DOI: 10.2214/ajr.23.30651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
|
2
|
Imaging Analytics using Artificial Intelligence in Oncology: A Comprehensive Review. Clin Oncol (R Coll Radiol) 2023:S0936-6555(23)00334-5. [PMID: 37806795 DOI: 10.1016/j.clon.2023.09.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/09/2023] [Accepted: 09/21/2023] [Indexed: 10/10/2023]
Abstract
The present era has seen a surge in artificial intelligence-related research in oncology, mainly using deep learning, because of powerful computer hardware, improved algorithms and the availability of large amounts of data from open-source domains and the use of transfer learning. Here we discuss the multifaceted role of deep learning in cancer care, ranging from risk stratification, the screening and diagnosis of cancer, to the prediction of genomic mutations, treatment response and survival outcome prediction, through the use of convolutional neural networks. Another role of artificial intelligence is in the generation of automated radiology reports, which is a boon in high-volume centres to minimise report turnaround time. Although a validated and deployable deep-learning model for clinical use is still in its infancy, there is ongoing research to overcome the barriers for its universal implementation and we also delve into this aspect. We also briefly describe the role of radiomics in oncoimaging. Artificial intelligence can provide answers pertaining to cancer management at baseline imaging, saving cost and time. Imaging biobanks, which are repositories of anonymised images, are also briefly described. We also discuss the commercialisation and ethical issues pertaining to artificial intelligence. The latest generation generalist artificial intelligence model is also briefly described at the end of the article. We believe this article will not only enrich knowledge, but also promote research acumen in the minds of readers to take oncoimaging to another level using artificial intelligence and also work towards clinical translation of such research.
Collapse
|
3
|
Automated detection of causal relationships among diseases and imaging findings in textual radiology reports. J Am Med Inform Assoc 2023; 30:1701-1706. [PMID: 37381076 PMCID: PMC10531499 DOI: 10.1093/jamia/ocad119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 06/10/2023] [Accepted: 06/16/2023] [Indexed: 06/30/2023] Open
Abstract
OBJECTIVE Textual radiology reports contain a wealth of information that may help understand associations among diseases and imaging observations. This study evaluated the ability to detect causal associations among diseases and imaging findings from their co-occurrence in radiology reports. MATERIALS AND METHODS This IRB-approved and HIPAA-compliant study analyzed 1 702 462 consecutive reports of 1 396 293 patients; patient consent was waived. Reports were analyzed for positive mention of 16 839 entities (disorders and imaging findings) of the Radiology Gamuts Ontology (RGO). Entities that occurred in fewer than 25 patients were excluded. A Bayesian network structure-learning algorithm was applied at P < 0.05 threshold: edges were evaluated as possible causal relationships. RGO and/or physician consensus served as ground truth. RESULTS 2742 of 16 839 RGO entities were included, 53 849 patients (3.9%) had at least one included entity. The algorithm identified 725 pairs of entities as causally related; 634 were confirmed by reference to RGO or physician review (87% precision). As shown by its positive likelihood ratio, the algorithm increased detection of causally associated entities 6876-fold. DISCUSSION Causal relationships among diseases and imaging findings can be detected with high precision from textual radiology reports. CONCLUSION This approach finds causal relationships among diseases and imaging findings with high precision from textual radiology reports, despite the fact that causally related entities represent only 0.039% of all pairs of entities. Applying this approach to larger report text corpora may help detect unspecified or heretofore unrecognized associations.
Collapse
|
4
|
Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
|
5
|
Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique. J Digit Imaging 2023; 36:80-90. [PMID: 36002778 PMCID: PMC9984654 DOI: 10.1007/s10278-022-00692-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 06/20/2022] [Accepted: 07/27/2022] [Indexed: 11/29/2022] Open
Abstract
Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.
Collapse
|
6
|
Semiautomated pelvic lymph node treatment response evaluation for patients with advanced prostate cancer: based on MET-RADS-P guidelines. Cancer Imaging 2023; 23:7. [PMID: 36650584 PMCID: PMC9847043 DOI: 10.1186/s40644-023-00523-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 01/05/2023] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The evaluation of treatment response according to METastasis Reporting and Data System for Prostate Cancer (MET-RADS-P) criteria is an important but time-consuming task for patients with advanced prostate cancer (APC). A deep learning-based algorithm has the potential to assist with this assessment. OBJECTIVE To develop and evaluate a deep learning-based algorithm for semiautomated treatment response assessment of pelvic lymph nodes. METHODS A total of 162 patients who had undergone at least two scans for follow-up assessment after APC metastasis treatment were enrolled. A previously reported deep learning model was used to perform automated segmentation of pelvic lymph nodes. The performance of the deep learning algorithm was evaluated using the Dice similarity coefficient (DSC) and volumetric similarity (VS). The consistency of the short diameter measurement with the radiologist was evaluated using Bland-Altman plotting. Based on the segmentation of lymph nodes, the treatment response was assessed automatically with a rule-based program according to the MET-RADS-P criteria. Kappa statistics were used to assess the accuracy and consistency of the treatment response assessment by the deep learning model and two radiologists [attending radiologist (R1) and fellow radiologist (R2)]. RESULTS The mean DSC and VS of the pelvic lymph node segmentation were 0.82 ± 0.09 and 0.88 ± 0.12, respectively. Bland-Altman plotting showed that most of the lymph node measurements were within the upper and lower limits of agreement (LOA). The accuracies of automated segmentation-based assessment were 0.92 (95% CI: 0.85-0.96), 0.91 (95% CI: 0.86-0.95) and 75% (95% CI: 0.46-0.92) for target lesions, nontarget lesions and nonpathological lesions, respectively. The consistency of treatment response assessment based on automated segmentation and manual segmentation was excellent for target lesions [K value: 0.92 (0.86-0.98)], good for nontarget lesions [0.82 (0.74-0.90)] and moderate for nonpathological lesions [0.71 (0.50-0.92)]. CONCLUSION The deep learning-based semiautomated algorithm showed high accuracy for the treatment response assessment of pelvic lymph nodes and demonstrated comparable performance with radiologists.
Collapse
|
7
|
IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record. INFORMATION 2023. [DOI: 10.3390/info14010049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
To date, information retrieval methods in the medical field have mainly focused on English medical reports, but little work has studied Chinese electronic medical reports, especially in the field of obstetrics and gynecology. In this paper, a dataset of 180,000 complete Chinese ultrasound reports in obstetrics and gynecology was established and made publicly available. Based on the ultrasound reports in the dataset, a new information retrieval method (IKAR) is proposed to extract key information from the ultrasound reports and automatically generate the corresponding ultrasound diagnostic results. The model can both extract what is already in the report and analyze what is not in the report by inference. After applying the IKAR method to the dataset, it is proved that the method could achieve 89.38% accuracy, 91.09% recall, and 90.23% F-score. Moreover, the method achieves an F-score of over 90% on 50% of the 10 components of the report. This study provides a quality dataset for the field of electronic medical records and offers a reference for information retrieval methods in the field of obstetrics and gynecology or in other fields.
Collapse
|
8
|
Integrated Damage Location Diagnosis of Frame Structure Based on Convolutional Neural Network with Inception Module. SENSORS (BASEL, SWITZERLAND) 2022; 23:418. [PMID: 36617014 PMCID: PMC9824787 DOI: 10.3390/s23010418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 12/25/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
Accurate damage location diagnosis of frame structures is of great significance to the judgment of damage degree and subsequent maintenance of frame structures. However, the similarity characteristics of vibration data at different damage locations and noise interference bring great challenges. In order to overcome the above problems and realize accurate damage location diagnosis of the frame structure, the existing convolutional neural network with training interference (TICNN) is improved in this paper, and a high-precision neural network model named convolutional neural network based on Inception (BICNN) for fault diagnosis with strong anti-noise ability is proposed by adding the Inception module to TICNN. In order to effectively avoid the overall misjudgment problem caused by using single sensor data for damage location diagnosis, an integrated damage location diagnosis method is proposed. Taking the four-story steel frame model of the University of British Columbia as the research object, the method proposed in this paper is tested and compared with other methods. The experimental results show that the diagnosis accuracy of the proposed method is 97.38%, which is higher than other methods; at the same time, it has greater advantages in noise resistance. Therefore, the method proposed in this paper not only has high accuracy, but also has strong anti-noise ability, which can solve the problem of accurate damage location diagnosis of complex frame structures under a strong noise environment.
Collapse
|
9
|
Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022; 10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. OBJECTIVE This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. METHODS In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. RESULTS For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. CONCLUSIONS Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.
Collapse
|
10
|
Implementation of an AI model to triage paediatric brain magnetic resonance imaging orders. ANNALS OF THE ACADEMY OF MEDICINE, SINGAPORE 2022. [DOI: 10.47102/annals-acadmedsg.2022104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Comparisons of deep learning and machine learning while using text mining methods to identify suicide attempts of patients with mood disorders. J Affect Disord 2022; 317:107-113. [PMID: 36029873 DOI: 10.1016/j.jad.2022.08.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/05/2022] [Accepted: 08/20/2022] [Indexed: 11/23/2022]
Abstract
BACKGROUND Suicide attempt is one of the most severe consequences for patients with mood disorders. This study aimed to perform deep learning and machine learning while using text mining to identify patients with suicide attempts and to compare their effectiveness. METHODS A total of 13,100 patients with mood disorders were selected. Two traditional text mining methods, logistic regression and Support vector machine (SVM), and one deep learning model (Convolutional neural network, CNN) were adopted to perform overall analysis and gender-specific subgroup analysis of patients to identify suicide attempts. The classification effectiveness of these models was evaluated by accuracy, F1-value, precision, recall, and the area under Receiver operator characteristic curve (ROC). RESULTS CNN's results were greater than the other two for all indicators except recall which was slightly smaller than SVM in male subgroup analysis. The accuracy values of the CNN were 98.4 %, 98.2 %, and 98.5 % in the overall analysis and the subgroup analysis for males and females, respectively. The results of McNemar's test showed that CNN and SVM models' predictions were statistically different from the logistic regression model's predictions in the overall analysis and the subgroup analysis for females (P < 0.050). LIMITATIONS A fixed number of features were selected based on document frequency to train models; this was a single-site study. CONCLUSIONS CNN model was a better way to detect suicide attempts in patients with mood disorders prior to hospital admission, saving time and resources in recognizing high-risk patients and preventing suicide.
Collapse
|
12
|
Recognizing the aeroacoustic information of noise radiated by an unflanged duct based on convolutional neural networks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2531. [PMID: 36456274 DOI: 10.1121/10.0015003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 10/10/2022] [Indexed: 06/17/2023]
Abstract
Accurately recognizing the aeroacoustic information of noise propagating into and radiating out of an aero-engine duct is of both fundamental and practical interest. The aeroacoustic information includes (1) the acoustic properties of the noise source, such as the frequency (f) and the circumferential and radial mode numbers (m, n), and (2) the flight conditions, including the ambient flow speed (M0) and the jet flow speed (M1). In this study, a data-driven model is developed to predict the aeroacoustic information of a simplified aero-engine duct noise from the far-field sound pressure level directivity. The model is constructed by the integration of one-dimensional convolutional layers and fully connected layers. The training and validation datasets are calculated from the analytical model for noise radiation from a semi-infinite unflanged duct based on the Wiener-Hopf method. For a single-spinning mode source, a regression model is established for f, M0, and M1 prediction, and a classification model is built up for m and n prediction. Additionally, for a multi-spinning mode source, the regression model is used to predict the coefficient of each mode. Results show that the proposed data-driven model can effectively and robustly predict the acoustic characteristics of noise propagation in and radiation out of an aero-engine bypass duct.
Collapse
|
13
|
Multi-objective data enhancement for deep learning-based ultrasound analysis. BMC Bioinformatics 2022; 23:438. [PMID: 36266626 PMCID: PMC9583467 DOI: 10.1186/s12859-022-04985-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 10/10/2022] [Indexed: 11/10/2022] Open
Abstract
Recently, Deep Learning based automatic generation of treatment recommendation has been attracting much attention. However, medical datasets are usually small, which may lead to over-fitting and inferior performances of deep learning models. In this paper, we propose multi-objective data enhancement method to indirectly scale up the medical data to avoid over-fitting and generate high quantity treatment recommendations. Specifically, we define a main and several auxiliary tasks on the same dataset and train a specific model for each of these tasks to learn different aspects of knowledge in limited data scale. Meanwhile, a Soft Parameter Sharing method is exploited to share learned knowledge among models. By sharing the knowledge learned by auxiliary tasks to the main task, the proposed method can take different semantic distributions into account during the training process of the main task. We collected an ultrasound dataset of thyroid nodules that contains Findings, Impressions and Treatment Recommendations labeled by professional doctors. We conducted various experiments on the dataset to validate the proposed method and justified its better performance than existing methods.
Collapse
|
14
|
The natural language processing of radiology requests and reports of chest imaging: Comparing five transformer models’ multilabel classification and a proof-of-concept study. Health Informatics J 2022; 28:14604582221131198. [DOI: 10.1177/14604582221131198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Background Radiology requests and reports contain valuable information about diagnostic findings and indications, and transformer-based language models are promising for more accurate text classification. Methods In a retrospective study, 2256 radiologist-annotated radiology requests (8 classes) and reports (10 classes) were divided into training and testing datasets (90% and 10%, respectively) and used to train 32 models. Performance metrics were compared by model type (LSTM, Bertje, RobBERT, BERT-clinical, BERT-multilingual, BERT-base), text length, data prevalence, and training strategy. The best models were used to predict the remaining 40,873 cases’ categories of the datasets of requests and reports. Results The RobBERT model performed the best after 4000 training iterations, resulting in AUC values ranging from 0.808 [95% CI (0.757–0.859)] to 0.976 [95% CI (0.956–0.996)] for the requests and 0.746 [95% CI (0.689–0.802)] to 1.0 [95% CI (1.0–1.0)] for the reports. The AUC for the classification of normal reports was 0.95 [95% CI (0.922–0.979)]. The predicted data demonstrated variability of both diagnostic yield for various request classes and request patterns related to COVID-19 hospital admission data. Conclusion Transformer-based natural language processing is feasible for the multilabel classification of chest imaging request and report items. Diagnostic yield varies with the information in the requests.
Collapse
|
15
|
Transfer learning in diagnosis of maxillary sinusitis using panoramic radiography and conventional radiography. Oral Radiol 2022:10.1007/s11282-022-00658-3. [DOI: 10.1007/s11282-022-00658-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 09/19/2022] [Indexed: 10/14/2022]
|
16
|
A gradient boosting tree model for multi-department venous thromboembolism risk assessment with imbalanced data. J Biomed Inform 2022; 134:104210. [PMID: 36122879 DOI: 10.1016/j.jbi.2022.104210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 08/17/2022] [Accepted: 09/12/2022] [Indexed: 11/19/2022]
Abstract
Venous thromboembolism (VTE) is the world's third most common cause of vascular mortality and a serious complication from multiple departments. Risk assessment of VTE guides clinical intervention in time and is of great importance to in-hospital patients. Traditional VTE risk assessment methods based on scaling tools, which always require rules carefully designed by human experts, are difficult to apply to large-population scenarios since the manually designed rules are not guaranteed to be accurate to all populations. In contrast, with the development of the electronic health record (EHR) datasets, data-driven machine-learning-based risk assessment methods have proven superior predictability in many studies in recent years. This paper uses the gradient boosting tree model to study the VTE risk assessment problem with multi-department data. There exist two distinct characteristics of VTE data collected at the level of the entire hospital: its wide distribution and heterogeneity across multiple departments. To this end, we consider the prediction task over multiple departments as a multi-task learning process, and introduce the algorithm of a task-aware tree-based method TSGB to tackle the multi-task prediction problem. Although the introduction of multi-task learning improves overall across-department performance, we reveal the problem of task-wise performance decline while dealing with imbalanced VTE data volume. According to the analysis, we finally propose two variants of TSGB to alleviate the problems and further boost the prediction performance. Compared with state-of-the-art rule-based and multi-task tree-based methods, the experimental results show the proposed methods not only improve the overall across-department AUC performance effectively, but also ensure the improvement of performance over every single department prediction.
Collapse
|
17
|
Automated extraction of information of lung cancer staging from unstructured reports of PET-CT interpretation: natural language processing with deep-learning. BMC Med Inform Decis Mak 2022; 22:229. [PMID: 36050674 PMCID: PMC9438247 DOI: 10.1186/s12911-022-01975-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/24/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Extracting metastatic information from previous radiologic-text reports is important, however, laborious annotations have limited the usability of these texts. We developed a deep-learning model for extracting primary lung cancer sites and metastatic lymph nodes and distant metastasis information from PET-CT reports for determining lung cancer stages. METHODS PET-CT reports, fully written in English, were acquired from two cohorts of patients with lung cancer who were diagnosed at a tertiary hospital between January 2004 and March 2020. One cohort of 20,466 PET-CT reports was used for training and the validation set, and the other cohort of 4190 PET-CT reports was used for an additional-test set. A pre-processing model (Lung Cancer Spell Checker) was applied to correct the typographical errors, and pseudo-labelling was used for training the model. The deep-learning model was constructed using the Convolutional-Recurrent Neural Network. The performance metrics for the prediction model were accuracy, precision, sensitivity, micro-AUROC, and AUPRC. RESULTS For the extraction of primary lung cancer location, the model showed a micro-AUROC of 0.913 and 0.946 in the validation set and the additional-test set, respectively. For metastatic lymph nodes, the model showed a sensitivity of 0.827 and a specificity of 0.960. In predicting distant metastasis, the model showed a micro-AUROC of 0.944 and 0.950 in the validation and the additional-test set, respectively. CONCLUSION Our deep-learning method could be used for extracting lung cancer stage information from PET-CT reports and may facilitate lung cancer studies by alleviating laborious annotation by clinicians.
Collapse
|
18
|
Accurately Identifying Cerebroarterial Stenosis from Angiography Reports Using Natural Language Processing Approaches. Diagnostics (Basel) 2022; 12:diagnostics12081882. [PMID: 36010232 PMCID: PMC9406429 DOI: 10.3390/diagnostics12081882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 07/31/2022] [Accepted: 08/02/2022] [Indexed: 11/16/2022] Open
Abstract
Patients with intracranial artery stenosis show high incidence of stroke. Angiography reports contain rich but underutilized information that can enable the detection of cerebrovascular diseases. This study evaluated various natural language processing (NLP) techniques to accurately identify eleven intracranial artery stenosis from angiography reports. Three NLP models, including a rule-based model, a recurrent neural network (RNN), and a contextualized language model, XLNet, were developed and evaluated by internal–external cross-validation. In this study, angiography reports from two independent medical centers (9614 for training and internal validation testing and 315 as external validation) were assessed. The internal testing results showed that XLNet had the best performance, with a receiver operating characteristic curve (AUROC) ranging from 0.97 to 0.99 using eleven targeted arteries. The rule-based model attained an AUROC from 0.92 to 0.96, and the RNN long short-term memory model attained an AUROC from 0.95 to 0.97. The study showed the potential application of NLP techniques such as the XLNet model for the routine and automatic screening of patients with high risk of intracranial artery stenosis using angiography reports. However, the NLP models were investigated based on relatively small sample sizes with very different report writing styles and a prevalence of stenosis case distributions, revealing challenges for model generalization.
Collapse
|
19
|
Development of lumbar spine MRI referrals vetting models using machine learning and deep learning algorithms: Comparison models vs healthcare professionals. Radiography (Lond) 2022; 28:674-683. [PMID: 35700654 DOI: 10.1016/j.radi.2022.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/28/2022] [Accepted: 05/24/2022] [Indexed: 11/28/2022]
Abstract
INTRODUCTION Referrals vetting is a necessary daily task to ensure the appropriateness of radiology referrals. Vetting requires extensive clinical knowledge and may challenge those responsible. This study aims to develop AI models to automate the vetting process and to compare their performance with healthcare professionals. METHODS 1020 lumbar spine MRI referrals were collected retrospectively from two Irish hospitals. Three expert MRI radiographers classified the referrals into indicated or not indicated for scanning based on iRefer guidelines. The reference label for each referral was assigned based on the majority voting. The corpus was divided into two datasets, one for the models' development with 920 referrals, and one included 100 referrals used as a held-out for the final comparison of the AI models versus national and international MRI radiographers. Three traditional models were developed: SVM, LR, RF, and two deep neural models, including CNN and Bi-LSTM. For the traditional models, four vectorisation techniques applied: BoW, bigrams, trigrams, and TF-IDF. A textual data augmentation technique was applied to investigate the influence of data augmentation on the models' performances. RESULTS RF with BoW achieved the highest AUC reaching 0.99. CNN model outperformed Bi-LSTM with AUC = 0.98. With the augmented dataset, the performance significantly improved with an increase in F1 scores ranging from 1% to 7%. All models outperformed the national and international radiographers when compared on the hold-out dataset. CONCLUSION The models assigned the referrals' appropriateness with higher accuracies than the national and international radiographers. Applying data augmentation significantly improved the models' performances. IMPLICATIONS FOR PRACTICE The outcomes suggest that the use of AI for checking referrals' eligibility could serve as a supporting tool to improve the referrals' management in radiology departments.
Collapse
|
20
|
Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw Open 2022; 5:e2227109. [PMID: 35972739 PMCID: PMC9382443 DOI: 10.1001/jamanetworkopen.2022.27109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 06/20/2022] [Indexed: 12/17/2022] Open
Abstract
Importance Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). However, designing new NLP algorithms for each individual injury category is an unwieldy proposition. An NLP tool that summarizes all injuries in head CT reports would facilitate exploration of large data sets for clinical significance of neuroradiological findings. Objective To automatically extract acute brain pathological data and their features from head CT reports. Design, Setting, and Participants This diagnostic study developed a 2-part named entity recognition (NER) NLP model to extract and summarize data on acute brain injuries from head CT reports. The model, termed BrainNERD, extracts and summarizes detailed brain injury information for research applications. Model development included building and comparing 2 NER models using a custom dictionary of terms, including lesion type, location, size, and age, then designing a rule-based decoder using NER outputs to evaluate for the presence or absence of injury subtypes. BrainNERD was evaluated against independent test data sets of manually classified reports, including 2 external validation sets. The model was trained on head CT reports from 1152 patients generated by neuroradiologists at the Yale Acute Brain Injury Biorepository. External validation was conducted using reports from 2 outside institutions. Analyses were conducted from May 2020 to December 2021. Main Outcomes and Measures Performance of the BrainNERD model was evaluated using precision, recall, and F1 scores based on manually labeled independent test data sets. Results A total of 1152 patients (mean [SD] age, 67.6 [16.1] years; 586 [52%] men), were included in the training set. NER training using transformer architecture and bidirectional encoder representations from transformers was significantly faster than spaCy. For all metrics, the 10-fold cross-validation performance was 93% to 99%. The final test performance metrics for the NER test data set were 98.82% (95% CI, 98.37%-98.93%) for precision, 98.81% (95% CI, 98.46%-99.06%) for recall, and 98.81% (95% CI, 98.40%-98.94%) for the F score. The expert review comparison metrics were 99.06% (95% CI, 97.89%-99.13%) for precision, 98.10% (95% CI, 97.93%-98.77%) for recall, and 98.57% (95% CI, 97.78%-99.10%) for the F score. The decoder test set metrics were 96.06% (95% CI, 95.01%-97.16%) for precision, 96.42% (95% CI, 94.50%-97.87%) for recall, and 96.18% (95% CI, 95.151%-97.16%) for the F score. Performance in external institution report validation including 1053 head CR reports was greater than 96%. Conclusions and Relevance These findings suggest that the BrainNERD model accurately extracted acute brain injury terms and their properties from head CT text reports. This freely available new tool could advance clinical research by integrating information in easily gathered head CT reports to expand knowledge of acute brain injury radiographic phenotypes.
Collapse
|
21
|
Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets. Radiol Artif Intell 2022; 4:e220007. [PMID: 35923377 PMCID: PMC9344209 DOI: 10.1148/ryai.220007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/08/2022] [Accepted: 06/14/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. MATERIALS AND METHODS The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. RESULTS The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. CONCLUSION Pretrained and domain-specific transformer models required small training datasets and short training times to create a highly accurate final model that expedites autonomous annotation of large datasets.Keywords: Informatics, Named Entity Recognition, Transfer Learning Supplemental material is available for this article. ©RSNA, 2022See also the commentary by Zech in this issue.
Collapse
|
22
|
RadBERT: Adapting Transformer-based Language Models to Radiology. Radiol Artif Intell 2022; 4:e210258. [PMID: 35923376 PMCID: PMC9344353 DOI: 10.1148/ryai.210258] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 04/28/2022] [Accepted: 06/03/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE To investigate if tailoring a transformer-based language model to radiology is beneficial for radiology natural language processing (NLP) applications. MATERIALS AND METHODS This retrospective study presents a family of bidirectional encoder representations from transformers (BERT)-based language models adapted for radiology, named RadBERT. Transformers were pretrained with either 2.16 or 4.42 million radiology reports from U.S. Department of Veterans Affairs health care systems nationwide on top of four different initializations (BERT-base, Clinical-BERT, robustly optimized BERT pretraining approach [RoBERTa], and BioMed-RoBERTa) to create six variants of RadBERT. Each variant was fine-tuned for three representative NLP tasks in radiology: (a) abnormal sentence classification: models classified sentences in radiology reports as reporting abnormal or normal findings; (b) report coding: models assigned a diagnostic code to a given radiology report for five coding systems; and (c) report summarization: given the findings section of a radiology report, models selected key sentences that summarized the findings. Model performance was compared by bootstrap resampling with five intensively studied transformer language models as baselines: BERT-base, BioBERT, Clinical-BERT, BlueBERT, and BioMed-RoBERTa. RESULTS For abnormal sentence classification, all models performed well (accuracies above 97.5 and F1 scores above 95.0). RadBERT variants achieved significantly higher scores than corresponding baselines when given only 10% or less of 12 458 annotated training sentences. For report coding, all variants outperformed baselines significantly for all five coding systems. The variant RadBERT-BioMed-RoBERTa performed the best among all models for report summarization, achieving a Recall-Oriented Understudy for Gisting Evaluation-1 score of 16.18 compared with 15.27 by the corresponding baseline (BioMed-RoBERTa, P < .004). CONCLUSION Transformer-based language models tailored to radiology had improved performance of radiology NLP tasks compared with baseline transformer language models.Keywords: Translation, Unsupervised Learning, Transfer Learning, Neural Networks, Informatics Supplemental material is available for this article. © RSNA, 2022See also commentary by Wiggins and Tejani in this issue.
Collapse
|
23
|
Automatic detection of actionable findings and communication mentions in radiology reports using natural language processing. Eur Radiol 2022; 32:3996-4002. [PMID: 34989840 DOI: 10.1007/s00330-021-08467-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 10/25/2021] [Accepted: 11/15/2021] [Indexed: 11/04/2022]
Abstract
OBJECTIVES To develop and validate classifiers for automatic detection of actionable findings and documentation of nonroutine communication in routinely delivered radiology reports. METHODS Two radiologists annotated all actionable findings and communication mentions in a training set of 1,306 radiology reports and a test set of 1,000 reports randomly selected from the electronic health record system of a large tertiary hospital. Various feature sets were constructed based on the impression section of the reports using different preprocessing steps (stemming, removal of stop words, negations, and previously known or stable findings) and n-grams. Random forest classifiers were trained to detect actionable findings, and a decision-rule classifier was trained to find communication mentions. Classifier performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. RESULTS On the training set, the actionable finding classifier with the highest cross-validated performance was obtained for a feature set of unigrams, after stemming and removal of negated, known, and stable findings. On the test set, this classifier achieved an AUC of 0.876 (95% CI 0.854-0.898). The classifier for communication detection was trained after negation removal, using unigrams as features. The resultant decision rule had a sensitivity of 0.841 (95% CI 0.706-0.921) and specificity of 0.990 (95% CI 0.981-0.994) on the test set. CONCLUSIONS Automatic detection of actionable findings and subsequent communication in routinely delivered radiology reports is possible. This can serve quality control purposes and may alert radiologists to the presence of actionable findings during reporting. KEY POINTS • Classifiers were developed for automatic detection of the broad spectrum of actionable findings and subsequent communication mentions in routinely delivered radiology reports. • Straightforward report preprocessing and simple feature sets can produce well-performing classifiers. • The resultant classifiers show good performance for detection of actionable findings and excellent performance for detection of communication mentions.
Collapse
|
24
|
Use of artificial intelligence in emergency radiology: An overview of current applications, challenges, and opportunities. Clin Imaging 2022; 89:61-67. [PMID: 35716432 DOI: 10.1016/j.clinimag.2022.05.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/04/2022] [Accepted: 05/23/2022] [Indexed: 11/16/2022]
Abstract
The value of artificial intelligence (AI) in healthcare has become evident, especially in the field of medical imaging. The accelerated pace and acuity of care in the Emergency Department (ED) has made it a popular target for artificial intelligence-driven solutions. Software that helps better detect, report, and appropriately guide management can ensure high quality patient care while enabling emergency radiologists to better meet the demands of quick turnaround times. Beyond diagnostic applications, AI-based algorithms also have the potential to optimize other important steps within the ED imaging workflow. This review will highlight the different types of AI-based applications currently available for use in the ED, as well as the challenges and opportunities associated with their implementation.
Collapse
|
25
|
Induced Pluripotent Stem Cell-Based Drug Screening by Use of Artificial Intelligence. Pharmaceuticals (Basel) 2022; 15:562. [PMID: 35631387 PMCID: PMC9145330 DOI: 10.3390/ph15050562] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/25/2022] [Accepted: 04/27/2022] [Indexed: 12/10/2022] Open
Abstract
Induced pluripotent stem cells (iPSCs) are terminally differentiated somatic cells that differentiate into various cell types. iPSCs are expected to be used for disease modeling and for developing novel treatments because differentiated cells from iPSCs can recapitulate the cellular pathology of patients with genetic mutations. However, a barrier to using iPSCs for comprehensive drug screening is the difficulty of evaluating their pathophysiology. Recently, the accuracy of image analysis has dramatically improved with the development of artificial intelligence (AI) technology. In the field of cell biology, it has become possible to estimate cell types and states by examining cellular morphology obtained from simple microscopic images. AI can evaluate disease-specific phenotypes of iPS-derived cells from label-free microscopic images; thus, AI can be utilized for disease-specific drug screening using iPSCs. In addition to image analysis, various AI-based methods can be applied to drug development, including phenotype prediction by analyzing genomic data and virtual screening by analyzing structural formulas and protein-protein interactions of compounds. In the future, combining AI methods may rapidly accelerate drug discovery using iPSCs. In this review, we explain the details of AI technology and the application of AI for iPSC-based drug screening.
Collapse
|
26
|
Applications of Natural Language Processing in Radiology: A Systematic Review. Int J Med Inform 2022; 163:104779. [DOI: 10.1016/j.ijmedinf.2022.104779] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/28/2022] [Accepted: 04/21/2022] [Indexed: 12/27/2022]
|
27
|
Management of Incidental Thyroid Nodules on Chest CT: Using Natural Language Processing to Assess White Paper Adherence and Track Patient Outcomes. Acad Radiol 2022; 29:e18-e24. [PMID: 33757722 DOI: 10.1016/j.acra.2021.02.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 02/17/2021] [Accepted: 02/21/2021] [Indexed: 12/20/2022]
Abstract
OBJECTIVE The purpose of this study was to develop a natural language processing (NLP) pipeline to identify incidental thyroid nodules (ITNs) meeting criteria for sonographic follow-up and to assess both adherence rates to white paper recommendations and downstream outcomes related to these incidental findings. METHODS 21583 non-contrast chest CT reports from 2017 and 2018 were retrospectively evaluated to identify reports which included either an explicit recommendation for thyroid ultrasound, a description of a nodule ≥ 1.5 cm, or description of a nodule with suspicious features. Reports from 2018 were used to train an NLP algorithm called fastText for automated identification of such reports. Algorithm performance was then evaluated on the 2017 reports. Next, any patient from 2017 with a report meeting criteria for ultrasound follow-up was further evaluated with manual chart review to determine follow-up adherence rates and nodule-related outcomes. RESULTS NLP identified reports with ITNs meeting criteria for sonographic follow-up with an accuracy of 96.5% (95% CI 96.2-96.7) and sensitivity of 92.1% (95% CI 89.8-94.3). In 10006 chest CTs from 2017, ITN follow-up ultrasound was indicated according to white paper criteria in 81 patients (0.8%), explicitly recommended in 46.9% (38/81) of patients, and obtained in less than half of patients in which it was appropriately recommended (17/35, 48.6%). DISCUSSION NLP accurately identified chest CT reports meeting criteria for ITN ultrasound follow-up. Radiologist adherence to white paper guidelines and subsequent referrer adherence to radiologist recommendations showed room for improvement.
Collapse
|
28
|
Handling of derived imbalanced dataset using XGBoost for identification of pulmonary embolism-a non-cardiac cause of cardiac arrest. Med Biol Eng Comput 2022; 60:551-558. [PMID: 35023074 DOI: 10.1007/s11517-021-02455-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/07/2021] [Indexed: 10/19/2022]
Abstract
Relationship between pulmonary embolism and heart failure is presented in this paper. The proposed research is divided into two phases. The first phase includes the establishment of a novel database with the help of a Cleveland's database for cardiology in order to establish a link between pulmonary embolism and heart failure. The connectivity is based on the relationship between the stroke volume and the pulse pressure (Pp < 25% (ap_hi)). The second phase includes the applicability of machine learning on the novel database. Novel database formed in this work is imbalanced, resulting in the overfitting problem. XGBoost has been used to get rid of overfitting problem. Efficiency has been increased by formulating an ensemble technique by combining extreme learning machines, IB3 tree, logistic regression, and averaged neural network (avNNet) models.
Collapse
|
29
|
Artificial Intelligence in Diagnostic Radiology: Where Do We Stand, Challenges, and Opportunities. J Comput Assist Tomogr 2022; 46:78-90. [PMID: 35027520 DOI: 10.1097/rct.0000000000001247] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
ABSTRACT Artificial intelligence (AI) is the most revolutionizing development in the health care industry in the current decade, with diagnostic imaging having the greatest share in such development. Machine learning and deep learning (DL) are subclasses of AI that show breakthrough performance in image analysis. They have become the state of the art in the field of image classification and recognition. Machine learning deals with the extraction of the important characteristic features from images, whereas DL uses neural networks to solve such problems with better performance. In this review, we discuss the current applications of machine learning and DL in the field of diagnostic radiology.Deep learning applications can be divided into medical imaging analysis and applications beyond analysis. In the field of medical imaging analysis, deep convolutional neural networks are used for image classification, lesion detection, and segmentation. Also used are recurrent neural networks when extracting information from electronic medical records and to augment the use of convolutional neural networks in the field of image classification. Generative adversarial networks have been explicitly used in generating high-resolution computed tomography and magnetic resonance images and to map computed tomography images from the corresponding magnetic resonance imaging. Beyond image analysis, DL can be used for quality control, workflow organization, and reporting.In this article, we review the most current AI models used in medical imaging research, providing a brief explanation of the various models described in the literature within the past 5 years. Emphasis is placed on the various DL models, as they are the most state-of-art in imaging analysis.
Collapse
|
30
|
Predicting pulmonary embolism among hospitalized patients with machine learning algorithms. Pulm Circ 2022; 12:e12013. [PMID: 35506114 PMCID: PMC9052977 DOI: 10.1002/pul2.12013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/24/2021] [Accepted: 11/28/2021] [Indexed: 01/15/2023] Open
Abstract
Background Objective Materials and Methods Results Conclusions
Collapse
|
31
|
Classifying Social Determinants of Health from Unstructured Electronic Health Records Using Deep Learning-based Natural Language Processing. J Biomed Inform 2022; 127:103984. [PMID: 35007754 DOI: 10.1016/j.jbi.2021.103984] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 12/28/2021] [Accepted: 12/29/2021] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Social determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes. DESIGN A retrospective cohort study. SETTING AND PARTICIPANTS Data were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes. METHODS We developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the "Social Work" category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures - convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) - for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC). RESULTS All three DNN models accurately classified all SDOH categories (minimum micro-F1 =.632, minimum macro-AUC=.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC=0.907). The BERT model most effectively identified the "occupational" category (F1=.774, AUC=.965) and least effectively identified the "non-SDOH" category (F=.491, AUC=.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = .87 vs. cTAKES F1=.06), and outperformed logistic regression (micro-F1=0.649, macro-AUC=0.696) and random forest (micro-F1=0.502, macro-AUC=0.523) trained on bag-of-words. CONCLUSIONS Our study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.
Collapse
|
32
|
Machine learning and deep learning-based Natural Language Processing for auto-vetting the appropriateness of Lumbar Spine Magnetic Resonance Imaging Referrals. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
33
|
Deep-Learning-Based Natural Language Processing of Serial Free-Text Radiological Reports for Predicting Rectal Cancer Patient Survival. Front Oncol 2021; 11:747250. [PMID: 34868947 PMCID: PMC8635726 DOI: 10.3389/fonc.2021.747250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 10/28/2021] [Indexed: 01/04/2023] Open
Abstract
Most electronic medical records, such as free-text radiological reports, are unstructured; however, the methodological approaches to analyzing these accumulating unstructured records are limited. This article proposes a deep-transfer-learning-based natural language processing model that analyzes serial magnetic resonance imaging reports of rectal cancer patients and predicts their overall survival. To evaluate the model, a retrospective cohort study of 4,338 rectal cancer patients was conducted. The experimental results revealed that the proposed model utilizing pre-trained clinical linguistic knowledge could predict the overall survival of patients without any structured information and was superior to the carcinoembryonic antigen in predicting survival. The deep-transfer-learning model using free-text radiological reports can predict the survival of patients with rectal cancer, thereby increasing the utility of unstructured medical big data.
Collapse
|
34
|
Identifying cardiomegaly in chest X-rays: a cross-sectional study of evaluation and comparison between different transfer learning methods. Acta Radiol 2021; 62:1601-1609. [PMID: 33203215 DOI: 10.1177/0284185120973630] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
BACKGROUND Cardiomegaly is a relatively common incidental finding on chest X-rays; if left untreated, it can result in significant complications. Using Artificial Intelligence for diagnosing cardiomegaly could be beneficial, as this pathology may be underreported, or overlooked, especially in busy or under-staffed settings. PURPOSE To explore the feasibility of applying four different transfer learning methods to identify the presence of cardiomegaly in chest X-rays and to compare their diagnostic performance using the radiologists' report as the gold standard. MATERIAL AND METHODS Two thousand chest X-rays were utilized in the current study: 1000 were normal and 1000 had confirmed cardiomegaly. Of these exams, 80% were used for training and 20% as a holdout test dataset. A total of 2048 deep features were extracted using Google's Inception V3, VGG16, VGG19, and SqueezeNet networks. A logistic regression algorithm optimized in regularization terms was used to classify chest X-rays into those with presence or absence of cardiomegaly. RESULTS Diagnostic accuracy is reported by means of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), with the VGG19 network providing the best values of sensitivity (84%), specificity (83%), PPV (83%), NPV (84%), and overall accuracy (84,5%). The other networks presented sensitivity at 64.1%-82%, specificity at 77.1%-81.1%, PPV at 74%-81.4%, NPV at 68%-82%, and overall accuracy at 71%-81.3%. CONCLUSION Deep learning using transfer learning methods based on VGG19 network can be used for the automatic detection of cardiomegaly on chest X-ray images. However, further validation and training of each method is required before application to clinical cases.
Collapse
|
35
|
Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm. Am J Emerg Med 2021; 51:388-392. [PMID: 34839182 DOI: 10.1016/j.ajem.2021.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 10/30/2021] [Accepted: 11/02/2021] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND The Mortality Probability Model (MPM) is used in research and quality improvement to adjust for severity of illness and can also inform triage decisions. However, a limitation for its automated use or application is that it includes the variable "intracranial mass effect" (IME), which requires human engagement with the electronic health record (EHR). We developed and tested a natural language processing (NLP) algorithm to identify IME from CT head reports. METHODS We obtained initial CT head reports from adult patients who were admitted to the ICU from our ED between 10/2013 and 9/2016. Each head CT head report was labeled yes/no IME by at least two of five independent labelers. The reports were then randomly divided 80/20 into training and test sets. All reports were preprocessed to remove linguistic and style variability, and a dictionary was created to map similar common terms. We tested three vectorization strategies: Term Frequency-Inverse Document frequency (TF-IDF), Word2Vec, and Universal Sentence Encoder to convert the report text to a numerical vector. This vector served as the input to a classification-tree-based ensemble machine learning algorithm (XGBoost). After training, model performance was assessed in the test set using the area under the receiver operating characteristic curve (AUROC). We also divided the continuous range of scores into positive/inconclusive/negative categories for IME. RESULTS Of the 1202 CT reports in the training set, 308 (25.6%) reports were manually labeled as "yes" for IME. Of the 355 reports in the test set, 108 (30.4%) were labeled as "yes" for IME. The TF-IDF vectorization strategy as an input for the XGBoost model had the best AUROC:-- 0.9625 (95% CI 0.9443-0.9807). TF-IDF score categories were defined and had the following likelihood ratios: "positive" (TF-IDF score > 0.5) LR = 24.59; "inconclusive" (TF-IDF 0.05-0.5) LR = 0.99; and "negative" (TF-IDF < 0.05) LR = 0.05. 82% of reports were classified as either "positive" or "negative". In the test set, only 4 of 199 (2.0%) reports with a "negative" classification were false negatives and only 8 of 93 (8.6%) reports classified as "positive" were false positives. CONCLUSION NLP can accurately identify IME from free-text reports of head CTs in approximately 80% of records, adequate to allow automatic calculation of MPM based on EHR data for many applications.
Collapse
|
36
|
Deep learning driven colorectal lesion detection in gastrointestinal endoscopic and pathological imaging. World J Clin Cases 2021; 9:9376-9385. [PMID: 34877273 PMCID: PMC8610875 DOI: 10.12998/wjcc.v9.i31.9376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 07/26/2021] [Accepted: 08/13/2021] [Indexed: 02/06/2023] Open
Abstract
Colorectal cancer has the second highest incidence of malignant tumors and is the fourth leading cause of cancer deaths in China. Early diagnosis and treatment of colorectal cancer will lead to an improvement in the 5-year survival rate, which will reduce medical costs. The current diagnostic methods for early colorectal cancer include excreta, blood, endoscopy, and computer-aided endoscopy. In this paper, research on image analysis and prediction of colorectal cancer lesions based on deep learning is reviewed with the goal of providing a reference for the early diagnosis of colorectal cancer lesions by combining computer technology, 3D modeling, 5G remote technology, endoscopic robot technology, and surgical navigation technology. The findings will supplement the research and provide insights to improve the cure rate and reduce the mortality of colorectal cancer.
Collapse
|
37
|
Training Strategies for Radiology Deep Learning Models in Data-limited Scenarios. Radiol Artif Intell 2021; 3:e210014. [PMID: 34870217 PMCID: PMC8637222 DOI: 10.1148/ryai.2021210014] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/08/2021] [Accepted: 09/16/2021] [Indexed: 12/22/2022]
Abstract
Data-driven approaches have great potential to shape future practices in radiology. The most straightforward strategy to obtain clinically accurate models is to use large, well-curated and annotated datasets. However, patient privacy constraints, tedious annotation processes, and the limited availability of radiologists pose challenges to building such datasets. This review details model training strategies in scenarios with limited data, insufficiently labeled data, and/or limited expert resources. This review discusses strategies to enlarge the data sample, decrease the time burden of manual supervised labeling, adjust the neural network architecture to improve model performance, apply semisupervised approaches, and leverage efficiencies from pretrained models. Keywords: Computer-aided Detection/Diagnosis, Transfer Learning, Limited Annotated Data, Augmentation, Synthetic Data, Semisupervised Learning, Federated Learning, Few-Shot Learning, Class Imbalance.
Collapse
|
38
|
Basic Artificial Intelligence Techniques: Natural Language Processing of Radiology Reports. Radiol Clin North Am 2021; 59:919-931. [PMID: 34689877 DOI: 10.1016/j.rcl.2021.06.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Natural language processing (NLP) is a subfield of computer science and linguistics that can be applied to extract meaningful information from radiology reports. Symbolic NLP is rule based and well suited to problems that can be explicitly defined by a set of rules. Statistical NLP is better situated to problems that cannot be well defined and requires annotated or labeled examples from which machine learning algorithms can infer the rules. Both symbolic and statistical NLP have found success in a variety of radiology use cases. More recently, deep learning approaches, including transformers, have gained traction and demonstrated good performance.
Collapse
|
39
|
[Structured reporting and artificial intelligence]. Radiologe 2021; 61:999-1004. [PMID: 34605945 DOI: 10.1007/s00117-021-00920-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/03/2021] [Indexed: 11/27/2022]
Abstract
BACKGROUND There are a multitude of application possibilities of artificial intelligence (AI) and structured reporting (SR) in radiology. The number of scientific publications have continuously increased for many years. There is an extensive portfolio of available AI algorithms for, e.g. automatic detection and preselection of pathologic patterns in images or for facilitating the reporting workflows. Even machines already use AI algorithms for improvement of operating comfort. METHOD The use of SR is essential especially for the extraction of automatically evaluable semantic data from radiology results reports. Regarding eligibility in certification processes, the use of SR is mandatory for the accreditation of the German Cancer Society as an oncological center or outside Germany, such as the European Cancer Center. RESULTS The data from SR can be automatically evaluated for the purpose of patient care, research and educational purposes and quality assurance. Lack of information and a high degree of variability often hamper the extraction of valid information from free-text reports using neurolinguistic programming (NLP). Against the background of supervised training, AI algorithms or k‑nearest neighbors (KNN) require a considerable amount of validated data. The semantic data from SR can also be processed by AI and used for training. CONCLUSION The AI and SR are separate entities within the field of radiology with mutual dependencies and significant added value. Both have a high potential for profound upcoming changes and further developments in radiology.
Collapse
|
40
|
Deep Learning and Risk Assessment in Acute Pulmonary Embolism. Radiology 2021; 302:185-186. [PMID: 34581632 DOI: 10.1148/radiol.2021211897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
41
|
CT Angiography Clot Burden Score from Data Mining of Structured Reports for Pulmonary Embolism. Radiology 2021; 302:175-184. [PMID: 34581626 DOI: 10.1148/radiol.2021211013] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Background Many studies emphasize the role of structured reports (SRs) because they are readily accessible for further automated analyses. However, using SR data obtained in clinical routine for research purposes is not yet well represented in literature. Purpose To compare the performance of the Qanadli scoring system with a clot burden score mined from structured pulmonary embolism (PE) reports from CT angiography. Materials and Methods In this retrospective study, a rule-based text mining pipeline was developed to extract descriptors of PE and right heart strain from SR of patients with suspected PE between March 2017 and February 2020. From standardized PE reporting, a pulmonary artery obstruction index (PAOI) clot burden score (PAOICBS) was derived and compared with the Qanadli score (PAOIQ). Scoring time and confidence from two independent readings were compared. Interobserver and interscore agreement was tested by using the intraclass correlation coefficient (ICC) and Bland-Altman analysis. To assess conformity and diagnostic performance of both scores, areas under the receiver operating characteristic curve (AUCs) were calculated to predict right heart strain incidence, as were optimal cutoff values for maximum sensitivity and specificity. Results SR content authored by 67 residents and signed off by 32 consultants from 1248 patients (mean age, 63 years ± 17 [standard deviation]; 639 men) was extracted accurately and allowed for PAOICBS calculation in 304 of 357 (85.2%) PE-positive reports. The PAOICBS strongly correlated with the PAOIQ (r = 0.94; P < .001). Use of PAOICBS yielded overall time savings (1.3 minutes ± 0.5 vs 3.0 minutes ± 1.7), higher confidence levels (4.2 ± 0.6 vs 3.6 ± 1.0), and a higher ICC (ICC, 0.99 vs 0.95), respectively, compared with PAOIQ (each, P < .001). AUCs were similar for PAOICBS (AUC, 0.75; 95% CI: 0.70, 0.81) and PAOIQ (AUC, 0.77; 95% CI: 0.72, 0.83; P = .68), with cutoff values of 27.5% for both scores. Conclusion Data mining of structured reports enabled the development of a CT angiography scoring system that simplified the Qanadli score as a semiquantitative estimate of thrombus burden in patients with pulmonary embolism. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Hunsaker in this issue.
Collapse
|
42
|
Noninterpretive Uses of Artificial Intelligence in Radiology. Acad Radiol 2021; 28:1225-1235. [PMID: 32059956 DOI: 10.1016/j.acra.2020.01.012] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 01/08/2020] [Accepted: 01/09/2020] [Indexed: 12/12/2022]
Abstract
We deem a computer to exhibit artificial intelligence (AI) when it performs a task that would normally require intelligent action by a human. Much of the recent excitement about AI in the medical literature has revolved around the ability of AI models to recognize anatomy and detect pathology on medical images, sometimes at the level of expert physicians. However, AI can also be used to solve a wide range of noninterpretive problems that are relevant to radiologists and their patients. This review summarizes some of the newer noninterpretive uses of AI in radiology.
Collapse
|
43
|
A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments. J Biomed Inform 2021; 122:103892. [PMID: 34454079 DOI: 10.1016/j.jbi.2021.103892] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 08/10/2021] [Accepted: 08/22/2021] [Indexed: 12/19/2022]
Abstract
Venous thromboembolism (VTE) is a common vascular disease and potentially fatal complication during hospitalization, and so the early identification of VTE risk is of significant importance. Compared with traditional scale assessments, machine learning methods provide new opportunities for precise early warning of VTE from clinical medical records. This research aimed to propose a two-stage hierarchical machine learning model for VTE risk prediction in patients from multiple departments. First, we built a machine learning prediction model that covered the entire hospital, based on all cohorts and common risk factors. Then, we took the prediction output of the first stage as an initial assessment score and then built specific models for each department. Over the duration of the study, a total of 9213 inpatients, including 1165 VTE-positive samples, were collected from four departments, which were split into developing and test datasets. The proposed model achieved an AUC of 0.879 in the department of oncology, which outperformed the first-stage model (0.730) and the department model (0.787). This was attributed to the fully usage of both the large sample size at the hospital level and variable abundance at the department level. Experimental results show that our model could effectively improve the prediction of hospital-acquired VTE risk before image diagnosis and provide decision support for further nursing and medical intervention.
Collapse
|
44
|
Performance Comparison of Individual and Ensemble CNN Models for the Classification of Brain 18F-FDG-PET Scans. J Digit Imaging 2021; 33:447-455. [PMID: 31659587 DOI: 10.1007/s10278-019-00289-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The high-background glucose metabolism of normal gray matter on [18F]-fluoro-2-D-deoxyglucose (FDG) positron emission tomography (PET) of the brain results in a low signal-to-background ratio, potentially increasing the possibility of missing important findings in patients with intracranial malignancies. To explore the strategy of using a deep learning classifier to aid in distinguishing normal versus abnormal findings on PET brain images, this study evaluated the performance of a two-dimensional convolutional neural network (2D-CNN) to classify FDG PET brain scans as normal (N) or abnormal (A). METHODS Two hundred eighty-nine brain FDG-PET scans (N; n = 150, A; n = 139) resulting in a total of 68,260 images were included. Nine individual 2D-CNN models with three different window settings for axial, coronal, and sagittal axes were trained and validated. The performance of these individual and ensemble models was evaluated and compared using a test dataset. Odds ratio, Akaike's information criterion (AIC), and area under curve (AUC) on receiver-operative-characteristic curve, accuracy, and standard deviation (SD) were calculated. RESULTS An optimal window setting to classify normal and abnormal scans was different for each axis of the individual models. An ensembled model using different axes with an optimized window setting (window-triad) showed better performance than ensembled models using the same axis and different windows settings (axis-triad). Increase in odds ratio and decrease in SD were observed in both axis-triad and window-triad models compared with individual models, whereas improvements of AUC and AIC were seen in window-triad models. An overall model averaging the probabilities of all individual models showed the best accuracy of 82.0%. CONCLUSIONS Data ensemble using different window settings and axes was effective to improve 2D-CNN performance parameters for the classification of brain FDG-PET scans. If prospectively validated with a larger cohort of patients, similar models could provide decision support in a clinical setting.
Collapse
|
45
|
Bag-of-Words Technique in Natural Language Processing: A Primer for Radiologists. Radiographics 2021; 41:1420-1426. [PMID: 34388050 DOI: 10.1148/rg.2021210025] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Natural language processing (NLP) is a methodology designed to extract concepts and meaning from human-generated unstructured (free-form) text. It is intended to be implemented by using computer algorithms so that it can be run on a corpus of documents quickly and reliably. To enable machine learning (ML) techniques in NLP, free-form text must be converted to a numerical representation. After several stages of preprocessing including tokenization, removal of stop words, token normalization, and creation of a master dictionary, the bag-of-words (BOW) technique can be used to represent each remaining word as a feature of the document. The preprocessing steps simplify the documents but also potentially degrade meaning. The values of the features in BOW can be modified by using techniques such as term count, term frequency, and term frequency-inverse document frequency. Experience and experimentation will guide decisions on which specific techniques will optimize ML performance. These and other NLP techniques are being applied in radiology. Radiologists' understanding of the strengths and limitations of these techniques will help in communication with data scientists and in implementation for specific tasks. Online supplemental material is available for this article. ©RSNA, 2021.
Collapse
|
46
|
Abstract
Natural language processing (NLP) is an interdisciplinary field, combining linguistics, computer science, and artificial intelligence to enable machines to read and understand human language for meaningful purposes. Recent advancements in deep learning have begun to offer significant improvements in NLP task performance. These techniques have the potential to create new automated tools that could improve clinical workflows and unlock unstructured textual information contained in radiology and clinical reports for the development of radiology and clinical artificial intelligence applications. These applications will combine the appropriate application of classic linguistic and NLP preprocessing techniques, modern NLP techniques, and modern deep learning techniques.
Collapse
|
47
|
Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 2021; 36:5255-5261. [PMID: 32702106 DOI: 10.1093/bioinformatics/btaa668] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 06/25/2020] [Accepted: 07/17/2020] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. RESULTS Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementationWe make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
48
|
Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification. J Digit Imaging 2021; 33:131-136. [PMID: 31482317 DOI: 10.1007/s10278-019-00271-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author's institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of n-grams such as "renal neoplasm" and "evalu with enhanc" being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.
Collapse
|
49
|
Abstract
Primary liver cancer is the fourth leading cause of cancer-related deaths worldwide, with hepatocellular carcinoma (HCC) comprising the vast majority of primary liver malignancies. Imaging plays a central role in HCC diagnosis and management. As a result, the content and structure of radiology reports are of utmost importance in guiding clinical management. The Liver Imaging Reporting and Data System (LI-RADS) provides guidance for standardized reporting of liver observations in patients who are at risk for HCC. LI-RADS standardized reporting intends to inform patient treatment and facilitate multidisciplinary communication and decisions, taking into consideration individual clinical factors. Depending on the context, observations may be reported individually, in aggregate, or as a combination of both. LI-RADS provides two templates for reporting liver observations: in a single continuous paragraph or in a structured format with keywords and imaging findings. The authors clarify terminology that is pertinent to reporting, highlight the benefits of structured reports, discuss the applicability of LI-RADS for liver CT and MRI, review the elements of a standardized LI-RADS report, provide guidance on the description of LI-RADS observations exemplified with two case-based reporting templates, illustrate relevant imaging findings and components to be included when reporting specific clinical scenarios, and discuss future directions. An invited commentary by Yano is available online. Online supplemental material is available for this article. Work of the U.S. Government published under an exclusive license with the RSNA.
Collapse
|
50
|
Deep learning to automate the labelling of head MRI datasets for computer vision applications. Eur Radiol 2021; 32:725-736. [PMID: 34286375 PMCID: PMC8660736 DOI: 10.1007/s00330-021-08132-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/02/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023]
Abstract
Objectives The purpose of this study was to build a deep learning model to derive labels from neuroradiology reports and assign these to the corresponding examinations, overcoming a bottleneck to computer vision model development. Methods Reference-standard labels were generated by a team of neuroradiologists for model training and evaluation. Three thousand examinations were labelled for the presence or absence of any abnormality by manually scrutinising the corresponding radiology reports (‘reference-standard report labels’); a subset of these examinations (n = 250) were assigned ‘reference-standard image labels’ by interrogating the actual images. Separately, 2000 reports were labelled for the presence or absence of 7 specialised categories of abnormality (acute stroke, mass, atrophy, vascular abnormality, small vessel disease, white matter inflammation, encephalomalacia), with a subset of these examinations (n = 700) also assigned reference-standard image labels. A deep learning model was trained using labelled reports and validated in two ways: comparing predicted labels to (i) reference-standard report labels and (ii) reference-standard image labels. The area under the receiver operating characteristic curve (AUC-ROC) was used to quantify model performance. Accuracy, sensitivity, specificity, and F1 score were also calculated. Results Accurate classification (AUC-ROC > 0.95) was achieved for all categories when tested against reference-standard report labels. A drop in performance (ΔAUC-ROC > 0.02) was seen for three categories (atrophy, encephalomalacia, vascular) when tested against reference-standard image labels, highlighting discrepancies in the original reports. Once trained, the model assigned labels to 121,556 examinations in under 30 min. Conclusions Our model accurately classifies head MRI examinations, enabling automated dataset labelling for downstream computer vision applications. Key Points • Deep learning is poised to revolutionise image recognition tasks in radiology; however, a barrier to clinical adoption is the difficulty of obtaining large labelled datasets for model training. • We demonstrate a deep learning model which can derive labels from neuroradiology reports and assign these to the corresponding examinations at scale, facilitating the development of downstream computer vision models. • We rigorously tested our model by comparing labels predicted on the basis of neuroradiology reports with two sets of reference-standard labels: (1) labels derived by manually scrutinising each radiology report and (2) labels derived by interrogating the actual images. Supplementary Information The online version contains supplementary material available at 10.1007/s00330-021-08132-0.
Collapse
|