1
|
Ibragimov B, Mello-Thoms C. The Use of Machine Learning in Eye Tracking Studies in Medical Imaging: A Review. IEEE J Biomed Health Inform 2024; 28:3597-3612. [PMID: 38421842 PMCID: PMC11262011 DOI: 10.1109/jbhi.2024.3371893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
Machine learning (ML) has revolutionized medical image-based diagnostics. In this review, we cover a rapidly emerging field that can be potentially significantly impacted by ML - eye tracking in medical imaging. The review investigates the clinical, algorithmic, and hardware properties of the existing studies. In particular, it evaluates 1) the type of eye-tracking equipment used and how the equipment aligns with study aims; 2) the software required to record and process eye-tracking data, which often requires user interface development, and controller command and voice recording; 3) the ML methodology utilized depending on the anatomy of interest, gaze data representation, and target clinical application. The review concludes with a summary of recommendations for future studies, and confirms that the inclusion of gaze data broadens the ML applicability in Radiology from computer-aided diagnosis (CAD) to gaze-based image annotation, physicians' error detection, fatigue recognition, and other areas of potentially high research and clinical impact.
Collapse
|
2
|
Neves J, Hsieh C, Nobre IB, Sousa SC, Ouyang C, Maciel A, Duchowski A, Jorge J, Moreira C. Shedding light on ai in radiology: A systematic review and taxonomy of eye gaze-driven interpretability in deep learning. Eur J Radiol 2024; 172:111341. [PMID: 38340426 DOI: 10.1016/j.ejrad.2024.111341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 02/12/2024]
Abstract
X-ray imaging plays a crucial role in diagnostic medicine. Yet, a significant portion of the global population lacks access to this essential technology due to a shortage of trained radiologists. Eye-tracking data and deep learning models can enhance X-ray analysis by mapping expert focus areas, guiding automated anomaly detection, optimizing workflow efficiency, and bolstering training methods for novice radiologists. However, the literature shows contradictory results regarding the usefulness of eye-tracking data in deep-learning architectures for abnormality detection. We argue that these discrepancies between studies in the literature are due to (a) the way eye-tracking data is (or is not) processed, (b) the types of deep learning architectures chosen, and (c) the type of application that these architectures will have. We conducted a systematic literature review using PRISMA to address these contradicting results. We analyzed 60 studies that incorporated eye-tracking data in a deep-learning approach for different application goals in radiology. We performed a comparative analysis to understand if eye gaze data contains feature maps that can be useful under a deep learning approach and whether they can promote more interpretable predictions. To the best of our knowledge, this is the first survey in the area that performs a thorough investigation of eye gaze data processing techniques and their impacts in different deep learning architectures for applications such as error detection, classification, object detection, expertise level analysis, fatigue estimation and human attention prediction in medical imaging data. Our analysis resulted in two main contributions: (1) taxonomy that first divides the literature by task, enabling us to analyze the value eye movement can bring for each case and build guidelines regarding architectures and gaze processing techniques adequate for each application, and (2) an overall analysis of how eye gaze data can promote explainability in radiology.
Collapse
Affiliation(s)
- José Neves
- Instituto Superior Técnico / INESC-ID, University of Lisbon, Portugal.
| | - Chihcheng Hsieh
- School of Information Systems, Queensland University of Technology, Australia.
| | | | | | - Chun Ouyang
- School of Information Systems, Queensland University of Technology, Australia.
| | - Anderson Maciel
- Instituto Superior Técnico / INESC-ID, University of Lisbon, Portugal.
| | | | - Joaquim Jorge
- Instituto Superior Técnico / INESC-ID, University of Lisbon, Portugal.
| | - Catarina Moreira
- Human Technology Institute, University of Technology Sydney, Australia.
| |
Collapse
|
3
|
Ma C, Zhao L, Chen Y, Wang S, Guo L, Zhang T, Shen D, Jiang X, Liu T. Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3384-3394. [PMID: 37335796 DOI: 10.1109/tmi.2023.3287572] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2023]
Abstract
Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical image analysis, where the clinical data are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To rectify the harmful shortcuts in medical imaging applications, in this paper, we propose a novel eye-gaze-guided vision transformer (EG-ViT) model which infuses the visual attention from radiologists to proactively guide the vision transformer (ViT) model to focus on regions with potential pathology rather than spurious correlations. To do so, the EG-ViT model takes the masked image patches that are within the radiologists' interest as input while has an additional residual connection to the last encoder layer to maintain the interactions of all patches. The experiments on two medical imaging datasets demonstrate that the proposed EG-ViT model can effectively rectify the harmful shortcut learning and improve the interpretability of the model. Meanwhile, infusing the experts' domain knowledge can also improve the large-scale ViT model's performance over all compared baseline methods with limited samples available. In general, EG-ViT takes the advantages of powerful deep neural networks while rectifies the harmful shortcut learning with human expert's prior knowledge. This work also opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.
Collapse
|
4
|
Park HB, Azer L, Ahn S, Dinh TD, Macias G, Zhang G, Chen BB, Ma H, Botejue M, Choi EH, Zhang W. Contributions of global and local processing on medical image perception. J Med Imaging (Bellingham) 2023; 10:S11911. [PMID: 37168693 PMCID: PMC10166588 DOI: 10.1117/1.jmi.10.s1.s11911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 04/18/2023] [Accepted: 04/24/2023] [Indexed: 05/13/2023] Open
Abstract
Purpose The influential holistic processing hypothesis attributes expertise in medical image perception to cognitive processing of global gist information. However, it has remained unclear whether or how experts use rapid global impression of images for their subsequent diagnostic decisions based on the focal sign of cancer. We hypothesized that continuous-global and discrete-local processes jointly attribute to radiological experts' detection of mammogram, with different weights and temporal dynamics. Approach We examined experienced versus inexperienced observers' performance at first (500 ms) versus second (2500 ms) mammogram image presentation in an abnormality detection task. We applied a dual-trace signal detection (DTSD) model of receiver operating characteristic (ROC) to assess the time-varying contributions of global and focal cancer signals on mammogram reading and medical expertise. Results The hierarchical Bayesian DTSD modeling of empirical ROCs revealed that mammogram expertise (experienced versus inexperienced observers) manifests largely in a continuous-global component for the detection of the gist of abnormality at the early phase of mammogram reading. For the second presentation of the same mammogram images, the experienced participants showed increased task performance that was largely driven by better processing of discrete-local information, whereas the global processing of abnormality remained saturated from the first exposure. Modeling of the mouse trajectory of the confidence rating responses further revealed the temporal dynamics of global and focal processing. Conclusions These results suggest a joint contribution of continuous-global and discrete-local processes on medical expertise, and these processes could be analytically dissociated.
Collapse
Affiliation(s)
- Hyung-Bum Park
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
- University of California, Riverside, Department of Psychology, Riverside, California, United States
| | - Lilian Azer
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Shinhae Ahn
- Washington University in St. Louis, Department of Psychological and Brain Sciences, St. Louis, Missouri, United States
| | - Tam-Dan Dinh
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Gabriela Macias
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| | - Gavin Zhang
- University of California, Berkeley, Department of Computer Science, Berkeley, California, United States
| | - Bihong Beth Chen
- City of Hope National Medical Center, Department of Diagnostic Radiology, Duarte, California, United States
| | - Huiyan Ma
- City of Hope National Medical Center, Beckman Research Institute, Department of Population Sciences, Duarte, California, United States
| | - Mahesh Botejue
- University of California Riverside School of Medicine, Riverside Community Hospital, Internal Medicine, Riverside, California, United States
| | - Eric H. Choi
- University of California Riverside School of Medicine, Riverside Community Hospital, Internal Medicine, Riverside, California, United States
| | - Weiwei Zhang
- The University of Chicago, Institute for Mind and Biology, Chicago, Illinois, United States
| |
Collapse
|
5
|
Wang S, Ouyang X, Liu T, Wang Q, Shen D. Follow My Eye: Using Gaze to Supervise Computer-Aided Diagnosis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:1688-1698. [PMID: 35085074 DOI: 10.1109/tmi.2022.3146973] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
When deep neural network (DNN) was first introduced to the medical image analysis community, researchers were impressed by its performance. However, it is evident now that a large number of manually labeled data is often a must to train a properly functioning DNN. This demand for supervision data and labels is a major bottleneck in current medical image analysis, since collecting a large number of annotations from experienced experts can be time-consuming and expensive. In this paper, we demonstrate that the eye movement of radiologists reading medical images can be a new form of supervision to train the DNN-based computer-aided diagnosis (CAD) system. Particularly, we record the tracks of the radiologists' gaze when they are reading images. The gaze information is processed and then used to supervise the DNN's attention via an Attention Consistency module. To the best of our knowledge, the above pipeline is among the earliest efforts to leverage expert eye movement for deep-learning-based CAD. We have conducted extensive experiments on knee X-ray images for osteoarthritis assessment. The results show that our method can achieve considerable improvement in diagnosis performance, with the help of gaze supervision.
Collapse
|
6
|
Tomographic Ultrasound Imaging in the Diagnosis of Breast Tumors under the Guidance of Deep Learning Algorithms. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9227440. [PMID: 35265119 PMCID: PMC8901319 DOI: 10.1155/2022/9227440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/23/2022] [Accepted: 02/01/2022] [Indexed: 11/18/2022]
Abstract
This study was aimed to discuss the feasibility of distinguishing benign and malignant breast tumors under the tomographic ultrasound imaging (TUI) of deep learning algorithm. The deep learning algorithm was used to segment the images, and 120 patients with breast tumor were included in this study, all of whom underwent routine ultrasound examinations. Subsequently, TUI was used to assist in guiding the positioning, and the light scattering tomography system was used to further measure the lesions. A deep learning model was established to process the imaging results, and the pathological test results were undertaken as the gold standard for the efficiency of different imaging methods to diagnose the breast tumors. The results showed that, among 120 patients with breast tumor, 56 were benign lesions and 64 were malignant lesions. The average total amount of hemoglobin (HBT) of malignant lesions was significantly higher than that of benign lesions (P < 0.05). The sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of TUI in the diagnosis of breast cancer were 90.4%, 75.6%, 81.4%, 84.7%, and 80.6%, respectively. The sensitivity, specificity, accuracy, positive predictive value, and negative predictive value of ultrasound in the diagnosis of breast cancer were 81.7%, 64.9%, 70.5%, 75.9%, and 80.6%, respectively. In addition, for suspected breast malignant lesions, the combined application of ultrasound and tomography can increase the diagnostic specificity to 82.1% and the accuracy to 83.8%. Based on the above results, it was concluded that TUI combined with ultrasound had a significant effect on benign and malignant diagnosis of breast cancer and can significantly improve the specificity and accuracy of diagnosis. It also reflected that deep learning technology had a good auxiliary role in the examination of diseases and was worth the promotion of clinical application.
Collapse
|
7
|
Panetta K, Rajendran R, Ramesh A, Rao S, Agaian S. Tufts Dental Database: A Multimodal Panoramic X-ray Dataset for Benchmarking Diagnostic Systems. IEEE J Biomed Health Inform 2021; 26:1650-1659. [PMID: 34606466 DOI: 10.1109/jbhi.2021.3117575] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The application of Artificial Intelligence in dental healthcare has a very promising role due to the abundance of imagery and non-imagery-based clinical data. Expert analysis of dental radiographs can provide crucial information for clinical diagnosis and treatment. In recent years, Convolutional Neural Networks have achieved the highest accuracy in various benchmarks, including analyzing dental X-ray images to improve clinical care quality. The Tufts Dental Database, a new X-ray panoramic radiography image dataset, has been presented in this paper. This dataset consists of 1000 panoramic dental radiography images with expert labeling of abnormalities and teeth. The classification of radiography images was performed based on five different levels: anatomical location, peripheral characteristics, radiodensity, effects on the surrounding structure, and the abnormality category. This first-of-its-kind multimodal dataset also includes the radiologist's expertise captured in the form of eye-tracking and think-aloud protocol. The contributions of this work are 1) publicly available dataset that can help researchers to incorporate human expertise into AI and achieve more robust and accurate abnormality detection; 2) a benchmark performance analysis for various state-of-the-art systems for dental radiograph image enhancement and image segmentation using deep learning; 3) an in-depth review of various panoramic dental image datasets, along with segmentation and detection systems. The release of this dataset aims to propel the development of AI-powered automated abnormality detection and classification in dental panoramic radiographs, enhance tooth segmentation algorithms, and the ability to distill the radiologist's expertise into AI.
Collapse
|
8
|
Karargyris A, Kashyap S, Lourentzou I, Wu JT, Sharma A, Tong M, Abedin S, Beymer D, Mukherjee V, Krupinski EA, Moradi M. Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development. Sci Data 2021; 8:92. [PMID: 33767191 PMCID: PMC7994908 DOI: 10.1038/s41597-021-00863-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/09/2021] [Indexed: 12/15/2022] Open
Abstract
We developed a rich dataset of Chest X-Ray (CXR) images to assist investigators in artificial intelligence. The data were collected using an eye-tracking system while a radiologist reviewed and reported on 1,083 CXR images. The dataset contains the following aligned data: CXR image, transcribed radiology report text, radiologist's dictation audio and eye gaze coordinates data. We hope this dataset can contribute to various areas of research particularly towards explainable and multimodal deep learning/machine learning methods. Furthermore, investigators in disease classification and localization, automated radiology report generation, and human-machine interaction can benefit from these data. We report deep learning experiments that utilize the attention maps produced by the eye gaze dataset to show the potential utility of this dataset.
Collapse
Affiliation(s)
| | | | - Ismini Lourentzou
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Joy T Wu
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
| | - Arjun Sharma
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
| | - Matthew Tong
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
| | - Shafiq Abedin
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
| | - David Beymer
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA
| | | | - Elizabeth A Krupinski
- Department of Radiology and Imaging Sciences, Emory University, Atlanta, GA, 30322, USA
| | - Mehdi Moradi
- IBM Research, Almaden Research Center, San Jose, CA, 95120, USA.
| |
Collapse
|
9
|
Stember JN, Celik H, Krupinski E, Chang PD, Mutasa S, Wood BJ, Lignelli A, Moonis G, Schwartz LH, Jambawalikar S, Bagci U. Eye Tracking for Deep Learning Segmentation Using Convolutional Neural Networks. J Digit Imaging 2020; 32:597-604. [PMID: 31044392 PMCID: PMC6646645 DOI: 10.1007/s10278-019-00220-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Deep learning with convolutional neural networks (CNNs) has experienced tremendous growth in multiple healthcare applications and has been shown to have high accuracy in semantic segmentation of medical (e.g., radiology and pathology) images. However, a key barrier in the required training of CNNs is obtaining large-scale and precisely annotated imaging data. We sought to address the lack of annotated data with eye tracking technology. As a proof of principle, our hypothesis was that segmentation masks generated with the help of eye tracking (ET) would be very similar to those rendered by hand annotation (HA). Additionally, our goal was to show that a CNN trained on ET masks would be equivalent to one trained on HA masks, the latter being the current standard approach. Step 1: Screen captures of 19 publicly available radiologic images of assorted structures within various modalities were analyzed. ET and HA masks for all regions of interest (ROIs) were generated from these image datasets. Step 2: Utilizing a similar approach, ET and HA masks for 356 publicly available T1-weighted postcontrast meningioma images were generated. Three hundred six of these image + mask pairs were used to train a CNN with U-net-based architecture. The remaining 50 images were used as the independent test set. Step 1: ET and HA masks for the nonneurological images had an average Dice similarity coefficient (DSC) of 0.86 between each other. Step 2: Meningioma ET and HA masks had an average DSC of 0.85 between each other. After separate training using both approaches, the ET approach performed virtually identically to HA on the test set of 50 images. The former had an area under the curve (AUC) of 0.88, while the latter had AUC of 0.87. ET and HA predictions had trimmed mean DSCs compared to the original HA maps of 0.73 and 0.74, respectively. These trimmed DSCs between ET and HA were found to be statistically equivalent with a p value of 0.015. We have demonstrated that ET can create segmentation masks suitable for deep learning semantic segmentation. Future work will integrate ET to produce masks in a faster, more natural manner that distracts less from typical radiology clinical workflow.
Collapse
Affiliation(s)
- J N Stember
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA.
| | - H Celik
- The National Institutes of Health, Clinical Center, Bethesda, MD, 20892, USA
| | - E Krupinski
- Department of Radiology & Imaging Sciences, Emory University, Atlanta, GA, 30322, USA
| | - P D Chang
- Department of Radiology, University of California, Irvine, CA, 92697, USA
| | - S Mutasa
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA
| | - B J Wood
- The National Institutes of Health, Clinical Center, Bethesda, MD, 20892, USA
| | - A Lignelli
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA
| | - G Moonis
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA
| | - L H Schwartz
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA
| | - S Jambawalikar
- Department of Radiology, Columbia University Medical Center - NYPH, New York, NY, 10032, USA
| | - U Bagci
- Center for Research in Computer Vision, University of Central Florida, 4328 Scorpius St. HEC 221, Orlando, FL, 32816, USA
| |
Collapse
|
10
|
Ahsan R, Ebrahimi M. Image processing techniques represent innovative tools for comparative analysis of proteins. Comput Biol Med 2019; 117:103584. [PMID: 32072976 DOI: 10.1016/j.compbiomed.2019.103584] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/10/2019] [Accepted: 12/12/2019] [Indexed: 01/09/2023]
Abstract
Different bioinformatic and data-mining approaches have been used for the analysis of proteins. Here, we describe a novel, robust, and reliable approach for comparative analysis of a large number of proteins by combining Image Processing Techniques and Convolutional Deep Neural Network (IPT-CNN). As proof of principle, we used IPT-CNN to predict different subtypes of Influenza A virus (IAV). Over 8000 sequences of surface proteins haemagglutinin (HA) and neuraminidase (NA) from different IAV subtypes were used to create polynomial or binary vector datasets. The datasets were then converted into binary images. Analysis of these images enabled the classification of IAV subtypes with 100% accuracy and, compared to non-image-based approaches, within a shorter time frame. The proteome-based IPT-CNN approach described here may be used for analysis and proteome-based classification of other proteins.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Information Technology, School of Engineering, University of Qom, Qom, Iran
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran; School of Agriculture and Veterinary Sciences, University of Adelaide, Adelaide, Australia.
| |
Collapse
|