1
|
Zhao J, Zhou Y, Chen Z, Fu H, Wan L. Topicwise Separable Sentence Retrieval for Medical Report Generation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:1505-1517. [PMID: 40030345 DOI: 10.1109/tmi.2024.3507076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/18/2025]
Abstract
Automated radiology reporting holds immense clinical potential in alleviating the burdensome workload of radiologists and mitigating diagnostic bias. Recently, retrieval-based report generation methods have garnered increasing attention. These methods predefine a set of candidate queries and compose reports by searching for sentences in an off-the-shelf sentence gallery that best match these candidate queries. However, due to the long-tail distribution of the training data, these models tend to learn frequently occurring sentences and topics, overlooking the rare topics. Regrettably, in many cases, the descriptions of rare topics often indicate critical findings that should be mentioned in the report. To address this problem, we introduce a Topicwise Separable Sentence Retrieval (Teaser) for medical report generation. To ensure comprehensive learning of both common and rare topics, we categorize queries into common and rare types to learn differentiated topics, and then propose Topic Contrastive Loss to effectively align topics and queries in the latent space. Moreover, we integrate an Abstractor module following the extraction of visual features, which aids the topic decoder in gaining a deeper understanding of the visual observational intent. Experiments on the MIMIC-CXR and IU X-ray datasets demonstrate that Teaser surpasses state-of-the-art models, while also validating its capability to effectively represent rare topics and establish more dependable correspondences between queries and topics. The code is available at https://github.com/CindyZJT/Teaser.git.
Collapse
|
2
|
Feng CM, Yang Z, Fu H, Xu Y, Yang J, Shao L. DONet: Dual-Octave Network for Fast MR Image Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3965-3975. [PMID: 34197326 DOI: 10.1109/tnnls.2021.3090303] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this article, we propose the dual-octave network (DONet), which is capable of learning multiscale spatial-frequency features from both the real and imaginary components of MR data, for parallel fast MR image reconstruction. More specifically, our DONet consists of a series of dual-octave convolutions (Dual-OctConvs), which are connected in a dense manner for better reuse of features. In each Dual-OctConv, the input feature maps and convolutional kernels are first split into two components (i.e., real and imaginary) and then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intragroup information updating and intergroup information exchange to aggregate the contextual information across different groups. Our framework provides three appealing benefits: 1) it encourages information interaction and fusion between the real and imaginary components at various spatial frequencies to achieve richer representational capacity; 2) the dense connections between the real and imaginary groups in each Dual-OctConv make the propagation of features more efficient by feature reuse; and 3) DONet enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. Extensive experiments on two popular datasets (i.e., clinical knee and fastMRI), under different undersampling patterns and acceleration factors, demonstrate the superiority of our model in accelerated parallel MR image reconstruction.
Collapse
|
3
|
Zhang M, Hu X, Gu L, Liu L, Kobayashi K, Harada T, Yan Y, Summers RM, Zhu Y. A New Benchmark: Clinical Uncertainty and Severity Aware Labeled Chest X-Ray Images With Multi-Relationship Graph Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2025; 44:338-347. [PMID: 39120990 DOI: 10.1109/tmi.2024.3441494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.
Collapse
|
4
|
Huang Q, Li G. Knowledge graph based reasoning in medical image analysis: A scoping review. Comput Biol Med 2024; 182:109100. [PMID: 39244959 DOI: 10.1016/j.compbiomed.2024.109100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 08/04/2024] [Accepted: 08/31/2024] [Indexed: 09/10/2024]
Abstract
Automated computer-aided diagnosis (CAD) is becoming more significant in the field of medicine due to advancements in computer hardware performance and the progress of artificial intelligence. The knowledge graph is a structure for visually representing knowledge facts. In the last decade, a large body of work based on knowledge graphs has effectively improved the organization and interpretability of large-scale complex knowledge. Introducing knowledge graph inference into CAD is a research direction with significant potential. In this review, we briefly review the basic principles and application methods of knowledge graphs firstly. Then, we systematically organize and analyze the research and application of knowledge graphs in medical imaging-assisted diagnosis. We also summarize the shortcomings of the current research, such as medical data barriers and deficiencies, low utilization of multimodal information, and weak interpretability. Finally, we propose future research directions with possibilities and potentials to address the shortcomings of current approaches.
Collapse
Affiliation(s)
- Qinghua Huang
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China.
| | - Guanghui Li
- School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, 127 West Youyi Road, Beilin District, Xi'an, 710072, Shaanxi, China; School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang'an District, Xi'an, 710129, Shaanxi, China.
| |
Collapse
|
5
|
Hu X, Gu L, Kobayashi K, Liu L, Zhang M, Harada T, Summers RM, Zhu Y. Interpretable medical image Visual Question Answering via multi-modal relationship graph learning. Med Image Anal 2024; 97:103279. [PMID: 39079429 DOI: 10.1016/j.media.2024.103279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 05/14/2024] [Accepted: 07/15/2024] [Indexed: 08/30/2024]
Abstract
Medical Visual Question Answering (VQA) is an important task in medical multi-modal Large Language Models (LLMs), aiming to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. However, existing medical VQA datasets are small and only contain simple questions (equivalent to classification tasks), which lack semantic reasoning and clinical knowledge. Our previous work proposed a clinical knowledge-driven image difference VQA benchmark using a rule-based approach (Hu et al., 2023). However, given the same breadth of information coverage, the rule-based approach shows an 85% error rate on extracted labels. We trained an LLM method to extract labels with 62% increased accuracy. We also comprehensively evaluated our labels with 2 clinical experts on 100 samples to help us fine-tune the LLM. Based on the trained LLM model, we proposed a large-scale medical VQA dataset, Medical-CXR-VQA, using LLMs focused on chest X-ray images. The questions involved detailed information, such as abnormalities, locations, levels, and types. Based on this dataset, we proposed a novel VQA method by constructing three different relationship graphs: spatial relationships, semantic relationships, and implicit relationship graphs on the image regions, questions, and semantic labels. We leveraged graph attention to learn the logical reasoning paths for different questions. These learned graph VQA reasoning paths can be further used for LLM prompt engineering and chain-of-thought, which are crucial for further fine-tuning and training multi-modal large language models. Moreover, we demonstrate that our approach has the qualities of evidence and faithfulness, which are crucial in the clinical field. The code and the dataset is available at https://github.com/Holipori/Medical-CXR-VQA.
Collapse
Affiliation(s)
- Xinyue Hu
- The University of Texas Arlington, Arlington, 76010, TX, USA
| | - Lin Gu
- RIKEN, Tokyo, Japan; University of Tokyo, Tokyo, Japan
| | | | - Liangchen Liu
- National Institutes of Health Clinical Center, Bethesda, 20892, MD, USA
| | - Mengliang Zhang
- The University of Texas Arlington, Arlington, 76010, TX, USA
| | | | - Ronald M Summers
- National Institutes of Health Clinical Center, Bethesda, 20892, MD, USA
| | - Yingying Zhu
- The University of Texas Arlington, Arlington, 76010, TX, USA.
| |
Collapse
|
6
|
Zhang H, Liu J, Liu W, Chen H, Yu Z, Yuan Y, Wang P, Qin J. MHD-Net: Memory-Aware Hetero-Modal Distillation Network for Thymic Epithelial Tumor Typing With Missing Pathology Modality. IEEE J Biomed Health Inform 2024; 28:3003-3014. [PMID: 38470599 DOI: 10.1109/jbhi.2024.3376462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
Fusing multi-modal radiology and pathology data with complementary information can improve the accuracy of tumor typing. However, collecting pathology data is difficult since it is high-cost and sometimes only obtainable after the surgery, which limits the application of multi-modal methods in diagnosis. To address this problem, we propose comprehensively learning multi-modal radiology-pathology data in training, and only using uni-modal radiology data in testing. Concretely, a Memory-aware Hetero-modal Distillation Network (MHD-Net) is proposed, which can distill well-learned multi-modal knowledge with the assistance of memory from the teacher to the student. In the teacher, to tackle the challenge in hetero-modal feature fusion, we propose a novel spatial-differentiated hetero-modal fusion module (SHFM) that models spatial-specific tumor information correlations across modalities. As only radiology data is accessible to the student, we store pathology features in the proposed contrast-boosted typing memory module (CTMM) that achieves type-wise memory updating and stage-wise contrastive memory boosting to ensure the effectiveness and generalization of memory items. In the student, to improve the cross-modal distillation, we propose a multi-stage memory-aware distillation (MMD) scheme that reads memory-aware pathology features from CTMM to remedy missing modal-specific information. Furthermore, we construct a Radiology-Pathology Thymic Epithelial Tumor (RPTET) dataset containing paired CT and WSI images with annotations. Experiments on the RPTET and CPTAC-LUAD datasets demonstrate that MHD-Net significantly improves tumor typing and outperforms existing multi-modal methods on missing modality situations.
Collapse
|
7
|
Li J, Jiang P, An Q, Wang GG, Kong HF. Medical image identification methods: A review. Comput Biol Med 2024; 169:107777. [PMID: 38104516 DOI: 10.1016/j.compbiomed.2023.107777] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/30/2023] [Accepted: 11/28/2023] [Indexed: 12/19/2023]
Abstract
The identification of medical images is an essential task in computer-aided diagnosis, medical image retrieval and mining. Medical image data mainly include electronic health record data and gene information data, etc. Although intelligent imaging provided a good scheme for medical image analysis over traditional methods that rely on the handcrafted features, it remains challenging due to the diversity of imaging modalities and clinical pathologies. Many medical image identification methods provide a good scheme for medical image analysis. The concepts pertinent of methods, such as the machine learning, deep learning, convolutional neural networks, transfer learning, and other image processing technologies for medical image are analyzed and summarized in this paper. We reviewed these recent studies to provide a comprehensive overview of applying these methods in various medical image analysis tasks, such as object detection, image classification, image registration, segmentation, and other tasks. Especially, we emphasized the latest progress and contributions of different methods in medical image analysis, which are summarized base on different application scenarios, including classification, segmentation, detection, and image registration. In addition, the applications of different methods are summarized in different application area, such as pulmonary, brain, digital pathology, brain, skin, lung, renal, breast, neuromyelitis, vertebrae, and musculoskeletal, etc. Critical discussion of open challenges and directions for future research are finally summarized. Especially, excellent algorithms in computer vision, natural language processing, and unmanned driving will be applied to medical image recognition in the future.
Collapse
Affiliation(s)
- Juan Li
- School of Information Engineering, Wuhan Business University, Wuhan, 430056, China; School of Artificial Intelligence, Wuchang University of Technology, Wuhan, 430223, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
| | - Pan Jiang
- School of Information Engineering, Wuhan Business University, Wuhan, 430056, China
| | - Qing An
- School of Artificial Intelligence, Wuchang University of Technology, Wuhan, 430223, China
| | - Gai-Ge Wang
- School of Computer Science and Technology, Ocean University of China, Qingdao, 266100, China.
| | - Hua-Feng Kong
- School of Information Engineering, Wuhan Business University, Wuhan, 430056, China.
| |
Collapse
|
8
|
Sugibayashi T, Walston SL, Matsumoto T, Mitsuyama Y, Miki Y, Ueda D. Deep learning for pneumothorax diagnosis: a systematic review and meta-analysis. Eur Respir Rev 2023; 32:32/168/220259. [PMID: 37286217 DOI: 10.1183/16000617.0259-2022] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 03/16/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND Deep learning (DL), a subset of artificial intelligence (AI), has been applied to pneumothorax diagnosis to aid physician diagnosis, but no meta-analysis has been performed. METHODS A search of multiple electronic databases through September 2022 was performed to identify studies that applied DL for pneumothorax diagnosis using imaging. Meta-analysis via a hierarchical model to calculate the summary area under the curve (AUC) and pooled sensitivity and specificity for both DL and physicians was performed. Risk of bias was assessed using a modified Prediction Model Study Risk of Bias Assessment Tool. RESULTS In 56 of the 63 primary studies, pneumothorax was identified from chest radiography. The total AUC was 0.97 (95% CI 0.96-0.98) for both DL and physicians. The total pooled sensitivity was 84% (95% CI 79-89%) for DL and 85% (95% CI 73-92%) for physicians and the pooled specificity was 96% (95% CI 94-98%) for DL and 98% (95% CI 95-99%) for physicians. More than half of the original studies (57%) had a high risk of bias. CONCLUSIONS Our review found the diagnostic performance of DL models was similar to that of physicians, although the majority of studies had a high risk of bias. Further pneumothorax AI research is needed.
Collapse
Affiliation(s)
- Takahiro Sugibayashi
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Shannon L Walston
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Toshimasa Matsumoto
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
- Smart Life Science Lab, Center for Health Science Innovation, Osaka Metropolitan University, Osaka, Japan
| | - Yasuhito Mitsuyama
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Yukio Miki
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
| | - Daiju Ueda
- Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan
- Smart Life Science Lab, Center for Health Science Innovation, Osaka Metropolitan University, Osaka, Japan
| |
Collapse
|
9
|
Gül Y, Yaman S, Avcı D, Çilengir AH, Balaban M, Güler H. A Novel Deep Transfer Learning-Based Approach for Automated Pes Planus Diagnosis Using X-ray Image. Diagnostics (Basel) 2023; 13:diagnostics13091662. [PMID: 37175053 PMCID: PMC10178173 DOI: 10.3390/diagnostics13091662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/02/2023] [Accepted: 05/06/2023] [Indexed: 05/15/2023] Open
Abstract
Pes planus, colloquially known as flatfoot, is a deformity defined as the collapse, flattening or loss of the medial longitudinal arch of the foot. The first standard radiographic examination for diagnosing pes planus involves lateral and dorsoplantar weight-bearing radiographs. Recently, many artificial intelligence-based computer-aided diagnosis (CAD) systems and models have been developed for the detection of various diseases from radiological images. However, to the best of our knowledge, no model and system has been proposed in the literature for automated pes planus diagnosis using X-ray images. This study presents a novel deep learning-based model for automated pes planus diagnosis using X-ray images, a first in the literature. To perform this study, a new pes planus dataset consisting of weight-bearing X-ray images was collected and labeled by specialist radiologists. In the preprocessing stage, the number of X-ray images was augmented and then divided into 4 and 16 patches, respectively in a pyramidal fashion. Thus, a total of 21 images are obtained for each image, including 20 patches and one original image. These 21 images were then fed to the pre-trained MobileNetV2 and 21,000 features were extracted from the Logits layer. Among the extracted deep features, the most important 1312 features were selected using the proposed iterative ReliefF algorithm, and then classified with support vector machine (SVM). The proposed deep learning-based framework achieved 95.14% accuracy using 10-fold cross validation. The results demonstrate that our transfer learning-based model can be used as an auxiliary tool for diagnosing pes planus in clinical practice.
Collapse
Affiliation(s)
- Yeliz Gül
- Department of Radiology, Elazig Fethi Sekin City Hospital, 23280 Elazig, Turkey
| | - Süleyman Yaman
- Biomedical Department, Vocational School of Technical Sciences, Firat University, 23119 Elazig, Turkey
| | - Derya Avcı
- Department of Software Engineering, Technology Faculty, Firat University, 23119 Elazig, Turkey
| | - Atilla Hikmet Çilengir
- Department of Radiology, Faculty of Medicine, Izmir Democracy University, 35140 Izmir, Turkey
| | - Mehtap Balaban
- Department of Radiology, Faculty of Medicine, Ankara Yildirim Beyazit University, 06010 Ankara, Turkey
| | - Hasan Güler
- Electrical-Electronics Engineering Department, Engineering Faculty, Firat University, 23119 Elazig, Turkey
| |
Collapse
|
10
|
Liu B, Zhan LM, Xu L, Wu XM. Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1532-1545. [PMID: 37015503 DOI: 10.1109/tmi.2022.3232411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare services, the development of this technology is still in the initial stage. On the one hand, Med-VQA tasks are highly challenging due to the massive diversity of clinical questions that require different visual reasoning skills for different types of questions. On the other hand, medical images are complex in nature and very different from natural images, while current Med-VQA datasets are small-scale with a few hundred radiology images, making it difficult to train a well-performing visual feature extractor. This paper addresses above two critical issues. We propose a novel conditional reasoning mechanism with a question-conditioned reasoning component and a type-conditioned reasoning strategy to learn effective reasoning skills for different Med-VQA tasks adaptively. Further, we propose to pre-train a visual feature extractor for Med-VQA via contrastive learning on large amounts of unlabeled radiology images. The effectiveness of our proposals is validated by extensive experiments on existing Med-VQA benchmarks, which show significant improvement of our model in prediction accuracy over state-of-the-art methods. The source code and pre-training dataset are provided at https://github.com/Awenbocc/CPCR.
Collapse
|
11
|
Completion-Attention Ladder Network for Few-Shot Underwater Acoustic Recognition. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11214-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2023]
|
12
|
Peng L, Wang N, Xu J, Zhu X, Li X. GATE: Graph CCA for Temporal Self-Supervised Learning for Label-Efficient fMRI Analysis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:391-402. [PMID: 36018878 DOI: 10.1109/tmi.2022.3201974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this work, we focus on the challenging task, neuro-disease classification, using functional magnetic resonance imaging (fMRI). In population graph-based disease analysis, graph convolutional neural networks (GCNs) have achieved remarkable success. However, these achievements are inseparable from abundant labeled data and sensitive to spurious signals. To improve fMRI representation learning and classification under a label-efficient setting, we propose a novel and theory-driven self-supervised learning (SSL) framework on GCNs, namely Graph CCA for Temporal sElf-supervised learning on fMRI analysis (GATE). Concretely, it is demanding to design a suitable and effective SSL strategy to extract formation and robust features for fMRI. To this end, we investigate several new graph augmentation strategies from fMRI dynamic functional connectives (FC) for SSL training. Further, we leverage canonical-correlation analysis (CCA) on different temporal embeddings and present the theoretical implications. Consequently, this yields a novel two-step GCN learning procedure comprised of (i) SSL on an unlabeled fMRI population graph and (ii) fine-tuning on a small labeled fMRI dataset for a classification task. Our method is tested on two independent fMRI datasets, demonstrating superior performance on autism and dementia diagnosis. Our code is available at https://github.com/LarryUESTC/GATE.
Collapse
|
13
|
Wang S, Lin M, Ghosal T, Ding Y, Peng Y. Knowledge Graph Applications in Medical Imaging Analysis: A Scoping Review. HEALTH DATA SCIENCE 2022; 2022:9841548. [PMID: 35800847 PMCID: PMC9259200 DOI: 10.34133/2022/9841548] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Accepted: 05/22/2022] [Indexed: 11/06/2022]
Abstract
Background There is an increasing trend to represent domain knowledge in structured graphs, which provide efficient knowledge representations for many downstream tasks. Knowledge graphs are widely used to model prior knowledge in the form of nodes and edges to represent semantically connected knowledge entities, which several works have adopted into different medical imaging applications. Methods We systematically searched over five databases to find relevant articles that applied knowledge graphs to medical imaging analysis. After screening, evaluating, and reviewing the selected articles, we performed a systematic analysis. Results We looked at four applications in medical imaging analysis, including disease classification, disease localization and segmentation, report generation, and image retrieval. We also identified limitations of current work, such as the limited amount of available annotated data and weak generalizability to other tasks. We further identified the potential future directions according to the identified limitations, including employing semisupervised frameworks to alleviate the need for annotated data and exploring task-agnostic models to provide better generalizability. Conclusions We hope that our article will provide the readers with aggregated documentation of the state-of-the-art knowledge graph applications for medical imaging to encourage future research.
Collapse
Affiliation(s)
- Song Wang
- The University of Texas at Austin, Austin, USA
| | - Mingquan Lin
- Population Health Sciences, Weill Cornell Medicine, New York, USA
| | - Tirthankar Ghosal
- Institute of Formal and Applied Linguistics, Charles University, Czechia, Czech Republic
| | - Ying Ding
- The University of Texas at Austin, Austin, USA
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, USA
| |
Collapse
|
14
|
Zhao H, Fang Z, Ren J, MacLellan C, Xia Y, Li S, Sun M, Ren K. SC2Net: A Novel Segmentation-based Classification Network for Detection of COVID-19 in Chest X-ray Images. IEEE J Biomed Health Inform 2022; 26:4032-4043. [PMID: 35613061 DOI: 10.1109/jbhi.2022.3177854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The pandemic of COVID-19 has become a global crisis in public health, which has led to a massive number of deaths and severe economic degradation. To suppress the spread of COVID-19, accurate diagnosis at an early stage is crucial. As the popularly used real-time reverse transcriptase polymerase chain reaction (RT-PCR) swab test can be lengthy and inaccurate, chest screening with radiography imaging is still preferred. However, due to limited image data and the difficulty of the early-stage diagnosis, existing models suffer from ineffective feature extraction and poor network convergence and optimisation. To tackle these issues, a segmentation-based COVID-19 classification network, namely SC2Net, is proposed for effective detection of the COVID-19 from chest x-ray (CXR) images. The SC2Net consists of two subnets: a COVID-19 lung segmentation network (CLSeg), and a spatial attention network (SANet). In order to supress the interference from the background, the CLSeg is first applied to segment the lung region from the CXR. The segmented lung region is then fed to the SANet for classification and diagnosis of the COVID-19. As a shallow yet effective classifier, SANet takes the ResNet-18 as the feature extractor and enhances highlevel feature via the proposed spatial attention module. For performance evaluation, the COVIDGR 1.0 dataset is used, which is a high-quality dataset with various severity levels of the COVID-19. Experimental results have shown that, our SC2Net has an average accuracy of 84.23% and an average F1 score of 81.31% in detection of COVID-19, outperforming several state-of-the-art approaches.
Collapse
|
15
|
Zhou T, Li L, Li X, Feng CM, Li J, Shao L. Group-Wise Learning for Weakly Supervised Semantic Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:799-811. [PMID: 34910633 DOI: 10.1109/tip.2021.3132834] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Acquiring sufficient ground-truth supervision to train deep visual models has been a bottleneck over the years due to the data-hungry nature of deep learning. This is exacerbated in some structured prediction tasks, such as semantic segmentation, which require pixel-level annotations. This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. To achieve this, we propose, for the first time, a novel group-wise learning framework for WSSS. The framework explicitly encodes semantic dependencies in a group of images to discover rich semantic context for estimating more reliable pseudo ground-truths, which are subsequently employed to train more effective segmentation models. In particular, we solve the group-wise learning within a graph neural network (GNN), wherein input images are represented as graph nodes, and the underlying relations between a pair of images are characterized by graph edges. We then formulate semantic mining as an iterative reasoning process which propagates the common semantics shared by a group of images to enrich node representations. Moreover, in order to prevent the model from paying excessive attention to common semantics, we further propose a graph dropout layer to encourage the graph model to capture more accurate and complete object responses. With the above efforts, our model lays the foundation for more sophisticated and flexible group-wise semantic mining. We conduct comprehensive experiments on the popular PASCAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance. In addition, our model shows promising performance in weakly supervised object localization (WSOL) on the CUB-200-2011 dataset, demonstrating strong generalizability. Our code is available at: https://github.com/Lixy1997/Group-WSSS.
Collapse
|
16
|
Li X, Jiang Y, Liu Y, Zhang J, Yin S, Luo H. RAGCN: Region Aggregation Graph Convolutional Network for Bone Age Assessment From X-Ray Images. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2022; 71:1-12. [DOI: 10.1109/tim.2022.3190025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2024]
Affiliation(s)
- Xiang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, China
| | - Yuchen Jiang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, China
| | - Yiliu Liu
- Department of Mechanical and Industrial Engineering, Faculty of Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jiusi Zhang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, China
| | - Shen Yin
- Department of Mechanical and Industrial Engineering, Faculty of Engineering, Norwegian University of Science and Technology, Trondheim, Norway
| | - Hao Luo
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, China
| |
Collapse
|