51
|
Shen W, Wang Y, Liu M, Wang J, Ding R, Zhang Z, Meijering E. Branch Aggregation Attention Network for Robotic Surgical Instrument Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3408-3419. [PMID: 37342952 DOI: 10.1109/tmi.2023.3288127] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2023]
Abstract
Surgical instrument segmentation is of great significance to robot-assisted surgery, but the noise caused by reflection, water mist, and motion blur during the surgery as well as the different forms of surgical instruments would greatly increase the difficulty of precise segmentation. A novel method called Branch Aggregation Attention network (BAANet) is proposed to address these challenges, which adopts a lightweight encoder and two designed modules, named Branch Balance Aggregation module (BBA) and Block Attention Fusion module (BAF), for efficient feature localization and denoising. By introducing the unique BBA module, features from multiple branches are balanced and optimized through a combination of addition and multiplication to complement strengths and effectively suppress noise. Furthermore, to fully integrate the contextual information and capture the region of interest, the BAF module is proposed in the decoder, which receives adjacent feature maps from the BBA module and localizes the surgical instruments from both global and local perspectives by utilizing a dual branch attention mechanism. According to the experimental results, the proposed method has the advantage of being lightweight while outperforming the second-best method by 4.03%, 1.53%, and 1.34% in mIoU scores on three challenging surgical instrument datasets, respectively, compared to the existing state-of-the-art methods. Code is available at https://github.com/SWT-1014/BAANet.
Collapse
|
52
|
Yang J, Jiao L, Shang R, Liu X, Li R, Xu L. EPT-Net: Edge Perception Transformer for 3D Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3229-3243. [PMID: 37216246 DOI: 10.1109/tmi.2023.3278461] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
The convolutional neural network has achieved remarkable results in most medical image seg- mentation applications. However, the intrinsic locality of convolution operation has limitations in modeling the long-range dependency. Although the Transformer designed for sequence-to-sequence global prediction was born to solve this problem, it may lead to limited positioning capability due to insufficient low-level detail features. Moreover, low-level features have rich fine-grained information, which greatly impacts edge segmentation decisions of different organs. However, a simple CNN module is difficult to capture the edge information in fine-grained features, and the computational power and memory consumed in processing high-resolution 3D features are costly. This paper proposes an encoder-decoder network that effectively combines edge perception and Transformer structure to segment medical images accurately, called EPT-Net. Under this framework, this paper proposes a Dual Position Transformer to enhance the 3D spatial positioning ability effectively. In addition, as low-level features contain detailed information, we conduct an Edge Weight Guidance module to extract edge information by minimizing the edge information function without adding network parameters. Furthermore, we verified the effectiveness of the proposed method on three datasets, including SegTHOR 2019, Multi-Atlas Labeling Beyond the Cranial Vault and the re-labeled KiTS19 dataset called KiTS19-M by us. The experimental results show that EPT-Net has significantly improved compared with the state-of-the-art medical image segmentation method.
Collapse
|
53
|
Tian Y, Zhang Z, Zhao B, Liu L, Liu X, Feng Y, Tian J, Kou D. Coarse-to-fine prior-guided attention network for multi-structure segmentation on dental panoramic radiographs. Phys Med Biol 2023; 68:215010. [PMID: 37816372 DOI: 10.1088/1361-6560/ad0218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/10/2023] [Indexed: 10/12/2023]
Abstract
Objective. Accurate segmentation of various anatomical structures from dental panoramic radiographs is essential for the diagnosis and treatment planning of various diseases in digital dentistry. In this paper, we propose a novel deep learning-based method for accurate and fully automatic segmentation of the maxillary sinus, mandibular condyle, mandibular nerve, alveolar bone and teeth on panoramic radiographs.Approach. A two-stage coarse-to-fine prior-guided segmentation framework is proposed to segment multiple structures on dental panoramic radiographs. In the coarse stage, a multi-label segmentation network is used to generate the coarse segmentation mask, and in the fine-tuning stage, a prior-guided attention network with an encoder-decoder architecture is proposed to precisely predict the mask of each anatomical structure. First, a prior-guided edge fusion module is incorporated into the network at the input of each convolution level of the encode path to generate edge-enhanced image feature maps. Second, a prior-guided spatial attention module is proposed to guide the network to extract relevant spatial features from foreground regions based on the combination of the prior information and the spatial attention mechanism. Finally, a prior-guided hybrid attention module is integrated at the bottleneck of the network to explore global context from both spatial and category perspectives.Main results. We evaluated the segmentation performance of our method on a testing dataset that contains 150 panoramic radiographs collected from real-world clinical scenarios. The segmentation results indicate that our proposed method achieves more accurate segmentation performance compared with state-of-the-art methods. The average Jaccard scores are 87.91%, 85.25%, 63.94%, 93.46% and 88.96% for the maxillary sinus, mandibular condyle, mandibular nerve, alveolar bone and teeth, respectively.Significance. The proposed method was able to accurately segment multiple structures on panoramic radiographs. This method has the potential to be part of the process of automatic pathology diagnosis from dental panoramic radiographs.
Collapse
Affiliation(s)
- Yuan Tian
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Zhejia Zhang
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Bailiang Zhao
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Lichao Liu
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Xiaolin Liu
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Yang Feng
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Jie Tian
- Angelalign Inc. No. 500 Zhengli Road, Yangpu District, Shanghai, People's Republic of China
| | - Dazhi Kou
- Shanghai Supercomputer Center. No. 585 Guoshoujing Road, Pudong New District, Shanghai, People's Republic of China
| |
Collapse
|
54
|
Kuang H, Wang Y, Liang Y, Liu J, Wang J. BEA-Net: Body and Edge Aware Network With Multi-Scale Short-Term Concatenation for Medical Image Segmentation. IEEE J Biomed Health Inform 2023; 27:4828-4839. [PMID: 37578920 DOI: 10.1109/jbhi.2023.3304662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/16/2023]
Abstract
Medical image segmentation is indispensable for diagnosis and prognosis of many diseases. To improve the segmentation performance, this study proposes a new 2D body and edge aware network with multi-scale short-term concatenation for medical image segmentation. Multi-scale short-term concatenation modules which concatenate successive convolution layers with different receptive fields, are proposed for capturing multi-scale representations with fewer parameters. Body generation modules with feature adjustment based on weight map computing via enlarging the receptive fields, and edge generation modules with multi-scale convolutions using Sobel kernels for edge detection, are proposed to separately learn body and edge features from convolutional features in decoders, making the proposed network be body and edge aware. Based on the body and edge modules, we design parallel body and edge decoders whose outputs are fused to achieve the final segmentation. Besides, deep supervision from the body and edge decoders is applied to ensure the effectiveness of the generated body and edge features and further improve the final segmentation. The proposed method is trained and evaluated on six public medical image segmentation datasets to show its effectiveness and generality. Experimental results show that the proposed method achieves better average Dice similarity coefficient and 95% Hausdorff distance than several benchmarks on all used datasets. Ablation studies validate the effectiveness of the proposed multi-scale representation learning modules, body and edge generation modules and deep supervision.
Collapse
|
55
|
Lu X, Liu X, Xiao Z, Zhang S, Huang J, Yang C, Liu S. Self-supervised dual-head attentional bootstrap learning network for prostate cancer screening in transrectal ultrasound images. Comput Biol Med 2023; 165:107337. [PMID: 37672927 DOI: 10.1016/j.compbiomed.2023.107337] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 07/13/2023] [Accepted: 08/07/2023] [Indexed: 09/08/2023]
Abstract
Current convolutional neural network-based ultrasound automatic classification models for prostate cancer often rely on extensive manual labeling. Although Self-supervised Learning (SSL) have shown promise in addressing this problem, those data that from medical scenarios contains intra-class similarity conflicts, so using loss calculations directly that include positive and negative sample pairs can mislead training. SSL method tends to focus on global consistency at the image level and does not consider the internal informative relationships of the feature map. To improve the efficiency of prostate cancer diagnosis, using SSL method to learn key diagnostic information in ultrasound images, we proposed a self-supervised dual-head attentional bootstrap learning network (SDABL), including Online-Net and Target-Net. Self-Position Attention Module (SPAM) and adaptive maximum channel attention module (CAAM) are inserted in both paths simultaneously. They captures position and inter-channel attention and of the original feature map with a small number of parameters, solve the information optimization problem of feature maps in SSL. In loss calculations, we discard the construction of negative sample pairs, and instead guide the network to learn the consistency of the location space and channel space by drawing closer to the embedding representation of positive samples continuously. We conducted numerous experiments on the prostate Transrectal ultrasound (TRUS) dataset, experiments show that our SDABL pre-training method has significant advantages over both mainstream contrast learning methods and other attention-based methods. Specifically, the SDABL pre-trained backbone achieves 80.46% accuracy on our TRUS dataset after fine-tuning.
Collapse
Affiliation(s)
- Xu Lu
- Guangdong Polytechnic Normal University, Guangzhou 510665, China; Guangdong Provincial Key Laboratory of Intellectual Property & Big Data, Guangzhou 510665, China; Pazhou Lab, Guangzhou 510330, China
| | - Xiangjun Liu
- Guangdong Polytechnic Normal University, Guangzhou 510665, China
| | - Zhiwei Xiao
- Guangdong Polytechnic Normal University, Guangzhou 510665, China
| | - Shulian Zhang
- Guangdong Polytechnic Normal University, Guangzhou 510665, China
| | - Jun Huang
- Department of Ultrasonography, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China.
| | - Chuan Yang
- Department of Ultrasonography, The First Affiliated Hospital of Jinan University, Guangzhou 510630, China.
| | - Shaopeng Liu
- Guangdong Polytechnic Normal University, Guangzhou 510665, China.
| |
Collapse
|
56
|
Tian M, Wang H, Liu X, Ye Y, Ouyang G, Shen Y, Li Z, Wang X, Wu S. Delineation of clinical target volume and organs at risk in cervical cancer radiotherapy by deep learning networks. Med Phys 2023; 50:6354-6365. [PMID: 37246619 DOI: 10.1002/mp.16468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 04/17/2023] [Accepted: 04/28/2023] [Indexed: 05/30/2023] Open
Abstract
PURPOSE Delineation of the clinical target volume (CTV) and organs-at-risk (OARs) is important in cervical cancer radiotherapy. But it is generally labor-intensive, time-consuming, and subjective. This paper proposes a parallel-path attention fusion network (PPAF-net) to overcome these disadvantages in the delineation task. METHODS The PPAF-net utilizes both the texture and structure information of CTV and OARs by employing a U-Net network to capture the high-level texture information, and an up-sampling and down-sampling (USDS) network to capture the low-level structure information to accentuate the boundaries of CTV and OARs. Multi-level features extracted from both networks are then fused together through an attention module to generate the delineation result. RESULTS The dataset contains 276 computed tomography (CT) scans of patients with cervical cancer of staging IB-IIA. The images are provided by the West China Hospital of Sichuan University. Simulation results demonstrate that PPAF-net performs favorably on the delineation of the CTV and OARs (e.g., rectum, bladder and etc.) and achieves the state-of-the-art delineation accuracy, respectively, for the CTV and OARs. In terms of the Dice Similarity Coefficient (DSC) and the Hausdorff Distance (HD), 88.61% and 2.25 cm for the CTV, 92.27% and 0.73 cm for the rectum, 96.74% and 0.68 cm for the bladder, 96.38% and 0.65 cm for the left kidney, 96.79% and 0.63 cm for the right kidney, 93.42% and 0.52 cm for the left femoral head, 93.69% and 0.51 cm for the right femoral head, 87.53% and 1.07 cm for the small intestine, and 91.50% and 0.84 cm for the spinal cord. CONCLUSIONS The proposed automatic delineation network PPAF-net performs well on CTV and OARs segmentation tasks, which has great potential for reducing the burden of radiation oncologists and increasing the accuracy of delineation. In future, radiation oncologists from the West China Hospital of Sichuan University will further evaluate the results of network delineation, making this method helpful in clinical practice.
Collapse
Affiliation(s)
- Miao Tian
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Hongqiu Wang
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Xingang Liu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Yuyun Ye
- Department of Electrical and Computer Engineering, University of Tulsa, Tulsa, USA
| | - Ganlu Ouyang
- Department of Radiation Oncology, Cancer Center, the West China Hospital of Sichuan University, Chengdu, China
| | - Yali Shen
- Department of Radiation Oncology, Cancer Center, the West China Hospital of Sichuan University, Chengdu, China
| | - Zhiping Li
- Department of Radiation Oncology, Cancer Center, the West China Hospital of Sichuan University, Chengdu, China
| | - Xin Wang
- Department of Radiation Oncology, Cancer Center, the West China Hospital of Sichuan University, Chengdu, China
| | - Shaozhi Wu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
57
|
Li W, Lu W, Chu J, Tian Q, Fan F. Confidence-guided mask learning for semi-supervised medical image segmentation. Comput Biol Med 2023; 165:107398. [PMID: 37688993 DOI: 10.1016/j.compbiomed.2023.107398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/29/2023] [Accepted: 08/26/2023] [Indexed: 09/11/2023]
Abstract
Semi-supervised learning aims to train a high-performance model with a minority of labeled data and a majority of unlabeled data. Existing methods mostly adopt the mechanism of prediction task to obtain precise segmentation maps with the constraints of consistency or pseudo-labels, whereas the mechanism usually fails to overcome confirmation bias. To address this issue, in this paper, we propose a novel Confidence-Guided Mask Learning (CGML) for semi-supervised medical image segmentation. Specifically, on the basis of the prediction task, we further introduce an auxiliary generation task with mask learning, which intends to reconstruct the masked images for extremely facilitating the model capability of learning feature representations. Moreover, a confidence-guided masking strategy is developed to enhance model discrimination in uncertain regions. Besides, we introduce a triple-consistency loss to enforce a consistent prediction of the masked unlabeled image, original unlabeled image and reconstructed unlabeled image for generating more reliable results. Extensive experiments on two datasets demonstrate that our proposed method achieves remarkable performance.
Collapse
Affiliation(s)
- Wenxue Li
- The School of Future Technology, Tianjin University, Tianjin, 300072, China
| | - Wei Lu
- The School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
| | - Jinghui Chu
- The School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
| | - Qi Tian
- Tianjin Children's Hospital, Tianjin, 300204, China
| | - Fugui Fan
- The School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
58
|
Li X, Luo G, Wang W, Wang K, Li S. Curriculum label distribution learning for imbalanced medical image segmentation. Med Image Anal 2023; 89:102911. [PMID: 37542795 DOI: 10.1016/j.media.2023.102911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 04/27/2023] [Accepted: 07/25/2023] [Indexed: 08/07/2023]
Abstract
Label distribution learning (LDL) has the potential to resolve boundary ambiguity in semantic segmentation tasks. However, existing LDL-based segmentation methods suffer from severe label distribution imbalance: the ambiguous label distributions contain a small fraction of the data, while the unambiguous label distributions occupy the majority of the data. The imbalanced label distributions induce model-biased distribution learning and make it challenging to accurately predict ambiguous pixels. In this paper, we propose a curriculum label distribution learning (CLDL) framework to address the above data imbalance problem by performing a novel task-oriented curriculum learning strategy. Firstly, the region label distribution learning (R-LDL) is proposed to construct more balanced label distributions and improves the imbalanced model learning. Secondly, a novel learning curriculum (TCL) is proposed to enable easy-to-hard learning in LDL-based segmentation by decomposing the segmentation task into multiple label distribution estimation tasks. Thirdly, the prior perceiving module (PPM) is proposed to effectively connect easy and hard learning stages based on the priors generated from easier stages. Benefiting from the balanced label distribution construction and prior perception, the proposed CLDL effectively conducts a curriculum learning-based LDL and significantly improves the imbalanced learning. We evaluated the proposed CLDL using the publicly available BRATS2018 and MM-WHS2017 datasets. The experimental results demonstrate that our method significantly improves different segmentation metrics compared to many state-of-the-art methods. The code will be available.1.
Collapse
Affiliation(s)
- Xiangyu Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Gongning Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Wei Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.
| | - Kuanquan Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Shuo Li
- Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH 44106, USA.
| |
Collapse
|
59
|
Shen L, Wang Q, Zhang Y, Qin F, Jin H, Zhao W. DSKCA-UNet: Dynamic selective kernel channel attention for medical image segmentation. Medicine (Baltimore) 2023; 102:e35328. [PMID: 37773842 PMCID: PMC10545043 DOI: 10.1097/md.0000000000035328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 08/31/2023] [Indexed: 10/01/2023] Open
Abstract
U-Net has attained immense popularity owing to its performance in medical image segmentation. However, it cannot be modeled explicitly over remote dependencies. By contrast, the transformer can effectively capture remote dependencies by leveraging the self-attention (SA) of the encoder. Although SA, an important characteristic of the transformer, can find correlations between them based on the original data, secondary computational complexity might retard the processing rate of high-dimensional data (such as medical images). Furthermore, SA is limited because the correlation between samples is overlooked; thus, there is considerable scope for improvement. To this end, based on Swin-UNet, we introduce a dynamic selective attention mechanism for the convolution kernels. The weight of each convolution kernel is calculated to fuse the results dynamically. This attention mechanism permits each neuron to adaptively modify its receptive field size in response to multiscale input information. A local cross-channel interaction strategy without dimensionality reduction was introduced, which effectively eliminated the influence of downscaling on learning channel attention. Through suitable cross-channel interactions, model complexity can be significantly reduced while maintaining its performance. Subsequently, the global interaction between the encoder features is used to extract more fine-grained features. Simultaneously, the mixed loss function of the weighted cross-entropy loss and Dice loss is used to alleviate category imbalances and achieve better results when the sample number is unbalanced. We evaluated our proposed method on abdominal multiorgan segmentation and cardiac segmentation datasets, achieving Dice similarity coefficient and 95% Hausdorff distance metrics of 80.30 and 14.55%, respectively, on the Synapse dataset and Dice similarity coefficient metrics of 90.80 on the ACDC dataset. The experimental results show that our proposed method has good generalization ability and robustness, and it is a powerful tool for medical image segmentation.
Collapse
Affiliation(s)
- Longfeng Shen
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Qiong Wang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Yingjie Zhang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Fenglan Qin
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, China
- Anhui Big-Data Research Center on University Management, Huaibei, China
| | - Hengjun Jin
- People’s Hospital of Huaibei City, Huaibei, China
| | - Wei Zhao
- People’s Hospital of Huaibei City, Huaibei, China
| |
Collapse
|
60
|
Zhao J, Xing Z, Chen Z, Wan L, Han T, Fu H, Zhu L. Uncertainty-Aware Multi-Dimensional Mutual Learning for Brain and Brain Tumor Segmentation. IEEE J Biomed Health Inform 2023; 27:4362-4372. [PMID: 37155398 DOI: 10.1109/jbhi.2023.3274255] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Existing segmentation methods for brain MRI data usually leverage 3D CNNs on 3D volumes or employ 2D CNNs on 2D image slices. We discovered that while volume-based approaches well respect spatial relationships across slices, slice-based methods typically excel at capturing fine local features. Furthermore, there is a wealth of complementary information between their segmentation predictions. Inspired by this observation, we develop an Uncertainty-aware Multi-dimensional Mutual learning framework to learn different dimensional networks simultaneously, each of which provides useful soft labels as supervision to the others, thus effectively improving the generalization ability. Specifically, our framework builds upon a 2D-CNN, a 2.5D-CNN, and a 3D-CNN, while an uncertainty gating mechanism is leveraged to facilitate the selection of qualified soft labels, so as to ensure the reliability of shared information. The proposed method is a general framework and can be applied to varying backbones. The experimental results on three datasets demonstrate that our method can significantly enhance the performance of the backbone network by notable margins, achieving a Dice metric improvement of 2.8% on MeniSeg, 1.4% on IBSR, and 1.3% on BraTS2020.
Collapse
|
61
|
Deng R, Liu Q, Cui C, Yao T, Long J, Asad Z, Womick RM, Zhu Z, Fogo AB, Zhao S, Yang H, Huo Y. Omni-Seg: A Scale-Aware Dynamic Network for Renal Pathological Image Segmentation. IEEE Trans Biomed Eng 2023; 70:2636-2644. [PMID: 37030838 PMCID: PMC10517077 DOI: 10.1109/tbme.2023.3260739] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2023]
Abstract
Comprehensive semantic segmentation on renal pathological images is challenging due to the heterogeneous scales of the objects. For example, on a whole slide image (WSI), the cross-sectional areas of glomeruli can be 64 times larger than that of the peritubular capillaries, making it impractical to segment both objects on the same patch, at the same scale. To handle this scaling issue, prior studies have typically trained multiple segmentation networks in order to match the optimal pixel resolution of heterogeneous tissue types. This multi-network solution is resource-intensive and fails to model the spatial relationship between tissue types. In this article, we propose the Omni-Seg network, a scale-aware dynamic neural network that achieves multi-object (six tissue types) and multi-scale (5× to 40× scale) pathological image segmentation via a single neural network. The contribution of this article is three-fold: (1) a novel scale-aware controller is proposed to generalize the dynamic neural network from single-scale to multi-scale; (2) semi-supervised consistency regularization of pseudo-labels is introduced to model the inter-scale correlation of unannotated tissue types into a single end-to-end learning paradigm; and (3) superior scale-aware generalization is evidenced by directly applying a model trained on human kidney images to mouse kidney images, without retraining. By learning from 150,000 human pathological image patches from six tissue types at three different resolutions, our approach achieved superior segmentation performance according to human visual assessment and evaluation of image-omics (i.e., spatial transcriptomics).
Collapse
|
62
|
Wang N, Lin S, Li X, Li K, Shen Y, Gao Y, Ma L. MISSU: 3D Medical Image Segmentation via Self-Distilling TransUNet. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:2740-2750. [PMID: 37018113 DOI: 10.1109/tmi.2023.3264433] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
U-Nets have achieved tremendous success in medical image segmentation. Nevertheless, it may have limitations in global (long-range) contextual interactions and edge-detail preservation. In contrast, the Transformer module has an excellent ability to capture long-range dependencies by leveraging the self-attention mechanism into the encoder. Although the Transformer module was born to model the long-range dependency on the extracted feature maps, it still suffers high computational and spatial complexities in processing high-resolution 3D feature maps. This motivates us to design an efficient Transformer-based UNet model and study the feasibility of Transformer-based network architectures for medical image segmentation tasks. To this end, we propose to self-distill a Transformer-based UNet for medical image segmentation, which simultaneously learns global semantic information and local spatial-detailed features. Meanwhile, a local multi-scale fusion block is first proposed to refine fine-grained details from the skipped connections in the encoder by the main CNN stem through self-distillation, only computed during training and removed at inference with minimal overhead. Extensive experiments on BraTS 2019 and CHAOS datasets show that our MISSU achieves the best performance over previous state-of-the-art methods. Code and models are available at: https://github.com/wangn123/MISSU.git.
Collapse
|
63
|
He Z, Wong ANN, Yoo JS. Co-ERA-Net: Co-Supervision and Enhanced Region Attention for Accurate Segmentation in COVID-19 Chest Infection Images. Bioengineering (Basel) 2023; 10:928. [PMID: 37627813 PMCID: PMC10451793 DOI: 10.3390/bioengineering10080928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/27/2023] Open
Abstract
Accurate segmentation of infected lesions in chest images remains a challenging task due to the lack of utilization of lung region information, which could serve as a strong location hint for infection. In this paper, we propose a novel segmentation network Co-ERA-Net for infections in chest images that leverages lung region information by enhancing supervised information and fusing multi-scale lung region and infection information at different levels. To achieve this, we introduce a Co-supervision scheme incorporating lung region information to guide the network to accurately locate infections within the lung region. Furthermore, we design an Enhanced Region Attention Module (ERAM) to highlight regions with a high probability of infection by incorporating infection information into the lung region information. The effectiveness of the proposed scheme is demonstrated using COVID-19 CT and X-ray datasets, with the results showing that the proposed schemes and modules are promising. Based on the baseline, the Co-supervision scheme, when integrated with lung region information, improves the Dice coefficient by 7.41% and 2.22%, and the IoU by 8.20% and 3.00% in CT and X-ray datasets respectively. Moreover, when this scheme is combined with the Enhanced Region Attention Module, the Dice coefficient sees further improvement of 14.24% and 2.97%, with the IoU increasing by 28.64% and 4.49% for the same datasets. In comparison with existing approaches across various datasets, our proposed method achieves better segmentation performance in all main metrics and exhibits the best generalization and comprehensive performance.
Collapse
Affiliation(s)
| | | | - Jung Sun Yoo
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China; (Z.H.); (A.N.N.W.)
| |
Collapse
|
64
|
Zhou H, Sun C, Huang H, Fan M, Yang X, Zhou L. Feature-guided attention network for medical image segmentation. Med Phys 2023; 50:4871-4886. [PMID: 36746870 DOI: 10.1002/mp.16253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 01/03/2023] [Accepted: 01/06/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND U-Net and its variations have achieved remarkable performances in medical image segmentation. However, they have two limitations. First, the shallow layer feature of the encoder always contains background noise. Second, semantic gaps exist between the features of the encoder and the decoder. Skip-connections directly connect the encoder to the decoder, which will lead to the fusion of semantically dissimilar feature maps. PURPOSE To overcome these two limitations, this paper proposes a novel medical image segmentation algorithm, called feature-guided attention network, which consists of U-Net, the cross-level attention filtering module (CAFM), and the attention-guided upsampling module (AUM). METHODS In the proposed method, the AUM and the CAFM were introduced into the U-Net, where the AUM learns to filter the background noise in the low-level feature map of the encoder and the CAFM tries to eliminate the semantic gap between the encoder and the decoder. Specifically, the AUM adopts a top-down pathway to use the high-level feature map so as to filter the background noise in the low-level feature map of the encoder. The AUM uses the encoder features to guide the upsampling of the corresponding decoder features, thus eliminating the semantic gap between them. Four medical image segmentation tasks, including coronary atherosclerotic plaque segmentation (Dataset A), retinal vessel segmentation (Dataset B), skin lesion segmentation (Dataset C), and multiclass retinal edema lesions segmentation (Dataset D), were used to validate the proposed method. RESULTS For Dataset A, the proposed method achieved higher Intersection over Union (IoU) (67.91 ± 3.82 % $67.91\pm 3.82\%$ ), dice (79.39 ± 3.37 % $79.39\pm 3.37\%$ ), accuracy (98.39 ± 0.34 % $98.39\pm 0.34\%$ ), and sensitivity (85.10 ± 3.74 % $85.10\pm 3.74\%$ ) than the previous best method: CA-Net. For Dataset B, the proposed method achieved higher sensitivity (83.50%) and accuracy (97.55%) than the previous best method: SCS-Net. For Dataset C, the proposed method had highest IoU (83.47 ± 0.41 % $83.47\pm 0.41\%$ ) and dice (90.81 ± 0.34 % $90.81\pm 0.34\%$ ) than those of all compared previous methods. For Dataset D, the proposed method had highest dice (average: 81.53%; retina edema area [REA]: 83.78%; pigment epithelial detachment [PED] 77.13%), sensitivity (REA: 89.01%; SRF: 85.50%), specificity (REA: 99.35%; PED: 100.00), and accuracy (98.73%) among all compared previous networks. In addition, the number of parameters of the proposed method was 2.43 M, which is less than CA-Net (3.21 M) and CPF-Net (3.07 M). CONCLUSIONS The proposed method demonstrated state-of-the-art performance, outperforming other top-notch medical image segmentation algorithms. The CAFM filtered the background noise in the low-level feature map of the encoder, while the AUM eliminated the semantic gap between the encoder and the decoder. Furthermore, the proposed method was of high computational efficiency.
Collapse
Affiliation(s)
- Hao Zhou
- National Key Laboratory of Science and Technology of Underwater Vehicle, Harbin Engineering University, Harbin, China
| | - Chaoyu Sun
- Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hai Huang
- National Key Laboratory of Science and Technology of Underwater Vehicle, Harbin Engineering University, Harbin, China
| | - Mingyu Fan
- College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China
| | - Xu Yang
- State Key Laboratory of Management and Control for Complex System, Institute of Automation, Chinese Academy of Sciences, Beijing, China
| | - Linxiao Zhou
- Fourth Affiliated Hospital, Harbin Medical University, Harbin, China
| |
Collapse
|
65
|
Shen L, Zhang Y, Wang Q, Qin F, Sun D, Min H, Meng Q, Xu C, Zhao W, Song X. Feature interaction network based on hierarchical decoupled convolution for 3D medical image segmentation. PLoS One 2023; 18:e0288658. [PMID: 37440581 DOI: 10.1371/journal.pone.0288658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Manual image segmentation consumes time. An automatic and accurate method to segment multimodal brain tumors using context information rich three-dimensional medical images that can be used for clinical treatment decisions and surgical planning is required. However, it is a challenge to use deep learning to achieve accurate segmentation of medical images due to the diversity of tumors and the complex boundary interactions between sub-regions while limited computing resources hinder the construction of efficient neural networks. We propose a feature fusion module based on a hierarchical decoupling convolution network and an attention mechanism to improve the performance of network segmentation. We replaced the skip connections of U-shaped networks with a feature fusion module to solve the category imbalance problem, thus contributing to the segmentation of more complicated medical images. We introduced a global attention mechanism to further integrate the features learned by the encoder and explore the context information. The proposed method was evaluated for enhance tumor, whole tumor, and tumor core, achieving Dice similarity coefficient metrics of 0.775, 0.900, and 0.827, respectively, on the BraTS 2019 dataset and 0.800, 0.902, and 0.841, respectively on the BraTS 2018 dataset. The results show that our proposed method is inherently general and is a powerful tool for brain tumor image studies. Our code is available at: https://github.com/WSake/Feature-interaction-network-based-on-Hierarchical-Decoupled-Convolution.
Collapse
Affiliation(s)
- Longfeng Shen
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- Anhui Big-Data Research Center on University Management, Huaibei, Anhui, China
| | - Yingjie Zhang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
| | - Qiong Wang
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
| | - Fenglan Qin
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
| | - Dengdi Sun
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- Anhui Provincial Key Laboratory of Multimodal Cognitive Computing, School of Artificial Intelligence, Anhui University, Hefei, China
| | - Hai Min
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, Anhui, China
| | - Qianqian Meng
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
| | - Chengzhen Xu
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior (ICACB), College of Computer Science and Technology, Huaibei Normal University, Huaibei, Anhui, China
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, China
| | - Wei Zhao
- Huaibei People's Hospital, Huaibei, Anhui, China
| | - Xin Song
- Huaibei People's Hospital, Huaibei, Anhui, China
| |
Collapse
|
66
|
Wang J, Ma X, Cao L, Leng Y, Li Z, Cheng Z, Cao Y, Huang X, Zheng J. DB-DCAFN: dual-branch deformable cross-attention fusion network for bacterial segmentation. Vis Comput Ind Biomed Art 2023; 6:13. [PMID: 37402101 DOI: 10.1186/s42492-023-00141-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 06/18/2023] [Indexed: 07/05/2023] Open
Abstract
Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from sputum smear images is important for improving diagnostic efficiency. However, this remains a challenging task owing to the high interclass similarity among different categories of bacteria and the low contrast of the bacterial edges. To explore more levels of global pattern features to promote the distinguishing ability of bacterial categories and maintain sufficient local fine-grained features to ensure accurate localization of ambiguous bacteria simultaneously, we propose a novel dual-branch deformable cross-attention fusion network (DB-DCAFN) for accurate bacterial segmentation. Specifically, we first designed a dual-branch encoder consisting of multiple convolution and transformer blocks in parallel to simultaneously extract multilevel local and global features. We then designed a sparse and deformable cross-attention module to capture the semantic dependencies between local and global features, which can bridge the semantic gap and fuse features effectively. Furthermore, we designed a feature assignment fusion module to enhance meaningful features using an adaptive feature weighting strategy to obtain more accurate segmentation. We conducted extensive experiments to evaluate the effectiveness of DB-DCAFN on a clinical dataset comprising three bacterial categories: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The experimental results demonstrate that the proposed DB-DCAFN outperforms other state-of-the-art methods and is effective at segmenting bacteria from sputum smear images.
Collapse
Affiliation(s)
- Jingkun Wang
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China
| | - Xinyu Ma
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China
| | - Long Cao
- Department of Infectious Diseases, the First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Yilin Leng
- Institute of Biomedical Engineering, School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
| | - Zeyi Li
- College of Computer and Information, Hohai University, Nanjing, 210098, China
| | - Zihan Cheng
- School of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun, 130022, China
| | - Yuzhu Cao
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China
- Jinan Guoke Medical Technology Development Co., Ltd, Jinan, 250101, China
| | - Xiaoping Huang
- Department of Infectious Diseases, the First Affiliated Hospital of Soochow University, Suzhou, 215006, China.
| | - Jian Zheng
- School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China.
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, 215163, China.
- Jinan Guoke Medical Technology Development Co., Ltd, Jinan, 250101, China.
| |
Collapse
|
67
|
Ding Y, Qin X, Zhang M, Geng J, Chen D, Deng F, Song C. RLSegNet: An Medical Image Segmentation Network Based on Reinforcement Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2565-2576. [PMID: 35914053 DOI: 10.1109/tcbb.2022.3195705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In the area of medical image segmentation, the spatial information can be further used to enhance the image segmentation performance. And the 3D convolution is mainly used to better utilize the spatial information. However, how to better utilize the spatial information in the 2D convolution is still a challenging task. In this paper, we propose an image segmentation network based on reinforcement learning (RLSegNet), which can translate the image segmentation process into a serial of decision-making problem. The proposed RLSegNet is a U-shaped network, which is composed of three components: the feature extraction network, the Mask Prediction Network (MPNet), and the up-sampling network with the cascade attention module. The deep semantic feature in the image is first extracted by adopting the feature extraction network. Then, the Mask Prediction Network (MPNet) is proposed to generate the prediction mask for the current frame based on the prior knowledge (segmentation result). And the proposed cascade attention module is mainly used to generate the weighted feature mask so that the up-sampling network pays more attention to the interesting region. Specifically, the state, action and reward used in the reinforcement learning are redesigned in the proposed RLSegNet to translate the segmentation process as the decision-making process, which performs as the reinforcement learning to realize the brain tumor segmentation. Extensive experiments are conducted on the BRATS 2015 dataset to evaluate the proposed RLSegNet. The experimental results demonstrate that the proposed method can achieve a better segmentation performance, in comparison with other state-of-the-art methods.
Collapse
|
68
|
Zhao Y, Wang S, Zhang Y, Qiao S, Zhang M. WRANet: wavelet integrated residual attention U-Net network for medical image segmentation. COMPLEX INTELL SYST 2023:1-13. [PMID: 37361970 PMCID: PMC10248349 DOI: 10.1007/s40747-023-01119-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 05/16/2023] [Indexed: 06/28/2023]
Abstract
Medical image segmentation is crucial for the diagnosis and analysis of disease. Deep convolutional neural network methods have achieved great success in medical image segmentation. However, they are highly susceptible to noise interference during the propagation of the network, where weak noise can dramatically alter the network output. As the network deepens, it can face problems such as gradient explosion and vanishing. To improve the robustness and segmentation performance of the network, we propose a wavelet residual attention network (WRANet) for medical image segmentation. We replace the standard downsampling modules (e.g., maximum pooling and average pooling) in CNNs with discrete wavelet transform, decompose the features into low- and high-frequency components, and remove the high-frequency components to eliminate noise. At the same time, the problem of feature loss can be effectively addressed by introducing an attention mechanism. The combined experimental results show that our method can effectively perform aneurysm segmentation, achieving a Dice score of 78.99%, an IoU score of 68.96%, a precision of 85.21%, and a sensitivity score of 80.98%. In polyp segmentation, a Dice score of 88.89%, an IoU score of 81.74%, a precision rate of 91.32%, and a sensitivity score of 91.07% were achieved. Furthermore, our comparison with state-of-the-art techniques demonstrates the competitiveness of the WRANet network.
Collapse
Affiliation(s)
- Yawu Zhao
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong China
| | - Shudong Wang
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong China
| | - Yulin Zhang
- College of Mathematics and System Science, Shandong University of Science and Technology, Qingdao, Shandong China
| | - Sibo Qiao
- School of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong China
| | - Mufei Zhang
- Inspur Cloud Information Technology Co, Inspur, Jinan, Shandong China
| |
Collapse
|
69
|
Shi Z, Li Y, Zou H, Zhang X. TCU-Net: Transformer Embedded in Convolutional U-Shaped Network for Retinal Vessel Segmentation. SENSORS (BASEL, SWITZERLAND) 2023; 23:4897. [PMID: 37430810 PMCID: PMC10223195 DOI: 10.3390/s23104897] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/01/2023] [Accepted: 05/09/2023] [Indexed: 07/12/2023]
Abstract
Optical coherence tomography angiography (OCTA) provides a detailed visualization of the vascular system to aid in the detection and diagnosis of ophthalmic disease. However, accurately extracting microvascular details from OCTA images remains a challenging task due to the limitations of pure convolutional networks. We propose a novel end-to-end transformer-based network architecture called TCU-Net for OCTA retinal vessel segmentation tasks. To address the loss of vascular features of convolutional operations, an efficient cross-fusion transformer module is introduced to replace the original skip connection of U-Net. The transformer module interacts with the encoder's multiscale vascular features to enrich vascular information and achieve linear computational complexity. Additionally, we design an efficient channel-wise cross attention module to fuse the multiscale features and fine-grained details from the decoding stages, resolving the semantic bias between them and enhancing effective vascular information. This model has been evaluated on the dedicated Retinal OCTA Segmentation (ROSE) dataset. The accuracy values of TCU-Net tested on the ROSE-1 dataset with SVC, DVC, and SVC+DVC are 0.9230, 0.9912, and 0.9042, respectively, and the corresponding AUC values are 0.9512, 0.9823, and 0.9170. For the ROSE-2 dataset, the accuracy and AUC are 0.9454 and 0.8623, respectively. The experiments demonstrate that TCU-Net outperforms state-of-the-art approaches regarding vessel segmentation performance and robustness.
Collapse
Affiliation(s)
- Zidi Shi
- School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan 430077, China
| | - Yu Li
- School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan 430077, China
| | - Hua Zou
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Xuedong Zhang
- School of Information Engineering, Tarim University, Alaer 843300, China
| |
Collapse
|
70
|
Yuan L, Song J, Fan Y. FM-Unet: Biomedical image segmentation based on feedback mechanism Unet. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:12039-12055. [PMID: 37501431 DOI: 10.3934/mbe.2023535] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
With the development of deep learning, medical image segmentation technology has made significant progress in the field of computer vision. The Unet is a pioneering work, and many researchers have conducted further research based on this architecture. However, we found that most of these architectures are improvements in the backward propagation and integration of the network, and few changes are made to the forward propagation and information integration of the network. Therefore, we propose a feedback mechanism Unet (FM-Unet) model, which adds feedback paths to the encoder and decoder paths of the network, respectively, to help the network fuse the information of the next step in the current encoder and decoder. The problem of encoder information loss and decoder information shortage can be well solved. The proposed model has more moderate network parameters, and the simultaneous multi-node information fusion can alleviate the gradient disappearance. We have conducted experiments on two public datasets, and the results show that FM-Unet achieves satisfactory results.
Collapse
Affiliation(s)
- Lei Yuan
- The Key Laboratory of Intelligent Optimization and Information Processing, Minnan Normal University, Zhangzhou 363000, China
| | - Jianhua Song
- The Key Laboratory of Intelligent Optimization and Information Processing, Minnan Normal University, Zhangzhou 363000, China
- College of Physics and Information Engineering, Minnan Normal University, Zhangzhou 363000, China
| | - Yazhuo Fan
- College of Physics and Information Engineering, Minnan Normal University, Zhangzhou 363000, China
| |
Collapse
|
71
|
Rondinella A, Crispino E, Guarnera F, Giudice O, Ortis A, Russo G, Di Lorenzo C, Maimone D, Pappalardo F, Battiato S. Boosting multiple sclerosis lesion segmentation through attention mechanism. Comput Biol Med 2023; 161:107021. [PMID: 37216775 DOI: 10.1016/j.compbiomed.2023.107021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/11/2023] [Accepted: 05/05/2023] [Indexed: 05/24/2023]
Abstract
Magnetic resonance imaging is a fundamental tool to reach a diagnosis of multiple sclerosis and monitoring its progression. Although several attempts have been made to segment multiple sclerosis lesions using artificial intelligence, fully automated analysis is not yet available. State-of-the-art methods rely on slight variations in segmentation architectures (e.g. U-Net, etc.). However, recent research has demonstrated how exploiting temporal-aware features and attention mechanisms can provide a significant boost to traditional architectures. This paper proposes a framework that exploits an augmented U-Net architecture with a convolutional long short-term memory layer and attention mechanism which is able to segment and quantify multiple sclerosis lesions detected in magnetic resonance images. Quantitative and qualitative evaluation on challenging examples demonstrated how the method outperforms previous state-of-the-art approaches, reporting an overall Dice score of 89% and also demonstrating robustness and generalization ability on never seen new test samples of a new dedicated under construction dataset.
Collapse
Affiliation(s)
- Alessia Rondinella
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy.
| | - Elena Crispino
- Department of Biomedical and Biotechnological Sciences, University of Catania, Via Santa Sofia 97, Catania, 95125, Italy
| | - Francesco Guarnera
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| | - Oliver Giudice
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| | - Alessandro Ortis
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| | - Giulia Russo
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| | - Clara Di Lorenzo
- UOC Radiologia, ARNAS Garibaldi, P.zza S. Maria di Gesù, Catania, 95124, Italy
| | - Davide Maimone
- Centro Sclerosi Multipla, UOC Neurologia, ARNAS Garibaldi, P.zza S. Maria di Gesù, Catania, 95124, Italy
| | - Francesco Pappalardo
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| | - Sebastiano Battiato
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania, 95125, Italy
| |
Collapse
|
72
|
Choi Y, Yu W, Nagarajan MB, Teng P, Goldin JG, Raman SS, Enzmann DR, Kim GHJ, Brown MS. Translating AI to Clinical Practice: Overcoming Data Shift with Explainability. Radiographics 2023; 43:e220105. [PMID: 37104124 PMCID: PMC10190133 DOI: 10.1148/rg.220105] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 09/09/2022] [Accepted: 09/23/2022] [Indexed: 04/28/2023]
Abstract
To translate artificial intelligence (AI) algorithms into clinical practice requires generalizability of models to real-world data. One of the main obstacles to generalizability is data shift, a data distribution mismatch between model training and real environments. Explainable AI techniques offer tools to detect and mitigate the data shift problem and develop reliable AI for clinical practice. Most medical AI is trained with datasets gathered from limited environments, such as restricted disease populations and center-dependent acquisition conditions. The data shift that commonly exists in the limited training set often causes a significant performance decrease in the deployment environment. To develop a medical application, it is important to detect potential data shift and its impact on clinical translation. During AI training stages, from premodel analysis to in-model and post hoc explanations, explainability can play a key role in detecting model susceptibility to data shift, which is otherwise hidden because the test data have the same biased distribution as the training data. Performance-based model assessments cannot effectively distinguish the model overfitting to training data bias without enriched test sets from external environments. In the absence of such external data, explainability techniques can aid in translating AI to clinical practice as a tool to detect and mitigate potential failures due to data shift. ©RSNA, 2023 Quiz questions for this article are available in the supplemental material.
Collapse
Affiliation(s)
- Youngwon Choi
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Wenxi Yu
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Mahesh B. Nagarajan
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Pangyu Teng
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Jonathan G. Goldin
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Steven S. Raman
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Dieter R. Enzmann
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Grace Hyun J. Kim
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| | - Matthew S. Brown
- From the Center for Computer Vision and Imaging Biomarkers, 924
Westwood Blvd, Los Angeles, CA 90024 (Y.C., W.Y., M.B.N., P.T., J.G.G.,
G.H.J.K., M.S.B.); and Department of Radiology, University of
California–Los Angeles, Los Angeles, Calif (Y.C., W.Y., M.B.N., P.T.,
J.G.G., S.S.R., D.R.E., G.H.J.K., M.S.B.)
| |
Collapse
|
73
|
Li Z, Zhang N, Gong H, Qiu R, Zhang W. MFA-Net: Multiple Feature Association Network for medical image segmentation. Comput Biol Med 2023; 158:106834. [PMID: 37003067 DOI: 10.1016/j.compbiomed.2023.106834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/01/2023] [Accepted: 03/26/2023] [Indexed: 03/30/2023]
Abstract
Medical image segmentation plays a crucial role in computer-aided diagnosis. However, due to the large variability of medical images, accurate segmentation is a highly challenging task. In this paper, we present a novel medical image segmentation network named the Multiple Feature Association Network (MFA-Net), which is based on deep learning techniques. The MFA-Net utilizes an encoder-decoder architecture with skip connections as its backbone network, and a parallelly dilated convolutions arrangement (PDCA) module is integrated between the encoder and the decoder to capture more representative deep features. Furthermore, a multi-scale feature restructuring module (MFRM) is introduced to restructure and fuse the deep features of the encoder. To enhance global attention perception, the proposed global attention stacking (GAS) modules are cascaded on the decoder. The proposed MFA-Net leverages novel global attention mechanisms to improve the segmentation performance at different feature scales. We evaluated our MFA-Net on four segmentation tasks, including lesions in intestinal polyp, liver tumor, prostate cancer, and skin lesion. Our experimental results and ablation study demonstrate that the proposed MFA-Net outperforms state-of-the-art methods in terms of global positioning and local edge recognition.
Collapse
Affiliation(s)
- Zhixun Li
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
| | - Nan Zhang
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
| | - Huiling Gong
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China.
| | - Ruiyun Qiu
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
| | - Wei Zhang
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
74
|
Huang X, Deng Z, Li D, Yuan X, Fu Y. MISSFormer: An Effective Transformer for 2D Medical Image Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1484-1494. [PMID: 37015444 DOI: 10.1109/tmi.2022.3230943] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Transformer-based methods are recently popular in vision tasks because of their capability to model global dependencies alone. However, it limits the performance of networks due to the lack of modeling local context and global-local correlations of multi-scale features. In this paper, we present MISSFormer, a Medical Image Segmentation tranSFormer. MISSFormer is a hierarchical encoder-decoder network with two appealing designs: 1) a feed-forward network in transformer block of U-shaped encoder-decoder structure is redesigned, ReMix-FFN, which explore global dependencies and local context for better feature discrimination by re-integrating the local context and global dependencies; 2) a ReMixed Transformer Context Bridge is proposed to extract the correlations of global dependencies and local context in multi-scale features generated by our hierarchical transformer encoder. The MISSFormer shows a solid capacity to capture more discriminative dependencies and context in medical image segmentation. The experiments on multi-organ, cardiac segmentation and retinal vessel segmentation tasks demonstrate the superiority, effectiveness and robustness of our MISSFormer. Specifically, the experimental results of MISSFormer trained from scratch even outperform state-of-the-art methods pre-trained on ImageNet, and the core designs can be generalized to other visual segmentation tasks. The code has been released on Github: https://github.com/ZhifangDeng/MISSFormer.
Collapse
|
75
|
Huang Y, Wang W, Li M. FNSAM: Image super-resolution using a feedback network with self-attention mechanism. Technol Health Care 2023; 31:383-395. [PMID: 37066938 DOI: 10.3233/thc-236033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
BACKGROUND High-resolution (HR) magnetic resonance imaging (MRI) provides rich pathological information which is of great significance in diagnosis and treatment of brain lesions. However, obtaining HR brain MRI images comes at the cost of extending scan time and using sophisticated expensive instruments. OBJECTIVE This study aims to reconstruct HR MRI images from low-resolution (LR) images by developing a deep learning based super-resolution (SR) method. METHODS We propose a feedback network with self-attention mechanism (FNSAM) for SR reconstruction of brain MRI images. Specifically, a feedback network is built to correct shallow features by using a recurrent neural network (RNN) and the self-attention mechanism (SAM) is integrated into the feedback network for extraction of important information as the feedback signal, which promotes image hierarchy. RESULTS Experimental results show that the proposed FNSAM obtains more reasonable SR reconstruction of brain MRI images both in peak signal to noise ratio (PSNR) and structural similarity (SSIM) than some state-of-the-arts. CONCLUSION Our proposed method is suitable for SR reconstruction of MRI images.
Collapse
|
76
|
Gai D, Zhang J, Xiao Y, Min W, Chen H, Wang Q, Su P, Huang Z. GL-Segnet: Global-Local representation learning net for medical image segmentation. Front Neurosci 2023; 17:1153356. [PMID: 37077320 PMCID: PMC10106565 DOI: 10.3389/fnins.2023.1153356] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 03/20/2023] [Indexed: 04/05/2023] Open
Abstract
Medical image segmentation has long been a compelling and fundamental problem in the realm of neuroscience. This is an extremely challenging task due to the intensely interfering irrelevant background information to segment the target. State-of-the-art methods fail to consider simultaneously addressing both long-range and short-range dependencies, and commonly emphasize the semantic information characterization capability while ignoring the geometric detail information implied in the shallow feature maps resulting in the dropping of crucial features. To tackle the above problem, we propose a Global-Local representation learning net for medical image segmentation, namely GL-Segnet. In the Feature encoder, we utilize the Multi-Scale Convolution (MSC) and Multi-Scale Pooling (MSP) modules to encode the global semantic representation information at the shallow level of the network, and multi-scale feature fusion operations are applied to enrich local geometric detail information in a cross-level manner. Beyond that, we adopt a global semantic feature extraction module to perform filtering of irrelevant background information. In Attention-enhancing Decoder, we use the Attention-based feature decoding module to refine the multi-scale fused feature information, which provides effective cues for attention decoding. We exploit the structural similarity between images and the edge gradient information to propose a hybrid loss to improve the segmentation accuracy of the model. Extensive experiments on medical image segmentation from Glas, ISIC, Brain Tumors and SIIM-ACR demonstrated that our GL-Segnet is superior to existing state-of-art methods in subjective visual performance and objective evaluation.
Collapse
Affiliation(s)
- Di Gai
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
- Jiangxi Key Laboratory of Smart City, Nanchang, China
- Institute of Metaverse, Nanchang University, Nanchang, China
| | - Jiqian Zhang
- School of Software, Nanchang University, Nanchang, China
| | - Yusong Xiao
- School of Software, Nanchang University, Nanchang, China
| | - Weidong Min
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
- Jiangxi Key Laboratory of Smart City, Nanchang, China
- Institute of Metaverse, Nanchang University, Nanchang, China
- *Correspondence: Weidong Min
| | - Hui Chen
- Office of Administration, Jiangxi Provincial Institute of Cultural Relics and Archaeology, Nanchang, China
| | - Qi Wang
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
- Jiangxi Key Laboratory of Smart City, Nanchang, China
- Institute of Metaverse, Nanchang University, Nanchang, China
| | - Pengxiang Su
- School of Software, Nanchang University, Nanchang, China
| | - Zheng Huang
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
- Jiangxi Key Laboratory of Smart City, Nanchang, China
- Institute of Metaverse, Nanchang University, Nanchang, China
| |
Collapse
|
77
|
Karri M, Annavarapu CSR, Acharya UR. Skin lesion segmentation using two-phase cross-domain transfer learning framework. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107408. [PMID: 36805279 DOI: 10.1016/j.cmpb.2023.107408] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 01/31/2023] [Accepted: 02/04/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Deep learning (DL) models have been used for medical imaging for a long time but they did not achieve their full potential in the past because of insufficient computing power and scarcity of training data. In recent years, we have seen substantial growth in DL networks because of improved technology and an abundance of data. However, previous studies indicate that even a well-trained DL algorithm may struggle to generalize data from multiple sources because of domain shifts. Additionally, ineffectiveness of basic data fusion methods, complexity of segmentation target and low interpretability of current DL models limit their use in clinical decisions. To meet these challenges, we present a new two-phase cross-domain transfer learning system for effective skin lesion segmentation from dermoscopic images. METHODS Our system is based on two significant technical inventions. We examine a two- phase cross-domain transfer learning approach, including model-level and data-level transfer learning, by fine-tuning the system on two datasets, MoleMap and ImageNet. We then present nSknRSUNet, a high-performing DL network, for skin lesion segmentation using broad receptive fields and spatial edge attention feature fusion. We examine the trained model's generalization capabilities on skin lesion segmentation to quantify these two inventions. We cross-examine the model using two skin lesion image datasets, MoleMap and HAM10000, obtained from varied clinical contexts. RESULTS At data-level transfer learning for the HAM10000 dataset, the proposed model obtained 94.63% of DSC and 99.12% accuracy. In cross-examination at data-level transfer learning for the Molemap dataset, the proposed model obtained 93.63% of DSC and 97.01% of accuracy. CONCLUSION Numerous experiments reveal that our system produces excellent performance and improves upon state-of-the-art methods on both qualitative and quantitative measures.
Collapse
Affiliation(s)
- Meghana Karri
- Department of Computer Science and Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, 826004, Jharkhand, India.
| | - Chandra Sekhara Rao Annavarapu
- Department of Computer Science and Engineering, Indian Institute of Technology (Indian School of Mines), Dhanbad, 826004, Jharkhand, India.
| | - U Rajendra Acharya
- Ngee Ann Polytechnic, Department of Electronics and Computer Engineering, 599489, Singapore; Department of Biomedical Engineering, School of science and Technology, SUSS university, Singapore; Department of Biomedical Informatics and Medical Engineering, Asia university, Taichung, Taiwan.
| |
Collapse
|
78
|
Tang S, Yu X, Cheang CF, Liang Y, Zhao P, Yu HH, Choi IC. Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images. Comput Biol Med 2023; 157:106723. [PMID: 36907035 DOI: 10.1016/j.compbiomed.2023.106723] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 02/04/2023] [Accepted: 02/26/2023] [Indexed: 03/07/2023]
Abstract
Despite being widely utilized to help endoscopists identify gastrointestinal (GI) tract diseases using classification and segmentation, models based on convolutional neural network (CNN) have difficulties in distinguishing the similarities among some ambiguous types of lesions presented in endoscopic images, and in the training when lacking labeled datasets. Those will prevent CNN from further improving the accuracy of diagnosis. To address these challenges, we first proposed a Multi-task Network (TransMT-Net) capable of simultaneously learning two tasks (classification and segmentation), which has the transformer designed to learn global features and can combine the advantages of CNN in learning local features so that to achieve a more accurate prediction in identifying the lesion types and regions in GI tract endoscopic images. We further adopted the active learning in TransMT-Net to tackle the labeled image-hungry problem. A dataset was created from the CVC-ClinicDB dataset, Macau Kiang Wu Hospital, and Zhongshan Hospital to evaluate the model performance. Then, the experimental results show that our model not only achieved 96.94% accuracy in the classification task and 77.76% Dice Similarity Coefficient in the segmentation task but also outperformed those of other models on our test set. Meanwhile, active learning also produced positive results for the performance of our model with a small-scale initial training set, and even its performance with 30% of the initial training set was comparable to that of most comparable models with the full training set. Consequently, the proposed TransMT-Net has demonstrated its potential performance in GI tract endoscopic images and it through active learning can alleviate the shortage of labeled images.
Collapse
Affiliation(s)
- Suigu Tang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Xiaoyuan Yu
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Chak Fong Cheang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China.
| | - Yanyan Liang
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Penghui Zhao
- Faculty of Innovation Engineering-School of Computer Science and Engineering, Macau University of Science and Technology, Macao Special Administrative Region of China
| | - Hon Ho Yu
- Kiang Wu Hospital, Macao Special Administrative Region of China
| | - I Cheong Choi
- Kiang Wu Hospital, Macao Special Administrative Region of China
| |
Collapse
|
79
|
Qin C, Zheng B, Li W, Chen H, Zeng J, Wu C, Liang S, Luo J, Zhou S, Xiao L. MAD-Net: Multi-attention dense network for functional bone marrow segmentation. Comput Biol Med 2023; 154:106428. [PMID: 36682178 DOI: 10.1016/j.compbiomed.2022.106428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 12/04/2022] [Accepted: 12/13/2022] [Indexed: 01/15/2023]
Abstract
Radiotherapy is the main treatment modality for various pelvic malignancies. However, high intensity radiation can damage the functional bone marrow (FBM), resulting in hematological toxicity (HT). Accurate identification and protection of the FBM during radiotherapy planning can reduce pelvic HT. The traditional manual method for contouring the FBM is time-consuming and laborious. Therefore, development of an efficient and accurate automatic segmentation mode can provide a distinct leverage in clinical settings. In this paper, we propose the first network for performing the FBM segmentation task, which is referred to as the multi-attention dense network (named MAD-Net). Primarily, we introduce the dense convolution block to promote the gradient flow in the network as well as incite feature reuse. Next, a novel slide-window attention module is proposed to emphasize long-range dependencies and exploit interdependencies between features. Finally, we design a residual-dual attention module as the bottleneck layer, which further aggregates useful spatial details and explores intra-class responsiveness of high-level features. In this work, we conduct extensive experiments on our dataset of 3838 two-dimensional pelvic slices. Experimental results demonstrate that the proposed MAD-Net transcends previous state-of-the-art models in various metrics. In addition, the contributions of the proposed components are verified by ablation analysis, and we conduct experiments on three other datasets to manifest the generalizability of MAD-Net.
Collapse
Affiliation(s)
- Chuanbo Qin
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, China
| | - Bin Zheng
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, China
| | - Wanying Li
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, China
| | - Hongbo Chen
- Radiotherapy Center, Jiangmen Central Hospital, Jiangmen, 529020, China
| | - Junying Zeng
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, China
| | - Chenwang Wu
- Radiotherapy Center, Jiangmen Central Hospital, Jiangmen, 529020, China
| | - Shufen Liang
- Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen, 529020, China
| | - Jun Luo
- School of Economics and Management, Wuyi University, Jiangmen, 529020, China
| | - Shuquan Zhou
- Radiotherapy Center, Jiangmen Central Hospital, Jiangmen, 529020, China
| | - Lin Xiao
- Radiotherapy Center, Jiangmen Central Hospital, Jiangmen, 529020, China.
| |
Collapse
|
80
|
Kang S, Yang M, Sharon Qi X, Jiang J, Tan S. Bridging Feature Gaps to Improve Multi-Organ Segmentation on Abdominal Magnetic Resonance Image. IEEE J Biomed Health Inform 2023; 27:1477-1487. [PMID: 37015687 DOI: 10.1109/jbhi.2022.3229315] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Accurate segmentation of abdominal organs on MRI is crucial for computer-aided surgery and computer-aided diagnosis. Most state-of-the-art methods for MRI segmentation employ an encoder-decoder structure, with skip connections concatenating shallow features from the encoder and deep features from the decoder. In this work, we noticed that simply concatenating shallow and deep features was insufficient for segmentation due to the feature gap between shallow features and deep features. To mitigate this problem, we quantified the feature gap from spatial and semantic aspects and proposed a spatial loss and a semantic loss to bridge the feature gap. The spatial loss enhanced spatial details in deep features, and the semantic loss introduced semantic information into shallow features. The proposed method successfully aggregated the complementary information between shallow and deep features by formulating and bridging the feature gap. Experiments on two abdominal MRI datasets demonstrated the effectiveness of the proposed method, which improved the segmentation performance over a baseline with nearly zero additional parameters. Particularly, the proposed method has advantages for segmenting organs with blurred boundaries or in a small scale, achieving superior performance than state-of-the-art methods.
Collapse
|
81
|
Li H, Nan Y, Del Ser J, Yang G. Large-Kernel Attention for 3D Medical Image Segmentation. Cognit Comput 2023; 16:2063-2077. [PMID: 38974012 PMCID: PMC11226511 DOI: 10.1007/s12559-023-10126-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/09/2023] [Indexed: 03/03/2023]
Abstract
Automated segmentation of multiple organs and tumors from 3D medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) scans using deep learning methods can aid in diagnosing and treating cancer. However, organs often overlap and are complexly connected, characterized by extensive anatomical variation and low contrast. In addition, the diversity of tumor shape, location, and appearance, coupled with the dominance of background voxels, makes accurate 3D medical image segmentation difficult. In this paper, a novel 3D large-kernel (LK) attention module is proposed to address these problems to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of biologically inspired self-attention and convolution are combined in the proposed LK attention module, including local contextual information, long-range dependencies, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into CNNs such as U-Net. Comprehensive ablation experiments demonstrated the feasibility of convolutional decomposition and explored the most efficient and effective network design. Among them, the best Mid-type 3D LK attention-based U-Net network was evaluated on CT-ORG and BraTS 2020 datasets, achieving state-of-the-art segmentation performance when compared to avant-garde CNN and Transformer-based methods for medical image segmentation. The performance improvement due to the proposed 3D LK attention module was statistically validated.
Collapse
Affiliation(s)
- Hao Li
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, UK
- Department of Bioengineering, Faculty of Engineering, Imperial College London, London, UK
| | - Yang Nan
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, UK
| | - Javier Del Ser
- TECNALIA, Basque Research & Technology Alliance (BRTA), Derio, Spain
- University of the Basque Country (UPV/EHU), Bilbao, Spain
| | - Guang Yang
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, UK
- Royal Brompton Hospital, London, UK
| |
Collapse
|
82
|
Abstract
Medical data refers to health-related information associated with regular patient care or as part of a clinical trial program. There are many categories of such data, such as clinical imaging data, bio-signal data, electronic health records (EHR), and multi-modality medical data. With the development of deep neural networks in the last decade, the emerging pre-training paradigm has become dominant in that it has significantly improved machine learning methods’ performance in a data-limited scenario. In recent years, studies of pre-training in the medical domain have achieved significant progress. To summarize these technology advancements, this work provides a comprehensive survey of recent advances for pre-training on several major types of medical data. In this survey, we summarize a large number of related publications and the existing benchmarking in the medical domain. Especially, the survey briefly describes how some pre-training methods are applied to or developed for medical data. From a data-driven perspective, we examine the extensive use of pre-training in many medical scenarios. Moreover, based on the summary of recent pre-training studies, we identify several challenges in this field to provide insights for future studies.
Collapse
Affiliation(s)
- Yixuan Qiu
- The University of Queensland, Brisbane, 4072 Australia
| | - Feng Lin
- The University of Queensland, Brisbane, 4072 Australia
| | - Weitong Chen
- The University of Adelaide, Adelaide, 5005 Australia
| | - Miao Xu
- The University of Queensland, Brisbane, 4072 Australia
| |
Collapse
|
83
|
Marthin P, Tutkun NA. Recurrent neural network for complex survival problems. J STAT COMPUT SIM 2023. [DOI: 10.1080/00949655.2023.2176504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Affiliation(s)
- Pius Marthin
- Department of Statistics, Graduate School of Science and Engineering, Hacettepe University, Ankara, Turkey
| | - N. Ata Tutkun
- Department of Statistics, Graduate School of Science and Engineering, Hacettepe University, Ankara, Turkey
| |
Collapse
|
84
|
Bahan Pal J, Mj D. Improving multi-scale attention networks: Bayesian optimization for segmenting medical images. THE IMAGING SCIENCE JOURNAL 2023. [DOI: 10.1080/13682199.2023.2174657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Affiliation(s)
- Jimut Bahan Pal
- Department of Computer Science, Ramakrishna Mission Vivekananda Educational and Research Institute, Howrah, India
| | - Dripta Mj
- Department of Mathematics, Ramakrishna Mission Vivekananda Educational and Research Institute Belur Math, Howrah, India
| |
Collapse
|
85
|
An advanced W-shaped network with adaptive multi-scale supervision for osteosarcoma segmentation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
86
|
Yu W, Zhou H, Choi Y, Goldin JG, Teng P, Wong WK, McNitt-Gray MF, Brown MS, Kim GHJ. Multi-scale, domain knowledge-guided attention + random forest: a two-stage deep learning-based multi-scale guided attention models to diagnose idiopathic pulmonary fibrosis from computed tomography images. Med Phys 2023; 50:894-905. [PMID: 36254789 PMCID: PMC10082682 DOI: 10.1002/mp.16053] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 07/25/2022] [Accepted: 09/06/2022] [Indexed: 11/08/2022] Open
Abstract
BACKGROUND Idiopathic pulmonary fibrosis (IPF) is a progressive, irreversible, and usually fatal lung disease of unknown reasons, generally affecting the elderly population. Early diagnosis of IPF is crucial for triaging patients' treatment planning into anti-fibrotic treatment or treatments for other causes of pulmonary fibrosis. However, current IPF diagnosis workflow is complicated and time-consuming, which involves collaborative efforts from radiologists, pathologists, and clinicians and it is largely subject to inter-observer variability. PURPOSE The purpose of this work is to develop a deep learning-based automated system that can diagnose subjects with IPF among subjects with interstitial lung disease (ILD) using an axial chest computed tomography (CT) scan. This work can potentially enable timely diagnosis decisions and reduce inter-observer variability. METHODS Our dataset contains CT scans from 349 IPF patients and 529 non-IPF ILD patients. We used 80% of the dataset for training and validation purposes and 20% as the holdout test set. We proposed a two-stage model: at stage one, we built a multi-scale, domain knowledge-guided attention model (MSGA) that encouraged the model to focus on specific areas of interest to enhance model explainability, including both high- and medium-resolution attentions; at stage two, we collected the output from MSGA and constructed a random forest (RF) classifier for patient-level diagnosis, to further boost model accuracy. RF classifier is utilized as a final decision stage since it is interpretable, computationally fast, and can handle correlated variables. Model utility was examined by (1) accuracy, represented by the area under the receiver operating characteristic curve (AUC) with standard deviation (SD), and (2) explainability, illustrated by the visual examination of the estimated attention maps which showed the important areas for model diagnostics. RESULTS During the training and validation stage, we observe that when we provide no guidance from domain knowledge, the IPF diagnosis model reaches acceptable performance (AUC±SD = 0.93±0.07), but lacks explainability; when including only guided high- or medium-resolution attention, the learned attention maps are not satisfactory; when including both high- and medium-resolution attention, under certain hyperparameter settings, the model reaches the highest AUC among all experiments (AUC±SD = 0.99±0.01) and the estimated attention maps concentrate on the regions of interests for this task. Three best-performing hyperparameter selections according to MSGA were applied to the holdout test set and reached comparable model performance to that of the validation set. CONCLUSIONS Our results suggest that, for a task with only scan-level labels available, MSGA+RF can utilize the population-level domain knowledge to guide the training of the network, which increases both model accuracy and explainability.
Collapse
Affiliation(s)
- Wenxi Yu
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Youngwon Choi
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Jonathan G Goldin
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Pangyu Teng
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Weng Kee Wong
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | | | - Matthew S Brown
- Department of Biostatistics, University of California, Los Angeles, California, USA
| | - Grace Hyun J Kim
- Department of Biostatistics, University of California, Los Angeles, California, USA
| |
Collapse
|
87
|
Lian S, Li L, Luo Z, Zhong Z, Wang B, Li S. Learning multi-organ segmentation via partial- and mutual-prior from single-organ datasets. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
88
|
Bozdag Z, Talu MF. Pyramidal position attention model for histopathological image segmentation. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
89
|
Chen C, Qi S, Zhou K, Lu T, Ning H, Xiao R. Pairwise attention-enhanced adversarial model for automatic bone segmentation in CT images. Phys Med Biol 2023; 68. [PMID: 36634367 DOI: 10.1088/1361-6560/acb2ab] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/12/2023] [Indexed: 01/14/2023]
Abstract
Objective. Bone segmentation is a critical step in screw placement navigation. Although the deep learning methods have promoted the rapid development for bone segmentation, the local bone separation is still challenging due to irregular shapes and similar representational features.Approach. In this paper, we proposed the pairwise attention-enhanced adversarial model (Pair-SegAM) for automatic bone segmentation in computed tomography images, which includes the two parts of the segmentation model and discriminator. Considering that the distributions of the predictions from the segmentation model contains complicated semantics, we improve the discriminator to strengthen the awareness ability of the target region, improving the parsing of semantic information features. The Pair-SegAM has a pairwise structure, which uses two calculation mechanics to set up pairwise attention maps, then we utilize the semantic fusion to filter unstable regions. Therefore, the improved discriminator provides more refinement information to capture the bone outline, thus effectively enhancing the segmentation models for bone segmentation.Main results. To test the Pair-SegAM, we selected the two bone datasets for assessment. We evaluated our method against several bone segmentation models and latest adversarial models on the both datasets. The experimental results prove that our method not only exhibits superior bone segmentation performance, but also states effective generalization.Significance. Our method provides a more efficient segmentation of specific bones and has the potential to be extended to other semantic segmentation domains.
Collapse
Affiliation(s)
- Cheng Chen
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
| | - Siyu Qi
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
| | - Kangneng Zhou
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
| | - Tong Lu
- Visual 3D Medical Science and Technology Development Co. Ltd, Beijing 100082, People's Republic of China
| | - Huansheng Ning
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China
| | - Ruoxiu Xiao
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, People's Republic of China.,Shunde Innovation School, University of Science and Technology Beijing, Foshan 100024, People's Republic of China
| |
Collapse
|
90
|
Tong G, Jiang H, Yao YD. SDA-UNet: a hepatic vein segmentation network based on the spatial distribution and density awareness of blood vessels. Phys Med Biol 2023; 68. [PMID: 36623320 DOI: 10.1088/1361-6560/acb199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 01/09/2023] [Indexed: 01/11/2023]
Abstract
Objective.Hepatic vein segmentation is a fundamental task for liver diagnosis and surgical navigation planning. Unlike other organs, the liver is the only organ with two sets of venous systems. Meanwhile, the segmentation target distribution in the hepatic vein scene is extremely unbalanced. The hepatic veins occupy a small area in abdominal CT slices. The morphology of each person's hepatic vein is different, which also makes segmentation difficult. The purpose of this study is to develop an automated hepatic vein segmentation model that guides clinical diagnosis.Approach.We introduce the 3D spatial distribution and density awareness (SDA) of hepatic veins and propose an automatic segmentation network based on 3D U-Net which includes a multi-axial squeeze and excitation module (MASE) and a distribution correction module (DCM). The MASE restrict the activation area to the area with hepatic veins. The DCM improves the awareness of the sparse spatial distribution of the hepatic veins. To obtain global axial information and spatial information at the same time, we study the effect of different training strategies on hepatic vein segmentation. Our method was evaluated by a public dataset and a private dataset. The Dice coefficient achieves 71.37% and 69.58%, improving 3.60% and 3.30% compared to the other SOTA models, respectively. Furthermore, metrics based on distance and volume also show the superiority of our method.Significance.The proposed method greatly reduced false positive areas and improved the segmentation performance of the hepatic vein in CT images. It will assist doctors in making accurate diagnoses and surgical navigation planning.
Collapse
Affiliation(s)
- Guoyu Tong
- Software College, Northeastern University, Shenyang 110819, People's Republic of China
| | - Huiyan Jiang
- Software College, Northeastern University, Shenyang 110819, People's Republic of China.,Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, Northeastern University, Shenyang 110819, People's Republic of China
| | - Yu-Dong Yao
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030, United States of America
| |
Collapse
|
91
|
Wang X, Yang B, Pan X, Liu F, Zhang S. BPCN: bilateral progressive compensation network for lung infection image segmentation. Phys Med Biol 2023; 68. [PMID: 36580682 DOI: 10.1088/1361-6560/acaf21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/29/2022] [Indexed: 12/31/2022]
Abstract
Lung infection image segmentation is a key technology for autonomous understanding of the potential illness. However, current approaches usually lose the low-level details, which leads to a considerable accuracy decrease for lung infection areas with varied shapes and sizes. In this paper, we propose bilateral progressive compensation network (BPCN), a bilateral progressive compensation network to improve the accuracy of lung lesion segmentation through complementary learning of spatial and semantic features. The proposed BPCN are mainly composed of two deep branches. One branch is the multi-scale progressive fusion for main region features. The other branch is a flow-field based adaptive body-edge aggregation operations to explicitly learn detail features of lung infection areas which is supplement to region features. In addition, we propose a bilateral spatial-channel down-sampling to generate a hierarchical complementary feature which avoids losing discriminative features caused by pooling operations. Experimental results show that our proposed network outperforms state-of-the-art segmentation methods in lung infection segmentation on two public image datasets with or without a pseudo-label training strategy.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Zhejiang University of Technology, Zhejiang Province, People's Republic of China
| | - Baoqi Yang
- Zhejiang University of Technology, Zhejiang Province, People's Republic of China
| | - Xiang Pan
- Zhejiang University of Technology, Zhejiang Province, People's Republic of China
| | - Fuchang Liu
- Hangzhou Normal University, Zhejiang Province, People's Republic of China
| | - Sanyuan Zhang
- Zhejiang University, Zhejiang Province, People's Republic of China
| |
Collapse
|
92
|
Chen Y, Dong Y, Si L, Yang W, Du S, Tian X, Li C, Liao Q, Ma H. Dual Polarization Modality Fusion Network for Assisting Pathological Diagnosis. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:304-316. [PMID: 36155433 DOI: 10.1109/tmi.2022.3210113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Polarization imaging is sensitive to sub-wavelength microstructures of various cancer tissues, providing abundant optical characteristics and microstructure information of complex pathological specimens. However, how to reasonably utilize polarization information to strengthen pathological diagnosis ability remains a challenging issue. In order to take full advantage of pathological image information and polarization features of samples, we propose a dual polarization modality fusion network (DPMFNet), which consists of a multi-stream CNN structure and a switched attention fusion module for complementarily aggregating the features from different modality images. Our proposed switched attention mechanism could obtain the joint feature embeddings by switching the attention map of different modality images to improve their semantic relatedness. By including a dual-polarization contrastive training scheme, our method can synthesize and align the interaction and representation of two polarization features. Experimental evaluations on three cancer datasets show the superiority of our method in assisting pathological diagnosis, especially in small datasets and low imaging resolution cases. Grad-CAM visualizes the important regions of the pathological images and the polarization images, indicating that the two modalities play different roles and allow us to give insightful corresponding explanations and analysis on cancer diagnosis conducted by the DPMFNet. This technique has potential to facilitate the performance of pathological aided diagnosis and broaden the current digital pathology boundary based on pathological image features.
Collapse
|
93
|
Nazir K, Mustafa Madni T, Iqbal Janjua U, Javed U, Attique Khan M, Tariq U, Cha JH. 3D Kronecker Convolutional Feature Pyramid for Brain Tumor Semantic Segmentation in MR Imaging. COMPUTERS, MATERIALS & CONTINUA 2023; 76:2861-2877. [DOI: 10.32604/cmc.2023.039181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 04/10/2023] [Indexed: 08/25/2024]
|
94
|
Bao H, Zhu Y, Li Q. Hybrid-scale contextual fusion network for medical image segmentation. Comput Biol Med 2023; 152:106439. [PMID: 36566623 DOI: 10.1016/j.compbiomed.2022.106439] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 11/24/2022] [Accepted: 12/15/2022] [Indexed: 12/24/2022]
Abstract
Medical image segmentation result is an essential reference for disease diagnosis. Recently, with the development and application of convolutional neural networks, medical image processing has significantly developed. However, most existing automatic segmentation tasks are still challenging due to various positions, sizes, and shapes, resulting in poor segmentation performance. In addition, most of the current methods use the encoder-decoder architecture for feature extraction, focusing on the acquisition of semantic information but ignoring the specific target and global context information. In this work, we propose a hybrid-scale contextual fusion network to capture the richer spatial and semantic information. First, a hybrid-scale embedding layer (HEL) is employed before the transformer. By mixing each embedding with multiple patches, the object information of different scales can be captured availably. Further, we present a standard transformer to model long-range dependencies in the first two skip connections. Meanwhile, the pooling transformer (PTrans) is employed to handle long input sequences in the following two skip connections. By leveraging the global average pooling operation and the corresponding transformer block, the spatial structure information of the target will be learned effectively. In the last, dual-branch channel attention module (DCA) is proposed to focus on crucial channel features and conduct multi-level features fusion simultaneously. By utilizing the fusion scheme, richer context and fine-grained features are captured and encoded efficiently. Extensive experiments on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods.
Collapse
Affiliation(s)
- Hua Bao
- The Key Laboratory of Intelligent Computing and Signal Processing Ministry of Education, Hefei 230601, China; The School of Artificial Intelligence, Anhui University, Hefei 230601, China.
| | - Yuqing Zhu
- The Key Laboratory of Intelligent Computing and Signal Processing Ministry of Education, Hefei 230601, China; The School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
| | - Qing Li
- The Key Laboratory of Intelligent Computing and Signal Processing Ministry of Education, Hefei 230601, China; The School of Electrical Engineering and Automation, Anhui University, Hefei 230601, China
| |
Collapse
|
95
|
Wang S, Xu X, Du H, Chen Y, Mei W. Attention feature fusion methodology with additional constraint for ovarian lesion diagnosis on magnetic resonance images. Med Phys 2023; 50:297-310. [PMID: 35975618 DOI: 10.1002/mp.15937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 06/25/2022] [Accepted: 07/24/2022] [Indexed: 01/25/2023] Open
Abstract
PURPOSE It is challenging for radiologists and gynecologists to identify the type of ovarian lesions by reading magnetic resonance (MR) images. Recently developed convolutional neural networks (CNNs) have made great progress in computer vision, but their architectures still need modification if they are used in processing medical images. This study aims to improve the feature extraction capability of CNNs, thus promoting the diagnostic performance in discriminating between benign and malignant ovarian lesions. METHODS We introduce a feature fusion architecture and insert the attention models in the neural network. The features extracted from different middle layers are integrated with reoptimized spatial and channel weights. We add a loss function to constrain the additional probability vector generated from the integrated features, thus guiding the middle layers to emphasize useful information. We analyzed 159 lesions imaged by dynamic contrast-enhanced MR imaging (DCE-MRI), including 73 benign lesions and 86 malignant lesions. Senior radiologists selected and labeled the tumor regions based on the pathology reports. Then, the tumor regions were cropped into 7494 nonoverlapping image patches for training and testing. The type of a single tumor was determined by the average probability scores of the image patches belonging to it. RESULTS We implemented fivefold cross-validation to characterize our proposed method, and the distribution of performance matrics was reported. For all the test image patches, the average accuracy of our method is 70.5% with an average area under the curve (AUC) of 0.785, while the baseline is 69.4% and 0.773, and for the diagnosis of single tumors, our model achieved an average accuracy of 82.4% and average AUC of 0.916, which were better than the baseline (81.8% and 0.899). Moreover, we evaluated the performance of our proposed method utilizing different CNN backbones and different attention mechanisms. CONCLUSIONS The texture features extracted from different middle layers are crucial for ovarian lesion diagnosis. Our proposed method can enhance the feature extraction capabilities of different layers of the network, thereby improving diagnostic performance.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information and Electronics, Beijing Institute of Technology, Beijing, China
| | - Xiaojuan Xu
- Department of Diagnostic Imaging, National Cancer Center, National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking, Union Medical College, Beijing, China
| | - Huiqian Du
- School of Integrated Circuits and Electronics, Beijing Institute of Technology, Beijing, China
| | - Yan Chen
- Department of Diagnostic Imaging, National Cancer Center, National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking, Union Medical College, Beijing, China
| | - Wenbo Mei
- School of Information and Electronics, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
96
|
MACFNet: multi-attention complementary fusion network for image denoising. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04313-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
97
|
SWAR: A Deep Multi-Model Ensemble Forecast Method with Spatial Grid and 2-D Time Structure Adaptability for Sea Level Pressure. INFORMATION 2022. [DOI: 10.3390/info13120577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The multi-model ensemble (MME) forecast for meteorological elements has been proved many times to be more skillful than the single model. It improves the forecast quality by integrating multiple sets of numerical forecast results with different spatial-temporal characteristics. Currently, the main numerical forecast results present a grid structure formed by longitude and latitude lines in space and a special two-dimensional time structure in time, namely the initial time and the lead time, compared with the traditional one-dimensional time. These characteristics mean that many MME methods have limitations in further improving forecast quality. Focusing on this problem, we propose a deep MME forecast method that suits the special structure. At spatial level, our model uses window self-attention and shifted window attention to aggregate information. At temporal level, we propose a recurrent like neural network with rolling structure (Roll-RLNN) which is more suitable for two-dimensional time structure that widely exists in the institutions of numerical weather prediction (NWP) with running service. In this paper, we test the MME forecast for sea level pressure as the forecast characteristics of the essential meteorological element vary clearly across institutions, and the results show that our model structure is effective and can make significant forecast improvements.
Collapse
|
98
|
Chen X, Peng Y, Guo Y, Sun J, Li D, Cui J. MLRD-Net: 3D multiscale local cross-channel residual denoising network for MRI-based brain tumor segmentation. Med Biol Eng Comput 2022; 60:3377-3395. [DOI: 10.1007/s11517-022-02673-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Accepted: 09/17/2022] [Indexed: 11/11/2022]
|
99
|
Celard P, Iglesias EL, Sorribes-Fdez JM, Romero R, Vieira AS, Borrajo L. A survey on deep learning applied to medical images: from simple artificial neural networks to generative models. Neural Comput Appl 2022; 35:2291-2323. [PMID: 36373133 PMCID: PMC9638354 DOI: 10.1007/s00521-022-07953-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 10/12/2022] [Indexed: 11/06/2022]
Abstract
Deep learning techniques, in particular generative models, have taken on great importance in medical image analysis. This paper surveys fundamental deep learning concepts related to medical image generation. It provides concise overviews of studies which use some of the latest state-of-the-art models from last years applied to medical images of different injured body areas or organs that have a disease associated with (e.g., brain tumor and COVID-19 lungs pneumonia). The motivation for this study is to offer a comprehensive overview of artificial neural networks (NNs) and deep generative models in medical imaging, so more groups and authors that are not familiar with deep learning take into consideration its use in medicine works. We review the use of generative models, such as generative adversarial networks and variational autoencoders, as techniques to achieve semantic segmentation, data augmentation, and better classification algorithms, among other purposes. In addition, a collection of widely used public medical datasets containing magnetic resonance (MR) images, computed tomography (CT) scans, and common pictures is presented. Finally, we feature a summary of the current state of generative models in medical image including key features, current challenges, and future research paths.
Collapse
Affiliation(s)
- P. Celard
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - E. L. Iglesias
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - J. M. Sorribes-Fdez
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - R. Romero
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - A. Seara Vieira
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - L. Borrajo
- Computer Science Department, Universidade de Vigo, Escuela Superior de Ingeniería Informática, Campus Universitario As Lagoas, 32004 Ourense, Spain
- CINBIO - Biomedical Research Centre, Universidade de Vigo, Campus Universitario Lagoas-Marcosende, 36310 Vigo, Spain
- SING Research Group, Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| |
Collapse
|
100
|
D’Souza G, Reddy NVS, Manjunath KN. Localization of lung abnormalities on chest X-rays using self-supervised equivariant attention. Biomed Eng Lett 2022; 13:21-30. [PMID: 36711159 PMCID: PMC9873849 DOI: 10.1007/s13534-022-00249-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/02/2022] [Accepted: 10/08/2022] [Indexed: 11/06/2022] Open
Abstract
Chest X-Ray (CXR) images provide most anatomical details and the abnormalities on a 2D plane. Therefore, a 2D view of the 3D anatomy is sometimes sufficient for the initial diagnosis. However, close to fourteen commonly occurring diseases are sometimes difficult to identify by visually inspecting the images. Therefore, there is a drift toward developing computer-aided assistive systems to help radiologists. This paper proposes a deep learning model for the classification and localization of chest diseases by using image-level annotations. The model consists of a modified Resnet50 backbone for extracting feature corpus from the images, a classifier, and a pixel correlation module (PCM). During PCM training, the network is a weight-shared siamese architecture where the first branch applies the affine transform to the image before feeding to the network, while the second applies the same transform to the network output. The method was evaluated on CXR from the clinical center in the ratio of 70:20 for training and testing. The model was developed and tested using the cloud computing platform Google Colaboratory (NVidia Tesla P100 GPU, 16 GB of RAM). A radiologist subjectively validated the results. Our model trained with the configurations mentioned in this paper outperformed benchmark results. Supplementary Information The online version contains supplementary material available at 10.1007/s13534-022-00249-5.
Collapse
Affiliation(s)
- Gavin D’Souza
- Department of Instrumentation and Control Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104 India
| | - N. V. Subba Reddy
- Department of Information Technology, Manipal Institute of Technology Bengaluru, Manipal Academy of Higher Education, Manipal, Karnataka 560064 India
| | - K. N. Manjunath
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104 India
| |
Collapse
|