1
|
Liu W, Kang X, Duan P, Xie Z, Wei X, Li S. SOSNet: Real-Time Small Object Segmentation via Hierarchical Decoding and Example Mining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3071-3083. [PMID: 38090866 DOI: 10.1109/tnnls.2023.3338732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Real-time semantic segmentation plays an important role in auto vehicles. However, most real-time small object segmentation methods fail to obtain satisfactory performance on small objects, such as cars and sign symbols, since the large objects usually tend to devote more to the segmentation result. To solve this issue, we propose an efficient and effective architecture, termed small objects segmentation network (SOSNet), to improve the segmentation performance of small objects. The SOSNet works from two perspectives: methodology and data. Specifically, with the former, we propose a dual-branch hierarchical decoder (DBHD) which is viewed as a small-object sensitive segmentation head. The DBHD consists of a top segmentation head that predicts whether the pixels belong to a small object class and a bottom one that estimates the pixel class. In this situation, the latent correlation among small objects can be fully explored. With the latter, we propose a small object example mining (SOEM) algorithm for balancing examples between small objects and large objects automatically. The core idea of the proposed SOEM is that most of the hard examples on small-object classes are reserved for training while most of the easy examples on large-object classes are banned. Experiments on three commonly used datasets show that the proposed SOSNet architecture greatly improves the accuracy compared to the existing real-time semantic segmentation methods while keeping efficiency. The code will be available at https://github.com/StuLiu/SOSNet.
Collapse
|
2
|
Sun X, Yao F, Ding C. Modeling High-Order Relationships: Brain-Inspired Hypergraph-Induced Multimodal-Multitask Framework for Semantic Comprehension. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12142-12156. [PMID: 37028292 DOI: 10.1109/tnnls.2023.3252359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Semantic comprehension aims to reasonably reproduce people's real intentions or thoughts, e.g., sentiment, humor, sarcasm, motivation, and offensiveness, from multiple modalities. It can be instantiated as a multimodal-oriented multitask classification issue and applied to scenarios, such as online public opinion supervision and political stance analysis. Previous methods generally employ multimodal learning alone to deal with varied modalities or solely exploit multitask learning to solve various tasks, a few to unify both into an integrated framework. Moreover, multimodal-multitask cooperative learning could inevitably encounter the challenges of modeling high-order relationships, i.e., intramodal, intermodal, and intertask relationships. Related research of brain sciences proves that the human brain possesses multimodal perception and multitask cognition for semantic comprehension via decomposing, associating, and synthesizing processes. Thus, establishing a brain-inspired semantic comprehension framework to bridge the gap between multimodal and multitask learning becomes the primary motivation of this work. Motivated by the superiority of the hypergraph in modeling high-order relations, in this article, we propose a hypergraph-induced multimodal-multitask (HIMM) network for semantic comprehension. HIMM incorporates monomodal, multimodal, and multitask hypergraph networks to, respectively, mimic the decomposing, associating, and synthesizing processes to tackle the intramodal, intermodal, and intertask relationships accordingly. Furthermore, temporal and spatial hypergraph constructions are designed to model the relationships in the modality with sequential and spatial structures, respectively. Also, we elaborate a hypergraph alternative updating algorithm to ensure that vertices aggregate to update hyperedges and hyperedges converge to update their connected vertices. Experiments on the dataset with two modalities and five tasks verify the effectiveness of HIMM on semantic comprehension.
Collapse
|
3
|
Lin Q, Tan W, Cai S, Yan B, Li J, Zhong Y. Lesion-Decoupling-Based Segmentation With Large-Scale Colon and Esophageal Datasets for Early Cancer Diagnosis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11142-11156. [PMID: 37028330 DOI: 10.1109/tnnls.2023.3248804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Lesions of early cancers often show flat, small, and isochromatic characteristics in medical endoscopy images, which are difficult to be captured. By analyzing the differences between the internal and external features of the lesion area, we propose a lesion-decoupling-based segmentation (LDS) network for assisting early cancer diagnosis. We introduce a plug-and-play module called self-sampling similar feature disentangling module (FDM) to obtain accurate lesion boundaries. Then, we propose a feature separation loss (FSL) function to separate pathological features from normal ones. Moreover, since physicians make diagnoses with multimodal data, we propose a multimodal cooperative segmentation network with two different modal images as input: white-light images (WLIs) and narrowband images (NBIs). Our FDM and FSL show a good performance for both single-modal and multimodal segmentations. Extensive experiments on five backbones prove that our FDM and FSL can be easily applied to different backbones for a significant lesion segmentation accuracy improvement, and the maximum increase of mean Intersection over Union (mIoU) is 4.58. For colonoscopy, we can achieve up to mIoU of 91.49 on our Dataset A and 84.41 on the three public datasets. For esophagoscopy, mIoU of 64.32 is best achieved on the WLI dataset and 66.31 on the NBI dataset.
Collapse
|
4
|
Wang Z, Yang L, Sun T, Yan W. Fusion PCAM R-CNN of Automatic Segmentation for Magnetic Flux Leakage Defects. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11424-11435. [PMID: 37027265 DOI: 10.1109/tnnls.2023.3261363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Magnetic leakage detection technology plays an important role in the long-oil pipeline. Automatic segmentation of defecting images is crucial for the detection of magnetic flux leakage (MFL) works. At present, accurate segmentation for small defects has always been a difficult problem. In contrast to the state-of-the-art MFL detection methodologies based on convolution neural network (CNN), an optimization method is devised in our study by integrating mask region-based CNN (Mask R-CNN) and information entropy constraint (IEC). To be precise, the principal component analysis (PCA) is utilized to improve the feature learning and network segmentation ability of the convolution kernel. The similarity constraint rule of information entropy is proposed to be inserted into the convolution layer in the Mask R-CNN network. The Mask R-CNN optimizes the convolutional kernel with similar weights or higher similarity, meanwhile, the PCA network reduces the dimension of the feature image to reconstruct the original feature vector. As such, the feature extraction of MFL defects is optimized in the convolution check. The research results can be applied in the field of MFL detection.
Collapse
|
5
|
Liu R, Huang ZA, Hu Y, Zhu Z, Wong KC, Tan KC. Attention-Like Multimodality Fusion With Data Augmentation for Diagnosis of Mental Disorders Using MRI. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7627-7641. [PMID: 36374900 DOI: 10.1109/tnnls.2022.3219551] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The globally rising prevalence of mental disorders leads to shortfalls in timely diagnosis and therapy to reduce patients' suffering. Facing such an urgent public health problem, professional efforts based on symptom criteria are seriously overstretched. Recently, the successful applications of computer-aided diagnosis approaches have provided timely opportunities to relieve the tension in healthcare services. Particularly, multimodal representation learning gains increasing attention thanks to the high temporal and spatial resolution information extracted from neuroimaging fusion. In this work, we propose an efficient multimodality fusion framework to identify multiple mental disorders based on the combination of functional and structural magnetic resonance imaging. A multioutput conditional generative adversarial network (GAN) is developed to address the scarcity of multimodal data for augmentation. Based on the augmented training data, the multiheaded gating fusion model is proposed for classification by extracting the complementary features across different modalities. The experiments demonstrate that the proposed model can achieve robust accuracies of 75.1 ± 1.5 %, 72.9 ± 1.1 %, and 87.2 ± 1.5 % for autism spectrum disorder (ASD), attention deficit/hyperactivity disorder, and schizophrenia, respectively. In addition, the interpretability of our model is expected to enable the identification of remarkable neuropathology diagnostic biomarkers, leading to well-informed therapeutic decisions.
Collapse
|
6
|
Zhao X, Zang D, Wang S, Shen Z, Xuan K, Wei Z, Wang Z, Zheng R, Wu X, Li Z, Wang Q, Qi Z, Zhang L. sTBI-GAN: An adversarial learning approach for data synthesis on traumatic brain segmentation. Comput Med Imaging Graph 2024; 112:102325. [PMID: 38228021 DOI: 10.1016/j.compmedimag.2024.102325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/18/2024]
Abstract
Automatic brain segmentation of magnetic resonance images (MRIs) from severe traumatic brain injury (sTBI) patients is critical for brain abnormality assessments and brain network analysis. Construction of sTBI brain segmentation model requires manually annotated MR scans of sTBI patients, which becomes a challenging problem as it is quite impractical to implement sufficient annotations for sTBI images with large deformations and lesion erosion. Data augmentation techniques can be applied to alleviate the issue of limited training samples. However, conventional data augmentation strategies such as spatial and intensity transformation are unable to synthesize the deformation and lesions in traumatic brains, which limits the performance of the subsequent segmentation task. To address these issues, we propose a novel medical image inpainting model named sTBI-GAN to synthesize labeled sTBI MR scans by adversarial inpainting. The main strength of our sTBI-GAN method is that it can generate sTBI images and corresponding labels simultaneously, which has not been achieved in previous inpainting methods for medical images. We first generate the inpainted image under the guidance of edge information following a coarse-to-fine manner, and then the synthesized MR image is used as the prior for label inpainting. Furthermore, we introduce a registration-based template augmentation pipeline to increase the diversity of the synthesized image pairs and enhance the capacity of data augmentation. Experimental results show that the proposed sTBI-GAN method can synthesize high-quality labeled sTBI images, which greatly improves the 2D and 3D traumatic brain segmentation performance compared with the alternatives. Code is available at .
Collapse
Affiliation(s)
- Xiangyu Zhao
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Di Zang
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China
| | - Sheng Wang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Zhenrong Shen
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Kai Xuan
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Zeyu Wei
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China
| | - Zhe Wang
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China
| | - Ruizhe Zheng
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China
| | - Xuehai Wu
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China
| | - Zheren Li
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Qian Wang
- School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Zengxin Qi
- Department of Neurosurgery, Huashan Hospital, Shanghai Medical College, Fudan University, Shanghai, China; National Center for Neurological Disorders, Shanghai, China; Shanghai Key Laboratory of Brain Function and Restoration and Neural Regeneration, Shanghai, China; State Key Laboratory of Medical Neurobiology and MOE Frontiers Center for Brain Science, School of Basic Medical Sciences and Institutes of Brain Science, Fudan University, China.
| | - Lichi Zhang
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
7
|
Wei J, Wu Z, Wang L, Bui TD, Qu L, Yap PT, Xia Y, Li G, Shen D. A cascaded nested network for 3T brain MR image segmentation guided by 7T labeling. PATTERN RECOGNITION 2022; 124:108420. [PMID: 38469076 PMCID: PMC10927017 DOI: 10.1016/j.patcog.2021.108420] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
Accurate segmentation of the brain into gray matter, white matter, and cerebrospinal fluid using magnetic resonance (MR) imaging is critical for visualization and quantification of brain anatomy. Compared to 3T MR images, 7T MR images exhibit higher tissue contrast that is contributive to accurate tissue delineation for training segmentation models. In this paper, we propose a cascaded nested network (CaNes-Net) for segmentation of 3T brain MR images, trained by tissue labels delineated from the corresponding 7T images. We first train a nested network (Nes-Net) for a rough segmentation. The second Nes-Net uses tissue-specific geodesic distance maps as contextual information to refine the segmentation. This process is iterated to build CaNes-Net with a cascade of Nes-Net modules to gradually refine the segmentation. To alleviate the misalignment between 3T and corresponding 7T MR images, we incorporate a correlation coefficient map to allow well-aligned voxels to play a more important role in supervising the training process. We compared CaNes-Net with SPM and FSL tools, as well as four deep learning models on 18 adult subjects and the ADNI dataset. Our results indicate that CaNes-Net reduces segmentation errors caused by the misalignment and improves segmentation accuracy substantially over the competing methods.
Collapse
Affiliation(s)
- Jie Wei
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Zhengwang Wu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Li Wang
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Toan Duc Bui
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Liangqiong Qu
- Department of Biomedical Data Science at Stanford University, Stanford, CA 94305, USA
| | - Pew-Thian Yap
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yong Xia
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an 710072, China
| | - Gang Li
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Dinggang Shen
- School of Biomedical Engineering, ShanghaiTech University, Shanghai 201210, China
- Shanghai United Imaging Intelligence Co., Ltd., Shanghai 200232, China
| |
Collapse
|