1
|
Li H, Chen X, Yang W, Huang J, Sun K, Wang Y, Huang A, Mei L. Global Semantic-Sense Aggregation Network for Salient Object Detection in Remote Sensing Images. ENTROPY (BASEL, SWITZERLAND) 2024; 26:445. [PMID: 38920454 PMCID: PMC11203128 DOI: 10.3390/e26060445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 05/16/2024] [Accepted: 05/23/2024] [Indexed: 06/27/2024]
Abstract
Salient object detection (SOD) aims to accurately identify significant geographical objects in remote sensing images (RSI), providing reliable support and guidance for extensive geographical information analyses and decisions. However, SOD in RSI faces numerous challenges, including shadow interference, inter-class feature confusion, as well as unclear target edge contours. Therefore, we designed an effective Global Semantic-aware Aggregation Network (GSANet) to aggregate salient information in RSI. GSANet computes the information entropy of different regions, prioritizing areas with high information entropy as potential target regions, thereby achieving precise localization and semantic understanding of salient objects in remote sensing imagery. Specifically, we proposed a Semantic Detail Embedding Module (SDEM), which explores the potential connections among multi-level features, adaptively fusing shallow texture details with deep semantic features, efficiently aggregating the information entropy of salient regions, enhancing information content of salient targets. Additionally, we proposed a Semantic Perception Fusion Module (SPFM) to analyze map relationships between contextual information and local details, enhancing the perceptual capability for salient objects while suppressing irrelevant information entropy, thereby addressing the semantic dilution issue of salient objects during the up-sampling process. The experimental results on two publicly available datasets, ORSSD and EORSSD, demonstrated the outstanding performance of our method. The method achieved 93.91% Sα, 98.36% Eξ, and 89.37% Fβ on the EORSSD dataset.
Collapse
Affiliation(s)
- Hongli Li
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Xuhui Chen
- School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
| | - Wei Yang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Jian Huang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Kaimin Sun
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
| | - Ying Wang
- School of Information Science and Engineering, Wuchang Shouyi University, Wuhan 430064, China
| | - Andong Huang
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
| | - Liye Mei
- School of Computer Science, Hubei University of Technology, Wuhan 430068, China
- The Institute of Technological Sciences, Wuhan University, Wuhan 430072, China
| |
Collapse
|
2
|
Li J, Qiao S, Zhao Z, Xie C, Chen X, Xia C. Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5664-5677. [PMID: 37773905 DOI: 10.1109/tip.2023.3318959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Existing salient object detection methods often adopt deeper and wider networks for better performance, resulting in heavy computational burden and slow inference speed. This inspires us to rethink saliency detection to achieve a favorable balance between efficiency and accuracy. To this end, we design a lightweight framework while maintaining satisfying competitive accuracy. Specifically, we propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches, which are devised to confront the dilution of semantic context, loss of spatial structure and absence of boundary detail, respectively. Along with the fusion of three branches, the coarse segmentation results are gradually refined in structure details and boundary quality. Without adding additional learnable parameters, we further propose Scale-Adaptive Pooling Module to obtain multi-scale receptive field. In particular, on the premise of inheriting this framework, we rethink the relationship among accuracy, parameters and speed via network depth-width tradeoff. With these insightful considerations, we comprehensively design shallower and narrower models to explore the maximum potential of lightweight SOD. Our models are proposed for different application environments: 1) a tiny version CTD-S (1.7M, 125FPS) for resource constrained devices, 2) a fast version CTD-M (12.6M, 158FPS) for speed-demanding scenarios, 3) a standard version CTD-L (26.5M, 84FPS) for high-performance platforms. Extensive experiments validate the superiority of our method, which achieves better efficiency-accuracy balance across five benchmarks.
Collapse
|
3
|
Ndayikengurukiye D, Mignotte M. CoSOV1Net: A Cone- and Spatial-Opponent Primary Visual Cortex-Inspired Neural Network for Lightweight Salient Object Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:6450. [PMID: 37514744 PMCID: PMC10386563 DOI: 10.3390/s23146450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Salient object-detection models attempt to mimic the human visual system's ability to select relevant objects in images. To this end, the development of deep neural networks on high-end computers has recently achieved high performance. However, developing deep neural network models with the same performance for resource-limited vision sensors or mobile devices remains a challenge. In this work, we propose CoSOV1net, a novel lightweight salient object-detection neural network model, inspired by the cone- and spatial-opponent processes of the primary visual cortex (V1), which inextricably link color and shape in human color perception. Our proposed model is trained from scratch, without using backbones from image classification or other tasks. Experiments on the most widely used and challenging datasets for salient object detection show that CoSOV1Net achieves competitive performance (i.e., Fβ=0.931 on the ECSSD dataset) with state-of-the-art salient object-detection models while having a low number of parameters (1.14 M), low FLOPS (1.4 G) and high FPS (211.2) on GPU (Nvidia GeForce RTX 3090 Ti) compared to the state of the art in lightweight or nonlightweight salient object-detection tasks. Thus, CoSOV1net has turned out to be a lightweight salient object-detection model that can be adapted to mobile environments and resource-constrained devices.
Collapse
Affiliation(s)
- Didier Ndayikengurukiye
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| | - Max Mignotte
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC H3C 3J7, Canada
| |
Collapse
|
4
|
Xu T, Zhao W, Cai L, Shi X, Wang X. Lightweight saliency detection method for real-time localization of livestock meat bones. Sci Rep 2023; 13:4510. [PMID: 36934170 PMCID: PMC10024766 DOI: 10.1038/s41598-023-31551-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/14/2023] [Indexed: 03/20/2023] Open
Abstract
Existing salient object detection networks are large, have many parameters, are bulky and take up a lot of computational resources. Seriously hinder its application and promotion in boning robot. To solve this problem, this paper proposes a lightweight saliency detection algorithm for real-time localization of livestock meat bones. First, a lightweight feature extraction network based on multi-scale attention is constructed in the encoding stage. To ensure that more adequate salient object features are extracted with fewer parameters. Second, the fusion of jump connections is introduced in the decoding phase. Used to capture fine-grained semantics and coarse-grained semantics at full scale. Finally, we added a residual refinement module at the end of the backbone network. For optimizing salient target regions and boundaries. Experimental results on both publicly available datasets and self-made Pig leg X-ray (PLX) datasets show that. The proposed method is capable of ensuring first-class detection accuracy with 40 times less parameters than the conventional model. In the most challenging SOD dataset. The proposed algorithm in this paper achieves a value of Fωβ of 0.699. And the segmentation of livestock bones can be effectively performed on the homemade PLX dataset. Our model has a detection speed of 5fps on industrial control equipment.
Collapse
Affiliation(s)
- Tao Xu
- School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang, 453003, China
| | - Weishuo Zhao
- School of Information Engineering, Henan Institute of Science and Technology, Xinxiang, 453003, China
| | - Lei Cai
- School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang, 453003, China.
| | - Xiaoli Shi
- School of Information Engineering, Henan Institute of Science and Technology, Xinxiang, 453003, China
| | - Xinfa Wang
- School of Information Engineering, Henan Institute of Science and Technology, Xinxiang, 453003, China
| |
Collapse
|
5
|
Li S, Liu F, Jiao L, Liu X, Chen P. Learning Salient Feature for Salient Object Detection Without Labels. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1012-1025. [PMID: 36227820 DOI: 10.1109/tcyb.2022.3209978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Supervised salient object detection (SOD) methods achieve state-of-the-art performance by relying on human-annotated saliency maps, while unsupervised methods attempt to achieve SOD by not using any annotations. In unsupervised SOD, how to obtain saliency in a completely unsupervised manner is a huge challenge. Existing unsupervised methods usually gain saliency by introducing other handcrafted feature-based saliency methods. In general, the location information of salient objects is included in the feature maps. If the features belonging to salient objects are called salient features and the features that do not belong to salient objects, such as background, are called nonsalient features, by dividing the feature maps into salient features and nonsalient features in an unsupervised way, then the object at the location of the salient feature is the salient object. Based on the above motivation, a novel method called learning salient feature (LSF) is proposed, which achieves unsupervised SOD by LSF from the data itself. This method takes enhancing salient feature and suppressing nonsalient features as the objective. Furthermore, a salient object localization method is proposed to roughly locate objects where the salient feature is located, so as to obtain the salient activation map. Usually, the object in the salient activation map is incomplete and contains a lot of noise. To address this issue, a saliency map update strategy is introduced to gradually remove noise and strengthen boundaries. The visualization of images and their salient activation maps show that our method can effectively learn salient visual objects. Experiments show that we achieve superior unsupervised performance on a series of datasets.
Collapse
|
6
|
Kamath V, Renuka A. Deep Learning Based Object Detection for Resource Constrained Devices- Systematic Review, Future Trends and Challenges Ahead. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
|
7
|
Zhou X, Shen K, Weng L, Cong R, Zheng B, Zhang J, Yan C. Edge-Guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:539-552. [PMID: 35417369 DOI: 10.1109/tcyb.2022.3163152] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optical remote sensing images (RSIs) have been widely used in many applications, and one of the interesting issues about optical RSIs is the salient object detection (SOD). However, due to diverse object types, various object scales, numerous object orientations, and cluttered backgrounds in optical RSIs, the performance of the existing SOD models often degrade largely. Meanwhile, cutting-edge SOD models targeting optical RSIs typically focus on suppressing cluttered backgrounds, while they neglect the importance of edge information which is crucial for obtaining precise saliency maps. To address this dilemma, this article proposes an edge-guided recurrent positioning network (ERPNet) to pop-out salient objects in optical RSIs, where the key point lies in the edge-aware position attention unit (EPAU). First, the encoder is used to give salient objects a good representation, that is, multilevel deep features, which are then delivered into two parallel decoders, including: 1) an edge extraction part and 2) a feature fusion part. The edge extraction module and the encoder form a U-shape architecture, which not only provides accurate salient edge clues but also ensures the integrality of edge information by extra deploying the intraconnection. That is to say, edge features can be generated and reinforced by incorporating object features from the encoder. Meanwhile, each decoding step of the feature fusion module provides the position attention about salient objects, where position cues are sharpened by the effective edge information and are used to recurrently calibrate the misaligned decoding process. After that, we can obtain the final saliency map by fusing all position attention cues. Extensive experiments are conducted on two public optical RSIs datasets, and the results show that the proposed ERPNet can accurately and completely pop-out salient objects, which consistently outperforms the state-of-the-art SOD models.
Collapse
|
8
|
Li G, Liu Z, Zeng D, Lin W, Ling H. Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:526-538. [PMID: 35417367 DOI: 10.1109/tcyb.2022.3162945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Salient object detection (SOD) in optical remote sensing images (RSIs), or RSI-SOD, is an emerging topic in understanding optical RSIs. However, due to the difference between optical RSIs and natural scene images (NSIs), directly applying NSI-SOD methods to optical RSIs fails to achieve satisfactory results. In this article, we propose a novel adjacent context coordination network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for RSI-SOD. Specifically, ACCoNet consists of three parts: 1) an encoder; 2) adjacent context coordination modules (ACCoMs); and 3) a decoder. As the key component of ACCoNet, ACCoM activates the salient regions of output features of the encoder and transmits them to the decoder. ACCoM contains a local branch and two adjacent branches to coordinate the multilevel features simultaneously. The local branch highlights the salient regions in an adaptive way, while the adjacent branches introduce global information of adjacent levels to enhance salient regions. In addition, to extend the capabilities of the classic decoder block (i.e., several cascaded convolutional layers), we extend it with two bifurcations and propose a bifurcation-aggregation block (BAB) to capture the contextual information in the decoder. Extensive experiments on two benchmark datasets demonstrate that the proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU. The code and results of our method are available at https://github.com/MathLee/ACCoNet.
Collapse
|
9
|
Wu YH, Liu Y, Xu J, Bian JW, Gu YC, Cheng MM. MobileSal: Extremely Efficient RGB-D Salient Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10261-10269. [PMID: 34898430 DOI: 10.1109/tpami.2021.3134684] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The high computational cost of neural networks has prevented recent successes in RGB-D salient object detection (SOD) from benefiting real-world applications. Hence, this article introduces a novel network, MobileSal, which focuses on efficient RGB-D SOD using mobile networks for deep feature extraction. However, mobile networks are less powerful in feature representation than cumbersome networks. To this end, we observe that the depth information of color images can strengthen the feature representation related to SOD if leveraged properly. Therefore, we propose an implicit depth restoration (IDR) technique to strengthen the mobile networks' feature representation capability for RGB-D SOD. IDR is only adopted in the training phase and is omitted during testing, so it is computationally free. Besides, we propose compact pyramid refinement (CPR) for efficient multi-level feature aggregation to derive salient objects with clear boundaries. With IDR and CPR incorporated, MobileSal performs favorably against state-of-the-art methods on six challenging RGB-D SOD datasets with much faster speed (450fps for the input size of 320×320) and fewer parameters (6.5M). The code is released at https://mmcheng.net/mobilesal.
Collapse
|
10
|
Xu C, Liu X, Zhao W. Attention-guided salient object detection using autoencoder regularization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03917-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Wu YH, Liu Y, Zhang L, Cheng MM, Ren B. EDN: Salient Object Detection via Extremely-Downsampled Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3125-3136. [PMID: 35412981 DOI: 10.1109/tip.2022.3164550] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent progress on salient object detection (SOD) mainly benefits from multi-scale learning, where the high-level and low-level features collaborate in locating salient objects and discovering fine details, respectively. However, most efforts are devoted to low-level feature learning by fusing multi-scale features or enhancing boundary representations. High-level features, which although have long proven effective for many other tasks, yet have been barely studied for SOD. In this paper, we tap into this gap and show that enhancing high-level features is essential for SOD as well. To this end, we introduce an Extremely-Downsampled Network (EDN), which employs an extreme downsampling technique to effectively learn a global view of the whole image, leading to accurate salient object localization. To accomplish better multi-level feature fusion, we construct the Scale-Correlated Pyramid Convolution (SCPC) to build an elegant decoder for recovering object details from the above extreme downsampling. Extensive experiments demonstrate that EDN achieves state-of-the-art performance with real-time speed. Our efficient EDN-Lite also achieves competitive performance with a speed of 316fps. Hence, this work is expected to spark some new thinking in SOD. Code is available at https://github.com/yuhuan-wu/EDN.
Collapse
|
12
|
Guo S. Fundus image segmentation via hierarchical feature learning. Comput Biol Med 2021; 138:104928. [PMID: 34662814 DOI: 10.1016/j.compbiomed.2021.104928] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 10/06/2021] [Accepted: 10/06/2021] [Indexed: 01/28/2023]
Abstract
Fundus Image Segmentation (FIS) is an essential procedure for the automated diagnosis of ophthalmic diseases. Recently, deep fully convolutional networks have been widely used for FIS with state-of-the-art performance. The representative deep model is the U-Net, which follows an encoder-decoder architecture. I believe it is suboptimal for FIS because consecutive pooling operations in the encoder lead to low-resolution representation and loss of detailed spatial information, which is particularly important for the segmentation of tiny vessels and lesions. Motivated by this, a high-resolution hierarchical network (HHNet) is proposed to learn semantic-rich high-resolution representations and preserve spatial details simultaneously. Specifically, a High-resolution Feature Learning (HFL) module with increasing dilation rates was first designed to learn the high-level high-resolution representations. Then, the HHNet was constructed by incorporating three HFL modules and two feature aggregation modules. The HHNet runs in a coarse-to-fine manner, and fine segmentation maps are output at the last level. Extensive experiments were conducted on fundus lesion segmentation, vessel segmentation, and optic cup segmentation. The experimental results reveal that the proposed method shows highly competitive or even superior performance in terms of segmentation performance and computation cost, indicating its potential advantages in clinical application.
Collapse
Affiliation(s)
- Song Guo
- School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an, 710055, China.
| |
Collapse
|
13
|
Wu YH, Liu Y, Zhang L, Gao W, Cheng MM. Regularized Densely-Connected Pyramid Network for Salient Instance Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3897-3907. [PMID: 33750689 DOI: 10.1109/tip.2021.3065822] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Much of the recent efforts on salient object detection (SOD) have been devoted to producing accurate saliency maps without being aware of their instance labels. To this end, we propose a new pipeline for end-to-end salient instance segmentation (SIS) that predicts a class-agnostic mask for each detected salient instance. To better use the rich feature hierarchies in deep networks and enhance the side predictions, we propose the regularized dense connections, which attentively promote informative features and suppress non-informative ones from all feature pyramids. A novel multi-level RoIAlign based decoder is introduced to adaptively aggregate multi-level features for better mask predictions. Such strategies can be well-encapsulated into the Mask R-CNN pipeline. Extensive experiments on popular benchmarks demonstrate that our design significantly outperforms existing state-of-the-art competitors by 6.3% (58.6% vs. 52.3%) in terms of the AP metric. The code is available at https://github.com/yuhuan-wu/RDPNet.
Collapse
|
14
|
Liu Y, Zhang XY, Bian JW, Zhang L, Cheng MM. SAMNet: Stereoscopically Attentive Multi-Scale Network for Lightweight Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3804-3814. [PMID: 33735077 DOI: 10.1109/tip.2021.3065239] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Recent progress on salient object detection (SOD) mostly benefits from the explosive development of Convolutional Neural Networks (CNNs). However, much of the improvement comes with the larger network size and heavier computation overhead, which, in our view, is not mobile-friendly and thus difficult to deploy in practice. To promote more practical SOD systems, we introduce a novel Stereoscopically Attentive Multi-scale (SAM) module, which adopts a stereoscopic attention mechanism to adaptively fuse the features of various scales. Embarking on this module, we propose an extremely lightweight network, namely SAMNet, for SOD. Extensive experiments on popular benchmarks demonstrate that the proposed SAMNet yields comparable accuracy with state-of-the-art methods while running at a GPU speed of 343fps and a CPU speed of 5fps for 336 ×336 inputs with only 1.33M parameters. Therefore, SAMNet paves a new path towards SOD. The source code is available on the project page https://mmcheng.net/SAMNet/.
Collapse
|
15
|
Wu YH, Gao SH, Mei J, Xu J, Fan DP, Zhang RG, Cheng MM. JCS: An Explainable COVID-19 Diagnosis System by Joint Classification and Segmentation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:3113-3126. [PMID: 33600316 DOI: 10.1109/tip.2021.3058783] [Citation(s) in RCA: 162] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Recently, the coronavirus disease 2019 (COVID-19) has caused a pandemic disease in over 200 countries, influencing billions of humans. To control the infection, identifying and separating the infected people is the most crucial step. The main diagnostic tool is the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test. Still, the sensitivity of the RT-PCR test is not high enough to effectively prevent the pandemic. The chest CT scan test provides a valuable complementary tool to the RT-PCR test, and it can identify the patients in the early-stage with high sensitivity. However, the chest CT scan test is usually time-consuming, requiring about 21.5 minutes per case. This paper develops a novel Joint Classification and Segmentation (JCS) system to perform real-time and explainable COVID- 19 chest CT diagnosis. To train our JCS system, we construct a large scale COVID- 19 Classification and Segmentation (COVID-CS) dataset, with 144,167 chest CT images of 400 COVID- 19 patients and 350 uninfected cases. 3,855 chest CT images of 200 patients are annotated with fine-grained pixel-level labels of opacifications, which are increased attenuation of the lung parenchyma. We also have annotated lesion counts, opacification areas, and locations and thus benefit various diagnosis aspects. Extensive experiments demonstrate that the proposed JCS diagnosis system is very efficient for COVID-19 classification and segmentation. It obtains an average sensitivity of 95.0% and a specificity of 93.0% on the classification test set, and 78.5% Dice score on the segmentation test set of our COVID-CS dataset. The COVID-CS dataset and code are available at https://github.com/yuhuan-wu/JCS.
Collapse
|