1
|
Chen J, Cong R, Luo Y, Ip HHS, Kwong S. Replay Without Saving: Prototype Derivation and Distribution Rebalance for Class-Incremental Semantic Segmentation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:4699-4716. [PMID: 40031667 DOI: 10.1109/tpami.2025.3545966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
The research of class-incremental semantic segmentation (CISS) seeks to enhance semantic segmentation methods by enabling the progressive learning of new classes while preserving knowledge of previously learned ones. A significant yet often neglected challenge in this domain is class imbalance. In CISS, each task focuses on different foreground classes, with the training set for each task exclusively comprising images that contain these currently focused classes. This results in an overrepresentation of these classes within the single-task training set, leading to a classification bias towards them. To address this issue, we propose a novel CISS method named STAR, whose core principle is to reintegrate the missing proportions of previous classes into current single-task training samples by replaying their prototypes. Moreover, we develop a prototype deviation technique that enables the deduction of past-class prototypes, integrating the recognition patterns of the classifiers and the extraction patterns of the feature extractor. With this technique, replay can be accomplished without using any storage to save prototypes. Complementing our method, we devise two loss functions to enforce cross-task feature constraints: the Old-Class Features Maintaining (OCFM) loss and the Similarity-Aware Discriminative (SAD) loss. The OCFM loss is designed to stabilize the feature space of old classes, thus preserving previously acquired knowledge without compromising the ability to learn new classes. The SAD loss aims to enhance feature distinctions between similar old and new class pairs, minimizing potential confusion. Our experiments on two public datasets, Pascal VOC 2012 and ADE20 K, demonstrate that our STAR achieves state-of-the-art performance.
Collapse
|
2
|
Ni Z, Xiao R, Yang W, Wang H, Wang Z, Xiang L, Sun L. M2Trans: Multi-Modal Regularized Coarse-to-Fine Transformer for Ultrasound Image Super-Resolution. IEEE J Biomed Health Inform 2025; 29:3112-3123. [PMID: 39226206 DOI: 10.1109/jbhi.2024.3454068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Ultrasound image super-resolution (SR) aims to transform low-resolution images into high-resolution ones, thereby restoring intricate details crucial for improved diagnostic accuracy. However, prevailing methods relying solely on image modality guidance and pixel-wise loss functions struggle to capture the distinct characteristics of medical images, such as unique texture patterns and specific colors harboring critical diagnostic information. To overcome these challenges, this paper introduces the Multi-Modal Regularized Coarse-to-fine Transformer (M2Trans) for Ultrasound Image SR. By integrating the text modality, we establish joint image-text guidance during training, leveraging the medical CLIP model to incorporate richer priors from text descriptions into the SR optimization process, enhancing detail, structure, and semantic recovery. Furthermore, we propose a novel coarse-to-fine transformer comprising multiple branches infused with self-attention and frequency transforms to efficiently capture signal dependencies across different scales. Extensive experimental results demonstrate significant improvements over state-of-the-art methods on benchmark datasets, including CCA-US, US-CASE, and our newly created dataset MMUS1K, with a minimum improvement of 0.17dB, 0.30dB, and 0.28dB in terms of PSNR.
Collapse
|
3
|
Xie C, Fei L, Tao H, Hu Y, Zhou W, Hoe JT, Hu W, Tan YP. Residual Quotient Learning for Zero-Reference Low-Light Image Enhancement. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; PP:365-378. [PMID: 40030647 DOI: 10.1109/tip.2024.3519997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Recently, neural networks have become the dominant approach to low-light image enhancement (LLIE), with at least one-third of them adopting a Retinex-related architecture. However, through in-depth analysis, we contend that this most widely accepted LLIE structure is suboptimal, particularly when addressing the non-uniform illumination commonly observed in natural images. In this paper, we present a novel variant learning framework, termed residual quotient learning, to substantially alleviate this issue. Instead of following the existing Retinex-related decomposition-enhancement-reconstruction process, our basic idea is to explicitly reformulate the light enhancement task as adaptively predicting the latent quotient with reference to the original low-light input using a residual learning fashion. By leveraging the proposed residual quotient learning, we develop a lightweight yet effective network called ResQ-Net. This network features enhanced non-uniform illumination modeling capabilities, making it more suitable for real-world LLIE tasks. Moreover, due to its well-designed structure and reference-free loss function, ResQ-Net is flexible in training as it allows for zero-reference optimization, which further enhances the generalization and adaptability of our entire framework. Extensive experiments on various benchmark datasets demonstrate the merits and effectiveness of the proposed residual quotient learning, and our trained ResQ-Net outperforms state-of-the-art methods both qualitatively and quantitatively. Furthermore, a practical application in dark face detection is explored, and the preliminary results confirm the potential and feasibility of our method in real-world scenarios.
Collapse
|
4
|
Chen C, Ma G, Song W, Li S, Hao A, Qin H. Saliency-Free and Aesthetic-Aware Panoramic Video Navigation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:2037-2054. [PMID: 40030675 DOI: 10.1109/tpami.2024.3516874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Most of the existing panoramic video navigation approaches are saliency-driven, whereby off-the-shelf saliency detection tools are directly employed to aid the navigation approaches in localizing video content that should be incorporated into the navigation path. In view of the dilemma faced by our research community, we rethink if the "saliency clues" are really appropriate to serve the panoramic video navigation task. According to our in-depth investigation, we argue that using "saliency clues" cannot generate a satisfying navigation path, failing to well represent the given panoramic video, and the views in the navigation path are also low aesthetics. In this paper, we present a brand-new navigation paradigm. Although our model is still trained on eye-fixations, our methodology can additionally enable the trained model to perceive the "meaningful" degree of the given panoramic video content. Outwardly, the proposed new approach is saliency-free, but inwardly, it is developed from saliency but biasing more to be "meaningful-driven"; thus, it can generate a navigation path with more appropriate content coverage. Besides, this paper is the first attempt to devise an unsupervised learning scheme to ensure all localized meaningful views in the navigation path have high aesthetics. Thus, the navigation path generated by our approach can also bring users an enjoyable watching experience. As a new topic in its infancy, we have devised a series of quantitative evaluation schemes, including objective verifications and subjective user studies. All these innovative attempts would have great potential to inspire and promote this research field in the near future.
Collapse
|
5
|
Liu L, Jan H, Tang C, He H, Zhang L, Lei Z. Dual-channel lightweight GAN for enhancing color retinal images with noise suppression and structural protection. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2024; 41:1948-1958. [PMID: 39889019 DOI: 10.1364/josaa.530601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 09/03/2024] [Indexed: 02/02/2025]
Abstract
As we all know, suppressing noise while maintaining detailed structure has been a challenging problem in the field of image enhancement, especially for color retinal images. In this paper, a dual-channel lightweight GAN named dilated shuffle generative adversarial network (DS-GAN) is proposed to solve the above problems. The lightweight generator consists of the RB branch used in the red-blue channels and the GN branch used in the green channel. The branches are then integrated with a cat function to generate enhanced images. The RB branch cascades six identical RB-enhanced modules and adds skip connections. The structure of the GN branch is similar to that of the RB branch. The generator simultaneously leverages the local context extraction capability of the normal convolution and the global information extraction capability of the dilated convolution. In addition, it facilitates the fusion and communication of feature information between channels through channel shuffle. Additionally, we utilize the lightweight image classification model ShuffleNetV2 as a discriminator to distinguish between enhanced images and corresponding labels. We also constructed a dataset for color retinal image enhancement by using traditional methods and a hybrid loss function by combining the MS-SSIM and perceptual loss for training the generator. With the proposed dataset and loss function, we train the DS-GAN successfully. We test our method on four publicly available datasets (Messidor, DIARETDB0, DRIVE, and FIRE) and a clinic dataset from the Tianjin Eye Hospital (China), and compare it with six existing image enhancement methods. The results show that the proposed method can simultaneously suppress noise, preserve structure, and enhance contrast in color retinal image enhancement. It gets better results than the compared methods in all cases. Furthermore, the model has fewer parameters, which provides the possibility of real-time image enhancement for portable devices.
Collapse
|
6
|
Jia Y, Yu W, Chen G, Zhao L. Nighttime road scene image enhancement based on cycle-consistent generative adversarial network. Sci Rep 2024; 14:14375. [PMID: 38909068 PMCID: PMC11193765 DOI: 10.1038/s41598-024-65270-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 06/18/2024] [Indexed: 06/24/2024] Open
Abstract
During nighttime road scenes, images are often affected by contrast distortion, loss of detailed information, and a significant amount of noise. These factors can negatively impact the accuracy of segmentation and object detection in nighttime road scenes. A cycle-consistent generative adversarial network has been proposed to address this issue to improve the quality of nighttime road scene images. The network includes two generative networks with identical structures and two adversarial networks with identical structures. The generative network comprises an encoder network and a corresponding decoder network. A context feature extraction module is designed as the foundational element of the encoder-decoder network to capture more contextual semantic information with different receptive fields. A receptive field residual module is also designed to increase the receptive field in the encoder network.The illumination attention module is inserted between the encoder and decoder to transfer critical features extracted by the encoder to the decoder. The network also includes a multiscale discriminative network to discriminate better whether the image is a real high-quality or generated image. Additionally, an improved loss function is proposed to enhance the efficacy of image enhancement. Compared to state-of-the-art methods, the proposed approach achieves the highest performance in enhancing nighttime images, making them clearer and more natural.
Collapse
Affiliation(s)
- Yanfei Jia
- College of Electrical and Information Engineering, Beihua University, Jilin, 132013, China
| | - Wenshuo Yu
- College of Electrical Engineering, Northeast Electric Power University, Jilin, 132012, China
| | - Guangda Chen
- College of Electrical and Information Engineering, Beihua University, Jilin, 132013, China.
| | - Liquan Zhao
- College of Electrical Engineering, Northeast Electric Power University, Jilin, 132012, China
| |
Collapse
|
7
|
Cao Y, Min X, Sun W, Zhai G. Attention-Guided Neural Networks for Full-Reference and No-Reference Audio-Visual Quality Assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1882-1896. [PMID: 37030730 DOI: 10.1109/tip.2023.3251695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With the popularity of mobile Internet, audio and video (A/V) have become the main way for people to entertain and socialize daily. However, in order to reduce the cost of media storage and transmission, A/V signals will be compressed by service providers before they are transmitted to end-users, which inevitably causes distortions in the A/V signals and degrades the end-user's Quality of Experience (QoE). This motivates us to research the objective audio-visual quality assessment (AVQA). In the field of AVQA, most previous works only focus on single-mode audio or visual signals, which ignores that the perceptual quality of users depends on both audio and video signals. Therefore, we propose an objective AVQA architecture for multi-mode signals based on attentional neural networks. Specifically, we first utilize an attention prediction model to extract the salient regions of video frames. Then, a pre-trained convolutional neural network is used to extract short-time features of the salient regions and the corresponding audio signals. Next, the short-time features are fed into Gated Recurrent Unit (GRU) networks to model the temporal relationship between adjacent frames. Finally, the fully connected layers are utilized to fuse the temporal related features of A/V signals modeled by the GRU network into the final quality score. The proposed architecture is flexible and can be applied to both full-reference and no-reference AVQA. Experimental results on the LIVE-SJTU Database and UnB-AVC Database demonstrate that our model outperforms the state-of-the-art AVQA methods. The code of the proposed method will be publicly available to promote the development of the field of AVQA.
Collapse
|
8
|
Khan RA, Luo Y, Wu FX. Multi-level GAN based enhanced CT scans for liver cancer diagnosis. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
9
|
Li C, Guo C, Han L, Jiang J, Cheng MM, Gu J, Loy CC. Low-Light Image and Video Enhancement Using Deep Learning: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9396-9416. [PMID: 34752382 DOI: 10.1109/tpami.2021.3126387] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination. Recent advances in this area are dominated by deep learning-based solutions, where many learning strategies, network structures, loss functions, training data, etc. have been employed. In this paper, we provide a comprehensive survey to cover various aspects ranging from algorithm taxonomy to unsolved open issues. To examine the generalization of existing methods, we propose a low-light image and video dataset, in which the images and videos are taken by different mobile phones' cameras under diverse illumination conditions. Besides, for the first time, we provide a unified online platform that covers many popular LLIE methods, of which the results can be produced through a user-friendly web interface. In addition to qualitative and quantitative evaluation of existing methods on publicly available and our proposed datasets, we also validate their performance in face detection in the dark. This survey together with the proposed dataset and online platform could serve as a reference source for future study and promote the development of this research field. The proposed platform and dataset as well as the collected methods, datasets, and evaluation metrics are publicly available and will be regularly updated. Project page: https://www.mmlab-ntu.com/project/lliv_survey/index.html.
Collapse
|
10
|
Xu H, Long X, Wang M. UUGAN: a GAN-based approach towards underwater image enhancement using non-pairwise supervision. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01659-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
11
|
Zhao R, Han Y, Zhao J. End-to-End Retinex-Based Illumination Attention Low-Light Enhancement Network for Autonomous Driving at Night. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4942420. [PMID: 36039345 PMCID: PMC9420063 DOI: 10.1155/2022/4942420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 06/27/2022] [Accepted: 07/09/2022] [Indexed: 11/17/2022]
Abstract
Low-light image enhancement is a preprocessing work for many recognition and tracking tasks for autonomous driving at night. It needs to handle various factors simultaneously including uneven lighting, low contrast, and artifacts. We propose a novel end-to-end Retinex-based illumination attention low-light enhancement network. Specifically, our proposed method adopts multibranch architecture to extract rich features for different depth levels. Meanwhile, we consider the features from different scales in built-in illumination attention module. We encode reflectance features and illumination features into latent space based on Retinex in each submodule, which could cater for highly ill-posed image decomposition tasks. It aims to enhance the desired illumination features under different receptive fields. Subsequently, we propose a memory gate mechanism to learn adaptively long-term and short-term memory. Their weight could control how many high-level and low-level features should be reserved. This method could improve the image quality from both different feature scales and feature levels. Comprehensive experiments on BDD10K and cityscapes datasets demonstrate that our proposed method outperforms various types of methods in terms of visual quality and quantitative metrics. We also show that our proposed method has certain antinoise capability and generalizes well without fine-tuning when dealing with unseen images. Meanwhile, our restoration performance is comparable to that of advanced computationally intensive models.1.
Collapse
Affiliation(s)
- Ruini Zhao
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Yi Han
- School of Automobile, Chang'an University, Xi'an, Shaanxi 710064, China
| | - Jian Zhao
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
| |
Collapse
|