1
|
Liu W, Cui R, Li Y, Zhang S. Hybrid-Input Convolutional Neural Network-Based Underwater Image Quality Assessment. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1790-1798. [PMID: 37943644 DOI: 10.1109/tnnls.2023.3328340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Since precisely sensing the underwater environment is a challenging prerequisite for safe and reliable underwater operation, interest in underwater image processing is growing at a rapid pace. In engineering applications, there are redundant underwater images addressed in real-time on the remotely operated vehicle (ROV). It puts the equipment or operators under great pressure. To relieve this pressure by transmitting images selectively according to the degradation degree, we propose an end-to-end hybrid-input convolutional neural network (HI-CNN) to predict the degradation of underwater images. First, we propose a feature extraction module to extract the features of original underwater images and saliency maps concurrently, which is composed of two branches with the same structure and shared parameters. Second, we design an end-to-end model to predict the quality scores of original images, which consists of a feature extraction module and a prediction module. Finally, we establish a real-world dataset to make the proposed model be duplicated in the practical underwater environment. Through several experiments, we demonstrate that the proposed model outperforms existing models in predicting underwater image quality.
Collapse
|
2
|
Kumar Malladi SP, Mukhopadhyay J, Larabi MC, Chaudhury S. Lighter and Faster Two-Pathway CMRNet for Video Saliency Prediction. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) 2022. [DOI: 10.1109/icip46576.2022.9897252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
Affiliation(s)
| | - Jayanta Mukhopadhyay
- IIT Kharagpur,Visual Information Processing Lab,Dept. of Computer Science & Engg.,India
| | | | | |
Collapse
|
3
|
Yan K, Wang X, Kim J, Zuo W, Feng D. Deep Cognitive Gate: Resembling Human Cognition for Saliency Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4776-4792. [PMID: 33755558 DOI: 10.1109/tpami.2021.3068277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Saliency detection by human refers to the ability to identify pertinent information using our perceptive and cognitive capabilities. While human perception is attracted by visual stimuli, our cognitive capability is derived from the inspiration of constructing concepts of reasoning. Saliency detection has gained intensive interest with the aim of resembling human 'perceptual' system. However, saliency related to human 'cognition', particularly the analysis of complex salient regions ('cogitating' process), is yet to be fully exploited. We propose to resemble human cognition, coupled with human perception, to improve saliency detection. We recognize saliency in three phases ('Seeing' - 'Perceiving' - 'Cogitating), mimicking human's perceptive and cognitive thinking of an image. In our method, 'Seeing' phase is related to human perception, and we formulate the 'Perceiving' and 'Cogitating' phases related to the human cognition systems via deep neural networks (DNNs) to construct a new module (Cognitive Gate) that enhances the DNN features for saliency detection. To the best of our knowledge, this is the first work that established DNNs to resemble human cognition for saliency detection. In our experiments, our approach outperformed 17 benchmarking DNN methods on six well-recognized datasets, demonstrating that resembling human cognition improves saliency detection.
Collapse
|
4
|
Review of Visual Saliency Prediction: Development Process from Neurobiological Basis to Deep Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app12010309] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The human attention mechanism can be understood and simulated by closely associating the saliency prediction task to neuroscience and psychology. Furthermore, saliency prediction is widely used in computer vision and interdisciplinary subjects. In recent years, with the rapid development of deep learning, deep models have made amazing achievements in saliency prediction. Deep learning models can automatically learn features, thus solving many drawbacks of the classic models, such as handcrafted features and task settings, among others. Nevertheless, the deep models still have some limitations, for example in tasks involving multi-modality and semantic understanding. This study focuses on summarizing the relevant achievements in the field of saliency prediction, including the early neurological and psychological mechanisms and the guiding role of classic models, followed by the development process and data comparison of classic and deep saliency prediction models. This study also discusses the relationship between the model and human vision, as well as the factors that cause the semantic gaps, the influences of attention in cognitive research, the limitations of the saliency model, and the emerging applications, to provide new saliency predictions for follow-up work and the necessary help and advice.
Collapse
|
5
|
Malladi SPK, Mukhopadhyay J, Larabi C, Chaudhury S. Lighter and Faster Cross-Concatenated Multi-Scale Residual Block Based Network for Visual Saliency Prediction. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) 2021. [DOI: 10.1109/icip42928.2021.9506710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
Affiliation(s)
| | - Jayanta Mukhopadhyay
- IIT Kharagpur,Visual Information Processing Lab,Dept. of Computer Science & Engg.,India
| | | | | |
Collapse
|
6
|
Deep Multimodal Fusion Autoencoder for Saliency Prediction of RGB-D Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:6610997. [PMID: 34035801 PMCID: PMC8116150 DOI: 10.1155/2021/6610997] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 04/01/2021] [Accepted: 04/23/2021] [Indexed: 11/18/2022]
Abstract
In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin.
Collapse
|
7
|
Borji A. Saliency Prediction in the Deep Learning Era: Successes and Limitations. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:679-700. [PMID: 31425064 DOI: 10.1109/tpami.2019.2935715] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Visual saliency models have enjoyed a big leap in performance in recent years, thanks to advances in deep learning and large scale annotated data. Despite enormous effort and huge breakthroughs, however, models still fall short in reaching human-level accuracy. In this work, I explore the landscape of the field emphasizing on new deep saliency models, benchmarks, and datasets. A large number of image and video saliency models are reviewed and compared over two image benchmarks and two large scale video datasets. Further, I identify factors that contribute to the gap between models and humans and discuss the remaining issues that need to be addressed to build the next generation of more powerful saliency models. Some specific questions that are addressed include: in what ways current models fail, how to remedy them, what can be learned from cognitive studies of attention, how explicit saliency judgments relate to fixations, how to conduct fair model comparison, and what are the emerging applications of saliency models.
Collapse
|
8
|
Zhou F, Yao R, Liao G, Liu B, Qiu G. Visual Saliency via Embedding Hierarchical Knowledge in a Deep Neural Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8490-8505. [PMID: 32813655 DOI: 10.1109/tip.2020.3016464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep neural networks (DNNs) have been extensively applied in image processing, including visual saliency map pre-diction of images. A major difficulty in using a DNN for visual saliency prediction is the lack of labeled ground truth of visual saliency. A powerful DNN usually contains a large number of trainable parameters. This condition can easily lead to model over-fitting. In this study, we develop a novel method that over-comes such difficulty by embedding hierarchical knowledge of existing visual saliency models in a DNN. We achieve the objective of exploiting the knowledge contained in the existing visual sali-ency models by using saliency maps generated by local, global, and semantic models to tune and fix about 92.5% of the parame-ters in our network in a hierarchical manner. As a result, the number of trainable parameters that need to be tuned by the ground truth is considerably reduced. This reduction enables us to fully utilize the power of a large DNN and overcome the issue of over-fitting at the same time. Furthermore, we introduce a simple but very effective center prior in designing the learning cost function of the DNN by attaching high importance to the errors around the image center. We also present extensive experimental results on four commonly used public databases to demonstrate the superiority of the proposed method over classical and state-of-the-art methods on various evaluation metrics.
Collapse
|
9
|
Krasovskaya S, MacInnes WJ. Salience Models: A Computational Cognitive Neuroscience Review. Vision (Basel) 2019; 3:E56. [PMID: 31735857 PMCID: PMC6969943 DOI: 10.3390/vision3040056] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 10/12/2019] [Accepted: 10/22/2019] [Indexed: 11/21/2022] Open
Abstract
The seminal model by Laurent Itti and Cristoph Koch demonstrated that we can compute the entire flow of visual processing from input to resulting fixations. Despite many replications and follow-ups, few have matched the impact of the original model-so what made this model so groundbreaking? We have selected five key contributions that distinguish the original salience model by Itti and Koch; namely, its contribution to our theoretical, neural, and computational understanding of visual processing, as well as the spatial and temporal predictions for fixation distributions. During the last 20 years, advances in the field have brought up various techniques and approaches to salience modelling, many of which tried to improve or add to the initial Itti and Koch model. One of the most recent trends has been to adopt the computational power of deep learning neural networks; however, this has also shifted their primary focus to spatial classification. We present a review of recent approaches to modelling salience, starting from direct variations of the Itti and Koch salience model to sophisticated deep-learning architectures, and discuss the models from the point of view of their contribution to computational cognitive neuroscience.
Collapse
Affiliation(s)
- Sofia Krasovskaya
- Vision Modelling Laboratory, Faculty of Social Science, National Research University Higher School of Economics, 101000 Moscow, Russia
- School of Psychology, National Research University Higher School of Economics, 101000 Moscow, Russia
| | - W. Joseph MacInnes
- Vision Modelling Laboratory, Faculty of Social Science, National Research University Higher School of Economics, 101000 Moscow, Russia
- School of Psychology, National Research University Higher School of Economics, 101000 Moscow, Russia
| |
Collapse
|
10
|
Zhou L, Zhang Y, Jiang Y, Zhang T, Fan W. Re-Caption: Saliency-Enhanced Image Captioning through Two-Phase Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:694-709. [PMID: 31331893 DOI: 10.1109/tip.2019.2928144] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual and semantic saliency are important in image captioning. However, single-phase image captioning benefits little from limited saliency without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance the single-phase image captioning. In the framework, visual saliency and semantic saliency are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency map predictor. The semantic saliency mechanism sheds some lights on the properties of words with part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to explicitly compute the saliency degree of each sample, which helps for more robust image captioning. In addition, how to combine the above three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.
Collapse
|