51
|
Zhou J, Ren Y, Yan Y, Pan L. A Multiple Graph Label Propagation Integration Framework for Salient Object Detection. Neural Process Lett 2015. [DOI: 10.1007/s11063-015-9488-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
52
|
Luo Y, Jiang M, Wong Y, Zhao Q. Multi-Camera Saliency. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:2057-2070. [PMID: 26340257 DOI: 10.1109/tpami.2015.2392783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A significant body of literature on saliency modeling predicts where humans look in a single image or video. Besides the scientific goal of understanding how information is fused from multiple visual sources to identify regions of interest in a holistic manner, there are tremendous engineering applications of multi-camera saliency due to the widespread of cameras. This paper proposes a principled framework to smoothly integrate visual information from multiple views to a global scene map, and to employ a saliency algorithm incorporating high-level features to identify the most important regions by fusing visual information. The proposed method has the following key distinguishing features compared with its counterparts: (1) the proposed saliency detection is global (salient regions from one local view may not be important in a global context), (2) it does not require special ways for camera deployment or overlapping field of view, and (3) the key saliency algorithm is effective in highlighting interesting object regions though not a single detector is used. Experiments on several data sets confirm the effectiveness of the proposed principled framework.
Collapse
|
53
|
Abstract
In this paper, visual attention spreading is formulated as a nonlocal diffusion equation. Different from other diffusion-based methods, a nonlocal diffusion tensor is introduced to consider both the diffusion strength and the diffusion direction. With the help of diffusion tensor, along with the principle direction, the diffusion has been suppressed to preserve the dissimilarity between the foreground and background, while in other directions, the diffusion has been boosted to combine the similar regions and highlight the salient object as a whole. Through a two-stages diffusion, the final saliency maps are obtained. Extensive quantitative or visual comparisons are performed on three widely used benchmark datasets, i.e. MSRA-ASD, MSRA-B and PASCAL-1500 datasets. Experimental results demonstrate the superior performance of our method.
Collapse
Affiliation(s)
- Xiujun Zhang
- College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - Chen Xu
- Institute of Intelligent Computing Science, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - Xiaoli Sun
- College of Mathematics and Computational Science, Shenzhen University, Nanhai Ave 3688, Shenzhen 518060, Guangdong, P. R. China
| | - George Baciu
- GAMA Lab, Department of Computing, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
54
|
Souly N, Shah M. Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes. Int J Comput Vis 2015. [DOI: 10.1007/s11263-015-0853-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
55
|
Chen C, Li S, Qin H, Hao A. Structure-sensitive saliency detection via multilevel rank analysis in intrinsic feature space. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2015; 24:2303-2316. [PMID: 25700446 DOI: 10.1109/tip.2015.2403232] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This paper advocates a novel multiscale, structure-sensitive saliency detection method, which can distinguish multilevel, reliable saliency from various natural pictures in a robust and versatile way. One key challenge for saliency detection is to guarantee the entire salient object being characterized differently from nonsalient background. To tackle this, our strategy is to design a structure-aware descriptor based on the intrinsic biharmonic distance metric. One benefit of introducing this descriptor is its ability to simultaneously integrate local and global structure information, which is extremely valuable for separating the salient object from nonsalient background in a multiscale sense. Upon devising such powerful shape descriptor, the remaining challenge is to capture the saliency to make sure that salient subparts actually stand out among all possible candidates. Toward this goal, we conduct multilevel low-rank and sparse analysis in the intrinsic feature space spanned by the shape descriptors defined on over-segmented super-pixels. Since the low-rank property emphasizes much more on stronger similarities among super-pixels, we naturally obtain a scale space along the rank dimension in this way. Multiscale saliency can be obtained by simply computing differences among the low-rank components across the rank scale. We conduct extensive experiments on some public benchmarks, and make comprehensive, quantitative evaluation between our method and existing state-of-the-art techniques. All the results demonstrate the superiority of our method in accuracy, reliability, robustness, and versatility.
Collapse
|
56
|
Zhang L, Li K, Ou Z, Wang F. Seam warping: a new approach for image retargeting for small displays. Soft comput 2015. [DOI: 10.1007/s00500-015-1795-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
57
|
Warnell G, David P, Chellappa R. Ray Saliency: Bottom-Up Visual Saliency for a Rotating and Zooming Camera. Int J Comput Vis 2015. [DOI: 10.1007/s11263-015-0842-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
58
|
Bylinskii Z, DeGennaro EM, Rajalingham R, Ruda H, Zhang J, Tsotsos JK. Towards the quantitative evaluation of visual attention models. Vision Res 2015; 116:258-68. [PMID: 25951756 DOI: 10.1016/j.visres.2015.04.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 03/15/2015] [Accepted: 04/02/2015] [Indexed: 11/17/2022]
Abstract
Scores of visual attention models have been developed over the past several decades of research. Differences in implementation, assumptions, and evaluations have made comparison of these models very difficult. Taxonomies have been constructed in an attempt at the organization and classification of models, but are not sufficient at quantifying which classes of models are most capable of explaining available data. At the same time, a multitude of physiological and behavioral findings have been published, measuring various aspects of human and non-human primate visual attention. All of these elements highlight the need to integrate the computational models with the data by (1) operationalizing the definitions of visual attention tasks and (2) designing benchmark datasets to measure success on specific tasks, under these definitions. In this paper, we provide some examples of operationalizing and benchmarking different visual attention tasks, along with the relevant design considerations.
Collapse
Affiliation(s)
- Z Bylinskii
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge 02141, USA; Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge 02141, USA.
| | - E M DeGennaro
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge 02141, USA
| | - R Rajalingham
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge 02141, USA
| | - H Ruda
- Computational Vision Laboratory, Department of Communication Sciences and Disorders, Northeastern University, Boston 02115, USA
| | - J Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China; Visual Attention Lab, Brigham and Women's Hospital, Cambridge, MA 02139, USA
| | - J K Tsotsos
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge 02141, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge 02141, USA; Electrical Engineering and Computer Science, Centre for Vision Research, York University, Toronto M3J 1P3, Canada
| |
Collapse
|
59
|
Gan L, Duan H. Chemical Reaction Optimization for Feature Combination in Bio-inspired Visual Attention. INT J COMPUT INT SYS 2015. [DOI: 10.1080/18756891.2015.1036220] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
|
60
|
Sun X, Yao H, Ji R, Liu XM. Toward statistical modeling of saccadic eye-movement and visual saliency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:4649-4662. [PMID: 25029460 DOI: 10.1109/tip.2014.2337758] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Collapse
|
61
|
Han S, Vasconcelos N. Object recognition with hierarchical discriminant saliency networks. Front Comput Neurosci 2014; 8:109. [PMID: 25249971 PMCID: PMC4158795 DOI: 10.3389/fncom.2014.00109] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/22/2014] [Indexed: 12/22/2022] Open
Abstract
The benefits of integrating attention and object recognition are investigated. While attention is frequently modeled as a pre-processor for recognition, we investigate the hypothesis that attention is an intrinsic component of recognition and vice-versa. This hypothesis is tested with a recognition model, the hierarchical discriminant saliency network (HDSN), whose layers are top-down saliency detectors, tuned for a visual class according to the principles of discriminant saliency. As a model of neural computation, the HDSN has two possible implementations. In a biologically plausible implementation, all layers comply with the standard neurophysiological model of visual cortex, with sub-layers of simple and complex units that implement a combination of filtering, divisive normalization, pooling, and non-linearities. In a convolutional neural network implementation, all layers are convolutional and implement a combination of filtering, rectification, and pooling. The rectification is performed with a parametric extension of the now popular rectified linear units (ReLUs), whose parameters can be tuned for the detection of target object classes. This enables a number of functional enhancements over neural network models that lack a connection to saliency, including optimal feature denoising mechanisms for recognition, modulation of saliency responses by the discriminant power of the underlying features, and the ability to detect both feature presence and absence. In either implementation, each layer has a precise statistical interpretation, and all parameters are tuned by statistical learning. Each saliency detection layer learns more discriminant saliency templates than its predecessors and higher layers have larger pooling fields. This enables the HDSN to simultaneously achieve high selectivity to target object classes and invariance. The performance of the network in saliency and object recognition tasks is compared to those of models from the biological and computer vision literatures. This demonstrates benefits for all the functional enhancements of the HDSN, the class tuning inherent to discriminant saliency, and saliency layers based on templates of increasing target selectivity and invariance. Altogether, these experiments suggest that there are non-trivial benefits in integrating attention and recognition.
Collapse
Affiliation(s)
- Sunhyoung Han
- Analytics Department, ID Analytics San Diego, CA, USA
| | - Nuno Vasconcelos
- Statistical and Visual Computing Lab, Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
62
|
Jing H, He X, Han Q, Abd El-Latif AA, Niu X. Saliency detection based on integrated features. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.02.048] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
63
|
Interval type-2 fuzzy kernel based support vector machine algorithm for scene classification of humanoid robot. Soft comput 2013. [DOI: 10.1007/s00500-013-1080-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
64
|
Xie Y, Lu H, Yang MH. Bayesian saliency via low and mid level cues. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2013; 22:1689-1698. [PMID: 22955904 DOI: 10.1109/tip.2012.2216276] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Visual saliency detection is a challenging problem in computer vision, but one of great importance and numerous applications. In this paper, we propose a novel model for bottom-up saliency within the Bayesian framework by exploiting low and mid level cues. In contrast to most existing methods that operate directly on low level cues, we propose an algorithm in which a coarse saliency region is first obtained via a convex hull of interest points. We also analyze the saliency information with mid level visual cues via superpixels. We present a Laplacian sparse subspace clustering method to group superpixels with local features, and analyze the results with respect to the coarse saliency region to compute the prior saliency map. We use the low level visual cues based on the convex hull to compute the observation likelihood, thereby facilitating inference of Bayesian saliency at each pixel. Extensive experiments on a large data set show that our Bayesian saliency model performs favorably against the state-of-the-art algorithms.
Collapse
Affiliation(s)
- Yulin Xie
- School of Information and Communication Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China.
| | | | | |
Collapse
|
65
|
Li J, Levine MD, An X, Xu X, He H. Visual saliency based on scale-space analysis in the frequency domain. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:996-1010. [PMID: 22802112 DOI: 10.1109/tpami.2012.147] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We address the issue of visual saliency from three perspectives. First, we consider saliency detection as a frequency domain analysis problem. Second, we achieve this by employing the concept of nonsaliency. Third, we simultaneously consider the detection of salient regions of different size. The paper proposes a new bottom-up paradigm for detecting visual saliency, characterized by a scale-space analysis of the amplitude spectrum of natural images. We show that the convolution of the image amplitude spectrum with a low-pass Gaussian kernel of an appropriate scale is equivalent to an image saliency detector. The saliency map is obtained by reconstructing the 2D signal using the original phase and the amplitude spectrum, filtered at a scale selected by minimizing saliency map entropy. A Hypercomplex Fourier Transform performs the analysis in the frequency domain. Using available databases, we demonstrate experimentally that the proposed model can predict human fixation data. We also introduce a new image database and use it to show that the saliency detector can highlight both small and large salient regions, as well as inhibit repeated distractors in cluttered images. In addition, we show that it is able to predict salient regions on which people focus their attention.
Collapse
Affiliation(s)
- Jian Li
- Institute of Automation, National University of Defense Technology, Changsha 410073, Hunan Province, P.R. China.
| | | | | | | | | |
Collapse
|
66
|
Mahadevan V, Vasconcelos N. Biologically Inspired Object Tracking Using Center-Surround Saliency Mechanisms. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:541-554. [PMID: 22529325 DOI: 10.1109/tpami.2012.98] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A biologically inspired discriminant object tracker is proposed. It is argued that discriminant tracking is a consequence of top-down tuning of the saliency mechanisms that guide the deployment of visual attention. The principle of discriminant saliency is then used to derive a tracker that implements a combination of center-surround saliency, a spatial spotlight of attention, and feature-based attention. In this framework, the tracking problem is formulated as one of continuous target-background classification, implemented in two stages. The first, or learning stage, combines a focus of attention (FoA) mechanism, and bottom-up saliency to identify a maximally discriminant set of features for target detection. The second, or detection stage, uses a feature-based attention mechanism and a target-tuned top-down discriminant saliency detector to detect the target. Overall, the tracker iterates between learning discriminant features from the target location in a video frame and detecting the location of the target in the next. The statistics of natural images are exploited to derive an implementation which is conceptually simple and computationally efficient. The saliency formulation is also shown to establish a unified framework for classifier design, target detection, automatic tracker initialization, and scale adaptation. Experimental results show that the proposed discriminant saliency tracker outperforms a number of state-of-the-art trackers in the literature.
Collapse
|
67
|
|
68
|
RETRACTED ARTICLE: From the human visual system to the computational models of visual attention: a survey. Artif Intell Rev 2013. [DOI: 10.1007/s10462-012-9385-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
69
|
Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:185-207. [PMID: 22487985 DOI: 10.1109/tpami.2012.89] [Citation(s) in RCA: 437] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Modeling visual attention--particularly stimulus-driven, saliency-based attention--has been a very active research area over the past 25 years. Many different models of attention are now available which, aside from lending theoretical contributions to other fields, have demonstrated successful applications in computer vision, mobile robotics, and cognitive systems. Here we review, from a computational perspective, the basic concepts of attention implemented in these models. We present a taxonomy of nearly 65 models, which provides a critical comparison of approaches, their capabilities, and shortcomings. In particular, 13 criteria derived from behavioral and computational studies are formulated for qualitative comparison of attention models. Furthermore, we address several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures. Finally, we highlight current research trends in attention modeling and provide insights for future.
Collapse
Affiliation(s)
- Ali Borji
- Department of Computer Science, University of Southern California, 3641 Watt Way, Los Angeles, CA 90089, USA.
| | | |
Collapse
|
70
|
Lin L, Zhou W. LGOH-Based Discriminant Centre-Surround Saliency Detection. INT J ADV ROBOT SYST 2013. [DOI: 10.5772/57222] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Discriminant saliency is a kind of decision-theoretic-based saliency detection method that has been proposed recently. Based on local gradient distribution, this paper proposes a simple but efficient discriminant centre-surround hypothesis, and builds local and global saliency models by combining multi-scale intensity contrast with colour and orientation features. This method makes three important contributions. First, a circular and multi-scale hierarchical centre-surround profile is designed for the local saliency detection. Secondly, the dense local gradient orientation histogram (LGOH) of the centre-surround region is counted and used for the local saliency analysis. And thirdly, a new integration strategy for the local and global saliency is proposed and applied to the final visual saliency discriminant. Experiments demonstrate the effectiveness of the proposed method. Compared with 12 state-of-the-art saliency detection models, the proposed method outperforms the others in precision-recall, F-measures and mean absolute error (MAE), and can produce a more complete salient object.
Collapse
Affiliation(s)
- Lili Lin
- College of Information and Electronic Engineering, Zhejiang Gongshang University, China
| | - Wenhui Zhou
- School of Computer Science and Technology, Hangzhou Dianzi University, China
| |
Collapse
|
71
|
Wu J, Lin W, Shi G, Liu A. Perceptual quality metric with internal generative mechanism. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2013; 22:43-54. [PMID: 22910116 DOI: 10.1109/tip.2012.2214048] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Objective image quality assessment (IQA) aims to evaluate image quality consistently with human perception. Most of the existing perceptual IQA metrics cannot accurately represent the degradations from different types of distortion, e.g., existing structural similarity metrics perform well on content-dependent distortions while not as well as peak signal-to-noise ratio (PSNR) on content-independent distortions. In this paper, we integrate the merits of the existing IQA metrics with the guide of the recently revealed internal generative mechanism (IGM). The IGM indicates that the human visual system actively predicts sensory information and tries to avoid residual uncertainty for image perception and understanding. Inspired by the IGM theory, we adopt an autoregressive prediction algorithm to decompose an input scene into two portions, the predicted portion with the predicted visual content and the disorderly portion with the residual content. Distortions on the predicted portion degrade the primary visual information, and structural similarity procedures are employed to measure its degradation; distortions on the disorderly portion mainly change the uncertain information and the PNSR is employed for it. Finally, according to the noise energy deployment on the two portions, we combine the two evaluation results to acquire the overall quality score. Experimental results on six publicly available databases demonstrate that the proposed metric is comparable with the state-of-the-art quality metrics.
Collapse
Affiliation(s)
- Jinjian Wu
- Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education of China, School of Electronic Engineering, Xidian University, Xi’an 710071, China.
| | | | | | | |
Collapse
|
72
|
Vig E, Dorr M, Martinetz T, Barth E. Intrinsic dimensionality predicts the saliency of natural dynamic scenes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2012; 34:1080-1091. [PMID: 22516647 DOI: 10.1109/tpami.2011.198] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Since visual attention-based computer vision applications have gained popularity, ever more complex, biologically inspired models seem to be needed to predict salient locations (or interest points) in naturalistic scenes. In this paper, we explore how far one can go in predicting eye movements by using only basic signal processing, such as image representations derived from efficient coding principles, and machine learning. To this end, we gradually increase the complexity of a model from simple single-scale saliency maps computed on grayscale videos to spatiotemporal multiscale and multispectral representations. Using a large collection of eye movements on high-resolution videos, supervised learning techniques fine-tune the free parameters whose addition is inevitable with increasing complexity. The proposed model, although very simple, demonstrates significant improvement in predicting salient locations in naturalistic videos over four selected baseline models and two distinct data labeling scenarios.
Collapse
Affiliation(s)
- Eleonora Vig
- Institute for Neuro- and Bioinformatics, University of Lübeck, Ratzeburger Allee 160, Lübeck D-23538, Germany.
| | | | | | | |
Collapse
|
73
|
Strasburger H, Rentschler I, Jüttner M. Peripheral vision and pattern recognition: a review. J Vis 2011; 11:13. [PMID: 22207654 PMCID: PMC11073400 DOI: 10.1167/11.5.13] [Citation(s) in RCA: 346] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Accepted: 09/06/2011] [Indexed: 11/24/2022] Open
Abstract
We summarize the various strands of research on peripheral vision and relate them to theories of form perception. After a historical overview, we describe quantifications of the cortical magnification hypothesis, including an extension of Schwartz's cortical mapping function. The merits of this concept are considered across a wide range of psychophysical tasks, followed by a discussion of its limitations and the need for non-spatial scaling. We also review the eccentricity dependence of other low-level functions including reaction time, temporal resolution, and spatial summation, as well as perimetric methods. A central topic is then the recognition of characters in peripheral vision, both at low and high levels of contrast, and the impact of surrounding contours known as crowding. We demonstrate how Bouma's law, specifying the critical distance for the onset of crowding, can be stated in terms of the retinocortical mapping. The recognition of more complex stimuli, like textures, faces, and scenes, reveals a substantial impact of mid-level vision and cognitive factors. We further consider eccentricity-dependent limitations of learning, both at the level of perceptual learning and pattern category learning. Generic limitations of extrafoveal vision are observed for the latter in categorization tasks involving multiple stimulus classes. Finally, models of peripheral form vision are discussed. We report that peripheral vision is limited with regard to pattern categorization by a distinctly lower representational complexity and processing speed. Taken together, the limitations of cognitive processing in peripheral vision appear to be as significant as those imposed on low-level functions and by way of crowding.
Collapse
Affiliation(s)
- Hans Strasburger
- Institut für Medizinische Psychologie, Ludwig-Maximilians-Universität, München, Germany
| | - Ingo Rentschler
- Institut für Medizinische Psychologie, Ludwig-Maximilians-Universität, München, Germany
| | - Martin Jüttner
- Department of Psychology, School of Life & Health Sciences, Aston University, Birmingham, UK
| |
Collapse
|
74
|
Lu Z, Lee S. Probabilistic 3D object recognition and pose estimation using multiple interpretations generation. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2011; 28:2607-2618. [PMID: 22193274 DOI: 10.1364/josaa.28.002607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This paper presents a probabilistic object recognition and pose estimation method using multiple interpretation generation in cluttered indoor environments. How to handle pose ambiguity and uncertainty is the main challenge in most recognition systems. In order to solve this problem, we approach it in a probabilistic manner. First, given a three-dimensional (3D) polyhedral object model, the parallel and perpendicular line pairs, which are detected from stereo images and 3D point clouds, generate pose hypotheses as multiple interpretations, with ambiguity from partial occlusion and fragmentation of 3D lines especially taken into account. Different from the previous methods, each pose interpretation is represented as a region instead of a point in pose space reflecting the measurement uncertainty. Then, for each pose interpretation, more features around the estimated pose are further utilized as additional evidence for computing the probability using the Bayesian principle in terms of likelihood and unlikelihood. Finally, fusion strategy is applied to the top ranked interpretations with high probabilities, which are further verified and refined to give a more accurate pose estimation in real time. The experimental results show the performance and potential of the proposed approach in real cluttered domestic environments.
Collapse
Affiliation(s)
- Zhaojin Lu
- School of Information and Communication Engineering, Sungkyunkwan University, Seoul, South Korea
| | | |
Collapse
|
75
|
|
76
|
Xu J, Yang Z, Tsien JZ. Emergence of visual saliency from natural scenes via context-mediated probability distributions coding. PLoS One 2010; 5:e15796. [PMID: 21209963 PMCID: PMC3012104 DOI: 10.1371/journal.pone.0015796] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 11/23/2010] [Indexed: 11/19/2022] Open
Abstract
Visual saliency is the perceptual quality that makes some items in visual scenes stand out from their immediate contexts. Visual saliency plays important roles in natural vision in that saliency can direct eye movements, deploy attention, and facilitate tasks like object detection and scene understanding. A central unsolved issue is: What features should be encoded in the early visual cortex for detecting salient features in natural scenes? To explore this important issue, we propose a hypothesis that visual saliency is based on efficient encoding of the probability distributions (PDs) of visual variables in specific contexts in natural scenes, referred to as context-mediated PDs in natural scenes. In this concept, computational units in the model of the early visual system do not act as feature detectors but rather as estimators of the context-mediated PDs of a full range of visual variables in natural scenes, which directly give rise to a measure of visual saliency of any input stimulus. To test this hypothesis, we developed a model of the context-mediated PDs in natural scenes using a modified algorithm for independent component analysis (ICA) and derived a measure of visual saliency based on these PDs estimated from a set of natural scenes. We demonstrated that visual saliency based on the context-mediated PDs in natural scenes effectively predicts human gaze in free-viewing of both static and dynamic natural scenes. This study suggests that the computation based on the context-mediated PDs of visual variables in natural scenes may underlie the neural mechanism in the early visual cortex for detecting salient features in natural scenes.
Collapse
Affiliation(s)
- Jinhua Xu
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Computer Science and Technology, East China Normal University, Shanghai, China
| | - Zhiyong Yang
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Ophthalmology, Georgia Health Sciences University, Augusta, Georgia, United States of America
| | - Joe Z. Tsien
- Brain and Behavior Discovery Institute, Georgia Health Sciences University, Augusta, Georgia, United States of America
- Department of Neurology, Georgia Health Sciences University, Augusta, Georgia, United States of America
| |
Collapse
|