1
|
Anil Meera A, Novicky F, Parr T, Friston K, Lanillos P, Sajid N. Reclaiming saliency: Rhythmic precision-modulated action and perception. Front Neurorobot 2022; 16:896229. [PMID: 35966370 PMCID: PMC9368584 DOI: 10.3389/fnbot.2022.896229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Computational models of visual attention in artificial intelligence and robotics have been inspired by the concept of a saliency map. These models account for the mutual information between the (current) visual information and its estimated causes. However, they fail to consider the circular causality between perception and action. In other words, they do not consider where to sample next, given current beliefs. Here, we reclaim salience as an active inference process that relies on two basic principles: uncertainty minimization and rhythmic scheduling. For this, we make a distinction between attention and salience. Briefly, we associate attention with precision control, i.e., the confidence with which beliefs can be updated given sampled sensory data, and salience with uncertainty minimization that underwrites the selection of future sensory data. Using this, we propose a new account of attention based on rhythmic precision-modulation and discuss its potential in robotics, providing numerical experiments that showcase its advantages for state and noise estimation, system identification and action selection for informative path planning.
Collapse
Affiliation(s)
- Ajith Anil Meera
- Department of Cognitive Robotics, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Delft, Netherlands
- *Correspondence: Ajith Anil Meera
| | - Filip Novicky
- Department of Neurophysiology, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
- Filip Novicky
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Pablo Lanillos
- Department of Artificial Intelligence, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Noor Sajid
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
2
|
de Santana Correia A, Colombini EL. Attention, please! A survey of neural attention models in deep learning. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10148-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
3
|
Mount J, Xu M, Dawes L, Milford M. Unsupervised Selection of Optimal Operating Parameters for Visual Place Recognition Algorithms Using Gaussian Mixture Models. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2020.3043171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
4
|
|
5
|
Visual Saliency Detection Using a Rule-Based Aggregation Approach. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9102015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we propose an approach for salient pixel detection using a rule-based system. In our proposal, rules are automatically learned by combining four saliency models. The learned rules are utilized for the detection of pixels of the salient object in a visual scene. The proposed methodology consists of two main stages. Firstly, in the training stage, the knowledge extracted from outputs of four state-of-the-art saliency models is used to induce an ensemble of rough-set-based rules. Secondly, the induced rules are utilized by our system to determine, in a binary manner, the pixels corresponding to the salient object within a scene. Being independent of any threshold value, such a method eliminates any midway uncertainty and exempts us from performing a post-processing step as is required in most approaches to saliency detection. The experimental results on three datasets show that our method obtains stable and better results than state-of-the-art models. Moreover, it can be used as a pre-processing stage in computer vision-based applications in diverse areas such as robotics, image segmentation, marketing, and image compression.
Collapse
|
6
|
Utility function generated saccade strategies for robot active vision: a probabilistic approach. Auton Robots 2019. [DOI: 10.1007/s10514-018-9752-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
7
|
|
8
|
Dodge S, Karam L. Visual Saliency Prediction Using a Mixture of Deep Neural Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2018; 27:4080-4090. [PMID: 29993885 DOI: 10.1109/tip.2018.2834826] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual saliency models have recently begun to incorporate deep learning to achieve predictive capacity much greater than previous unsupervised methods. However, most existing models predict saliency without explicit knowledge of global scene semantic information. We propose a model (MxSalNet) that incorporates global scene semantic information in addition to local information gathered by a convolutional neural network. Our model is formulated as a mixture of experts. Each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' output, with weights determined by a separate gating network. This gating network is guided by global scene information to predict weights. The expert networks and the gating network are trained simultaneously in an end-toend manner. We show that our mixture formulation leads to improvement in performance over an otherwise identical nonmixture model that does not incorporate global scene information. Additionally, we show that our model achieves better performance than several other visual saliency models.
Collapse
|
9
|
|
10
|
Abstract
SUMMARYThis paper presents a novel system for human–robot interaction in object-grasping applications. Consisting of an RGB-D camera, a projector and a robot manipulator, the proposed system provides intuitive information to the human by analyzing the scene, detecting graspable objects and directly projecting numbers or symbols in front of objects. Objects are detected using a visual attention model that incorporates color, shape and depth information. The positions and orientations of the projected numbers are based on the shapes, positions and orientations of the corresponding objects. Users select a grasping target by indicating the corresponding number. Projected arrows are then created on the fly to guide a robotic arm to grasp the selected object using visual servoing and deliver the object to the human user. Experimental results are presented to demonstrate how the system is used in robot grasping tasks.
Collapse
|
11
|
|
12
|
Mu B, Paull L, Agha-Mohammadi AA, Leonard JJ, How JP. Two-Stage Focused Inference for Resource-Constrained Minimal Collision Navigation. IEEE T ROBOT 2017. [DOI: 10.1109/tro.2016.2623344] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
13
|
Becerra I, Valentín-Coronado LM, Murrieta-Cid R, Latombe JC. Reliable confirmation of an object identity by a mobile robot: A mixed appearance/localization-driven motion approach. Int J Rob Res 2016. [DOI: 10.1177/0278364915620848] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This paper investigates the problem of confirming the identity of a candidate object (expected to be a target based on some crude visual clues) with a mobile robot equipped with visual sensing capabilities. We present a method whose main novelty is to mix localization of the robot relative to the candidate object and to confirm that it is the sought target. This twofold approach drastically reduces false positives. Identity confirmation with this twofold goal is modeled as a Partially-Observable Markov Decision Process, where the states are the cells of the space decomposition. It is solved using Stochastic Dynamic Programming with imperfect state information. A robotic system using this method has been implemented and tests have been carried out both in simulation and with a real robot. The experiments empirically validate the use of various metrics, and demonstrate their ability to perform well in different settings.
Collapse
Affiliation(s)
| | | | | | - Jean-Claude Latombe
- Artificial Intelligence Laboratory, Computer
Science Department, Stanford University, USA
| |
Collapse
|
14
|
Tao D, Cheng J, Song M, Lin X. Manifold Ranking-Based Matrix Factorization for Saliency Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1122-1134. [PMID: 26277008 DOI: 10.1109/tnnls.2015.2461554] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Saliency detection is used to identify the most important and informative area in a scene, and it is widely used in various vision tasks, including image quality assessment, image matching, and object recognition. Manifold ranking (MR) has been used to great effect for the saliency detection, since it not only incorporates the local spatial information but also utilizes the labeling information from background queries. However, MR completely ignores the feature information extracted from each superpixel. In this paper, we propose an MR-based matrix factorization (MRMF) method to overcome this limitation. MRMF models the ranking problem in the matrix factorization framework and embeds query sample labels in the coefficients. By incorporating spatial information and embedding labels, MRMF enforces similar saliency values on neighboring superpixels and ranks superpixels according to the learned coefficients. We prove that the MRMF has good generalizability, and develops an efficient optimization algorithm based on the Nesterov method. Experiments using popular benchmark data sets illustrate the promise of MRMF compared with the other state-of-the-art saliency detection methods.
Collapse
|
15
|
Lu Y, Song D. Visual Navigation Using Heterogeneous Landmarks and Unsupervised Geometric Constraints. IEEE T ROBOT 2015. [DOI: 10.1109/tro.2015.2424032] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
16
|
|
17
|
Catenacci Volpi N, Quinton JC, Pezzulo G. How active perception and attractor dynamics shape perceptual categorization: a computational model. Neural Netw 2014; 60:1-16. [PMID: 25105744 DOI: 10.1016/j.neunet.2014.06.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2013] [Revised: 06/21/2014] [Accepted: 06/22/2014] [Indexed: 10/25/2022]
Abstract
We propose a computational model of perceptual categorization that fuses elements of grounded and sensorimotor theories of cognition with dynamic models of decision-making. We assume that category information consists in anticipated patterns of agent-environment interactions that can be elicited through overt or covert (simulated) eye movements, object manipulation, etc. This information is firstly encoded when category information is acquired, and then re-enacted during perceptual categorization. The perceptual categorization consists in a dynamic competition between attractors that encode the sensorimotor patterns typical of each category; action prediction success counts as "evidence" for a given category and contributes to falling into the corresponding attractor. The evidence accumulation process is guided by an active perception loop, and the active exploration of objects (e.g., visual exploration) aims at eliciting expected sensorimotor patterns that count as evidence for the object category. We present a computational model incorporating these elements and describing action prediction, active perception, and attractor dynamics as key elements of perceptual categorizations. We test the model in three simulated perceptual categorization tasks, and we discuss its relevance for grounded and sensorimotor theories of cognition.
Collapse
Affiliation(s)
- Nicola Catenacci Volpi
- School of Computer Science, Adaptive Systems Research Group University of Hertfordshire, Collage Lane Campus, College Ln, Hatfield, Hertfordshire AL10 9AB, United Kingdom.
| | - Jean Charles Quinton
- Clermont University, Blaise Pascal University, Pascal Institute, BP 10448, F-63000 Clermont-Ferrand, France; CNRS, UMR 6602, Pascal Institute, F-63171 Aubiere, France.
| | - Giovanni Pezzulo
- Istituto di Scienze e Tecnologie della Cognizione - CNR, Via S. Martino della Battaglia, 44 - 00185 Rome, Italy.
| |
Collapse
|
18
|
An adaptive scheme for robot localization and mapping with dynamically configurable inter-beacon range measurements. SENSORS 2014; 14:7684-710. [PMID: 24776938 PMCID: PMC4063007 DOI: 10.3390/s140507684] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Revised: 04/10/2014] [Accepted: 04/11/2014] [Indexed: 11/17/2022]
Abstract
This work is motivated by robot-sensor network cooperation techniques where sensor nodes (beacons) are used as landmarks for range-only (RO) simultaneous localization and mapping (SLAM). This paper presents a RO-SLAM scheme that actuates over the measurement gathering process using mechanisms that dynamically modify the rate and variety of measurements that are integrated in the SLAM filter. It includes a measurement gathering module that can be configured to collect direct robot-beacon and inter-beacon measurements with different inter-beacon depth levels and at different rates. It also includes a supervision module that monitors the SLAM performance and dynamically selects the measurement gathering configuration balancing SLAM accuracy and resource consumption. The proposed scheme has been applied to an extended Kalman filter SLAM with auxiliary particle filters for beacon initialization (PF-EKF SLAM) and validated with experiments performed in the CONET Integrated Testbed. It achieved lower map and robot errors (34% and 14%, respectively) than traditional methods with a lower computational burden (16%) and similar beacon energy consumption.
Collapse
|
19
|
Choi H, Kim R, Kim E. An Efficient Ceiling-view SLAM Using Relational Constraints Between Landmarks. INT J ADV ROBOT SYST 2014. [DOI: 10.5772/57225] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
In this paper, we present a new indoor 'simultaneous localization and mapping‘ (SLAM) technique based on an upward-looking ceiling camera. Adapted from our previous work [ 17 ], the proposed method employs sparsely-distributed line and point landmarks in an indoor environment to aid with data association and reduce extended Kalman filter computation as compared with earlier techniques. Further, the proposed method exploits geometric relationships between the two types of landmarks to provide added information about the environment. This geometric information is measured with an upward-looking ceiling camera and is used as a constraint in Kalman filtering. The performance of the proposed ceiling-view (CV) SLAM is demonstrated through simulations and experiments. The proposed method performs localization and mapping more accurately than those methods that use the two types of landmarks without taking into account their relative geometries.
Collapse
Affiliation(s)
- Hyukdoo Choi
- School of Electrical and Electronic Engineering at Yonsei University
| | - Ryunseok Kim
- School of Electrical and Electronic Engineering at Yonsei University
| | - Euntai Kim
- School of Electrical and Electronic Engineering at Yonsei University
| |
Collapse
|
20
|
|
21
|
RETRACTED ARTICLE: From the human visual system to the computational models of visual attention: a survey. Artif Intell Rev 2013. [DOI: 10.1007/s10462-012-9385-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
22
|
Borji A, Itti L. State-of-the-art in visual attention modeling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:185-207. [PMID: 22487985 DOI: 10.1109/tpami.2012.89] [Citation(s) in RCA: 437] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Modeling visual attention--particularly stimulus-driven, saliency-based attention--has been a very active research area over the past 25 years. Many different models of attention are now available which, aside from lending theoretical contributions to other fields, have demonstrated successful applications in computer vision, mobile robotics, and cognitive systems. Here we review, from a computational perspective, the basic concepts of attention implemented in these models. We present a taxonomy of nearly 65 models, which provides a critical comparison of approaches, their capabilities, and shortcomings. In particular, 13 criteria derived from behavioral and computational studies are formulated for qualitative comparison of attention models. Furthermore, we address several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures. Finally, we highlight current research trends in attention modeling and provide insights for future.
Collapse
Affiliation(s)
- Ali Borji
- Department of Computer Science, University of Southern California, 3641 Watt Way, Los Angeles, CA 90089, USA.
| | | |
Collapse
|
23
|
|
24
|
Choi H, Kim DY, Hwang JP, Park CW, Kim E. Efficient Simultaneous Localization and Mapping Based on Ceiling-View: Ceiling Boundary Feature Map Approach. Adv Robot 2012. [DOI: 10.1163/156855311x617542] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Hyukdoo Choi
- a School of Electrical and Electronic Engineering, Yonsei University, C613, Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea
| | - Dong Yeop Kim
- b Korea Electronics Technology Institute, Bucheon Technopark, 4 Danji Apartments, Yakdae-dong, Wonmi-gu, Bucheon-si, Gyeonggi-do 420-734, South Korea
| | - Jae Pil Hwang
- c School of Electrical and Electronic Engineering, Yonsei University, C613, Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea
| | - Chang-Woo Park
- d Korea Electronics Technology Institute, Bucheon Technopark, 4 Danji Apartments, Yakdae-dong, Wonmi-gu, Bucheon-si, Gyeonggi-do 420-734, South Korea
| | - Euntai Kim
- e School of Electrical and Electronic Engineering, Yonsei University, C613, Sinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea;,
| |
Collapse
|
25
|
Lee YJ, Song JB. Autonomous Salient Feature Detection through Salient Cues in an HSV Color Space for Visual Indoor Simultaneous Localization and Mapping. Adv Robot 2012. [DOI: 10.1163/016918610x512613] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yong-Ju Lee
- a Department of Mechanical Engineering, Korea University, 5-ga, Anam-dong, Seongbuk-gu, Seoul 136-713, South Korea
| | - Jae-Bok Song
- b Department of Mechanical Engineering, Korea University, 5-ga, Anam-dong, Seongbuk-gu, Seoul 136-713, South Korea
| |
Collapse
|
26
|
Abstract
In this paper we provide a broad survey of developments in active vision in robotic applications over the last 15 years. With increasing demand for robotic automation, research in this area has received much attention. Among the many factors that can be attributed to a high-performance robotic system, the planned sensing or acquisition of perceptions on the operating environment is a crucial component. The aim of sensor planning is to determine the pose and settings of vision sensors for undertaking a vision-based task that usually requires obtaining multiple views of the object to be manipulated. Planning for robot vision is a complex problem for an active system due to its sensing uncertainty and environmental uncertainty. This paper describes such problems arising from many applications, e.g. object recognition and modeling, site reconstruction and inspection, surveillance, tracking and search, as well as robotic manipulation and assembly, localization and mapping, navigation and exploration. A bundle of solutions and methods have been proposed to solve these problems in the past. They are summarized in this review while enabling readers to easily refer solution methods for practical applications. Representative contributions, their evaluations, analyses, and future research trends are also addressed in an abstract level.
Collapse
|
27
|
|