1
|
Bergoin R, Boucenna S, D'Urso R, Cohen D, Pitti A. A developmental model of audio-visual attention (MAVA) for bimodal language learning in infants and robots. Sci Rep 2024; 14:20492. [PMID: 39242623 PMCID: PMC11379723 DOI: 10.1038/s41598-024-69245-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 08/02/2024] [Indexed: 09/09/2024] Open
Abstract
A social individual needs to effectively manage the amount of complex information in his or her environment relative to his or her own purpose to obtain relevant information. This paper presents a neural architecture aiming to reproduce attention mechanisms (alerting/orienting/selecting) that are efficient in humans during audiovisual tasks in robots. We evaluated the system based on its ability to identify relevant sources of information on faces of subjects emitting vowels. We propose a developmental model of audio-visual attention (MAVA) combining Hebbian learning and a competition between saliency maps based on visual movement and audio energy. MAVA effectively combines bottom-up and top-down information to orient the system toward pertinent areas. The system has several advantages, including online and autonomous learning abilities, low computation time and robustness to environmental noise. MAVA outperforms other artificial models for detecting speech sources under various noise conditions.
Collapse
Affiliation(s)
- Raphaël Bergoin
- ETIS, UMR 8051, ENSEA, CY Cergy Paris Université, CNRS, Cergy-Pontoise, France
| | - Sofiane Boucenna
- ETIS, UMR 8051, ENSEA, CY Cergy Paris Université, CNRS, Cergy-Pontoise, France.
| | - Raphaël D'Urso
- ETIS, UMR 8051, ENSEA, CY Cergy Paris Université, CNRS, Cergy-Pontoise, France
| | - David Cohen
- Service de Psychiatrie de l'Enfant et de l'Adolescent, Hôpital Pitié-Salpêtrière, AP-HP, Paris, France
- Institut des Systèmes Intelligents et de Robotiques, Université Pierre et Marie Curie, Paris, France
| | - Alexandre Pitti
- ETIS, UMR 8051, ENSEA, CY Cergy Paris Université, CNRS, Cergy-Pontoise, France
| |
Collapse
|
2
|
Anil Meera A, Novicky F, Parr T, Friston K, Lanillos P, Sajid N. Reclaiming saliency: Rhythmic precision-modulated action and perception. Front Neurorobot 2022; 16:896229. [PMID: 35966370 PMCID: PMC9368584 DOI: 10.3389/fnbot.2022.896229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Computational models of visual attention in artificial intelligence and robotics have been inspired by the concept of a saliency map. These models account for the mutual information between the (current) visual information and its estimated causes. However, they fail to consider the circular causality between perception and action. In other words, they do not consider where to sample next, given current beliefs. Here, we reclaim salience as an active inference process that relies on two basic principles: uncertainty minimization and rhythmic scheduling. For this, we make a distinction between attention and salience. Briefly, we associate attention with precision control, i.e., the confidence with which beliefs can be updated given sampled sensory data, and salience with uncertainty minimization that underwrites the selection of future sensory data. Using this, we propose a new account of attention based on rhythmic precision-modulation and discuss its potential in robotics, providing numerical experiments that showcase its advantages for state and noise estimation, system identification and action selection for informative path planning.
Collapse
Affiliation(s)
- Ajith Anil Meera
- Department of Cognitive Robotics, Faculty of Mechanical, Maritime and Materials Engineering, Delft University of Technology, Delft, Netherlands
- *Correspondence: Ajith Anil Meera
| | - Filip Novicky
- Department of Neurophysiology, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
- Filip Novicky
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| | - Pablo Lanillos
- Department of Artificial Intelligence, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Noor Sajid
- Wellcome Centre for Human Neuroimaging, University College London, London, United Kingdom
| |
Collapse
|
3
|
Aarthi R, Amudha J. Weight modulation in top–down computational model for target search. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Computer vision research aims at building models which mimic human systems. The recent development in visual information have been used to derive computational models which address a variety of applications. Biological models help to identify the salient objects in the image. But, the identification of non-salient objects in a heterogeneous environment is a challenging task that requires a better understanding of the visual system. In this work, a weight modulation based top-down model is proposed that integrates the visual features that depend on its importance for the target search application. The model is designed to learn the optimal weights such that it biases the features of the target from the other surrounding regions. Experimental analysis is performed on various scenes on a standard dataset with the selected object in the scene. Metrics such as area under curve, average hit number and correlation reveal that the method is more suitable in target identification, by suppressing the other region.
Collapse
Affiliation(s)
- R. Aarthi
- Department of Computer Science and Engineering, Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India
| | - J. Amudha
- Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
| |
Collapse
|
4
|
Mantelli M, Pittol D, Maffei R, Torresen J, Prestes E, Kolberg M. Semantic Active Visual Search System Based on Text Information for Large and Unknown Environments. J INTELL ROBOT SYST 2021; 101:32. [PMID: 33519083 PMCID: PMC7825386 DOI: 10.1007/s10846-020-01298-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 12/10/2020] [Indexed: 12/05/2022]
Abstract
Different high-level robotics tasks require the robot to manipulate or interact with objects that are in an unexplored part of the environment or not already in its field of view. Although much works rely on searching for objects based on their colour or 3D context, we argue that text information is a useful and functional visual cue to guide the search. In this paper, we study the problem of active visual search (AVS) in large unknown environments. In this paper, we present an AVS system that relies on semantic information inferred from texts found in the environment, which allows the robot to reduce the search costs by avoiding not promising regions for the target object. Our semantic planner reasons over the numbers detected from door signs to decide either perform a goal-directed exploration towards unknown parts of the environment or carefully search in the already known parts. We compared the performance of our semantic AVS system with two other search systems in four simulated environments. First, we developed a greedy search system that does not consider any semantic information, and second, we invited human participants to teleoperate the robot while performing the search. Our results from simulation and real-world experiments show that text is a promising source of information that provides different semantic cues for AVS systems.
Collapse
Affiliation(s)
- Mathias Mantelli
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Diego Pittol
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Renan Maffei
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Jim Torresen
- Institute of Informatics, University of Oslo, Oslo, Norway
| | - Edson Prestes
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| | - Mariana Kolberg
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil
| |
Collapse
|
5
|
Learning to Perform Visual Tasks from Human Demonstrations. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1007/978-3-030-31321-0_30] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
6
|
Rast AD, Adams SV, Davidson S, Davies S, Hopkins M, Rowley A, Stokes AB, Wennekers T, Furber S, Cangelosi A. Behavioral Learning in a Cognitive Neuromorphic Robot: An Integrative Approach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:6132-6144. [PMID: 29994007 DOI: 10.1109/tnnls.2018.2816518] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present here a learning system using the iCub humanoid robot and the SpiNNaker neuromorphic chip to solve the real-world task of object-specific attention. Integrating spiking neural networks with robots introduces considerable complexity for questionable benefit if the objective is simply task performance. But, we suggest, in a cognitive robotics context, where the goal is understanding how to compute, such an approach may yield useful insights to neural architecture as well as learned behavior, especially if dedicated neural hardware is available. Recent advances in cognitive robotics and neuromorphic processing now make such systems possible. Using a scalable, structured, modular approach, we build a spiking neural network where the effects and impact of learning can be predicted and tested, and the network can be scaled or extended to new tasks automatically. We introduce several enhancements to a basic network and show how they can be used to direct performance toward behaviorally relevant goals. Results show that using a simple classical spike-timing-dependent plasticity (STDP) rule on selected connections, we can get the robot (and network) to progress from poor task-specific performance to good performance. Behaviorally relevant STDP appears to contribute strongly to positive learning: "do this" but less to negative learning: "don't do that." In addition, we observe that the effect of structural enhancements tends to be cumulative. The overall system suggests that it is by being able to exploit combinations of effects, rather than any one effect or property in isolation, that spiking networks can achieve compelling, task-relevant behavior.
Collapse
|
7
|
Zhao D, Chen Y, Lv L. Deep Reinforcement Learning With Visual Attention for Vehicle Classification. IEEE Trans Cogn Dev Syst 2017. [DOI: 10.1109/tcds.2016.2614675] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
8
|
Potapova E, Zillich M, Vincze M. Survey of recent advances in 3D visual attention for robotics. Int J Rob Res 2017. [DOI: 10.1177/0278364917726587] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Ekaterina Potapova
- Automation and Control Institute, Vienna University of Technology, Austria
| | - Michael Zillich
- Automation and Control Institute, Vienna University of Technology, Austria
| | - Markus Vincze
- Automation and Control Institute, Vienna University of Technology, Austria
| |
Collapse
|
9
|
A salient region detection model combining background distribution measure for indoor robots. PLoS One 2017; 12:e0180519. [PMID: 28742089 PMCID: PMC5524399 DOI: 10.1371/journal.pone.0180519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Accepted: 06/17/2017] [Indexed: 11/19/2022] Open
Abstract
Vision system plays an important role in the field of indoor robot. Saliency detection methods, capturing regions that are perceived as important, are used to improve the performance of visual perception system. Most of state-of-the-art methods for saliency detection, performing outstandingly in natural images, cannot work in complicated indoor environment. Therefore, we propose a new method comprised of graph-based RGB-D segmentation, primary saliency measure, background distribution measure, and combination. Besides, region roundness is proposed to describe the compactness of a region to measure background distribution more robustly. To validate the proposed approach, eleven influential methods are compared on the DSD and ECSSD dataset. Moreover, we build a mobile robot platform for application in an actual environment, and design three different kinds of experimental constructions that are different viewpoints, illumination variations and partial occlusions. Experimental results demonstrate that our model outperforms existing methods and is useful for indoor mobile robots.
Collapse
|
10
|
de Figueiredo RP, Bernardino A, Santos-Victor J, Araújo H. On the advantages of foveal mechanisms for active stereo systems in visual search tasks. Auton Robots 2017. [DOI: 10.1007/s10514-017-9617-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
A Novel Saliency Prediction Method Based on Fast Radial Symmetry Transform and Its Generalization. Cognit Comput 2016. [DOI: 10.1007/s12559-016-9406-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
12
|
|
13
|
Liu Z, Xu S, Zhang Y, Chen CLP. A multiple-feature and multiple-kernel scene segmentation algorithm for humanoid robot. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:2232-2241. [PMID: 25248211 DOI: 10.1109/tsmc.2013.2297398] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This technical correspondence presents a multiple-feature and multiple-kernel support vector machine (MFMK-SVM) methodology to achieve a more reliable and robust segmentation performance for humanoid robot. The pixel wise intensity, gradient, and C1 SMF features are extracted via the local homogeneity model and Gabor filter, which would be used as inputs of MFMK-SVM model. It may provide multiple features of the samples for easier implementation and efficient computation of MFMK-SVM model. A new clustering method, which is called feature validity-interval type-2 fuzzy C-means (FV-IT2FCM) clustering algorithm, is proposed by integrating a type-2 fuzzy criterion in the clustering optimization process to improve the robustness and reliability of clustering results by the iterative optimization. Furthermore, the clustering validity is employed to select the training samples for the learning of the MFMK-SVM model. The MFMK-SVM scene segmentation method is able to fully take advantage of the multiple features of scene image and the ability of multiple kernels. Experiments on the BSDS dataset and real natural scene images demonstrate the superior performance of our proposed method.
Collapse
|
14
|
Boccignone G, Ferraro M. Ecological sampling of gaze shifts. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:266-279. [PMID: 23757548 DOI: 10.1109/tcyb.2013.2253460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Visual attention guides our gaze to relevant parts of the viewed scene, yet the moment-to-moment relocation of gaze can be different among observers even though the same locations are taken into account. Surprisingly, the variability of eye movements has been so far overlooked by the great majority of computational models of visual attention. In this paper we present the ecological sampling model, a stochastic model of eye guidance explaining such variability. The gaze shift mechanism is conceived as an active random sampling that the foraging eye carries out upon the visual landscape, under the constraints set by the observable features and the global complexity of the landscape. By drawing on results reported in the foraging literature, the actual gaze relocation is eventually driven by a stochastic differential equation whose noise source is sampled from a mixture of α-stable distributions. This way, the sampling strategy proposed here allows to mimic a fundamental property of the eye guidance mechanism: where we choose to look next at any given moment in time, it is not completely deterministic, but neither is it completely random To show that the model yields gaze shift motor behaviors that exhibit statistics similar to those displayed by human observers, we compare simulation outputs with those obtained from eye-tracked subjects while viewing complex dynamic scenes.
Collapse
|
15
|
Liang J, Yuen SY. An edge detection with automatic scale selection approach to improve coherent visual attention model. Pattern Recognit Lett 2013. [DOI: 10.1016/j.patrec.2013.06.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
16
|
Interval type-2 fuzzy kernel based support vector machine algorithm for scene classification of humanoid robot. Soft comput 2013. [DOI: 10.1007/s00500-013-1080-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
17
|
Yücel Z, Salah AA, Meriçli Ç, Meriçli T, Valenti R, Gevers T. Joint attention by gaze interpolation and saliency. IEEE TRANSACTIONS ON CYBERNETICS 2013; 43:829-842. [PMID: 23047879 DOI: 10.1109/tsmcb.2012.2216979] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Joint attention, which is the ability of coordination of a common point of reference with the communicating party, emerges as a key factor in various interaction scenarios. This paper presents an image-based method for establishing joint attention between an experimenter and a robot. The precise analysis of the experimenter's eye region requires stability and high-resolution image acquisition, which is not always available. We investigate regression-based interpolation of the gaze direction from the head pose of the experimenter, which is easier to track. Gaussian process regression and neural networks are contrasted to interpolate the gaze direction. Then, we combine gaze interpolation with image-based saliency to improve the target point estimates and test three different saliency schemes. We demonstrate the proposed method on a human-robot interaction scenario. Cross-subject evaluations, as well as experiments under adverse conditions (such as dimmed or artificial illumination or motion blur), show that our method generalizes well and achieves rapid gaze estimation for establishing joint attention.
Collapse
Affiliation(s)
- Zeynep Yücel
- Intelligent Robotics and Communication Laboratories, Advanced Telecommunications Research Institute International, Kyoto 619-0288, Japan.
| | | | | | | | | | | |
Collapse
|