1
|
Munguia-Galeano F, Tan AH, Ji Z. Deep Reinforcement Learning With Explicit Context Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:419-432. [PMID: 37906492 DOI: 10.1109/tnnls.2023.3325633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Though reinforcement learning (RL) has shown an outstanding capability for solving complex computational problems, most RL algorithms lack an explicit method that would allow learning from contextual information. On the other hand, humans often use context to identify patterns and relations among elements in the environment, along with how to avoid making wrong actions. However, what may seem like an obviously wrong decision from a human perspective could take hundreds of steps for an RL agent to learn to avoid. This article proposes a framework for discrete environments called Iota explicit context representation (IECR). The framework involves representing each state using contextual key frames (CKFs), which can then be used to extract a function that represents the affordances of the state; in addition, two loss functions are introduced with respect to the affordances of the state. The novelty of the IECR framework lies in its capacity to extract contextual information from the environment and learn from the CKFs' representation. We validate the framework by developing four new algorithms that learn using context: Iota deep Q-network (IDQN), Iota double deep Q-network (IDDQN), Iota dueling deep Q-network (IDuDQN), and Iota dueling double deep Q-network (IDDDQN). Furthermore, we evaluate the framework and the new algorithms in five discrete environments. We show that all the algorithms, which use contextual information, converge in around 40000 training steps of the neural networks, significantly outperforming their state-of-the-art equivalents.
Collapse
|
2
|
Nguyen HS, Cruz F, Dazeley R. Towards a Broad-Persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments. SENSORS (BASEL, SWITZERLAND) 2023; 23:2681. [PMID: 36904885 PMCID: PMC10007476 DOI: 10.3390/s23052681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/27/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use, which causes a duplicate process at the same state for a revisit. In this paper, we present Broad-Persistent Advising (BPA), an approach that retains and reuses the processed information. It not only helps trainers give more general advice relevant to similar states instead of only the current state, but also allows the agent to speed up the learning process. We tested the proposed approach in two continuous robotic scenarios, namely a cart pole balancing task and a simulated robot navigation task. The results demonstrated that the agent's learning speed increased, as evidenced by the rising reward points of up to 37%, while maintaining the number of interactions required for the trainer, in comparison to the DeepIRL approach.
Collapse
Affiliation(s)
- Hung Son Nguyen
- School of Information Technology, Deakin University, Geelong 3220, Australia
| | - Francisco Cruz
- School of Computer Science and Engineering, University of New South Wales, Sydney 2052, Australia
- Escuela de Ingeniería, Universidad Central de Chile, Santiago 8330601, Chile
| | - Richard Dazeley
- School of Information Technology, Deakin University, Geelong 3220, Australia
| |
Collapse
|
3
|
Harnack D, Pivin-Bachler J, Navarro-Guerrero N. Quantifying the effect of feedback frequency in interactive reinforcement learning for robotic tasks. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07949-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022]
Abstract
AbstractReinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent’s proficiency in the task increases.
Collapse
|
4
|
Andriella A, Torras C, Abdelnour C, Alenyà G. Introducing CARESSER: A framework for in situ learning robot social assistance from expert knowledge and demonstrations. USER MODELING AND USER-ADAPTED INTERACTION 2022; 33:441-496. [PMID: 35311217 PMCID: PMC8916953 DOI: 10.1007/s11257-021-09316-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 11/28/2021] [Indexed: 06/14/2023]
Abstract
Socially assistive robots have the potential to augment and enhance therapist's effectiveness in repetitive tasks such as cognitive therapies. However, their contribution has generally been limited as domain experts have not been fully involved in the entire pipeline of the design process as well as in the automatisation of the robots' behaviour. In this article, we present aCtive leARning agEnt aSsiStive bEhaviouR (CARESSER), a novel framework that actively learns robotic assistive behaviour by leveraging the therapist's expertise (knowledge-driven approach) and their demonstrations (data-driven approach). By exploiting that hybrid approach, the presented method enables in situ fast learning, in a fully autonomous fashion, of personalised patient-specific policies. With the purpose of evaluating our framework, we conducted two user studies in a daily care centre in which older adults affected by mild dementia and mild cognitive impairment (N = 22) were requested to solve cognitive exercises with the support of a therapist and later on of a robot endowed with CARESSER. Results showed that: (i) the robot managed to keep the patients' performance stable during the sessions even more so than the therapist; (ii) the assistance offered by the robot during the sessions eventually matched the therapist's preferences. We conclude that CARESSER, with its stakeholder-centric design, can pave the way to new AI approaches that learn by leveraging human-human interactions along with human expertise, which has the benefits of speeding up the learning process, eliminating the need for the design of complex reward functions, and finally avoiding undesired states.
Collapse
Affiliation(s)
- Antonio Andriella
- CSIC-UPC, Institut de Robòtica i Informàtica Industrial, C/Llorens i Artigas 4-6, 08028 Barcelona, Spain
| | - Carme Torras
- CSIC-UPC, Institut de Robòtica i Informàtica Industrial, C/Llorens i Artigas 4-6, 08028 Barcelona, Spain
| | - Carla Abdelnour
- Research Center and Memory Clinic, Fundació ACE, Institut Català de Neurociències Aplicades, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Guillem Alenyà
- CSIC-UPC, Institut de Robòtica i Informàtica Industrial, C/Llorens i Artigas 4-6, 08028 Barcelona, Spain
| |
Collapse
|
5
|
Abstract
The challenge in human–robot interaction is to build an agent that can act upon human implicit statements, where the agent is instructed to execute tasks without explicit utterance. Understanding what to do under such scenarios requires the agent to have the capability to process object grounding and affordance learning from acquired knowledge. Affordance has been the driving force for agents to construct relationships between objects, their effects, and actions, whereas grounding is effective in the understanding of spatial maps of objects present in the environment. The main contribution of this paper is to propose a methodology for the extension of object affordance and grounding, the Bloom-based cognitive cycle, and the formulation of perceptual semantics for the context-based human–robot interaction. In this study, we implemented YOLOv3 to formulate visual perception and LSTM to identify the level of the cognitive cycle, as cognitive processes synchronized in the cognitive cycle. In addition, we used semantic networks and conceptual graphs as a method to represent knowledge in various dimensions related to the cognitive cycle. The visual perception showed average precision of 0.78, an average recall of 0.87, and an average F1 score of 0.80, indicating an improvement in the generation of semantic networks and conceptual graphs. The similarity index used for the lingual and visual association showed promising results and improves the overall experience of human–robot interaction.
Collapse
|
6
|
Singh B, Kumar R, Singh VP. Reinforcement learning in robotic applications: a comprehensive survey. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-09997-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Koert D, Kircher M, Salikutluk V, D'Eramo C, Peters J. Multi-Channel Interactive Reinforcement Learning for Sequential Tasks. Front Robot AI 2021; 7:97. [PMID: 33501264 PMCID: PMC7805623 DOI: 10.3389/frobt.2020.00097] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Accepted: 06/15/2020] [Indexed: 11/13/2022] Open
Abstract
The ability to learn new tasks by sequencing already known skills is an important requirement for future robots. Reinforcement learning is a powerful tool for this as it allows for a robot to learn and improve on how to combine skills for sequential tasks. However, in real robotic applications, the cost of sample collection and exploration prevent the application of reinforcement learning for a variety of tasks. To overcome these limitations, human input during reinforcement can be beneficial to speed up learning, guide the exploration and prevent the choice of disastrous actions. Nevertheless, there is a lack of experimental evaluations of multi-channel interactive reinforcement learning systems solving robotic tasks with input from inexperienced human users, in particular for cases where human input might be partially wrong. Therefore, in this paper, we present an approach that incorporates multiple human input channels for interactive reinforcement learning in a unified framework and evaluate it on two robotic tasks with 20 inexperienced human subjects. To enable the robot to also handle potentially incorrect human input we incorporate a novel concept for self-confidence, which allows the robot to question human input after an initial learning phase. The second robotic task is specifically designed to investigate if this self-confidence can enable the robot to achieve learning progress even if the human input is partially incorrect. Further, we evaluate how humans react to suggestions of the robot, once the robot notices human input might be wrong. Our experimental evaluations show that our approach can successfully incorporate human input to accelerate the learning process in both robotic tasks even if it is partially wrong. However, not all humans were willing to accept the robot's suggestions or its questioning of their input, particularly if they do not understand the learning process and the reasons behind the robot's suggestions. We believe that the findings from this experimental evaluation can be beneficial for the future design of algorithms and interfaces of interactive reinforcement learning systems used by inexperienced users.
Collapse
Affiliation(s)
- Dorothea Koert
- Intelligent Autonomous Systems Group, Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany.,Center for Cognitive Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Maximilian Kircher
- Intelligent Autonomous Systems Group, Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Vildan Salikutluk
- Center for Cognitive Science, Technische Universität Darmstadt, Darmstadt, Germany.,Models of Higher Cognition Group, Department of Psychology, Technische Universität Darmstadt, Darmstadt, Germany
| | - Carlo D'Eramo
- Intelligent Autonomous Systems Group, Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
| | - Jan Peters
- Intelligent Autonomous Systems Group, Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany.,Robot Learning Group, Max Planck Institute for Intelligent Systems, Tübingen, Germany
| |
Collapse
|
8
|
Rath M, Chatterjee JM. Exploration of Information Retrieval Approaches With Focus on Medical Information Retrieval. ONTOLOGY‐BASED INFORMATION RETRIEVAL FOR HEALTHCARE SYSTEMS 2020:275-291. [DOI: 10.1002/9781119641391.ch13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
9
|
Kerzel M, Pekarek-Rosin T, Strahl E, Heinrich S, Wermter S. Teaching NICO How to Grasp: An Empirical Study on Crossmodal Social Interaction as a Key Factor for Robots Learning From Humans. Front Neurorobot 2020; 14:28. [PMID: 32581759 PMCID: PMC7297081 DOI: 10.3389/fnbot.2020.00028] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 04/17/2020] [Indexed: 11/13/2022] Open
Abstract
To overcome novel challenges in complex domestic environments, humanoid robots can learn from human teachers. We propose that the capability for social interaction should be a key factor in this teaching process and benefits both the subjective experience of the human user and the learning process itself. To support our hypothesis, we present a Human-Robot Interaction study on human-assisted visuomotor learning with the robot NICO, the Neuro-Inspired COmpanion, a child-sized humanoid. NICO is a flexible, social platform with sensing and manipulation abilities. We give a detailed description of NICO's design and a comprehensive overview of studies that use or evaluate NICO. To engage in social interaction, NICO can express stylized facial expressions and utter speech via an Embodied Dialogue System. NICO is characterized in particular by combining these social interaction capabilities with the abilities for human-like object manipulation and crossmodal perception. In the presented study, NICO acquires visuomotor grasping skills by interacting with its environment. In contrast to methods like motor babbling, the learning process is, in part, supported by a human teacher. To begin the learning process, an object is placed into NICO's hand, and if this object is accidentally dropped, the human assistant has to recover it. The study is conducted with 24 participants with little or no prior experience with robots. In the robot-guided experimental condition, assistance is actively requested by NICO via the Embodied Dialogue System. In the human-guided condition, instructions are given by a human experimenter, while NICO remains silent. Evaluation using established questionnaires like Godspeed, Mind Perception, and Uncanny Valley Indices, along with a structured interview and video analysis of the interaction, show that the robot's active requests for assistance foster the participant's engagement and benefit the learning process. This result supports the hypothesis that the ability for social interaction is a key factor for companion robots that learn with the help of non-expert teachers, as these robots become capable of communicating active requests or questions that are vital to their learning process. We also show how the design of NICO both enables and is driven by this approach.
Collapse
Affiliation(s)
- Matthias Kerzel
- Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, Germany
| | | | | | | | | |
Collapse
|
10
|
Bhattacharyya R, Hazarika SM. A knowledge-driven layered inverse reinforcement learning approach for recognizing human intents. J EXP THEOR ARTIF IN 2020. [DOI: 10.1080/0952813x.2020.1718773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- R. Bhattacharyya
- Computer Science and Engineering, Indian Institute of Information Technology Bhagalpur, Bihar, India
| | - S. M. Hazarika
- Biomimetic Robotics and Artificial Intelligence Lab, Mechanical Engineering, Indian Institute of Technology Guwahati, Assam, India
| |
Collapse
|
11
|
Huang K, Ma X, Song R, Rong X, Tian X, Li Y. A self-organizing developmental cognitive architecture with interactive reinforcement learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.07.109] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Cleaning Tasks Knowledge Transfer Between Heterogeneous Robots: a Deep Learning Approach. J INTELL ROBOT SYST 2019. [DOI: 10.1007/s10846-019-01072-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
Kim J, Mishra AK, Limosani R, Scafuro M, Cauli N, Santos-Victor J, Mazzolai B, Cavallo F. Control strategies for cleaning robots in domestic applications: A comprehensive review. INT J ADV ROBOT SYST 2019. [DOI: 10.1177/1729881419857432] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Service robots are built and developed for various applications to support humans as companion, caretaker, or domestic support. As the number of elderly people grows, service robots will be in increasing demand. Particularly, one of the main tasks performed by elderly people, and others, is the complex task of cleaning. Therefore, cleaning tasks, such as sweeping floors, washing dishes, and wiping windows, have been developed for the domestic environment using service robots or robot manipulators with several control approaches. This article is primarily focused on control methodology used for cleaning tasks. Specifically, this work mainly discusses classical control and learning-based controlled methods. The classical control approaches, which consist of position control, force control, and impedance control , are commonly used for cleaning purposes in a highly controlled environment. However, classical control methods cannot be generalized for cluttered environment so that learning-based control methods could be an alternative solution. Learning-based control methods for cleaning tasks can encompass three approaches: learning from demonstration (LfD), supervised learning (SL), and reinforcement learning (RL). These control approaches have their own capabilities to generalize the cleaning tasks in the new environment. For example, LfD, which many research groups have used for cleaning tasks, can generate complex cleaning trajectories based on human demonstration. Also, SL can support the prediction of dirt areas and cleaning motion using large number of data set. Finally, RL can learn cleaning actions and interact with the new environment by the robot itself. In this context, this article aims to provide a general overview of robotic cleaning tasks based on different types of control methods using manipulator. It also suggest a description of the future directions of cleaning tasks based on the evaluation of the control approaches.
Collapse
Affiliation(s)
- Jaeseok Kim
- Istituto di BioRobotica, Scuola Superiore Sant’Anna, Pontedera, Italy
| | - Anand Kumar Mishra
- Centro di Micro-BioRobotica, Istituto Italiano di Tecnologia, Pontedera, Italy
| | - Raffaele Limosani
- Istituto di BioRobotica, Scuola Superiore Sant’Anna, Pontedera, Italy
| | - Marco Scafuro
- Istituto di BioRobotica, Scuola Superiore Sant’Anna, Pontedera, Italy
| | - Nino Cauli
- Institute for Systems and Robotics, Instituto Superior Tecnico, Universidade de Lisboa, Lisboa, Portugal
| | - Jose Santos-Victor
- Institute for Systems and Robotics, Instituto Superior Tecnico, Universidade de Lisboa, Lisboa, Portugal
| | - Barbara Mazzolai
- Centro di Micro-BioRobotica, Istituto Italiano di Tecnologia, Pontedera, Italy
| | - Filippo Cavallo
- Istituto di BioRobotica, Scuola Superiore Sant’Anna, Pontedera, Italy
| |
Collapse
|
14
|
Deng Z, Guan H, Huang R, Liang H, Zhang L, Zhang J. Combining Model-Based $Q$ -Learning With Structural Knowledge Transfer for Robot Skill Learning. IEEE Trans Cogn Dev Syst 2019. [DOI: 10.1109/tcds.2017.2718938] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Shukla D, Erkent Ö, Piater J. Learning Semantics of Gestural Instructions for Human-Robot Collaboration. Front Neurorobot 2018; 12:7. [PMID: 29615888 PMCID: PMC5868127 DOI: 10.3389/fnbot.2018.00007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 02/02/2018] [Indexed: 11/13/2022] Open
Abstract
Designed to work safely alongside humans, collaborative robots need to be capable partners in human-robot teams. Besides having key capabilities like detecting gestures, recognizing objects, grasping them, and handing them over, these robots need to seamlessly adapt their behavior for efficient human-robot collaboration. In this context we present the fast, supervised Proactive Incremental Learning (PIL) framework for learning associations between human hand gestures and the intended robotic manipulation actions. With the proactive aspect, the robot is competent to predict the human's intent and perform an action without waiting for an instruction. The incremental aspect enables the robot to learn associations on the fly while performing a task. It is a probabilistic, statistically-driven approach. As a proof of concept, we focus on a table assembly task where the robot assists its human partner. We investigate how the accuracy of gesture detection affects the number of interactions required to complete the task. We also conducted a human-robot interaction study with non-roboticist users comparing a proactive with a reactive robot that waits for instructions.
Collapse
Affiliation(s)
- Dadhichi Shukla
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Özgür Erkent
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| | - Justus Piater
- Intelligent and Interactive Systems, Department of Computer Science, University of Innsbruck, Innsbruck, Austria
| |
Collapse
|