Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Vien NA, Ertel W, Chung TC. Learning via human feedback in continuous state and action spaces. APPL INTELL 2013. [DOI: 10.1007/s10489-012-0412-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Number

Cited by Other Article(s)

Mourad N, Ezzeddine A, Nadjar Araabi B, Nili Ahmadabadi M. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach. Journal of Robotics 2020;2020:1-18. [DOI: 10.1155/2020/3849309] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract Programming by demonstrations is one of the most efficient methods for knowledge transfer to develop advanced learning systems, provided that teachers deliver abundant and correct demonstrations, and learners correctly perceive them. Nevertheless, demonstrations are sparse and inaccurate in almost all real-world problems. Complementary information is needed to compensate these shortcomings of demonstrations. In this paper, we target programming by a combination of nonoptimal and sparse demonstrations and a limited number of binary evaluative feedbacks, where the learner uses its own evaluated experiences as new demonstrations in an extended inverse reinforcement learning method. This provides the learner with a broader generalization and less regret as well as robustness in face of sparsity and nonoptimality in demonstrations and feedbacks. Our method alleviates the unrealistic burden on teachers to provide optimal and abundant demonstrations. Employing an evaluative feedback, which is easy for teachers to deliver, provides the opportunity to correct the learner’s behavior in an interactive social setting without requiring teachers to know and use their own accurate reward function. Here, we enhance the inverse reinforcement learning (IRL) to estimate the reward function using a mixture of nonoptimal and sparse demonstrations and evaluative feedbacks. Our method, called IRL from demonstration and human’s critique (IRLDC), has two phases. The teacher first provides some demonstrations for the learner to initialize its policy. Next, the learner interacts with the environment and the teacher provides binary evaluative feedbacks. Taking into account possible inconsistencies and mistakes in issuing and receiving feedbacks, the learner revises the estimated reward function by solving a single optimization problem. The IRLDC is devised to handle errors and sparsities in demonstrations and feedbacks and can generalize different combinations of these two sources expertise. We apply our method to three domains: a simulated navigation task, a simulated car driving problem with human interactions, and a navigation experiment of a mobile robot. The results indicate that the IRLDC significantly enhances the learning process where the standard IRL methods fail and learning from feedbacks (LfF) methods has a high regret. Also, the IRLDC works well at different levels of sparsity and optimality of the teacher’s demonstrations and feedbacks, where other state-of-the-art methods fail. Collapse

Zhao X, Ding S, An Y, Jia W. Applications of asynchronous deep reinforcement learning based on dynamic updating weights. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1296-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Celemin C, Ruiz-del-solar J. An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback. J INTELL ROBOT SYST 2019;95:77-97. [DOI: 10.1007/s10846-018-0839-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]

Jagodnik KM, Thomas PS, van den Bogert AJ, Branicky MS, Kirsch RF. Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards. IEEE Trans Neural Syst Rehabil Eng 2017;25:1892-1905. [PMID: 28475063 DOI: 10.1109/tnsre.2017.2700395] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Vien NA, Lee S, Chung T. Bayes-adaptive hierarchical MDPs. APPL INTELL 2016. [DOI: 10.1007/s10489-015-0742-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Ngo H, Luciw M, Nagi J, Forster A, Schmidhuber J, Vien NA. Efficient Interactive Multiclass Learning from Binary Feedback. ACM T INTERACT INTEL 2014. [DOI: 10.1145/2629631] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Kusy M, Zajdel R. Probabilistic neural network training procedure based on Q(0)-learning algorithm in medical data classification. APPL INTELL 2014;41:837-54. [DOI: 10.1007/s10489-014-0562-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Vien NA, Ngo H, Lee S, Chung T. Approximate planning for bayesian hierarchical reinforcement learning. APPL INTELL 2014;41:808-19. [DOI: 10.1007/s10489-014-0565-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Wu B, Zheng HY, Feng YP. Point-based online value iteration algorithm in large POMDP. APPL INTELL 2013. [DOI: 10.1007/s10489-013-0479-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Abdoos M, Mozayani N, Bazzan ALC. Hierarchical control of traffic signals using Q-learning with tile coding. APPL INTELL 2013. [DOI: 10.1007/s10489-013-0455-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]