Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Li J, Liu Q, Chi G. Distributed deep reinforcement learning based on bi-objective framework for multi-robot formation. Neural Netw 2024;171:61-72. [PMID: 38091765 DOI: 10.1016/j.neunet.2023.11.063] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 10/18/2023] [Accepted: 11/29/2023] [Indexed: 01/29/2024]

For:	Li J, Liu Q, Chi G. Distributed deep reinforcement learning based on bi-objective framework for multi-robot formation. Neural Netw 2024;171:61-72. [PMID: 38091765 DOI: 10.1016/j.neunet.2023.11.063] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 10/18/2023] [Accepted: 11/29/2023] [Indexed: 01/29/2024]

Number

Cited by Other Article(s)

Shu M, Lü S, Gong X, An D, Li S. Episodic Memory-Double Actor-Critic Twin Delayed Deep Deterministic Policy Gradient. Neural Netw 2025;187:107286. [PMID: 40048754 DOI: 10.1016/j.neunet.2025.107286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 02/03/2025] [Accepted: 02/13/2025] [Indexed: 04/29/2025]

Abstract

Existing deep reinforcement learning (DRL) algorithms suffer from the problem of low sample efficiency. Episodic memory allows DRL algorithms to remember and use past experiences with high return, thereby improving sample efficiency. However, due to the high dimensionality of the state-action space in continuous action tasks, previous methods in continuous action tasks often only utilize the information stored in episodic memory, rather than directly employing episodic memory for action selection as done in discrete action tasks. We suppose that episodic memory retains the potential to guide action selection in continuous control tasks. Our objective is to enhance sample efficiency by leveraging episodic memory for action selection in such tasks-either reducing the number of training steps required to achieve comparable performance or enabling the agent to obtain higher rewards within the same number of training steps. To this end, we propose an "Episodic Memory-Double Actor-Critic (EMDAC)" framework, which can use episodic memory for action selection in continuous action tasks. The critics and episodic memory evaluate the value of state-action pairs selected by the two actors to determine the final action. Meanwhile, we design an episodic memory based on a Kalman filter optimizer, which updates using the episodic rewards of collected state-action pairs. The Kalman filter optimizer assigns different weights to experiences collected at different time periods during the memory update process. In our episodic memory, state-action pair clusters are used as indices, recording both the occurrence frequency of these clusters and the value estimates for the corresponding state-action pairs. This enables the estimation of the value of state-action pair clusters by querying the episodic memory. After that, we design intrinsic reward based on the novelty of state-action pairs with episodic memory, defined by the occurrence frequency of state-action pair clusters, to enhance the exploration capability of the agent. Ultimately, we propose an "EMDAC-TD3" algorithm by applying this three modules to Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm within an Actor-Critic framework. Through evaluations in MuJoCo environments within the OpenAI Gym domain, EMDAC-TD3 achieves higher sample efficiency compared to baseline algorithms. EMDAC-TD3 demonstrates superior final performance compared to state-of-the-art episodic control algorithms and advanced Actor-Critic algorithms, by comparing the final rewards, Median, Interquartile Mean, Mean, and Optimality Gap. The final rewards can directly demonstrate the advantages of the algorithms. Based on the final rewards, EMDAC-TD3 achieves an average performance improvement of 11.01% over TD3, surpassing the current state-of-the-art algorithms in the same category.

Collapse

Yang C, Huang J, Wu S, Liu Q. Neural-network-based practical specified-time resilient formation maneuver control for second-order nonlinear multi-robot systems under FDI attacks. Neural Netw 2025;186:107288. [PMID: 40020307 DOI: 10.1016/j.neunet.2025.107288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2024] [Revised: 12/03/2024] [Accepted: 02/13/2025] [Indexed: 03/03/2025]

Li C, Dong S, Yang S, Hu Y, Li W, Gao Y. Coordinating Multi-Agent Reinforcement Learning via Dual Collaborative Constraints. Neural Netw 2025;182:106858. [PMID: 39550797 DOI: 10.1016/j.neunet.2024.106858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 10/03/2024] [Accepted: 10/27/2024] [Indexed: 11/19/2024]

Li X, Yang X, Ju X. A novel fractional-order memristive Hopfield neural network for traveling salesman problem and its FPGA implementation. Neural Netw 2024;179:106548. [PMID: 39128274 DOI: 10.1016/j.neunet.2024.106548] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/20/2024] [Accepted: 07/14/2024] [Indexed: 08/13/2024]

Wang H, Liu Q, Xu C. Predefined-time distributed optimization and anti-disturbance control for nonlinear multi-agent system with neural network estimator: A hierarchical framework. Neural Netw 2024;175:106270. [PMID: 38569458 DOI: 10.1016/j.neunet.2024.106270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/22/2024] [Accepted: 03/24/2024] [Indexed: 04/05/2024]