1
|
Chen B, Cao Z, Bai Q. SATF: A Scalable Attentive Transfer Framework for Efficient Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6627-6641. [PMID: 38648131 DOI: 10.1109/tnnls.2024.3387397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
It is challenging to train an efficient learning procedure with multiagent reinforcement learning (MARL) when the number of agents increases as the observation space exponentially expands, especially in large-scale multiagent systems. In this article, we proposed a scalable attentive transfer framework (SATF) for efficient MARL, which achieved goals faster and more accurately in homogeneous and heterogeneous combat tasks by transferring learned knowledge from a small number of agents (4) to a large number of agents (up to 64). To reduce and align the dimensionality of the observed state variations caused by increasing numbers of agents, the proposed SATF deployed a novel state representation network with a self-attention mechanism, known as dynamic observation representation network (DorNet), to extract the dominant observed information with excellent cost-effectiveness. The experiments on the MAgent platform showed that the SATF outperformed the distributed MARL (independent Q-learning (IQL) and A2C) in task sequences from 8 to 64 agents. The experiments on StarCraft II showed that the SATF demonstrated superior performance relative to the centralized training with decentralized execution MARL (QMIX) by presenting shorter training steps, achieving a desired win rate of up to approximately 90% when increasing the number of agents from 4 to 32. The findings of our study showed the great potential for enhancing the efficiency of MARL training in large-scale agent combat missions.
Collapse
|
2
|
Li L, Zhu Y. Boosting On-Policy Actor-Critic With Shallow Updates in Critic. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5644-5653. [PMID: 38619961 DOI: 10.1109/tnnls.2024.3378913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Deep reinforcement learning (DRL) benefits from the representation power of deep neural networks (NNs), to approximate the value function and policy in the learning process. Batch reinforcement learning (BRL) benefits from stable training and data efficiency with fixed representation and enjoys solid theoretical analysis. This work proposes least-squares deep policy gradient (LSDPG), a hybrid approach that combines least-squares reinforcement learning (RL) with online DRL to achieve the best of both worlds. LSDPG leverages a shared network to share useful features between policy (actor) and value function (critic). LSDPG learns policy, value function, and representation separately. First, LSDPG views deep NNs of the critic as a linear combination of representation weighted by the weights of the last layer and performs policy evaluation with regularized least-squares temporal difference (LSTD) methods. Second, arbitrary policy gradient algorithms can be applied to improve the policy. Third, an auxiliary task is used to periodically distill the features from the critic into the representation. Unlike most DRL methods, where the critic algorithms are often used in a nonstationary situation, i.e., the policy to be evaluated is changing, the critic in LSDPG is working on a stationary case in each iteration of the critic update. We prove that, under some conditions, the critic converges to the regularized TD fixpoint of current policy, and the actor converges to the local optimal policy. The experimental results on challenging Procgen benchmark illustrate the improvement of sample efficiency of LSDPG over proximal policy optimization and phasic policy gradient (PPG).
Collapse
|
3
|
Chai J, Zhu Y, Zhao D. NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17829-17841. [PMID: 37672377 DOI: 10.1109/tnnls.2023.3309608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods. NVIF compresses the MIS into a compact latent state while adopting neighboring communication. To stabilize the overall training process, we introduce a two-stage training mechanism. We first pretrain the NVIF module using a randomly sampled offline dataset to create a task-agnostic and stable communication protocol, and then use the pretrained protocol to perform online policy training with RL algorithms. Our theoretical analysis indicates that NVIF-proximal policy optimization (PPO), which combines NVIF with PPO, has the potential to promote cooperation with agent-specific rewards. Experiment results demonstrate the superiority of our method in both heterogeneous and homogeneous settings. Additional experiment results also demonstrate the potential of our method for multitask learning.
Collapse
|
4
|
Ma H, Dong D, Ding SX, Chen C. Curriculum-Based Deep Reinforcement Learning for Quantum Control. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8852-8865. [PMID: 35263262 DOI: 10.1109/tnnls.2022.3153502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep reinforcement learning (DRL) has been recognized as an efficient technique to design optimal strategies for different complex systems without prior knowledge of the control landscape. To achieve a fast and precise control for quantum systems, we propose a novel DRL approach by constructing a curriculum consisting of a set of intermediate tasks defined by fidelity thresholds, where the tasks among a curriculum can be statically determined before the learning process or dynamically generated during the learning process. By transferring knowledge between two successive tasks and sequencing tasks according to their difficulties, the proposed curriculum-based DRL (CDRL) method enables the agent to focus on easy tasks in the early stage, then move onto difficult tasks, and eventually approaches the final task. Numerical comparison with the traditional methods [gradient method (GD), genetic algorithm (GA), and several other DRL methods] demonstrates that CDRL exhibits improved control performance for quantum systems and also provides an efficient way to identify optimal strategies with few control pulses.
Collapse
|
5
|
Zhang R, Zong Q, Zhang X, Dou L, Tian B. Game of Drones: Multi-UAV Pursuit-Evasion Game With Online Motion Planning by Deep Reinforcement Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7900-7909. [PMID: 35157597 DOI: 10.1109/tnnls.2022.3146976] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
As one of the tiniest flying objects, unmanned aerial vehicles (UAVs) are often expanded as the "swarm" to execute missions. In this article, we investigate the multiquadcopter and target pursuit-evasion game in the obstacles environment. For high-quality simulation of the urban environment, we propose the pursuit-evasion scenario (PES) framework to create the environment with a physics engine, which enables quadcopter agents to take actions and interact with the environment. On this basis, we construct multiagent coronal bidirectionally coordinated with target prediction network (CBC-TP Net) with a vectorized extension of multiagent deep deterministic policy gradient (MADDPG) formulation to ensure the effectiveness of the damaged "swarm" system in pursuit-evasion mission. Unlike traditional reinforcement learning, we design a target prediction network (TP Net) innovatively in the common framework to imitate the way of human thinking: situation prediction is always before decision-making. The experiments of the pursuit-evasion game are conducted to verify the state-of-the-art performance of the proposed strategy, both in the normal and antidamaged situations.
Collapse
|
6
|
Chai J, Li W, Zhu Y, Zhao D, Ma Z, Sun K, Ding J. UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2093-2104. [PMID: 34460404 DOI: 10.1109/tnnls.2021.3105869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multiagent reinforcement learning methods, such as VDN, QMIX, and QTRAN, that adopt centralized training with decentralized execution (CTDE) framework have shown promising results in cooperation and competition. However, in some multiagent scenarios, the number of agents and the size of the action set actually vary over time. We call these unshaped scenarios, and the methods mentioned above fail in performing satisfyingly. In this article, we propose a new method, called Unshaped Networks for Multiagent Systems (UNMAS), which adapts to the number and size changes in multiagent systems. We propose the self-weighting mixing network to factorize the joint action-value. Its adaption to the change in agent number is attributed to the nonlinear mapping from each-agent Q value to the joint action-value with individual weights. Besides, in order to address the change in an action set, each agent constructs an individual action-value network that is composed of two streams to evaluate the constant environment-oriented subset and the varying unit-oriented subset. We evaluate UNMAS on various StarCraft II micromanagement scenarios and compare the results with several state-of-the-art MARL algorithms. The superiority of UNMAS is demonstrated by its highest winning rates especially on the most difficult scenario 3s5z_vs_3s6z. The agents learn to perform effectively cooperative behaviors, while other MARL algorithms fail. Animated demonstrations and source code are provided in https://sites.google.com/view/unmas.
Collapse
|
7
|
Shi H, Li J, Mao J, Hwang KS. Lateral Transfer Learning for Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1699-1711. [PMID: 34506297 DOI: 10.1109/tcyb.2021.3108237] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Some researchers have introduced transfer learning mechanisms to multiagent reinforcement learning (MARL). However, the existing works devoted to cross-task transfer for multiagent systems were designed just for homogeneous agents or similar domains. This work proposes an all-purpose cross-transfer method, called multiagent lateral transfer (MALT), assisting MARL with alleviating the training burden. We discuss several challenges in developing an all-purpose multiagent cross-task transfer learning method and provide a feasible way of reusing knowledge for MARL. In the developed method, we take features as the transfer object rather than policies or experiences, inspired by the progressive network. To achieve more efficient transfer, we assign pretrained policy networks for agents based on clustering, while an attention module is introduced to enhance the transfer framework. The proposed method has no strict requirements for the source task and target task. Compared with the existing works, our method can transfer knowledge among heterogeneous agents and also avoid negative transfer in the case of fully different tasks. As far as we know, this article is the first work denoted to all-purpose cross-task transfer for MARL. Several experiments in various scenarios have been conducted to compare the performance of the proposed method with baselines. The results demonstrate that the method is sufficiently flexible for most settings, including cooperative, competitive, homogeneous, and heterogeneous configurations.
Collapse
|
8
|
Ding Z, Chen Y, Li N, Zhao D, Sun Z, Chen CLP. BNAS: Efficient Neural Architecture Search Using Broad Scalable Architecture. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:5004-5018. [PMID: 33788694 DOI: 10.1109/tnnls.2021.3067028] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Efficient neural architecture search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing and reinforcement learning (RL). In the phase of architecture search, ENAS employs deep scalable architecture as search space whose training process consumes most of the search cost. Moreover, time-consuming model training is proportional to the depth of deep scalable architecture. Through experiments using ENAS on CIFAR-10, we find that layer reduction of scalable architecture is an effective way to accelerate the search process of ENAS but suffers from a prohibitive performance drop in the phase of architecture estimation. In this article, we propose a broad neural architecture search (BNAS) where we elaborately design broad scalable architecture dubbed broad convolutional neural network (BCNN) to solve the above issue. On the one hand, the proposed broad scalable architecture has fast training speed due to its shallow topology. Moreover, we also adopt RL and parameter sharing used in ENAS as the optimization strategy of BNAS. Hence, the proposed approach can achieve higher search efficiency. On the other hand, the broad scalable architecture extracts multi-scale features and enhancement representations, and feeds them into global average pooling (GAP) layer to yield more reasonable and comprehensive representations. Therefore, the performance of broad scalable architecture can be promised. In particular, we also develop two variants for BNAS that modify the topology of BCNN. In order to verify the effectiveness of BNAS, several experiments are performed and experimental results show that 1) BNAS delivers 0.19 days which is 2.37× less expensive than ENAS who ranks the best in RL-based NAS approaches; 2) compared with small-size (0.5 million parameters) and medium-size (1.1 million parameters) models, the architecture learned by BNAS obtains state-of-the-art performance (3.58% and 3.24% test error) on CIFAR-10; and 3) the learned architecture achieves 25.3% top-1 error on ImageNet just using 3.9 million parameters.
Collapse
|
9
|
Xie D, Zhong X. Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1584-1593. [PMID: 33351767 DOI: 10.1109/tnnls.2020.3042943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios.
Collapse
|
10
|
Zhu Y, Zhao D. Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1228-1241. [PMID: 33306474 DOI: 10.1109/tnnls.2020.3041469] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs). The problem is first formulated as a Bellman minimax equation, and generalized policy iteration (GPI) provides a double-loop iterative way to find the equilibrium. Then, neural networks are introduced to approximate Q functions for large-scale problems. An online minimax Q network learning algorithm is proposed to train the network with observations. Experience replay, dueling network, and double Q-learning are applied to improve the learning process. The contributions are twofold: 1) DRL techniques are combined with GPI to find the TZMG Nash equilibrium for the first time and 2) the convergence of the online learning algorithm with a lookup table and experience replay is proven, whose proof is not only useful for TZMGs but also instructive for single-agent Markov decision problems. Experiments on different examples validate the effectiveness of the proposed algorithm on TZMG problems.
Collapse
|
11
|
Barros P, Sciutti A. All by myself: Learning individualized competitive behaviour with a contrastive reinforcement learning optimization. Neural Netw 2022; 150:364-376. [DOI: 10.1016/j.neunet.2022.03.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 03/01/2022] [Accepted: 03/07/2022] [Indexed: 11/17/2022]
|
12
|
Hou Y, Sun M, Zhu W, Zeng Y, Piao H, Chen X, Zhang Q. Behavior Reasoning for Opponent Agents in Multi-Agent Learning Systems. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2022. [DOI: 10.1109/tetci.2022.3147011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
13
|
Recent Advances in Deep Reinforcement Learning Applications for Solving Partially Observable Markov Decision Processes (POMDP) Problems: Part 1—Fundamentals and Applications in Games, Robotics and Natural Language Processing. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2021. [DOI: 10.3390/make3030029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The first part of a two-part series of papers provides a survey on recent advances in Deep Reinforcement Learning (DRL) applications for solving partially observable Markov decision processes (POMDP) problems. Reinforcement Learning (RL) is an approach to simulate the human’s natural learning process, whose key is to let the agent learn by interacting with the stochastic environment. The fact that the agent has limited access to the information of the environment enables AI to be applied efficiently in most fields that require self-learning. Although efficient algorithms are being widely used, it seems essential to have an organized investigation—we can make good comparisons and choose the best structures or algorithms when applying DRL in various applications. In this overview, we introduce Markov Decision Processes (MDP) problems and Reinforcement Learning and applications of DRL for solving POMDP problems in games, robotics, and natural language processing. A follow-up paper will cover applications in transportation, communications and networking, and industries.
Collapse
|
14
|
Liu M, Zhang H, Hao W, Qi X, Cheng K, Jin D, Feng X. Introduction of a new dataset and method for location predicting based on deep learning in wargame. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
It is a challenge for existing artificial intelligence algorithms to deal with incomplete information of computer tactical wargames in military research, and one effective method is to take advantage of game replays based on data mining or supervised learning. However, the open source datasets of wargame replays are extremely rare, which obstruct the development of research on computer wargames. In this paper, a data set of wargame replays is opened for predicting algorithm on the condition of incomplete information, to be specific, we propose the dataset processing method for deep learning and an network model for enemy locations predicting. We first introduce the criteria and methods of data preprocessing, parsing and feature extraction, then the training set and test set for deep learning are predefined. Furthermore, we have designed a newly specific network model for enemy locations predicting, including multi-head input, multi-head output, CNN and GRU layers to deal with the multi-agent and long-term memory problems. The experimental results demonstrate that our method achieves good performance of 84.9% on top-50 accuracy. Finally, we open source the data set and methods on https://github.com/daman043/AAGWS-Wargame-master.
Collapse
Affiliation(s)
- Man Liu
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Hongjun Zhang
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Wenning Hao
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Xiuli Qi
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Kai Cheng
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Dawei Jin
- Army Engineering University of PLA, Nanjin, Jiangsu, China
| | - Xinliang Feng
- Army Infantry Academy of PLA, Nanchang, Jiangxi, China
| |
Collapse
|
15
|
Miyashita Y, Sugawara T. Analysis of coordinated behavior structures with multi-agent deep reinforcement learning. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01832-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractCooperation and coordination are major issues in studies on multi-agent systems because the entire performance of such systems is greatly affected by these activities. The issues are challenging however, because appropriate coordinated behaviors depend on not only environmental characteristics but also other agents’ strategies. On the other hand, advances in multi-agent deep reinforcement learning (MADRL) have recently attracted attention, because MADRL can considerably improve the entire performance of multi-agent systems in certain domains. The characteristics of learned coordination structures and agent’s resulting behaviors, however, have not been clarified sufficiently. Therefore, we focus here on MADRL in which agents have their own deep Q-networks (DQNs), and we analyze their coordinated behaviors and structures for the pickup and floor laying problem, which is an abstraction of our target application. In particular, we analyze the behaviors around scarce resources and long narrow passages in which conflicts such as collisions are likely to occur. We then indicated that different types of inputs to the networks exhibit similar performance but generate various coordination structures with associated behaviors, such as division of labor and a shared social norm, with no direct communication.
Collapse
|
16
|
Zha Z, Wang B, Tang X. Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05663-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
17
|
Lu Y, Chen Y, Zhao D, Li D. MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.07.091] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
18
|
A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System. Symmetry (Basel) 2020. [DOI: 10.3390/sym12040631] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.
Collapse
|
19
|
A Novel Digital Modulation Recognition Algorithm Based on Deep Convolutional Neural Network. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10031166] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The modulation recognition of digital signals under non-cooperative conditions is one of the important research contents here. With the rapid development of artificial intelligence technology, deep learning theory is also increasingly being applied to the field of modulation recognition. In this paper, a novel digital signal modulation recognition algorithm is proposed, which has combined the InceptionResNetV2 network with transfer adaptation, called InceptionResnetV2-TA. Firstly, the received signal is preprocessed and generated the constellation diagram. Then, the constellation diagram is used as the input of the InceptionResNetV2 network to identify different kinds of signals. Transfer adaptation is used for feature extraction and SVM classifier is used to identify the modulation mode of digital signal. The constellation diagram of three typical signals, including Binary Phase Shift Keying(BPSK), Quadrature Phase Shift Keying(QPSK) and 8 Phase Shift Keying(8PSK), was made for the experiments. When the signal-to-noise ratio(SNR) is 4dB, the recognition rates of BPSK, QPSK and 8PSK are respectively 1.0, 0.9966 and 0.9633 obtained by InceptionResnetV2-TA, and at the same time, the recognition rate can be 3% higher than other algorithms. Compared with the traditional modulation recognition algorithms, the experimental results show that the proposed algorithm in this paper has a higher accuracy rate for digital signal modulation recognition at low SNR.
Collapse
|
20
|
Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation. INFORMATION 2019. [DOI: 10.3390/info10110341] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.
Collapse
|