1
|
High-accuracy model-based reinforcement learning, a survey. Artif Intell Rev 2023. [DOI: 10.1007/s10462-022-10335-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
2
|
Zhang J, Liu Q, Han X. Dynamic sub-route-based self-adaptive beam search Q-learning algorithm for traveling salesman problem. PLoS One 2023; 18:e0283207. [PMID: 36943840 PMCID: PMC10030033 DOI: 10.1371/journal.pone.0283207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 03/03/2023] [Indexed: 03/23/2023] Open
Abstract
In this paper, a dynamic sub-route-based self-adaptive beam search Q-learning (DSRABSQL) algorithm is proposed that provides a reinforcement learning (RL) framework combined with local search to solve the traveling salesman problem (TSP). DSRABSQL builds upon the Q-learning (QL) algorithm. Considering its problems of slow convergence and low accuracy, four strategies within the QL framework are designed first: the weighting function-based reward matrix, the power function-based initial Q-table, a self-adaptive ε-beam search strategy, and a new Q-value update formula. Then, a self-adaptive beam search Q-learning (ABSQL) algorithm is designed. To solve the problem that the sub-route is not fully optimized in the ABSQL algorithm, a dynamic sub-route optimization strategy is introduced outside the QL framework, and then the DSRABSQL algorithm is designed. Experiments are conducted to compare QL, ABSQL, DSRABSQL, our previously proposed variable neighborhood discrete whale optimization algorithm, and two advanced reinforcement learning algorithms. The experimental results show that DSRABSQL significantly outperforms the other algorithms. In addition, two groups of algorithms are designed based on the QL and DSRABSQL algorithms to test the effectiveness of the five strategies. From the experimental results, it can be found that the dynamic sub-route optimization strategy and self-adaptive ε-beam search strategy contribute the most for small-, medium-, and large-scale instances. At the same time, collaboration exists between the four strategies within the QL framework, which increases with the expansion of the instance scale.
Collapse
Affiliation(s)
- Jin Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Qing Liu
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
| | - XiaoHang Han
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
| |
Collapse
|
3
|
Abstract
AbstractMonte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games. However, in more complex games (e.g. those with a high branching factor or real-time ones) as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey was published in 2012. Contributions that appeared since its release are of particular interest for this review.
Collapse
|
4
|
Li B. Hierarchical Architecture for Multi-Agent Reinforcement Learning in Intelligent Game. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) 2022. [DOI: 10.1109/ijcnn55064.2022.9892666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
- Bin Li
- Nanjing University,Department of Control and Systems Engineering,Nanjing,China
| |
Collapse
|
5
|
Probabilistic Plan Recognition for Multi-Agent Systems under Temporal Logic Tasks. ELECTRONICS 2022. [DOI: 10.3390/electronics11091352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
This paper studies the plan recognition problem of multi-agent systems with temporal logic tasks. The high-level temporal tasks are represented as linear temporal logic (LTL). We present a probabilistic plan recognition algorithm to predict the future goals and identify the temporal logic tasks of the agent based on the observations of their states and actions. We subsequently build a plan library composed of Nondeterministic Bu¨chi Automation to model the temporal logic tasks. We also propose a Boolean matrix generation algorithm to map the plan library to multi-agent trajectories and a task recognition algorithm to parse the Boolean matrix. Then, the probability calculation formula is proposed to calculate the posterior goal probability distribution, and the cold start situation of the plan recognition is solved using the Bayes formula. Finally, we validate the proposed algorithm via extensive comparative simulations.
Collapse
|
6
|
Xie D, Zhong X. Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1584-1593. [PMID: 33351767 DOI: 10.1109/tnnls.2020.3042943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios.
Collapse
|
7
|
Khanna R, Dodge J, Anderson A, Dikkala R, Irvine J, Shureih Z, Lam KH, Matthews CR, Lin Z, Kahng M, Fern A, Burnett M. Finding AI’s Faults with AAR/AI: An Empirical Study. ACM T INTERACT INTEL 2022. [DOI: 10.1145/3487065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Would you allow an AI agent to make decisions on your behalf? If the answer is “not always,” the next question becomes “in what circumstances”? Answering this question requires human users to be able to assess an AI agent—and not just with overall pass/fail assessments or statistics. Here users need to be able to
localize
an agent’s bugs so that they can determine when they are willing to rely on the agent and when they are not. After-Action Review for AI (AAR/AI), a new AI assessment process for integration with Explainable AI systems, aims to support human users in this endeavor, and in this article we empirically investigate AAR/AI’s effectiveness with domain-knowledgeable users. Our results show that AAR/AI participants not only located significantly
more
bugs than non-AAR/AI participants did (i.e., showed greater recall) but also located them more
precisely
(i.e., with greater precision). In fact, AAR/AI participants outperformed non-AAR/AI participants on every bug and were, on average, almost six times as likely as non-AAR/AI participants to find any particular bug. Finally, evidence suggests that incorporating labeling into the AAR/AI process may encourage domain-knowledgeable users to abstract above individual instances of bugs; we hypothesize that doing so may have contributed further to AAR/AI participants’ effectiveness.
Collapse
Affiliation(s)
| | | | | | | | - Jed Irvine
- Oregon State University, Corvallis, OR, USA
| | | | - Kin-Ho Lam
- Oregon State University, Corvallis, OR, USA
| | | | | | | | - Alan Fern
- Oregon State University, Corvallis, OR, USA
| | | |
Collapse
|
8
|
Ye D, Chen G, Zhao P, Qiu F, Yuan B, Zhang W, Chen S, Sun M, Li X, Li S, Liang J, Lian Z, Shi B, Wang L, Shi T, Fu Q, Yang W, Huang L. Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:908-918. [PMID: 33147150 DOI: 10.1109/tnnls.2020.3029475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present JueWu-SL, the first supervised-learning-based artificial intelligence (AI) program that achieves human-level performance in playing multiplayer online battle arena (MOBA) games. Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner. Tested on Honor of Kings, the most popular MOBA at present, our AI performs competitively at the level of High King players in standard 5v5 games.
Collapse
|
9
|
Liu X, Tan Y. Attentive Relational State Representation in Decentralized Multiagent Reinforcement Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:252-264. [PMID: 32224477 DOI: 10.1109/tcyb.2020.2979803] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In multiagent reinforcement learning (MARL), it is crucial for each agent to model the relation with its neighbors. Existing approaches usually resort to concatenate the features of multiple neighbors, fixing the size and the identity of the inputs. But these settings are inflexible and unscalable. In this article, we propose an attentive relational encoder (ARE), which is a novel scalable feedforward neural module, to attentionally aggregate an arbitrary-sized neighboring feature set for state representation in the decentralized MARL. The ARE actively selects the relevant information from the neighboring agents and is permutation invariant, computationally efficient, and flexible to interactive multiagent systems. Our method consistently outperforms the latest competing decentralized MARL methods in several multiagent tasks. In particular, it shows strong cooperative performance in challenging StarCraft micromanagement tasks and achieves over a 96% winning rate against the most difficult noncheating built-in artificial intelligence bots.
Collapse
|
10
|
Dodge J, Khanna R, Irvine J, Lam KH, Mai T, Lin Z, Kiddle N, Newman E, Anderson A, Raja S, Matthews C, Perdriau C, Burnett M, Fern A. After-Action Review for AI (AAR/AI). ACM T INTERACT INTEL 2021. [DOI: 10.1145/3453173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Explainable AI is growing in importance as AI pervades modern society, but few have studied how explainable AI can directly support people trying to
assess
an AI agent. Without a rigorous process, people may approach assessment in ad hoc ways—leading to the possibility of wide variations in assessment of the same agent due only to variations in their processes. AAR, or After-Action Review, is a method some military organizations use to assess human agents, and it has been validated in many domains. Drawing upon this strategy, we derived an After-Action Review for AI (AAR/AI), to organize ways people assess reinforcement learning agents in a sequential decision-making environment. We then investigated what AAR/AI brought to human assessors in two qualitative studies. The first investigated AAR/AI to gather formative information, and the second built upon the results, and also varied the type of explanation (model-free vs. model-based) used in the AAR/AI process. Among the results were the following: (1) participants reporting that AAR/AI helped to
organize their thoughts
and
think logically
about the agent, (2) AAR/AI encouraged participants to reason about the agent from a
wide range of perspectives
, and (3) participants were able to leverage AAR/AI with the model-based explanations to
falsify
the agent’s predictions.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Sai Raja
- Oregon State University, Corvallis, OR
| | | | | | | | - Alan Fern
- Oregon State University, Corvallis, OR
| |
Collapse
|
11
|
Huang W, Yin Q, Zhang J, Huang K. Learning Macromanagement in Starcraft by Deep Reinforcement Learning. SENSORS 2021; 21:s21103332. [PMID: 34065012 PMCID: PMC8150573 DOI: 10.3390/s21103332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 04/27/2021] [Accepted: 05/06/2021] [Indexed: 12/02/2022]
Abstract
StarCraft is a real-time strategy game that provides a complex environment for AI research. Macromanagement, i.e., selecting appropriate units to build depending on the current state, is one of the most important problems in this game. To reduce the requirements for expert knowledge and enhance the coordination of the systematic bot, we select reinforcement learning (RL) to tackle the problem of macromanagement. We propose a novel deep RL method, Mean Asynchronous Advantage Actor-Critic (MA3C), which computes the approximate expected policy gradient instead of the gradient of sampled action to reduce the variance of the gradient, and encode the history queue with recurrent neural network to tackle the problem of imperfect information. The experimental results show that MA3C achieves a very high rate of winning, approximately 90%, against the weaker opponents and it improves the win rate about 30% against the stronger opponents. We also propose a novel method to visualize and interpret the policy learned by MA3C. Combined with the visualized results and the snapshots of games, we find that the learned macromanagement not only adapts to the game rules and the policy of the opponent bot, but also cooperates well with the other modules of MA3C-Bot.
Collapse
Affiliation(s)
- Wenzhen Huang
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; (W.H.); (Q.Y.); (J.Z.)
- CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Qiyue Yin
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; (W.H.); (Q.Y.); (J.Z.)
- CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Junge Zhang
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; (W.H.); (Q.Y.); (J.Z.)
- CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Kaiqi Huang
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; (W.H.); (Q.Y.); (J.Z.)
- CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing 100190, China
- Correspondence:
| |
Collapse
|
12
|
Cuccu G, Togelius J, Cudré-Mauroux P. Playing Atari with few neurons: Improving the efficacy of reinforcement learning by decoupling feature extraction and decision making. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS 2021; 35:17. [PMID: 34720684 PMCID: PMC8550197 DOI: 10.1007/s10458-021-09497-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 03/09/2021] [Indexed: 06/13/2023]
Abstract
We propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game's controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.
Collapse
Affiliation(s)
- Giuseppe Cuccu
- eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland
| | - Julian Togelius
- Game Innovation Lab, Tandon School of Engineering, New York University, New York, NY USA
| | - Philippe Cudré-Mauroux
- eXascale Infolab, Department of Computer Science, University of Fribourg, Fribourg, Switzerland
| |
Collapse
|
13
|
Penney S, Dodge J, Anderson A, Hilderbrand C, Simpson L, Burnett M. The Shoutcasters, the Game Enthusiasts, and the AI: Foraging for Explanations of Real-time Strategy Players. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3396047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Assessing and understanding intelligent agents is a difficult task for users who lack an AI background. “Explainable AI” (XAI) aims to address this problem, but what should be in an explanation? One route toward answering this question is to turn to theories of how humans try to obtain information they seek. Information Foraging Theory (IFT) is one such theory. In this article, we present a series of studies
1
using IFT: the first investigates how expert explainers
supply
explanations in the RTS domain, the second investigates what explanations domain experts
demand
from agents in the RTS domain, and the last focuses on how both populations try to explain a state-of-the-art AI. Our results show that RTS environments like StarCraft offer so many options that change so rapidly, foraging tends to be very costly. Ways foragers attempted to manage such costs included “satisficing” approaches to reduce their cognitive load, such as focusing more on What information than on Why information, strategic use of language to communicate a lot of nuanced information in a few words, and optimizing their environment when possible to make their most valuable information patches readily available. Further, when a real AI entered the picture, even very experienced domain experts had difficulty understanding and judging some of the AI’s unconventional behaviors. Finally, our results reveal ways Information Foraging Theory can inform future XAI interactive explanation environments, and also how XAI can inform IFT.
Collapse
|
14
|
Zha Z, Wang B, Tang X. Evaluate, explain, and explore the state more exactly: an improved Actor-Critic algorithm for complex environment. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05663-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
15
|
Abstract
In general, games pose interesting and complex problems for the implementation of intelligent agents and are a popular domain in the study of artificial intelligence. In fact, games have been at the center of some of the most well-known achievements in artificial intelligence. From classical board games such as chess, checkers, backgammon and Go, to video games such as Dota 2 and StarCraft II, artificial intelligence research has devised computer programs that can play at the level of a human master and even at a human world champion level. Planning and learning, two well-known and successful paradigms of artificial intelligence, have greatly contributed to these achievements. Although representing distinct approaches, planning and learning try to solve similar problems and share some similarities. They can even complement each other. This has led to research on methodologies to combine the strengths of both approaches to derive better solutions. This paper presents a survey of the multiple methodologies that have been proposed to integrate planning and learning in the context of games. In order to provide a richer contextualization, the paper also presents learning and planning techniques commonly used in games, both in terms of their theoretical foundations and applications.
Collapse
|
16
|
Anderson A, Dodge J, Sadarangani A, Juozapaitis Z, Newman E, Irvine J, Chattopadhyay S, Olson M, Fern A, Burnett M. Mental Models of Mere Mortals with Explanations of Reinforcement Learning. ACM T INTERACT INTEL 2020. [DOI: 10.1145/3366485] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
How should reinforcement learning (RL) agents explain themselves to humans not trained in AI? To gain insights into this question, we conducted a 124-participant, four-treatment experiment to compare participants’ mental models of an RL agent in the context of a simple Real-Time Strategy (RTS) game. The four treatments isolated two types of explanations vs. neither vs. both together. The two types of explanations were as follows: (1) saliency maps (an “Input Intelligibility Type” that explains the AI’s focus of attention) and (2) reward-decomposition bars (an “Output Intelligibility Type” that explains the AI’s predictions of future types of rewards). Our results show that a combined explanation that included saliency and reward bars was needed to achieve a statistically significant difference in participants’ mental model scores over the no-explanation treatment. However, this combined explanation was far from a panacea: It exacted disproportionately high cognitive loads from the participants who received the combined explanation. Further, in some situations, participants who saw both explanations predicted the agent’s next action
worse
than all other treatments’ participants.
Collapse
Affiliation(s)
| | - Jonathan Dodge
- Oregon State University, SW Jefferson Way, Corvallis, OR
| | | | | | - Evan Newman
- Oregon State University, SW Jefferson Way, Corvallis, OR
| | - Jed Irvine
- Oregon State University, SW Jefferson Way, Corvallis, OR
| | | | - Matthew Olson
- Oregon State University, SW Jefferson Way, Corvallis, OR
| | - Alan Fern
- Oregon State University, SW Jefferson Way, Corvallis, OR
| | | |
Collapse
|
17
|
Badman RP, Hills TT, Akaishi R. Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence. Brain Sci 2020; 10:E396. [PMID: 32575758 PMCID: PMC7348831 DOI: 10.3390/brainsci10060396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 05/23/2020] [Accepted: 06/17/2020] [Indexed: 11/16/2022] Open
Abstract
Biological and artificial intelligence (AI) are often defined by their capacity to achieve a hierarchy of short-term and long-term goals that require incorporating information over time and space at both local and global scales. More advanced forms of this capacity involve the adaptive modulation of integration across scales, which resolve computational inefficiency and explore-exploit dilemmas at the same time. Research in neuroscience and AI have both made progress towards understanding architectures that achieve this. Insight into biological computations come from phenomena such as decision inertia, habit formation, information search, risky choices and foraging. Across these domains, the brain is equipped with mechanisms (such as the dorsal anterior cingulate and dorsolateral prefrontal cortex) that can represent and modulate across scales, both with top-down control processes and by local to global consolidation as information progresses from sensory to prefrontal areas. Paralleling these biological architectures, progress in AI is marked by innovations in dynamic multiscale modulation, moving from recurrent and convolutional neural networks-with fixed scalings-to attention, transformers, dynamic convolutions, and consciousness priors-which modulate scale to input and increase scale breadth. The use and development of these multiscale innovations in robotic agents, game AI, and natural language processing (NLP) are pushing the boundaries of AI achievements. By juxtaposing biological and artificial intelligence, the present work underscores the critical importance of multiscale processing to general intelligence, as well as highlighting innovations and differences between the future of biological and artificial intelligence.
Collapse
Affiliation(s)
| | | | - Rei Akaishi
- Center for Brain Science, RIKEN, Saitama 351-0198, Japan
| |
Collapse
|
18
|
Fea MP, Boisseau RP, Emlen DJ, Holwell GI. Cybernetic combatants support the importance of duels in the evolution of extreme weapons. Proc Biol Sci 2020; 287:20200254. [PMID: 32517625 DOI: 10.1098/rspb.2020.0254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A current evolutionary hypothesis predicts that the most extreme forms of animal weaponry arise in systems where combatants fight each other one-to-one, in duels. It has also been suggested that arms races in human interstate conflicts are more likely to escalate in cases where there are only two opponents. However, directly testing whether duels matter for weapon investment is difficult in animals and impossible in interstate conflicts. Here, we test whether superior combatants experience a disproportionate advantage in duels, as compared with multi-combatant skirmishes, in a system analogous to both animal and military contests: the battles fought by artificial intelligence agents in a computer war game. We found that combatants with experimentally improved fighting power had a large advantage in duels, but that this advantage deteriorated as the complexity of the battlefield was increased by the addition of further combatants. This pattern remained under the two different forms of the advantage granted to our focal artificial intelligence (AI) combatants, and became reversed when we switched the roles to feature a weak focal AI among strong opponents. Our results suggest that one-on-one combat may trigger arms races in diverse systems. These results corroborate the outcomes of studies of both animal and interstate contests, and suggest that elements of animal contest theory may be widely applicable to arms races generally.
Collapse
Affiliation(s)
- Murray P Fea
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Romain P Boisseau
- Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA
| | - Douglas J Emlen
- Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA
| | - Gregory I Holwell
- School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
19
|
A Comparison of Evolutionary and Tree-Based Approaches for Game Feature Validation in Real-Time Strategy Games with a Novel Metric. MATHEMATICS 2020. [DOI: 10.3390/math8050688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
When it comes to game playing, evolutionary and tree-based approaches are the most popular approximate methods for decision making in the artificial intelligence field of game research. The evolutionary domain therefore draws its inspiration for the design of approximate methods from nature, while the tree-based domain builds an approximate representation of the world in a tree-like structure, and then a search is conducted to find the optimal path inside that tree. In this paper, we propose a novel metric for game feature validation in Real-Time Strategy (RTS) games. Firstly, the identification and grouping of Real-Time Strategy game features is carried out, and, secondly, groups are included into weighted classes with regard to their correlation and importance. A novel metric is based on the groups, weighted classes, and how many times the playtesting agent invalidated the game feature in a given game feature scenario. The metric is used in a series of experiments involving recent state-of-the-art evolutionary and tree-based playtesting agents. The experiments revealed that there was no major difference between evolutionary-based and tree-based playtesting agents.
Collapse
|
20
|
A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System. Symmetry (Basel) 2020. [DOI: 10.3390/sym12040631] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.
Collapse
|
21
|
From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AI. KUNSTLICHE INTELLIGENZ 2020. [DOI: 10.1007/s13218-020-00647-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
22
|
Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation. INFORMATION 2019. [DOI: 10.3390/info10110341] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.
Collapse
|
23
|
Shao K, Zhu Y, Zhao D. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2019. [DOI: 10.1109/tetci.2018.2823329] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
24
|
Pronobis W, Tkatchenko A, Müller KR. Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules. J Chem Theory Comput 2018; 14:2991-3003. [DOI: 10.1021/acs.jctc.8b00110] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Wiktor Pronobis
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
| | - Alexandre Tkatchenko
- Physics and Materials Science Research Unit, University of Luxembourg, Luxembourg L-1511, Luxembourg
| | - Klaus-Robert Müller
- Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
- Department of Brain and Cognitive Engineering, Korea University, Seoul 136-713, South Korea
| |
Collapse
|
25
|
Procedural generation of non-player characters in massively multiplayer online strategy games. Soft comput 2017. [DOI: 10.1007/s00500-016-2238-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
26
|
Modified Adversarial Hierarchical Task Network Planning in Real-Time Strategy Games. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7090872] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
27
|
Bosc G, Tan P, Boulicaut JF, Raissi C, Kaytoue M. A Pattern Mining Approach to Study Strategy Balance in RTS Games. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2017. [DOI: 10.1109/tciaig.2015.2511819] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
|
29
|
Synnaeve G, Bessiere P. Multiscale Bayesian Modeling for RTS Games: An Application to StarCraft AI. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2015.2487743] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
30
|
Liu S, Louis SJ, Ballinger CA. Evolving Effective Microbehaviors in Real-Time Strategy Games. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2016.2544844] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
31
|
Ballinger C, Louis S, Liu S. Coevolving Robust Build-Order Iterative Lists for Real-Time Strategy Games. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2016.2544817] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
32
|
ghost: A Combinatorial Optimization Framework for Real-Time Problems. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2016.2573199] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
33
|
Perez-Liebana D, Samothrakis S, Togelius J, Schaul T, Lucas SM, Couetoux A, Lee J, Lim CU, Thompson T. The 2014 General Video Game Playing Competition. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2015.2402393] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
34
|
|
35
|
Togelius J. How to Run a Successful Game-Based AI Competition. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2014.2365470] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
36
|
Stanescu M, Certicky M. Predicting Opponent's Production in Real-Time Strategy Games With Answer Set Programming. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2016. [DOI: 10.1109/tciaig.2014.2365414] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
Yannakakis GN, Togelius J. A Panorama of Artificial and Computational Intelligence in Games. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES 2015. [DOI: 10.1109/tciaig.2014.2339221] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|