GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Association for the Advancement of Artificial Intelligence (AAAI)  (6)
Material
Publisher
  • Association for the Advancement of Artificial Intelligence (AAAI)  (6)
Language
Years
  • 1
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2023
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 10 ( 2023-06-26), p. 11604-11612
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 10 ( 2023-06-26), p. 11604-11612
    Abstract: Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2023
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2023
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 10 ( 2023-06-26), p. 11717-11725
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 10 ( 2023-06-26), p. 11717-11725
    Abstract: Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. One possible solution to this issue is to exploit inherent task structures for an acceleration of exploration. In this paper, we present a novel exploration approach, which encodes a special structural prior on the reward function into exploration, for sparse-reward multi-agent tasks. Specifically, a novel entropic exploration objective which encodes the structural prior is proposed to accelerate the discovery of rewards. By maximizing the lower bound of this objective, we then propose an algorithm with moderate computational cost, which can be applied to practical tasks. Under the sparse-reward setting, we show that the proposed algorithm significantly outperforms the state-of-the-art algorithms in the multiple-particle environment, the Google Research Football and StarCraft II micromanagement tasks. To the best of our knowledge, on some hard tasks (such as 27m_vs_30m}) which have relatively larger number of agents and need non-trivial strategies to defeat enemies, our method is the first to learn winning strategies under the sparse-reward setting.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2023
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2023
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 7 ( 2023-06-26), p. 8536-8544
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 7 ( 2023-06-26), p. 8536-8544
    Abstract: Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multi-agent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super hard maps. We further study extensive research problems regarding ACE, including extension, generalization and practicability.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2023
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2022
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 8 ( 2022-06-28), p. 8441-8449
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 36, No. 8 ( 2022-06-28), p. 8441-8449
    Abstract: We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such an extension enables PeVFA to preserve values of multiple policies at the same time and brings an appealing characteristic, i.e., value generalization among policies. We formally analyze the value generalization under Generalized Policy Iteration (GPI). From theoretical and empirical lens, we show that generalized value estimates offered by PeVFA may have lower initial approximation error to true values of successive policies, which is expected to improve consecutive value approximation during GPI. Based on above clues, we introduce a new form of GPI with PeVFA which leverages the value generalization along policy improvement path. Moreover, we propose a representation learning framework for RL policy, providing several approaches to learn effective policy embeddings from policy network parameters or state-action pairs. In our experiments, we evaluate the efficacy of value generalization offered by PeVFA and policy representation learning in several OpenAI Gym continuous control tasks. For a representative instance of algorithm implementation, Proximal Policy Optimization (PPO) re-implemented under the paradigm of GPI with PeVFA achieves about 40% performance improvement on its vanilla counterpart in most environments.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2022
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2020
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 05 ( 2020-04-03), p. 7325-7332
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 34, No. 05 ( 2020-04-03), p. 7325-7332
    Abstract: Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2020
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2021
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 11 ( 2021-05-18), p. 9834-9842
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 35, No. 11 ( 2021-05-18), p. 9834-9842
    Abstract: Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2021
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...