OpenAI’s Multi-Agent Deep Deterministic Policy Gradients (MADDPG)

An actor-critic approach to multi-agent RL problems

May 26 ·8min read

YvIv2yR.jpg!web

Photo by Safar Safarov on Unsplash

A New Approach

Multi-agent reinforcement learning is an on-going, rich field of research. Due to many hurdles, naively applying single-agent algorithms in multi-agent contexts “puts us in a pickle.” Learning becomes difficult due to many reasons, especially these two:

The non-stationarity between independent agents
The exponential increase in state and action space

Researchers have proposed plenty of approaches to mitigate the effects of these challenges. A large subset of these methods falls under the umbrella of “centralized planning with decentralized execution.”

Centralized Planning

Each agent only has direct access to local observations. These observations can be many things: an image of the environment, relative positions to landmarks, or even relative positions of other agents. Also, during learning, all agents are guided by a centralized module or critic .

Even though each agent only has local information and local policies to train, there is an entity overlooking the entire system of agents, advising them on how to update their policies. This reduces the effect of non-stationarity. All agents learn with the help of a module with global information.

Decentralized Execution

Then, during testing, the centralized module is removed, leaving only the agents, their policies, and local observations. This reduces the detriments of increasing state and action space because joint policies are never explicitly learned. Instead, we hope that the central module has given enough information to guide local policy training such that it is optimal for the entire system once test time comes around.

OpenAI

Researchers at OpenAI, UC Berkeley, and McGill University introduced a novel approach using Multi-Agent Deep Deterministic Policy Gradients. Inspired by its single-agent counterpart DDPG, this approach uses actor-critic style learning and has shown promising results in multi-agent settings. Here, we outline the algorithm, the intuition behind, and the results.

OpenAI’s Multi-Agent Deep Deterministic Policy Gradients (MADDPG)