Machine Learning

DeepMind’s AI Agents Teach Themselves to Play 3D Multiplayer Game

6 Jun 2019 1:07pm, by

We’ve seen the power of artificial intelligence when it comes to strategic two-player games such as chess and Go — not only is AI capable of beating human champions at these games, it’s also able to master them by teaching itself to play in a very short period of time. But despite these big achievements, these games aren’t necessarily the most representative of real-world situations where both cooperation and collective strategy might be needed.

That’s why getting AI to autonomously master more complex environments, such as those found in first-person multiplayer games is the next logical step. Recent work from DeepMind, the Google-affiliated AI laboratory, reveals that it is indeed possible to have AI agents teach themselves how to cooperate and play multiplayer games just as well as humans might, using unsupervised reinforcement learning.

As DeepMind’s team outlines in their recently published paper in Science, the research involves using a modified version of Quake III Arena, a popular first-person, multiplayer game that has players navigating through three-dimensional maps. In the study, teams of both human players and AI agents played the game in Capture The Flag (CTF) mode, collaborating in order to capture as many flags as possible from the opposing team, within the timeframe of five minutes. The teams begin at two base camps that are located at opposite ends of the 3D game map, with the layout changing from match to match, in order to prevent agents from memorizing the layout to their advantage.

The team had their artificial agents use a convolutional neural network in order to see this world as human players might: as a stream of pixels from which they must learn the rules and goals of the game independently, and operate in a decentralized fashion. Players are equipped with a laser weapon that allows them to “tag” opponents by shooting them, causing them to return to their starting point to “respawn”. If they are carrying a flag when tagged, they must drop the flag and return to home base.

Thousands of Capture the Flag (CTF) games are played in parallel in order to generate training data, using a diverse set of 3D maps that are either indoors or outside.

Rather than being given a predetermined goal, agents start outperforming random actions. Over time, the researchers found that by playing with each other and with human players, the agents were able to learn their own “internal reward signal,” such as from gaining game points. This self-learned internal reward signal then permits them to generate their own internal goals — such as seizing the rival team’s flag to score more points. A two-tier optimization process is used to enhance agents’ internal reward signals toward racking up an increasing number of points, where reinforcement learning (RL) methods are used to further reinforce agents’ policy in selecting actions that would bring it closer to achieving its internal goal.

To help them evaluate and learn from their progress, the agents are equipped with a “multi-time scale recurrent neural network with external memory,” which helps them keep track of the score of the points during the game and at the end of the game. Essentially, the agents rely on the outcomes of the games played in order to continue evolving and developing what kinds of tactical actions they will take in the future.

Analysis of AI agent behavior.

Emergent Behaviors

Interestingly, this set-up means that each agent is capable of developing its own specialized policy without any supervision. For instance, some agents were able to emulate human-like behaviors in the game, such as following teammates around, staking out the enemy base, as well as defending their own base from opponents. After every 1,000 matches, the system evaluates how well the team is performing by comparing each agent’s policies; agents that don’t win as much will then imitate better-performing players.

“No one has told [the AI] how to play the game — only if they’ve beaten their opponent or not,” explained paper lead author Max Jaderberg on Venture Beat. “The beauty of using [an] approach like this is that you never know what kind of behaviors will emerge as the agents learn. From a research perspective, it’s the novelty of the algorithmic approach that’s really exciting. The specific way we trained our [AI] … is a good example of how to scale up and operationalize some classic evolutionary ideas.”

The results of the experiments were striking: even with the agents deliberately hobbled with a delayed reaction time, they nevertheless beat intermediate players 88 percent of the time, and advanced human players 79 percent of the time. While it’s still too early to say where this step forward will take us, the team’s findings point to the powerful potential of leveraging reinforcement learning to help AI master new, unknown situations, as well as advancing research into hybrid systems that have humans and machines collaborating together to work toward the same goal.

Images: DeepMind