On Monday, the team at OpenAI launched at Neural MMO (Massively Multiplayer Online Games), a multiagent game environment for reinforcement learning agents. It will be used for training AI in complex, open-world environments. This platform supports a large number of agents within a persistent and open-ended task.
The need for Neural MMO
Since the past few years, the suitability of MMOs for modeling real-life events has been explored. But there are two main challenges for multiagent reinforcement learning.
Firstly, there is a need to create open-ended tasks with high complexity ceiling as the current environments are complex and narrow. The other challenge, the OpenAI team specifies is the need for more benchmark environments in order to quantify learning progress in the presence of large population scales.
Different criteria to overcome challenges
The team suggests certain criteria which need to be met by the environment to overcome the challenges.
Agents can concurrently learn in the presence of other learning agents without the need of environment resets. The strategies should adapt to rapid changes in the behaviors of other agents and also consider long time horizons.
Neural MMO supports a large and variable number of entities. The experiments by the OpenAI team consider up to 100M lifetimes of 128 concurrent agents in each of 100 concurrent servers.
As the computational barrier to entry is low, effective policies can be trained on a single desktop CPU.
The Neural MMO is designed to update new content. The core features include food and water foraging system, procedural generation of tile-based terrain, and a strategic combat system. There are opportunities for open-source driven expansion in the future.
Players can join any available server while each containing an automatically generated tile-based game map of configurable size. Some tiles are traversable, such as food-bearing forest tiles and grass tiles, while others, such as water and solid stone, are not. Players are required to obtain food and water and avoid combat damage from other agents, in order to sustain their health. The platform comes with a procedural environment generator and visualization tools for map tile visitation distribution, value functions, and agent-agent dependencies of learned policies.
The team has trained a fully connected architecture using vanilla policy gradients, with a value function baseline and reward discounting as the only enhancements. The team has converted variable length observations, such as the list of surrounding players, into a single length vector by computing the maximum across all players.
Neural MMO has resolved a couple of limitations of previous game-based environments, but there are still many left unsolved. Few users are excited about this news. One of the users commented on HackerNews, “What I find interesting about this is that the agents naturally become pacifists.”
While a few others think that the company should come up with novel ideas and not copied ones. Another user commented on HackerNews, “So far, they are replicating known results from evolutionary game theory (pacifism & niches) to economics (distance & diversification). I wonder when and if they will surprise some novel results.”
To know more about this news, check out OpenAI’s official blog post.