OpenAI set their eyes to beat Professional Dota 2 team at The International

Back in June, the OpenAI Five team, had smashed amateur humans in the video game Dota 2. Then early this month, OpenAI Five beat semi-professional Dota 2 players. Now OpenAI Five is at set to claim the Dota 2 throne with plans to beat the world’s best professional Dota 2 players.

The Elon Musk backed non-profit AI research company, OpenAI is pitting its team of five neural networks, called OpenAI Five, against a team of top professional “Dota 2” players at The International esports tournament. The International 2018 is an ongoing event this week (held from Aug 20-25) at the Rogers Arena in Vancouver, Canada.

With this challenge the team is using Dota 2 as a testbed for general-purpose AI systems which will start to capture the messiness and continuous nature of the real world, such as teamwork, long time horizons, and hidden information. The Dota training system showed that the current AI algorithms can learn long-term planning with large but achievable scale. The system is not specific to Dota 2, and they’ve also used it to control a robotic hand—a previously unsolved problem in robotics.OpenAI’s mission is to ensure that artificial general intelligence can benefit all of the humanity.

How does OpenAI Five work

A team of five artificial neural networks, are a kind of simulated “brains” which the team has designed to be well-shaped for learning Dota 2. The OpenAI Five sees the world as a list of 20,000 numbers which encode the visible game state (limited to the information a human player is permitted to see), and chooses an action by emitting a list of 8 numbers.

The OpenAI team writes code which maps between game state/actions and lists of numbers. Once trained, these neural networks are creatures of pure instinct—their neural networks implement memory but do not otherwise learn further. They play as a team, but do not design special communication structures and only provide with an incentive.

OpenAI Five Training

The Five neural networks start with random parameters and use a general-purpose training system, Rapid, to learn better parameters.

Rapid has OpenAI Five play copies of itself. It generates 180 years of gameplay data each day across thousands of simultaneous games. It will consume 128,000 CPU cores and 256 GPUs.

At each game frame, Rapid computes a numeric reward which is positive when something favorable happens (e.g. an allied hero gained experience) and negative when something unfavorable happens (e.g. an allied hero is killed).

Rapid will apply the Proximal Policy Optimization algorithm to update the parameters of the neural network—making actions which occurred soon before positive reward more likely and those soon before negative reward less likely.