Earlier this year in January, Google’s DeepMind AI AlphaStar had defeated two professional players, TLO and MaNa, at StarCraft II, a real-time strategy game. Two days ago, DeepMind announced that AlphaStar has now achieved the highest possible online competitive ranking, called Grandmaster level, in StarCraft II. This makes AlphaStar the first AI to reach the top league of a widely popular game without any restrictions.
AplhaStar used the multi-agent reinforcement learning technique and rated above 99.8% of officially ranked human players. It was able to achieve the Grandmaster level for all the three StarCraft II races – Protoss, Terran, and Zerg. The DeepMind researchers have published the details of AlphaStar in the paper titled, ‘Grandmaster level in StarCraft II using multi-agent reinforcement learning’.
Our new @nature paper: AlphaStar is the first learning system to reach the top tier of a major esport without any game restrictions, achieving Grandmaster status in StarCraft II.
— DeepMind (@DeepMindAI) October 30, 2019
How did AlphaStar achieve the Grandmaster level in StarCraft II?
The DeepMind researchers were able to develop a robust and flexible agent by understanding the potential and limitations of open-ended learning. This helped the researchers to make AlphaStar cope with complex real-world domains. “Games like StarCraft are an excellent training ground to advance these approaches, as players must use limited information to make dynamic and difficult decisions that have ramifications on multiple levels and timescales,” states the blog post.
The StarCraft II video game requires players to balance high-level economic decisions with individual control of hundreds of units. When playing this game, humans are under physical constraints which limits their reaction time and their rate of actions. Accordingly, AphaStar was also imposed with these constraints, thus making it suffer from delays due to network latency and computation time. In order to limit its actions per minute (APM), AphaStar’s peak statistics were kept substantially lower than those of humans. To align with the standard human movement, it also had a limited viewing of the portion of the map, AlphaStar could register only a limited number of mouse clicks and had only 22 non-duplicated actions to play every five seconds.
AlphaStar uses a combination of general-purpose techniques like neural network architectures, imitation learning, reinforcement learning, and multi-agent learning. The games were sampled from a publicly available dataset of anonymized human replays, which were later trained to predict the action of every player. These predictions were then used to procure a diverse set of strategies to reflect the different modes of human play.
Dario “TLO” WÜNSCH, a professional starcraft II player says, “I’ve found AlphaStar’s gameplay incredibly impressive – the system is very skilled at assessing its strategic position, and knows exactly when to engage or disengage with its opponent. And while AlphaStar has excellent and precise control, it doesn’t feel superhuman – certainly not on a level that a human couldn’t theoretically achieve. Overall, it feels very fair – like it is playing a ‘real’ game of StarCraft.”
According to the paper, AlphaStar had the 1026 possible actions available at each time step, thus it had to make thousands of actions before learning if it has won or lost the game. One of the key strategies behind AlphaStar’s performance was learning human strategies. This was necessary to ensure that the agents keep exploring those strategies throughout self-play. The researchers say, “To do this, we used imitation learning – combined with advanced neural network architectures and techniques used for language modeling – to create an initial policy which played the game better than 84% of active players.”
AlphaStar also uses a latent variable to encode the distribution of opening moves from human games. This helped AlphaStar to preserve the high-level strategies and enabled it to represent many strategies within a single neural network. By training the advances in imitation learning, reinforcement learning, and the League, the researchers were able to train AlphaStar Final, the agent that reached the Grandmaster level at the full game of StarCraft II without any modifications.
AlphaStar used a camera interface, which helped it get the exact information that a human player would receive. All the interface and restrictions faced by AlphaStar were approved by a professional player. Finally, the results indicated that general-purpose learning techniques can be used to scale AI systems to work in complex and dynamic environments involving multiple actors.
AlphaStar’s great feat has got many people excited about the future of AI.
What an intriguing idea. Give AI control over games like this to see what happens! I love it.
— Michael Robinson (@mickdooit) October 30, 2019
Lots of nice ideas revealed in the AlphaStar paper. Main takeaways for me were the model architecture, methods to deal with off-policy w/ large action spaces, and crafting a population with a combination of main agents and "exploiters".
— Kaixhin (@KaiLashArul) November 1, 2019
— joshua spanier (@JoshuaSpanier) November 1, 2019