Google’s DeepMind introduced AlphaZero last year as a reinforcement learning program that masters three different types of board games, Chess, Shogi and Go to beat world champions in each case. Yesterday, they announced that a full evaluation of AlphaZero has been published in the journal Science, which confirms and updates the preliminary results. The research paper describes how Deepmind’s AlphaZero learns each game from scratch, without any human intervention or no inbuilt domain knowledge but the basic rules of the game.
Unlike traditional game playing programs, Deepmind’s AlphaZero uses deep neural networks, a general-purpose reinforcement learning algorithm, and a general-purpose tree search algorithm. The first play by the program is completely random. Over-time the system uses RL algorithms to learn from wins, losses and draws to adjust the parameters of the neural network. The amount of training varies taking approximately 9 hours for chess, 12 hours for shogi, and 13 days for Go. For searching, it uses Monte-Carlo Tree Search (MCTS) to select the most promising moves in games.
Testing and Evaluation
Deepmind’s AlphaZero was tested against the best engines for chess (Stockfish), shogi (Elmo), and Go (AlphaGo Zero). All matches were played for three hours per game, plus an additional 15 seconds for each move. AlphaZero was able to beat all its component in each evaluation.
Per Deepmind’s blog:
In chess, Deepmind’s AlphaZero defeated the 2016 TCEC (Season 9) world champion Stockfish, winning 155 games and losing just six games out of 1,000. To verify the robustness of AlphaZero, it was also played on a series of matches that started from common human openings. In each opening, AlphaZero defeated Stockfish.
It also played a match that started from the set of opening positions used in the 2016 TCEC world championship, along with a series of additional matches against the most recent development version of Stockfish, and a variant of Stockfish that uses a strong opening book. In all matches, AlphaZero won.
In shogi, AlphaZero defeated the 2017 CSA world champion version of Elmo, winning 91.2% of games.
In Go, AlphaZero defeated AlphaGo Zero, winning 61% of games.
AlphaZero’s ability to master three different complex games is an important progress towards building a single AI system that can solve a wide range of real-world problems and generalize to new situations.
People on the internet are also highly excited about this new achievement.
— Daniel King (@DanielKingChess) December 6, 2018
"Programs usually reflect priorities and prejudices of programmers, but because AlphaZero programs itself, I would say that its style reflects the truth." Awesome words from the brilliant @Kasparov63 – read Kasparov's insightful review of #AlphaZero here: https://t.co/bqhrAF2A5m
— Demis Hassabis (@demishassabis) December 6, 2018
A turning point in artificial intelligence: creating machines that think like humans. AlphaZero destroys Stockfish in chess, dominates go and shogi games too. https://t.co/NdFqvUR2ZC pic.twitter.com/v7TFWvqsnO
— Trevor A. Branch (@TrevorABranch) December 6, 2018
This is interesting in the context of the debate about changing the World Chess Championship rules. It was argued elite play has led to boring games where players neuralise each other. But maybe that's not the case. Maybe human play just needs to evolve more, like AlphaZero https://t.co/0Tdinj3TEY
— Leon Watson ♛ (@LeonWatson) December 6, 2018
I couldn't help but be pleased that AlphaZero plays in open, dynamic style. It's not just my style, but it's not the incomprehensible maneuvering we feared computer chess would become. My @sciencemagazine article: https://t.co/ftcKzYTsw0 https://t.co/85h44ebCrS
— Garry Kasparov (@Kasparov63) December 6, 2018