Researchers from Facebook and Carnegie Mellon University have developed an AI bot that has defeated human professionals in six-player no-limit Texas Hold’em poker. Pluribus defeated pro players in both “five AIs + one human player” format and a “one AI + five human players” format. Pluribus was tested in 10,000 games against five human players, as well as in 10,000 rounds where five copies of the AI played against one professional. This is the first time an AI bot has beaten top human players in a complex game with more than two players or two teams.
Pluribus was developed by Noam Brown of Facebook AI Research and Tuomas Sandholm of Carnegie Mellon University. Pluribus builds on Libratus, their previous poker-playing AI which defeated professionals at Heads-Up Texas Hold ’Em, a two-player game in 2017.
Mastering 6-player Poker for AI bots is difficult considering the number of possible actions. First, obviously since this involves six players, the games have a lot more variables and the bot can’t figure out a perfect strategy for each game – as it would do for a two player game. Second, Poker involves hidden information, in which a player only has access to the cards that they see. AI has to take into account how it would act with different cards so it isn’t obvious when it has a good hand.
Brown wrote on a Hacker News thread, “So much of early AI research was focused on beating humans at chess and later Go. But those techniques don’t directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning). Fortunately, these techniques now work really really well for poker.”
What went behind Pluribus?
Initially, Pluribus engages in self-play by playing against copies of itself, without any data from human or prior AI play used as input. The AI starts from scratch by playing randomly, and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy. Pluribus’s self-play produces a strategy for the entire game offline, called the blueprint strategy. This online search algorithm can efficiently evaluate its options by searching just a few moves ahead rather than only to the end of the game. Pluribus improves upon the blueprint strategy by searching for a better strategy in real time for the situations it finds itself in during the game.
The blueprint strategy in Pluribus was computed using a variant of counterfactual regret minimization (CFR). The researchers used Monte Carlo CFR (MCCFR) that samples actions in the game tree rather than traversing the entire game tree on each iteration. Pluribus only plays according to this blueprint strategy in the first betting round (of four), where the number of decision points is small enough that the blueprint strategy can afford to not use information abstraction and have a lot of actions in the action abstraction. After the first round, Pluribus instead conducts a real-time search to determine a better, finer-grained strategy for the current situation it is in.
What is astonishing is that Pluribus uses very little processing power and memory, less than $150 worth of cloud computing resources. The researchers trained the blueprint strategy for Pluribus in eight days on a 64-core server and required less than 512 GB of RAM. No GPUs were used.
Stassa Patsantzis, a Ph.D. research student appreciated Pluribus’s resource-friendly compute power. She commented on Hacker News, “That’s the best part in all of this. I’m hoping that there is going to be more of this kind of result, signaling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.” She also said how this is significantly lesser than ML algorithms used at DeepMind and Open AI. “In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently”, she added.
AI bots such as Pluribus give a better understanding of how to build general AI that can cope with multi-agent environments, both with other AI agents and with humans.
A six-player AI bot has better implications in reality because two-player zero-sum interactions (in which one player wins and one player loses) are common in recreational games, but they are very rare in real life. These AI bots can be used for handling harmful content, dealing with cybersecurity challenges, or managing an online auction or navigating traffic, all of which involve multiple actors and/or hidden information.
Apart from fighting online harm, four-time World Poker Tour title holder Darren Elias helped test the program’s skills, said, Pluribus could spell the end of high-stakes online poker. “I don’t think many people will play online poker for a lot of money when they know that this type of software might be out there and people could use it to play against them for money.” Poker sites are actively working to detect and root out possible bots. Brown, Pluribus’ developer, on the other hand, is optimistic. He says it’s exciting that a bot could teach humans new strategies and ultimately improve the game. “I think those strategies are going to start penetrating the poker community and really change the way professional poker is played,” he said.
For more information on Pluribus and it’s working, read Facebook’s blog.