We fought the AI, and the AI won. And in this round of machine learning, computers learned to bluff.
In Rivers Casino in Pittsburgh, Carnegie Mellon University pitted its artificial intelligence poker-playing program, called Libratus, against four of the world’s best professional poker players in a 20-day poker competition. Libratus, the AI whose Latin name means “balanced,” beat Jimmy Chou, Dong Kim, Jason Les and Daniel McAulay during the Brains vs. Artificial Intelligence competition.
Odds makers favored the human players 4- or 5- to 1 over AI Libratus. But they were wrong. Tuomas Sandholm, professor of computer science and graduate student Ph.D. student Noam Brown of Carnegie Mellon University (CMU), the brains behind Libratus hovered around the casino, watching their brainchild steadily gain ground.
Players spent 11 hours per day battling their AI opponent on-screen, playing heads-up, no-limit Texas Hold’em, an unlimited variation of poker that involves two players. A total of 120,000 games were played, with Libratus netting $1,766,250 in chips. Though the human players were not betting with real money, it was nevertheless a crushing victory for a piece of software that apparently taught itself how to outsmart its human counterparts.
— Noam Brown (@polynoamial) January 30, 2017
“But every time we find a weakness, it learns from us, and the weakness disappears the next day,” he said.
Hit them so hard their motherboards probably felt it. #BrainsVsAI
— Libratus (@BotsAintLoyal) January 21, 2017
The tournament was grueling. After the daily play had wrapped up at 10 p.m., Brown connected Libratus to the Pittsburgh Supercomputer Center’s Bridges high-performance computer to run algorithms to improve its strategy overnight. In the morning he would spend two hours getting the newly enhanced bot back up and running before the 11 a.m. start time.
But no-limit Texas hold’em is the last frontier. Surprisingly enough, despite its relative ease-of-play, Texas Hold ‘em is widely regarded as the most complex poker game from an AI perspective. First, a two-player game produces 10160 possible situations.
Michael Shinzaki took down Cepheus like an iceberg took down the Titanic.
The University of Alberta launched the “unbeatable” Cepheus Poker Project in 2016. DeepStack, a joint effort including researchers from Alberta, Charles University and Czech Technical University in Prague, plays limit hold ’em poker, which is much less complex than no-limit and has amassed good records against humans.
Sandholm’s previous attempt, called Claudio lost to the humans in 2015. So they scrapped the losing computer, and started from scratch, creating new algorithms using the lessons learned. The 20-day, 120,000 hands of Heads-Up no-limit Texas hold’em tournament is designed to provide statistically significant data to Sandholm and Brown. Which might not be necessary after such a trouncing, but still necessary for science.
Computers Can Now Bluff
The enormous number of possible plays is only part of the problem. Unlike chess, Jeopardy!, or Go, no-limit Texas Hold’em is considered an imperfect game. Players do not have identical information about the current state of the game. It’s not just that the cards are not known and that there is no limit on the number of chips that can be bet on any given hand, bluffing, which is essentially withholding information, is a crucial part of the game. Developers are programming for imperfect sets of data.
So how did the AI do it? Before the competition, Libratus’ algorithms helped establish a strategy by training itself in trillions of games, using 15 million processor-core-hours on a new supercomputer called Bridges at the Pittsburgh Supercomputing Center. The AI essentially has no built-in or fixed strategy; rather, it uses a variant of an algorithmic technique called counterfactual regret minimization to help compute a strategy, the IEEE Spectrum reported.
“We don’t write the strategy,” Sandholm explained in the CMU newspaper. “We write the algorithm that computes the strategy.” In an email, Sandholm said the algorithms are written in C++.
Brown explained on an AMA on Reddit, “We use a form of Monte Carlo Counterfactual Regret Minimization (CFR) distributed over about 200 nodes. We also incorporate a sampled form of Regret-Based Pruning which speeds up the computation quite a bit.”
The Libratus algorithm accommodates the Nash Equilibrium. A foundation in gaming theory, the Nash equilibrium states, in essence, that no player has an incentive to change their strategy. Brown shared during the AMA that the bot trained using a special variant of CFR. Before the competition, it only learned from played poker against itself. No human hand histories were used.
The AI also engages in “endgame solving” during each hand to determine how much it affords to risk in the final betting rounds, a technique which apparently proved to be the decisive factor in helping the AI gain the upper hand. At the end of each day of competition, Libratus was then hooked up again to the supercomputer so that it can refine its strategy further. It even prioritizes improvements that haven’t yet been exploited by its rivals. Thanks to these differences between it and its predecessors, Libratus was able to pull out ahead, even with the uncertainty of no-limit Texas Hold’em poker, which allows for two hidden cards and unlimited sizes for bets.
“People have this idea that poker is a very human game and that bots can’t bluff, for example. That’s totally wrong. It’s not about reading your opponent and trying to tell if they are lying, it’s about the cards and probabilities,” explained Brown.
On top of that, another common thread between this loss and previous defeats of human champions at once-unassailable gaming fields like Go, is that humans get tired and can be emotionally affected by a setback, which in turn affects their overall play.
“Libratus turned out to be way better than we imagined. It’s slightly demoralizing,” said Les, who also played against the Claudico AI in 2015. “If you play a human and lose, you can stop, take a break. Here we have to show up to take a beating every day for 11 hours a day. It’s a real different emotional experience when you’re not used to losing that often.”
So we have an AI that used machine learning and C++ algorithms to beat a single opponent at poker. Although Libratus’ win is a huge step forward, two-hand poker is still a restricted universe. Adding a second, third or fourth player would overload the program, so there’s still a long way to go.
So what can we expect in the future? Artificial intelligence researchers have their sights set on conquering other complex, strategic games like Starcraft, Civilization, and even role-playing games like Skyrim. The biggest challenge that remains is creating an adaptable artificial general intelligence that can learn and master multiple types of games relatively well — not just one — much like a human might. For now, we only know that beating humans at this variant of poker is just another piece of this greater puzzle.
Feature image: Noam Brown watches a human play with the AI Libratus as Tuomas Sandholm looks on. Photo courtesy of Carnegie Mellon University.