Bluff Poker – Aditya Gupta

Overview

The aim of this project is ultimately to “solve” (i.e. find a Nash Equilibrium) for the two-person card game Bluff Poker. The game is similar to Liar’s Poker, a game associated with Wall Street traders, but is more complex, allowing for claims of straights, flushes, etc., as in the Texas Hold’em variant. A full set of rules can be found in the GitHub repository linked below.

I began by reviewing the formalism of POMDPs (Partially Observed Markov Decision Processes) and relevant literature, particularly 'Deep Reinforcement Learning from Self-Play in Imperfect-Information Games'. I’m now exploring how to extend ideas from 'Reevaluating Policy Gradient Methods for Imperfect-Information Games' to Bluff Poker, to compare the performance of different reinforcement learning algorithms in this setting.

The overall approach is to simplify, solve, and then add back complexity.

Approach & Iterations

Step 1: One Card Each

The first sub-goal was to solve a one-card-each version of the game. The state space here is very small: indeed, brute-force value iteration of the entire state space (taking care to allow stochastic strategies) is enough. This provided a good baseline truth against which I tried out various Reinforcement Learning techniques:

– Q-Value Tables
– Hyperparameter tuning (learning rate, exploration–exploitation, etc.)
– Best-Response / Mixing Loop (as described in “Fictitious Self-Play in Extensive-Form Games” )
– Monte Carlo vs SARSA methods

These methods were strong enough to give near-optimal performance for the 1-card version of the game. My full (somewhat meandering) thoughts can be found here.

Encouraged, I moved on to the 2-card version of the game.

Step 2: Two Cards Each

The state space is now large enough that exhuastive Q-Tables do not work: function approximation is needed. I first tried linear function approximation, representing states with feature vectors and strategies with a weight matrix. This restricted the class of possible strategies I could learn, but allowed me to implement gradient descent easily. Unfortunately, while this approach had modest success for small decks (n=2 => deck size 12), it failed to learn even basic strategies for larger n, suggesting scalability requires more effort. Full thoughts can be read here (coming soon).

Next Steps

– Assess feature and output representation: model should easily be able to associate e.g. trip 4s with pair of 4s.
– Move from linear function approximation to full-fledged Neural Networks.
– Explore traditional non-RL methods to establist some baseline (Counterfactual Regret?)

Resources

GitHub Repository