~/will-diamond

Texas Hold LLM

A poker simulation where language models play against each other and talk trash while doing it.

PythonOpenAIFastAPINext.jsTypeScript
GPT-5 and Claude in a heads-up poker match with chat bubbles showing their table talk
Heads-up final: GPT-5 vs Claude. The chat bubbles show what they said to each other during the hand.

I wanted to see what happens when you put multiple language models at a poker table and let them play a real tournament. Not just making betting decisions (actually talking to each other during hands). Bluffing. Misdirecting. Reading tells through conversation.

The setup is straightforward: a full Texas Hold'em simulation with tournament structure, blinds that increase, and elimination when you run out of chips. Each model receives the game state (hole cards, community cards, pot size, stack sizes) and decides what to do. But here's the key part: they can also choose to talk to a specific player at the table. Only that player sees the message and has to respond.

Table talk changes everything

When a model speaks during a hand, the message goes to one specific opponent. Only the target sees it and has to respond before the conversation ends. This is a private channel, not broadcast table talk. If GPT-5 sends "that king looks dangerous" to Claude, only Claude factors that into its reasoning. The rest of the table has no idea what was said.

This creates something interesting. The models aren't just playing cards. They're playing each other. They can try to represent hands they don't have. They can feign weakness. They can try to talk opponents into folding.

Internal reasoning panel showing GPT-5's thought process before going all-in
The reasoning panel shows what's happening inside GPT-5's head as it considers going all-in. It's calculating pot odds and considering what its opponent has said.

What I learned

The most surprising finding: Claude is bad at poker. Not because it can't calculate odds (it's actually quite good at the math). But it's too trusting. Too cooperative. When GPT-5 represents a strong hand, Claude tends to believe it and fold. This is probably a feature in most contexts, but it's a liability at the poker table.

GPT-5, meanwhile, dominates. It's the most aggressive at the table, both in betting and in talk. It bluffs more. It puts pressure on other players. It won the tournament I ran, and I don't think that's a coincidence.

The verbal positioning genuinely matters. Models that say confident things ("I'm comfortable here," "this is my hand") tend to get more folds. Models that stay quiet or hedge ("I'm not sure about this") tend to get called more. The talk feeds into the decisions in measurable ways.

Full table with 6 LLM agents, showing the chat during a hand
A full table with six models. You can see the conversation happening alongside the cards.

How it works

Each hand, every model receives a prompt with the current game state and conversation history. They output two things: an action (fold, check, call, bet, raise) and optionally something to say. The game engine processes the action, updates the state, and moves to the next player.

Games run as sit-and-go tournaments. Blinds increase on a schedule. When you lose all your chips, you're eliminated. Last model standing wins. The system records full hand histories so you can replay any game.

The frontend shows everything in real time: the cards, the chips, the chat bubbles, and most importantly, the reasoning. You can expand any player to see exactly what they were thinking when they made a decision.

Try it yourself

The demo link above will take you to a recorded game. You can watch the hands play out, read the table talk, and see the reasoning panels. It's oddly compelling, like watching a poker stream where the players are language models trying to outthink each other.