Optimal strategy maximizing expected value for splitting pairs

Introduction

I think the game of blackjack would be relatively boring, mathematically speaking, if not for splitting pairs. But that one additional playing strategy option makes analyzing the game much more challenging and interesting: if you are dealt a pair of cards with the same rank, should you “split” the pair– at a cost of another wager– and play each single-card hand separately? If you are dealt yet another pair card, should you “re-split” to a third, or even fourth, additional hand?

When evaluating splitting strategy, we usually trade some optimality for speed and simplicity. That is, we can compute the exact expected return from splitting a pair very quickly— in milliseconds– if we are willing to constrain the player’s strategy to depend only on the cards in the current hand (and maybe some limited additional contextual information), not all of the cards observed in the entire round.

But there are some analysis questions where it would be useful to know the speed of light: how much can you possibly win with truly perfect play, with every strategy decision accounting for every card dealt from the shoe? The motivation for this post is to capture my notes on an algorithm– code is on GitHub— to compute this optimal pair-splitting strategy and corresponding maximum expected return. See [2] and [3] for related work; the objective here is to significantly reduce execution time and extend the space of feasibly computable resplits.

More retrograde analysis

I was able to borrow a lot of code from the recent solution of the children’s game Gobblet Gobblers. The idea is the same here: we compute the expected value of a pair split in two steps:

  1. Breadth-first search the decision tree of all possible states of the round.
  2. Traverse the tree backward from leaves to the root, in order of decreasing depth, computing the expected value of each state in terms of its children.

In the first step, the breadth-first search trades the memory required to store all possible states for the speed of not repeatedly re-visiting states as in a recursive (effectively depth-first) traversal. Storing all of those states requires some careful packing; more on this shortly.

In the second step, we compute the expected value of each game state in terms of its children. For example, the value of hitting (or doubling down) from a given state is a linear combination of the values of the states reached from each possible drawn card, weighted by their probabilities; the value of a decision state is the maximum of the values obtained by standing vs. hitting, etc. By “solving” for each state in order of decreasing depth, we ensure that we have already computed the expected values of any dependent child states. We start at the bottom, with leaf states, corresponding to the player having completed all split hands in the round, which we “solve” in terms of the probabilities of outcomes of the dealer’s hand, as discussed here before.

Storing packed game states

The most interesting part of this problem was the memory management. As with Gobblet, we need a hash map from each game state to its corresponding (expected) value. And once again, we’re using the lean and mean “mask, step, index” MSI hash map implementation described by Chris Wellons [1]. (Thanks to Chris for very helpful discussions about other optimizations as well!)

But now we need even more states– well over half a billion depending on the rules and number of decks– and even more memory per state. Following is the 16-byte representation of a single game state:

struct State
{
    std::int8_t cards[11];
    std::int8_t hands[4];
    std::int8_t current;

    bool operator==(const State& rhs) const
    {
        return std::memcmp(this, &rhs, sizeof(State)) == 0;
    }
};

The cards[1] through cards[10] indicate the number of cards of each rank observed during the round, including the dealer’s up card and the initial two pair cards being split. The hands[current] indicates the current hand total, negated to indicate a soft total. The cards[0] indicates the number of cards in the current hand: either 1 for a single split pair card, 2 for a two-card hand, or 3 for more than two cards.

The hands[0] through hands[current-1] indicate totals for hands that are already completed (via stand, double down, or bust), but with a slightly different convention than hands[current]: totals are “clamped” to 16 for any total less than 17, or 17, 18, 19, 20, or 21; or 22 for any busted hand, with negation indicating that the hand was doubled down. Furthermore, this subarray of completed hand values is sorted.

The result of all of this clamping and sorting is that we cannot in general reconstruct from a State the chronological “history” of decisions and composition of cards drawn to each hand in a round. But that’s okay: the objective is to preserve only information that is exploitable by strategy decisions. When trying to determine what to do next, we don’t care whether that already-completed hand total is hard or soft, but we do care whether we doubled our wager on it, etc.

But we’re not done compressing just yet. Each element of the MSI hash map array contains not just a game state “key,” but also its expected value:

struct Solved
{
    State state;
    union
    {
        Solved* next;
        double value;
    };
};

I wrote my first computer program over four decades ago, and this is the first time I’ve found myself using a union in C++. During the first breadth-first search exploration of all possible game states, we use these 8 bytes as linked list pointers to group states by depth. Then in the second step we traverse those linked lists of states, overwriting the list pointers as we compute each state’s expected value.

One interesting thing that I learned from this project is that just zeroing memory takes a long time (“only” gigabytes per second). For this reason, you can input the desired size of the hash map, to speed up scripted evaluation of large numbers of “fast” scenarios that don’t require a large number of states. For example, to evaluate splitting 2-2 against a dealer 4 in 1D, H17, DOA, DAS, SPL1, we only need a couple of million states:

Memory allocation exponent (29=12GB, 30=24GB, etc.): 21
European no hole card rule (1=yes, 0=no): 0
Dealer hits soft 17 (1=yes, 0=no): 1
Double down after split (1=yes, 0=no): 1
Double down on any two cards (1=yes, 0=no): 1
Maximum number of splits (3=SPL3): 1
Shoe (ace through ten, -1 to exit): 4 4 4 4 4 4 4 4 4 16
Pair card: 2
Dealer up card: 4

Split 2 vs. 4:
    Searched 1247392 states to depth 17, 59% capacity.
    Solving depth 0, 100% complete.
    Elapsed time 2.675 seconds.
E(split)= 0.12079971803625923

References:

  1. Wellons, C., The quick and practical “MSI” hash table [nullprogram.com]
  2. ChemMeister iCountNTrack, Composition Dependent Combinatorial Analyzer [moderator on blackjacktheforum]
  3. kc, Optimal Expected Values for a single split [bjstrat.net]

Another Secret Santa puzzle

Suppose that n couples want to participate in a Secret Santa gift exchange, where each person is secretly assigned to give a gift to one other person in the group. We need a procedure to randomize the gift-giving assignments, subject to some desired constraints:

  1. Every person receives one gift; that is, the function mapping each person to their assigned gift recipient is a bijection.
  2. No person is assigned to give a gift to themselves; that is, the function is a derangement.
  3. No person is assigned to give a gift to their partner.
  4. Every assignment generated by the procedure that satisfies the above constraints is equally likely.

A common approach to generating a random assignment is for each person to write their name on a slip of paper, put all of the slips into a hat, and pass the hat around, with each person drawing a slip to determine their assigned gift recipient. We have discussed this approach before: for example, if a person “peeks” at their slip as it is drawn and sees their own name– violating constraint (2)– then it might be tempting to “fix” the procedure inline by returning the slip to the hat and drawing another. But this turns out to violate constraint (4).

We can preserve constraint (4) by rejection sampling: if someone draws their own name– or that of their partner– then everyone immediately returns their slips to the hat, and the drawing starts again from the beginning. We can compute the probability of “accepting” such a drawing, with the added wrinkle of constraint (3) addressed here.

This post is motivated by a slightly different procedure: suppose that we have a moderator who uniformly randomly permutes the list of 2n participants, and each person is assigned to give a gift to the person immediately following them in the shuffled list (viewed cyclically, so that the last person in the list gives a gift to the first person in the list).

This procedure satisfies constraints (1), (2), and (4), without any need for rejection re-sampling. However, what is the probability that this procedure also satisfies constraint (3)?

It’s a nice problem to compute this probability exactly; but it’s also interesting to show that this probability approaches 1/e as n grows large.