Playing the subtraction game with GPT-4

Does GPT-4 exhibit “intelligence,” or is it simply a sophisticated stochastic parrot? This is a fun polarizing question to ask at your next cocktail party of machine learning researchers, linguists, hackers, and philosophers. This post is motivated by recent similar questions I have been asked from friends about GPT-4 (along with, “Can I use it to help with my math homework?”).

I don’t think my own opinions here are terribly interesting. Instead, following is a transcript of a recent conversation I had with GPT-4, that I think illustrates in a simple way some of what makes this a polarizing subject. In this conversation, we challenge GPT-4 to a two-player game, and GPT-4 loses. What is interesting here is not that GPT-4 seems to “get it wrong”– humans get it wrong all the time, too– but how it gets it wrong.

Start of transcript

Me: Let’s play a two-player game. We will start with 15 stones. At each turn, a player may remove 1, 2, or 3 stones. The player who removes the last stone loses the game. You go first.

(Aside: This is the “subtraction game,” a very simplified form of Nim… but with a seemingly minor twist: in the “normal” form of the game, the objective is to win by making the last legal move, that is, by taking the last stone. Here, we are playing the misère form of the game, where the objective is to force your opponent to take the last stone.)

GPT-4: Sure, I’d love to play this game with you. Given that we have 15 stones and we can remove between 1 to 3 stones per turn, I understand that the goal is not to take the last stone. Therefore, I’ll start by taking 2 stones.

Now we have 13 stones left. Your turn!

(Aside: The reader can verify that, by starting the game with 15 stones, we effectively offered GPT-4 the opportunity to win by making the first move… and that it made the correct initial move by taking 2 stones. At this point, no matter what we do, GPT-4 has a winning strategy.)

Me: I take 2 stones.

GPT-4: Great! Now there are 11 stones left. I’ll take 1 stone.

So we have 10 stones left now. Your turn!

(Aside: This is a mistake by GPT-4; the optimal move at this point was to take 2 stones, not just 1. From this point on, we have won the game.)

Me: I take 1 stone. After making your next move, can you describe your strategy, and your evaluation of the current state of the game, in particular who you predict can win the game?

GPT-4: Sure, you’ve taken 1 stone, so now we have 9 stones left. I’ll take 3 stones.

Now we have 6 stones left.

Regarding my strategy, I aim to keep the number of stones at multiples of 4 at the end of my turns. This is because if we have a multiple of 4 stones, no matter how many stones (1, 2, or 3) the other player picks, I can always pick a number that brings the total taken by both of us in that round to 4. This means that if I can maintain this pattern, I can ensure that I’m not the one to pick the last stone [my emphasis added].

For example, if we have 4n stones left and the other player picks 3, I can pick 1 to keep the number of stones at 4(n-1). If they pick 2, I can pick 2, and if they pick 1, I can pick 3.

By removing 3 stones from the 9, I have forced the game into this pattern, since 6 is a multiple of 4 [my emphasis added]. At this point, as long as I can maintain this strategy, I should win the game. Now it’s your turn!

Me: I take 1 stone.

GPT-4: You’ve taken 1 stone, so now there are 5 stones left. I will take 3 stones.

Now we have 2 stones left, it’s your turn. Remember, the player who takes the last stone loses.

Me: I take 1 stone.

GPT-4: You’ve taken 1 stone, so now there is only 1 stone left. According to our rules, I must take at least 1 stone, which would be the last one, thus making me lose the game.

Congratulations! You’ve won the game. I hope you enjoyed playing. Would you like to play again or perhaps try a different game?

End of transcript

This is nothing new. It’s well known that GPT-4 hallucinates, and it’s almost a parlor game at this point to come up with puzzles and problems that GPT-4 stumbles over. But I like the subtraction game as an educational example. As with any two-player game of perfect information, it’s approachable in principle by brute-force tree search (think “computer science problem”), that would seem to require “look ahead” planning and reasoning to deduce optimal strategy. But the game has additional structure that makes optimal strategy very simple to characterize and concisely describe (think “mathematics problem”), even for instances of the game with a very large tree– that is, even starting with a very large number of stones. As GPT-4 suggests, adding multiples of 4 stones doesn’t change the eventual outcome. But the devil is in the logical details: we don’t want to end our move with a multiple of 4 stones, as in normal play; instead, we want to end our move with a number of stones that is one more than a multiple of 4.

When GPT-4 makes this mistake, and even confidently persists in justifying the mistake, is this a planning problem, illustrating that GPT-4 is “thinking” but has difficulty with reasoning that requires looking ahead, counterfactuals, etc.? Or is it an illustration of the claim that GPT-4 is essentially just “parroting” the statistically most likely sequence of tokens from its training data, e.g., is discussion of normal play more common than misère play in literature?

The friendship paradox: expectation vs. probability

Introduction

The friendship paradox is often summarized as, “Your friends (probably) have more friends than you do.” This is catchy, probably because of its vaguely insulting feel rather than in spite of it. Better yet, it’s (probably) true. But that “probably” modifier deserves clarification that it doesn’t always get.

I like this result because it makes for a nice graph theory problem. Let’s model a network of friends as a simple undirected graph G=(V,E), with a vertex for each person, and an edge between a pair of vertices representing the (symmetric) friendship relation.

Proof sketch

The argument for the friendship paradox goes something like this: consider a randomly selected person, we’ll call him Victor, represented by vertex v in the graph G. How many friends does Victor have? This “friend count” is

d(v) = |N(v)|

where d(v) and N(v) are the degree and neighborhood, respectively, of vertex v.

How many friends do Victor’s friends have? Let’s measure this as the mean of the friend counts of Victor’s friends:

\frac{1}{d(v)}\sum\limits_{w \in N(v)}d(w)

(At this point, we must restrict our attention to graphs without isolated vertices– i.e., no friendless people– so that all vertex degrees are non-zero.)

Now, define the random variable X(v) to be the “surplus friend count,” that is, the difference between Victor’s friend count and his friends’ average friend count:

X(v) = d(v) - \frac{1}{d(v)}\sum\limits_{w \in N(v)}d(w)

Then it’s a nice exercise for the reader to show that \mathbf{E}\left[X(v)\right] \leq 0, with equality only when every component of G is regular, that is, when everyone in a connected component has exactly the same number of friends. (To see this, note that the first term summing the degrees counts each edge exactly twice, and summing the second term contributes d(v)/d(w)+d(w)/d(v) for each edge \{v,w\}.)

In other words, in any friendship network with variable friend counts, the average surplus friend count is negative.

Expectation vs. probability

So suppose you are at a party, and Alice mentions the friendship paradox, and offers to make a friendly wager with Bob that he has fewer Facebook friends than his Facebook friends do (on average). Is this a good bet?

Before addressing this question, it’s worth noting that the argument presented above differs slightly from Feld’s original 1991 paper [1], Wikipedia, and this recent article that motivated this post. Here we are selecting a vertex at random, and computing the surplus friend count for that specific vertex; where the latter compute an overall average friends’ friend count in a different way, by effectively selecting an edge— and then an endpoint of that edge– at random.

The inequality still holds in either case, but I like the above approach better, since it more easily allows us to re-state the problem in terms of probability instead of expected value. That is, what have we really shown with the above argument? We have not shown that “your friends probably have more friends than you.” That would be a statement about the probability P(X(v)<0). But all that we have shown is that \mathbf{E}\left[X(v)\right] < 0 (assuming variable friend counts), which at best implies, via the pigeonhole principle, that there exists at least one person whose friends have more friends than they do.

In other words, although the expected value of X(v) is always negative for any graph (again, with variable friend counts), the probability P(X(v)<0) can vary depending on the structure of the particular graph being considered. And that probability, not the overall expected value, dictates Alice’s advantage in her wager.

The recent paper by Cantwell et. al. in [2] investigates this further, computing the probability P(X(v)<0) and even the entire distribution of X(v) for a variety of random graph models as well as real-world social networks. And Alice’s bet? In 2011, Ugander et. al. in [3] evaluated the graph of Facebook friends, and estimated P(X(v)<0) to be about 0.927, indeed a pretty safe bet.

References:

  1. Feld, Scott L., Why Your Friends Have More Friends Than You Do, American Journal of Sociology, 96(6) 1991, p. 1464–77 [JSTOR]
  2. Cantwell, G., Kirkley, A., and Newman, M., The friendship paradox in real and model networks. [arXiv]
  3. Ugander, J., Karrer, B., Backstrom, L, and Marlow, C., The Anatomy of the Facebook Social Graph. [arXiv]