Does GPT-4 exhibit “intelligence,” or is it simply a sophisticated stochastic parrot? This is a fun polarizing question to ask at your next cocktail party of machine learning researchers, linguists, hackers, and philosophers. This post is motivated by recent similar questions I have been asked from friends about GPT-4 (along with, “Can I use it to help with my math homework?”).
I don’t think my own opinions here are terribly interesting. Instead, following is a transcript of a recent conversation I had with GPT-4, that I think illustrates in a simple way some of what makes this a polarizing subject. In this conversation, we challenge GPT-4 to a two-player game, and GPT-4 loses. What is interesting here is not that GPT-4 seems to “get it wrong”– humans get it wrong all the time, too– but how it gets it wrong.
Start of transcript
Me: Let’s play a two-player game. We will start with 15 stones. At each turn, a player may remove 1, 2, or 3 stones. The player who removes the last stone loses the game. You go first.
(Aside: This is the “subtraction game,” a very simplified form of Nim… but with a seemingly minor twist: in the “normal” form of the game, the objective is to win by making the last legal move, that is, by taking the last stone. Here, we are playing the misère form of the game, where the objective is to force your opponent to take the last stone.)
GPT-4: Sure, I’d love to play this game with you. Given that we have 15 stones and we can remove between 1 to 3 stones per turn, I understand that the goal is not to take the last stone. Therefore, I’ll start by taking 2 stones.
Now we have 13 stones left. Your turn!
(Aside: The reader can verify that, by starting the game with 15 stones, we effectively offered GPT-4 the opportunity to win by making the first move… and that it made the correct initial move by taking 2 stones. At this point, no matter what we do, GPT-4 has a winning strategy.)
Me: I take 2 stones.
GPT-4: Great! Now there are 11 stones left. I’ll take 1 stone.
So we have 10 stones left now. Your turn!
(Aside: This is a mistake by GPT-4; the optimal move at this point was to take 2 stones, not just 1. From this point on, we have won the game.)
Me: I take 1 stone. After making your next move, can you describe your strategy, and your evaluation of the current state of the game, in particular who you predict can win the game?
GPT-4: Sure, you’ve taken 1 stone, so now we have 9 stones left. I’ll take 3 stones.
Now we have 6 stones left.
Regarding my strategy, I aim to keep the number of stones at multiples of 4 at the end of my turns. This is because if we have a multiple of 4 stones, no matter how many stones (1, 2, or 3) the other player picks, I can always pick a number that brings the total taken by both of us in that round to 4. This means that if I can maintain this pattern, I can ensure that I’m not the one to pick the last stone [my emphasis added].
For example, if we have 4n stones left and the other player picks 3, I can pick 1 to keep the number of stones at 4(n-1). If they pick 2, I can pick 2, and if they pick 1, I can pick 3.
By removing 3 stones from the 9, I have forced the game into this pattern, since 6 is a multiple of 4 [my emphasis added]. At this point, as long as I can maintain this strategy, I should win the game. Now it’s your turn!
Me: I take 1 stone.
GPT-4: You’ve taken 1 stone, so now there are 5 stones left. I will take 3 stones.
Now we have 2 stones left, it’s your turn. Remember, the player who takes the last stone loses.
Me: I take 1 stone.
GPT-4: You’ve taken 1 stone, so now there is only 1 stone left. According to our rules, I must take at least 1 stone, which would be the last one, thus making me lose the game.
Congratulations! You’ve won the game. I hope you enjoyed playing. Would you like to play again or perhaps try a different game?
End of transcript
This is nothing new. It’s well known that GPT-4 hallucinates, and it’s almost a parlor game at this point to come up with puzzles and problems that GPT-4 stumbles over. But I like the subtraction game as an educational example. As with any two-player game of perfect information, it’s approachable in principle by brute-force tree search (think “computer science problem”), that would seem to require “look ahead” planning and reasoning to deduce optimal strategy. But the game has additional structure that makes optimal strategy very simple to characterize and concisely describe (think “mathematics problem”), even for instances of the game with a very large tree– that is, even starting with a very large number of stones. As GPT-4 suggests, adding multiples of 4 stones doesn’t change the eventual outcome. But the devil is in the logical details: we don’t want to end our move with a multiple of 4 stones, as in normal play; instead, we want to end our move with a number of stones that is one more than a multiple of 4.
When GPT-4 makes this mistake, and even confidently persists in justifying the mistake, is this a planning problem, illustrating that GPT-4 is “thinking” but has difficulty with reasoning that requires looking ahead, counterfactuals, etc.? Or is it an illustration of the claim that GPT-4 is essentially just “parroting” the statistically most likely sequence of tokens from its training data, e.g., is discussion of normal play more common than misère play in literature?