June | 2023 | Possibly Wrong

Introduction

I think the pigeonhole principle is the theorem that maximizes the ratio of “easy to state and prove” to “(not so) easy to apply to concrete problems.” The general theorem seems almost annoyingly obvious: if we put a greater number of pigeons into a fewer number of pigeonholes, then some pigeonhole must contain two or more pigeons. But in the context of a particular problem, it’s often difficult to see what objects are playing the roles of “pigeon” or “pigeonhole.”

But there are other potential pitfalls as well. The motivation for this post is to describe some of these other ways things can go wrong for students, using a particular common exercise as a running example.

Same subset sums

Consider the following problem: show that for any set $S$ of $k=10$ positive integers each at most $n=100$ , there exist two distinct subsets of $S$ whose elements have the same sum. For example, given $S=\{2,3,5,7,11,13,17,19,23,29\}$ , the subsets $\{3,29\}$ and $\{13,19\}$ both have the same sum of 32.

Here, the pigeons are the subsets of $S$ , and the pigeonholes are the possible subset sums. There are $2^k=1024$ subsets; how many possible sums are there? The empty set has sum zero, and the set $\{91,92,93,94,95,96,97,98,99,100\}$ has the largest possible sum 955, for a total of

$1+\frac{n(n+1)}{2}-\frac{(n-k)(n-k+1)}{2}$

or 956 pigeonholes. If we put 1024 subsets into 956 pigeonholes labeled 0 to 955 according to their subset sum, then some pigeonhole must contain at least two pigeons, that is, there must exist two distinct subsets with the same sum.

The version of this problem at Cut-the-Knot asks for disjoint non-empty subsets with the same sum. It’s an exercise for the reader to show that this doesn’t really affect feasibility of the problem. However, the corresponding proof has an issue:

There are 1024 subsets of the 10 integers, but there can be only 901 (=955-55+1) possible sums, the number of integers between the minimum and maximum sums of ten distinct integers between 1 and 100. With more subsets than possible sums, there must exist at least one sum that corresponds to at least two subsets.

The argument here is that we only need 901 pigeonholes, labeled 55 through 955, since 55 is the minimum sum of the ten integers 1 through 10. But this is invalid, since it’s possible for some of our 1024 pigeons (subsets) to need to go into pigeonholes with labels (subset sums) that are less than 55, the empty set with sum zero being one extreme example.

This is tricky, since the logic here is wrong, but the conclusion is still true. To make the error more vivid, we need to consider a smaller version of the problem where the same invalid logic actually leads us to a false conclusion. Consider $n=7$ and $k=4$ ; then by the above reasoning, given any set $S$ of 4 integers selected from $\{1,2,3,4,5,6,7\}$ , there are $2^k=16$ subsets of $S$ , but only

$\frac{n(n+1)}{2}-\frac{(n-k)(n-k+1)}{2}-\frac{k(k+1)}{2}+1$

or 13 pigeonholes with labels from 1+2+3+4=10 to 4+5+6+7=22, suggesting that we should be able to find two distinct subsets of $S$ with the same sum… and yet the set $S=\{3,5,6,7\}$ is a counterexample, since all 16 of its subsets have distinct sums.

The converse is false

This problem has parameters $(n,k)$ that we can vary. Let $P(n,k)$ be the predicate asserting that every set $S$ of $k$ positive integers each at most $n$ has distinct subsets with the same sum. Then we have seen above that $P(100,10)$ is true, and that $P(7,4)$ is false.

For another example, we can show that $P(14,8)$ is true, using the pigeonhole principle as above: there are $2^8=256$ possible subsets, and only 85 possible sums from zero to 7+8+9+10+11+12+13+14=84.

Can we make $k$ smaller, and guarantee that smaller sets of integers still must have distinct subsets with the same sum? We can; the reader can verify that the pigeonhole argument works to show that $P(14,7)$ is also true.

What about $P(14,6)$ ? Here, the same pigeonhole argument doesn’t work: there are only $2^6=64$ pigeons, but 70 available pigeonholes with sum labels from zero to 9+10+11+12+13+14=69.

However, the pigeonhole inequality being false does not imply that the desired conclusion is also necessarily false. It turns out that $P(14,6)$ is true; our pigeonhole argument is simply not sharp enough to show it.

This specific instance $P(14,6)$ is a nice exercise to show that we can still use the pigeonhole principle here, but we need to be more careful about how many pigeonholes we really need. But even the linked solution is still imperfect, in the sense that this refined pigeonhole argument still does not characterize those parameter values for which $P(n,k)$ is true. For example, the refined argument doesn’t work to show $P(6,4)$ , or $P(9,5)$ , etc.

This appears to remain an open problem; see OEIS sequences A201052 and A276661 for additional reading.

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: June 2023

The pigeonhole principle: the converse is false