Introduction
I think the pigeonhole principle is the theorem that maximizes the ratio of “easy to state and prove” to “(not so) easy to apply to concrete problems.” The general theorem seems almost annoyingly obvious: if we put a greater number of pigeons into a fewer number of pigeonholes, then some pigeonhole must contain two or more pigeons. But in the context of a particular problem, it’s often difficult to see what objects are playing the roles of “pigeon” or “pigeonhole.”
But there are other potential pitfalls as well. The motivation for this post is to describe some of these other ways things can go wrong for students, using a particular common exercise as a running example.
Same subset sums
Consider the following problem: show that for any set of
positive integers each at most
, there exist two distinct subsets of
whose elements have the same sum. For example, given
, the subsets
and
both have the same sum of 32.
Here, the pigeons are the subsets of , and the pigeonholes are the possible subset sums. There are
subsets; how many possible sums are there? The empty set has sum zero, and the set
has the largest possible sum 955, for a total of
or 956 pigeonholes. If we put 1024 subsets into 956 pigeonholes labeled 0 to 955 according to their subset sum, then some pigeonhole must contain at least two pigeons, that is, there must exist two distinct subsets with the same sum.
The version of this problem at Cut-the-Knot asks for disjoint non-empty subsets with the same sum. It’s an exercise for the reader to show that this doesn’t really affect feasibility of the problem. However, the corresponding proof has an issue:
There are 1024 subsets of the 10 integers, but there can be only 901 (=955-55+1) possible sums, the number of integers between the minimum and maximum sums of ten distinct integers between 1 and 100. With more subsets than possible sums, there must exist at least one sum that corresponds to at least two subsets.
The argument here is that we only need 901 pigeonholes, labeled 55 through 955, since 55 is the minimum sum of the ten integers 1 through 10. But this is invalid, since it’s possible for some of our 1024 pigeons (subsets) to need to go into pigeonholes with labels (subset sums) that are less than 55, the empty set with sum zero being one extreme example.
This is tricky, since the logic here is wrong, but the conclusion is still true. To make the error more vivid, we need to consider a smaller version of the problem where the same invalid logic actually leads us to a false conclusion. Consider and
; then by the above reasoning, given any set
of 4 integers selected from
, there are
subsets of
, but only
or 13 pigeonholes with labels from 1+2+3+4=10 to 4+5+6+7=22, suggesting that we should be able to find two distinct subsets of with the same sum… and yet the set
is a counterexample, since all 16 of its subsets have distinct sums.
The converse is false
This problem has parameters that we can vary. Let
be the predicate asserting that every set
of
positive integers each at most
has distinct subsets with the same sum. Then we have seen above that
is true, and that
is false.
For another example, we can show that is true, using the pigeonhole principle as above: there are
possible subsets, and only 85 possible sums from zero to 7+8+9+10+11+12+13+14=84.
Can we make smaller, and guarantee that smaller sets of integers still must have distinct subsets with the same sum? We can; the reader can verify that the pigeonhole argument works to show that
is also true.
What about ? Here, the same pigeonhole argument doesn’t work: there are only
pigeons, but 70 available pigeonholes with sum labels from zero to 9+10+11+12+13+14=69.
However, the pigeonhole inequality being false does not imply that the desired conclusion is also necessarily false. It turns out that is true; our pigeonhole argument is simply not sharp enough to show it.
This specific instance is a nice exercise to show that we can still use the pigeonhole principle here, but we need to be more careful about how many pigeonholes we really need. But even the linked solution is still imperfect, in the sense that this refined pigeonhole argument still does not characterize those parameter values for which
is true. For example, the refined argument doesn’t work to show
, or
, etc.
This appears to remain an open problem; see OEIS sequences A201052 and A276661 for additional reading.