February | 2016 | Possibly Wrong

I encountered the following problem a few weeks ago: suppose that Alice and Bob want to play a match consisting of a series of games, where the first player to win $n$ games wins the match. This is often referred to as a “best-of- $(2n-1)$ series,” with many examples in a variety of sports, such as the World Series in baseball ( $n=4$ ), sets and matches in tennis, volleyball, etc.

There is a problem, though: each game must be played at either Alice’s or Bob’s home field, conferring a slight advantage to the corresponding player. Let’s assume that the probability that Alice wins a game at home is $p$ , and the probability that Alice wins a game away (i.e., at Bob’s home field) is $q<p$ .

(Note that this asymmetry may arise due to something other than where each game is played. In tennis, for example, the serving player has a significant advantage; even against an otherwise evenly-matched opponent (i.e., $p+q=1$ ), values of $p$ may be as large as 0.9 at the highest levels of play; see Reference (1) below.)

What is the probability that Alice wins the overall match? Of course, this probability surely depends on how Alice and Bob agree on who has “home-field advantage” for each game. Let’s assume without loss of generality that Alice plays at home in the first game, and consider a few different possibilities for how the rest of the series plays out:

Alternating: Alice plays at home in odd-numbered games, and away in even-numbered games. (This is similar to a set in tennis.)
Winner plays at home: The winner of the previous game has home-field advantage in the subsequent game. (This is similar to a set in volleyball.)
Loser plays at home: The loser of the previous game has home-field advantage in the subsequent game.
Coin toss: After Alice’s first game at home, a fair coin toss determines home-field advantage for each subsequent game.

I’m sure there are other reasonable approaches I’m not thinking of as well. It is an interesting exercise to compute the probability of Alice winning the match using each of these approaches. Certainly they yield very different distributions of outcomes of games: for example, the following figure shows the distribution of number of games played in a World Series, between evenly-matched opponents with a “typical” home-field advantage of $p=0.55$ .

Distribution of number of games played in a best-of-7 series with p=0.55, q=0.45.

The motivation for this post is the observation that, despite these differences, approaches (1), (2), and (3) all yield exactly the same overall probability of Alice winning the match! This was certainly not intuitive to me. And the rule determining home-field advantage does matter in general; for example, the coin toss approach in (4) yields a different overall probability of winning, and so lacks something that (1), (2), and (3) have in common. Can you see what it is?

References:

Newton, P. and Keller, J., Probability of Winning at Tennis I. Theory and Data, Studies in Applied Mathematics, 114(3) April 2005, p. 241-269 [PDF]
Kingston, J. G., Comparison of Scoring Systems in Two-Sided Competitions, Journal of Combinatorial Theory (Series A), 20(3) May 1976, p. 357-362
Anderson, C. L., Note on the Advantage of First Serve, Journal of Combinatorial Theory (Series A), 23(3) November 1977, p. 363

Introduction

Can you think of a word in which the letters HIPE appear consecutively? What about a word containing HQ? This “game” is described in a not-so-relevant chapter of Peter Winkler’s wonderful book Mathematical Mind-Benders, where he provides many more examples of what he calls HIPEs: a challenge of a short sequence of letters, a response to which must be a word containing that sequence, consecutively with no intervening letters. For example, BV has many solutions, one of which is, well, oBVious.

It turns out that I am really bad at playing this game. My wife, on the other hand, is pretty good at it. As it happens, I also lag behind her quite a bit when we work together on crossword puzzles, acrostics, anagrams, etc…. that is, in many situations where it is helpful to consider words as made up of letters. After browsing more examples of HIPEs in Winkler’s book, I wondered what makes a given HIPE easy or difficult. I will describe my attempt at “automating” the generation of difficult HIPEs… but mostly I want to share what I found to be a fascinating anecdote from Winkler’s discussion of the game.

Word Lists

My idea was pretty simple: difficult HIPEs are most likely those sequences of letters that (1) occur very rarely in the dictionary, but (2) occur sufficiently often in natural language to be recognizable. To compute these metrics, I used:

A word list consisting of the union of (a) the ENABLE2k word list (updated from the previous ENABLE1), and (b) the third and latest 2014 edition of the Scrabble Official Tournament and Club Word List.
The Google Books Ngrams data set (English Version 20120701) to map each word in my dictionary to a corresponding frequency of occurrence (details of methodology described in this previous post).

As usual, you can download the aggregated frequency data and component word lists here.

2- and 3-letter HIPEs

First, let’s focus on just two letters for now; the following figure shows all possible 2-letter HIPEs, arranged by frequency of occurrence in the Google Books dataset on the x-axis, and frequency of occurrence in the word list on the y-axis, with the example HIPEs in Winkler’s chapter shown in red. Note that both axes are on a logarithmic scale to better “spread out” the infrequent HIPEs that we are interested in.

All digraphs with corresponding x=frequency of occurrence in the Google Books dataset and y=frequency of occurrence in the ENABLE+Scrabble word list. Winkler’s example HIPEs are shown in red.

As expected, the digraphs near the top of the figure aren’t very interesting, while Winkler’s examples are clustered near the bottom of the figure… although I don’t see much horizontal organization. To get a better view, let’s zoom in on that lower-left corner of the figure, while still containing all of Winkler’s example HIPEs in red:

A zoomed-in view of the 2-letter HIPEs no more common than Winkler’s examples.

Unfortunately, closer inspection of this figure is a little disappointing: there are certainly “interesting” additional HIPEs in there (DK and GP, for example), but no clear separation between them and other unacceptably weird ones like QI, MK, etc.

We can do the same thing for 3-letter HIPEs, but things get messy quickly; there are just too many possible HIPEs with valid solutions, even if we again “zoom in” on just the bounding box of Winkler’s examples:

Similarly zoomed-in view of 3-letter HIPEs.

Similarly zoomed-in view of rare/difficult 3-letter HIPEs.

There are quite a few interesting HIPEs even in that very bottom row in the figure. Following are some of my favorites, which appear at most twice in the entire word list: BSM, CEV, CTW, CYI, FCO, IKY, KGA, LFC, UCY, UIU, WDF, XEU, XII.

Conclusion

Finally, back to Winkler’s discussion of what makes HIPEs easy or difficult. This is where things get really interesting. He points out that, for most people, it is the “kinetic sense” of producing a word with our mouths that dominates our sense of “knowing” the word. Not its definition, not what it looks like on paper or how it sounds, but the association with the physical act of expressing the word. If this is really true, then he suggests that perhaps deaf people, “especially those who customarily communicate in a sign language,” might play HIPE better than others:

Resolved to test this hypothesis, I introduced HIPE to a group of hearing-impaired employees at a government agency, who sat together and conversed in ASL daily at lunch. They found the game completely trivial; as fast as I wrote HIPEs on napkins, they wrote solutions around them. To them it was a mystery why anyone would think HIPE was any kind of challenge.

This certainly sounds like a significant effect, enough so that I wonder if more rigorous study has been done?

References:

Winkler, P., Mathematical Mind-Benders. Wellesley: A K Peters, Ltd., 2007 [PDF]

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: February 2016

Home-field advantage

Analysis of the game of HIPE