April | 2018 | Possibly Wrong

Introduction

The title question came up recently, which I think makes for an interesting combinatorics exercise. The idea is that, given the extent of “borrowing” of past musical ideas by later artists, are we in danger of running out of new music?

We can turn this into a combinatorics problem by focusing solely on pitch and rhythm: in a single bar of music, how many possible melodies are there, consisting of a sequence of notes and rests of varying pitch and duration? The point is that this number is finite; it may be astronomical, but how astronomical?

This is not a new question, and there are plenty of answers out there which make various simplifying assumptions. For example, Vsauce has a video, “Will We Ever Run Out of New Music?,” which in turn refers to a write-up, “How many melodies are there in the universe?,” describing a calculation based on a recurrence relation that effectively requires cutting segments of a bar exactly in halves– undercounting in a way that I suspect was not intentional, implicitly prohibiting even relatively simple melodies like “I’ll Be Home for Christmas.”

But the most common simplifying assumption seems to be a lack of treatment of rests— that is, only counting melodies consisting of a sequence of notes. Rests are an interesting wrinkle that complicates the counting problem: for example, a half note is different from two consecutive quarter notes of the same pitch, but a half rest sounds the same as two consecutive quarter rests. The objective of this post is to add this “expressive power” to the calculation of possible melodies.

Solution

Consider a single bar in 4/4 time, consisting of a sequence of whole, half, quarter, eighth, and sixteenth notes and/or rests, with notes chosen from 13 possible pitches, allowing melodies within an octave of the 12-pitch chromatic scale, but also allowing an octave jump (e.g., “Take Me Out to the Ball Game,” “Over the Rainbow,” etc.).

Twelve notes of the chromatic scale, and a thirteenth allowing melodies with an octave jump.

We can encode the choice of a single note of $n$ possible pitches with the following generating function, weighted by duration:

$g_n(x) = n(x+x^2+x^4+x^8+x^{16})$

and all possible rests with

$h(x) = \frac{x}{1-x}$

Then the generating function for the number of possible melodies is

$f_n(x) = \frac{1+h(x)}{1-g_n(x)(1+h(x))}$

Intuitively, a melody consists of zero or one rest, followed by a sequence of zero or more sub-sequences, each consisting of a note followed by zero or one rest. The coefficient $[x^{16}]f_{13}(x)$ is the number of one-bar melodies given the above constraints… but this counts two melodies as distinct even if they only differ in relative pitch. The number of possible melodies consisting of sequences of intervals confined to at most an octave jump is

$[x^{16}]f_{13}(x) - [x^{16}]f_{12}(x) + 1$

where the +1 accounts for the single “silent melody” of a whole rest. The result is 3,674,912,999,046,911,152, or about 3.7 billion billion possible melodies.

Results

The machinery described above may be easily extended to consider different sets of assumptions: different time signatures, longer or shorter lists of possible notes to choose from, dotted notes and rests, triplets (e.g., the Star Wars theme), etc. The figure below shows the number of possible one-bar melodies for a variety of such assumptions.

As might be expected, dotted notes and/or rests do not affect the “space” of possible melodies nearly as much as note duration: halve the shortest allowable note value, and you very roughly double the number of “bits” in the representation of a melody. If we extend our expressive power to allow 32nd (possibly dotted) notes and rests, then there are 6,150,996,564,625,709,162,647,180,518,925,064,281,006 possible melodies.

Of course, these calculations only address the question of how many melodies are possible— not how many of such melodies are actually appealing to our human ears.

Introduction

Every year, there are upsets and wild outcomes during the NCAA men’s basketball tournament. But this year felt, well, wilder. For example, for the first time in 136 games over the 34 years of the tournament’s current 64-team format, a #16 seed (UMBC) beat a #1 seed (Virginia) in the first round. (I refuse to acknowledge the abomination of the 4 “play in” games in the zero-th round.) And I am a Kansas State fan, who watched my #9 seed Wildcats beat Kentucky, a team that went to the Final Four in 4 of the last 8 years.

So I wondered whether this was indeed the “wildest” tournament ever… and it turns out that it was, by several reasonable metrics.

Modeling game probabilities

To compare the tournaments in different years, we assume that the probability of outcome of any particular game depends only on the seeds of the opposing teams. Schwertman et. al. (see reference below) suggest a reasonable model of the form

$p_{i,j} = 1-p_{j,i} = \frac{1}{2}+k(s_i-s_j)$

where $s_i$ is some measure of the “strength” of seed $i$ (ranging from 1 to 16), and the scale factor $k$ calibrates the range of resulting probabilities, selected here so that the most extreme value $p_{16,1}=1/136$ matches the current maximum likelihood estimate based on the 136 observations over the past 34 years.

One simple strength function is the linear $s_i=-i$ , although this would suggest, for example, that #1 vs. #5 and #11 vs. #15 are essentially identical match-ups. A better fit is

$s_i = \Phi^{-1}(1-\frac{4i}{n})$

where $\Phi$ is the quantile of the normal distribution, and $n=351$ is the number of teams in all of Division I. The idea is that team strength is normally distributed, and the tournament invites the 64 teams in the upper tail of the distribution, as shown in the figure below.

Normally-distributed strength s_i for each seed.

Probability of a perfect bracket

Armed with these candidate models, I looked at all of the tournaments since 1985, the first year of the current 64-team format. I have provided summary data sets before (a search of this blog for “NCAA” will yield several posts on this subject), but this analysis required more raw data, all of which is now available at the usual location here, as well as on GitHub.

For each year of the tournament, we can ask what is the probability of picking a perfect bracket in that year, correctly identifying the winners of all 63 games? Actually, there are three reasonable variants of this question:

If we flip a coin to pick each game, what is the probability of picking every game correctly?
If we pick a “chalk” bracket, always picking the favored higher-seeded (i.e., lower-numbered) team to win each game, what is the probability of picking every game correctly?
If we managed to pick the perfect bracket for a given year, what is the prior probability of that particular outcome?

The answer to the first question is the 1 in $2^{63}$ , or the “1 in 9.2 quintillion” that appears in popular press. And this is always exactly correct, no matter how individual teams actually match up in any given year, as long as we are flipping a coin to guess the outcome of each game. But this isn’t very realistic, since seed match-ups do matter; a #1 seed will beat a #16 seed… well, almost all of the time.

So the second question is more interesting, but also more complicated, since it does depend on our model of how different seeds match up… but it doesn’t depend on which year of the tournament we’re talking about, at least as long as we always use the same model. Using the strength models described above, a chalk bracket has a probability of around 1 in 100 billion of being correct (1 in 186 billion for the linear strength model, or 1 in 90 billion for the normal strength model).

The third question is the motivation for this post: the probability of a given year’s actual outcome will generally lie somewhere between the other two “extremes.” How has this probability varied over the years, and was 2018 really an outlier? The results are shown in the figure below.

Probability of a perfect bracket, 1985-2018.

The constant black line at the bottom is the 1 in 9.2 quintillion coin flip. The constant red and blue lines at the top are the probabilities of a chalk bracket, assuming the linear or normal strength models, respectively.

And in between are the actual outcomes of each tournament. (Aside: I tried a bar chart for this, but I think the line plot more clearly shows the comparison of the two models, as well as both the maximum and minimum behavior that we’re interested in here.) This year’s 2018 tournament was indeed the most unlikely, so to speak, although it has close competition, all in this decade. At the other extreme, 2007 was the most likely bracket.

Reference:

Schwertman, N., McCready, T., and Howard, L., Probability Models for the NCAA Regional Basketball Tournaments, The American Statistician, 45(1) February 1991, p. 35-38 [JSTOR]

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: April 2018

How many melodies are there?

Picking a perfect NCAA bracket: 2018 was the most unlikely tournament so far