November | 2018 | Possibly Wrong

Introduction

“This is supposed to be a random system. It doesn’t feel very random.” – Eric Reid, as quoted on Twitter

Last week, Eric Reid was selected to take his fifth random drug test in eight weeks with the Carolina Panthers. That might seem like a lot. It might even seem like Reid is perhaps the target of extra scrutiny by the NFL, particularly given his social activism, viewed as controversial by some, and his involvement in a current collusion lawsuit against the league.

So is the drug testing really random, or is Reid justified in his complaint? From the NFL Players Association Policy on Performance-Enhancing Substances:

“Each week during the preseason and regular season, ten (10) Players on every Club will be tested. By means of a computer program, the Independent Administrator will randomly select the Players to be tested from the Club’s active roster, practice squad list, and reserve list who are not otherwise subject to ongoing reasonable cause testing for performance-enhancing substances.”

As will be shown shortly, this is a pretty straightforward example of our very human habit of perceiving patterns where only randomness exists. But there is an interesting mathematical problem buried here as well, challenging enough that I can only provide an approximate solution.

Let’s make the setup more precise: suppose that for each of $n=8$ weeks we select, with replacement, a random subset of $s=10$ of $t=72$ players on a team to take a drug test. (Reid only signed with the Panthers eight weeks ago, and I am assuming that the 72 players comprise 53 active, 10 practice, and 9 reserve.) There are three reasonable questions to ask:

What is the probability that a particular player (e.g., Eric Reid) will be selected for testing $m=5$ or more times over this time period?
What is the probability that at least one player on the team (i.e., not necessarily Reid) will be selected for testing $m=5$ or more times?
What is the probability that at least one player in the 32-team league will be selected for testing $m=5$ or more times?

Question 1: P(Eric Reid is selected 5 or more times)

The first question is easy to answer, and is unfortunately the only question asked in most of the popular press. The probability of being selected in any single week is $s/t$ , and so the probability of being selected at least $m$ times in $n$ weeks is

$q = \sum\limits_{k=m}^n {n \choose k} (\frac{s}{t})^k (1-\frac{s}{t})^{n-k}$

which equals approximately 0.002, or only slightly more than one chance in 500.

But if there is a moral to this story, it’s this: You will almost certainly not win the lottery… but almost certainly someone will. That is, the second (or really the third) question is the right one to ask: what is the probability that some player will be selected so many times?

Question 2: P(some Carolina player is selected 5 or more times)

This is the hard problem that motivated this post. There is some similarity to the “Double Dixie Cup” version of the coupon collector problem with group drawings, where the players are coupons, but instead of requiring at least $m$ copies of each coupon in $n$ drawings, here we ask for at least $m$ copies of at least one coupon (or the complementary equivalent, for at most $m-1$ copies of each coupon).

If we define the generating function

$g(x_1,x_2,...,x_n) = \prod\limits_{k=1}^n (1+x_k)$

and $h(\cdot)$ to be the expansion of $g(\cdot)$ with all terms removed where the sum of exponents is at least $m$ , then the desired probability may be expressed as

$p = 1-\frac{[\prod\limits_{k=1}^n x_k^s] h(\cdot)^t}{[\prod\limits_{k=1}^n x_k^s] g(\cdot)^t}$

where the denominator is simply ${t \choose s}^n$ . But unfortunately I don’t see a computationally feasible way to evaluate the coefficient in the numerator. Fortunately, we can get a good lower bound on $p$ using inclusion-exclusion and Bonferroni’s inequality:

$p \geq t q - \frac{{t \choose 2}}{{t \choose s}^n}\sum\limits_{i=m}^n \sum\limits_{j=m}^n \sum\limits_{k=\max(0,i+j-n)}^{\min(i,j)} f(i,j,k)$

$f(i,j,k) = {n \choose k}{{n-k} \choose {i-k}} {{n-i} \choose {j-k}} {{t-2} \choose {s-2}}^k {{t-2} \choose {s-1}}^{i+j-2k} {{t-2} \choose s}^{n-i-j+k}$

yielding a probability $p \geq 0.136$ that some Carolina player would be selected 5 or more times over 8 weeks.

Question 3: P(some NFL player is selected 5 or more times)

Finally, while this is happening, the other 31 teams in the league are subjecting their players to the same random drug testing procedure. The probability that some player on some team will experience 5 or more random drug tests over a span of 8 weeks is

$1-(1-p)^{32} \geq 0.99$

In other words, it is a near certainty that some player in the league would experience the number of random tests that Reid has. Indeed, by linearity of expectation, we should expect an average of $32 t q \approx 4.6$ players to find themselves in a similar situation over a similar time period. How many players actually did experience multiple tests over the last couple of months would be interesting and useful data to add to the discussion.

Possibly Wrong

On science, mathematics, and computing

Monthly Archives: November 2018

Random drug testing in the NFL