No Tricks: Probability

Showing posts with label Probability. Show all posts

Monday, April 22, 2013

The 12 Bonk Rule

I am on a mailing list from Business Insider and against my better judgement (which often comes out when the internet is involved) I followed a link to an article on The Sexiest Scientists Alive. There are 50 scientists listed, and scientist number 43 is Clio Cresswell, a mathematician at the University of Sydney, who is the author of Mathematics and Sex.

The book gained some notoriety for propounding the 12-bonk rule. Bonking is a term in Australia for having sex (usually casual sex as I recall), and Dr. Cresswell has stated that the best strategy for finding a good (sexual) partner is to bonk with 12 twelve different partners, note the best one, then keep bonking until you find someone who is better, then settle on that partner. You benchmark on a sample of 12 partners, discard them, then take the next best that comes along. Cresswell reports that this strategy gives you a 75% chance of finding a good mate. So it's not foolproof, but with the confidence of mathematics, it proclaims to be better than any other trial-and-error approach that leaves behind a trail of discarded lovers. Of course there is more at play in finding a mate than "bonkability", as opined in the Ask Sam column of the Sydney Morning Herald for example.

The result caught my eye as I was recently reviewing some statistical problems and I surmised that the 12-bonk rule had a similar sounding result to the secretary problem, a classic problem in probability. The secretary problem (which by now should be at least upgraded to the executive assistant problem, or simply just the candidate problem) asks for the best strategy to select a secretary for a position where there is a collection of candidates and you get to have one interview, upon which you must either hire the candidate or move onto the next one. It is assumed that the market is competitive and that you will not be able to return to a rejected candidate as they have found employment elsewhere.

The optimal strategy here is to interview 37% of the possible candidates, make a note of the best one, and then keep interviewing until you a find an additional candidate that is better than you previous best, and then choose the new best candidate as the one to employ. So if you have 100 candidates, interview the first 37, note the best, and then keep interviewing until you find someone better and then hire them. The graph below plots the probability of finding the best candidate using this strategy, as the percentage k of candidates interviewed and refused increases.

Here 37 is the double winner in that the point marked by the dashed lines indicates that the optimal approach is to reject the first 37% and then you will find the best candidate as the next best choice 37% of the time. This magic 37% is derived from 1/e = 0.37, where e is the base of the natural logarithms.

I just downloaded the e-book version of Mathematics and Sex and took a quick look at the 12-bonk section, and it seems that Cresswell's discussion is based on the work of Peter Todd in his paper Searching for the Next Best Mate. Todd looks at simpler heuristics to find a mate than applying the 37% rule, which he notes has the following drawbacks in practice. If we assume a sample of 100 people where they can be rated uniquely on a scale from 1 to 100, then when applying the 37% rule

On average, 37 additional people need to be interviewed (or bonked) to find the next best beyond the best found in the initial 37 people, for an average total of 74 people being considered from the 100.
On average, the best person found has rank 82, where 100 is the best on the scale. The 37% rule finds the best person 37% of the time, but averaging the success out over the remaining 63% of choices, lowers the result by about 20%.

Todd decided to explore other decision rules that performed better than the 37% rule on some criteria, and more closely match with our observed behaviours for finding a mate. It is unlikely that anyone will have the time and (emotional) energy to engage with 37% of all their potential mates, which could easily run into the thousands. Todd's computer models found that if you engaged 12 people from a mating population of 1000, then took the next best, you are highly likely to end up with someone in the top 25% of the population. I cannot quite tell from Todd's graph referring to this result as to how many people must be engaged in total, but seemingly around 30 or so (50 at the outside).

So this was the genesis of the 12-bonk rule, and I will read Crisswell's book a bit closer to see if she has teased out any further details or conclusions. A very quick search of the internet on the topic of the number of sexual partners seemed to indicate that for Western men and women 12 sexual partners is on the high side for most of them - actually more like half of that, after discarding "outliers". A further potential glitch in the 12-bonk rule is that it assumes when you have found your post-12-bonk lover that he or she will accept your overtures, and of course you cannot be certain of that. I am sure that someone is working on the mathematics of unrequited love.

Saturday, February 6, 2010

Single DES and Double Yolks

It was reported in the Daily Mail this week that a woman bought a carton of half a dozen eggs, which she later found to be all double-yolked, as shown below

Since the chances of getting a single double-yolk egg are around 1-in-1000, then it appears that we have witnessed an extremely rare event, in fact one that is a practical impossibility. If we assume that the likelihood of each egg being double-yolked is independent, then the picture above is conclusive evidence of a 1-in-10^{18} event manifesting. This is a bit less likely than guessing a DES key at random at 1-in-10^{17}. The Daily Mail article goes on to give reasons why this event is not as unlikely as it seems, because on face value, the event is so unlikely that we would never expect to witness it over the lifetime of all eggs that have ever been produced.

Apparently the eggs are all likely to come from hens in the same flock and of the same age which reduces the likelihood to “only” 1-in-729 million. And the occurrence becomes even more likely (or less unlikely – take your pick) when we account for eggs of a similar weight being sorted into the same boxes.

A bit more detail is given over at the wonderful Understanding Uncertainty blog. If the 1-in-10^{18} odds were correct then given the number of eggs consumed in Britain each year, we are looking at waiting 500 years to see the photo above, so the independence assumption is not plausible. Factoring in that eggs coming from the same group (who may have a propensity for double yolks), packing by weight, noting that some supermarkets can detect and sell double yolked eggs, then the event seems less impressive. But impressive nonetheless!

Monday, January 12, 2009

Some books on Scribd

As I mentioned in my last post, there is a lot of very interesting and detailed content of all types being uploaded to Scribd. According to Wikipedia,

Scribd is a document sharing website.It houses 'more than 2 million documents' and 'drew more than 21 million unique visitors in May 2008, little more than a year after launching, and claims 1.5 million registered users.' The site was initially funded with $12,000 funding from Y Combinator, but has since received over $3.7 million from Redpoint Ventures and The Kinsey Hills Group.

You can even find whole books on the site. Here are some interesting documents that I found from a hour or so of searching

I think I will drop my Safari account as I now have enough reading for far more than foreseeable future. I also uploaded a paper that I co-wrote on Data Centric Security
A Data Centric Security Model

Publish at Scribd or explore others: Computer Science Technology security data centric securit

Friday, November 7, 2008

Weapons of Math Instruction: The Birthday Paradox

In November 2007, a family from Ohio was blessed with the birth of their third child Kayla, who arrived on October 2nd. Kayla will share the same birthday as her two older siblings who were also born on October 2nd, in 2003 and 2006 respectively. Bill Notz, a statistics professor at Ohio State University concluded that the odds of a family having three children born on the same date in different years are less than 8 in a million. That said, the Paez family from San Diego, California, also had their third child born on July 30th of this year. Are these events extraordinary, or just plain ordinary?

For some time I have been wanting to start a series of posts on Weapons of Math Instruction for IT Risk. The name comes from a joke I saw a few a years ago about the evil Al-gebra group. I will be aiming for less math and more instruction since the text format of the blog is somewhat formula-challenged. The first post will be on the Birthday Paradox, a perennial favourite, recently recounted to me by a senior manager. Also there was a recent furore over the reported odds of DNA matching which was a misinterpretation of the Birthday Paradox at heart.

Quadratic Football

The Birthday Paradox (BP) poses the following simple question: how many people need be gathered in a room so that the chance of any two people sharing a birthday is at least 50%? We assume that each person is equally likely to be born on each of the 365 days of the year , and we ignore leap years. Given these assumptions, the surprising answer is 23. The BP is not a true paradox (a logical contradiction) but is rather deeply counterintuitive since 23 seems too small a number to produce a common birthday. Keep this in mind next time you are watching a football (soccer!) game which has 23 people on the field (the two teams of 11 plus the referee). If you want to see the mathematics behind the result, Google spoils for choice with over 53,000 hits. As is often the case with mathematical topics, the Wikipedia article has an excellent explanation, graphics and references.

The BP is an archetypal problem in probability whose solution is less important than the principles it demonstrates. What is the lesson? If we are trying to obtain a match on an attribute with M distinct values, then just under sqrt(2*M) uniform samples are required for better than 50% success. For birthdays, M = 365 and sqrt(2*365) = 27.1, a little bit higher than the correct value of 23. To generalise, if you were looking for a match amongst 1,000,000 attributes then you would need less than 1500 uniform samples - probably lower than you would have estimated/guessed. Where does the sqrt(2*M) term come from?

Well if we have N objects to compare for a potential match, then the number of possible comparisons is N*(N-1)/2. This number can be approximated as (N^2)/2, which grows in proportional to the square of N, and is therefore called a quadratic function. The source of the surprise in the BP is that we are not thinking quadratically, and therefore underestimate the number of potential matches. When N = sqrt(2*M) then (N^2)/2 yields M possible matches. If we assume that the probability of a match is 1/M, then the average number of matches over the M possibilities is M*(1/M) = 1. A closer analysis shows that there is a 50% chance of a match when N is approximately sqrt(2*M*ln2) where sqrt(2*ln2) = 1.18. You can find a straightforward 1-page derivation of a slightly higher bound here.

Johnny's Fallacy with DNA matching

Another reason why the BP is puzzling is that people often mistake the question to be the following: how many people need to be gathered in a room such that the chance of someone sharing a birthday with me is at least 50%? In this case the person is confusing a 1-to-N matching problem (the person to the group) with the required N-to-N matching problem (the group to itself). Famed US talk show host Johnny Carson made this mistake (as related here, p.79), which we will refer to as Johnny's Fallacy. One night there were about 100 people in his studio audience, and he started to search for someone with the same birthday as him. He was disappointed when no one with his birthday turned up. In 100 people the probability that someone would have the same birthday as Johnny is just under 1/4, while the probability that some pair of people share a birthday is 0.9999, essentially a certainty.

Recently there was a occurrence of Johnny's Fallacy on the topic of identifying suspects using DNA testing. The episode was reported in the Freakanomics blog under the title Are the FBI's Probabilities About DNA Matches Crazy?, reporting on a piece from the Los Angeles Times How reliable is DNA in identifying suspects?. In 2001, Kathryn Troyer, a crime lab analyst in Arizona, was running some tests on the state's DNA database and found two felons with remarkably similar genetic profiles. Remarkable in the sense that the men matched at nine of the 13 locations on chromosomes (or loci) commonly used to distinguish people, and that the FBI estimated the odds of such a match to be 1 in 113 billion. This is the 1-in-N probability of a DNA match. Since her initial discovery Troyer has found among about 65,000 felons, there were 122 pairs that matched at nine of 13 loci. The matches here are the result of the N-to-N probabilities. Johnny's Fallacy here would be to conclude that a search of the Arizona DNA database against the given DNA sample would return 122 matches.

Gradually Troyer's findings spread and raised doubts concerning the veracity of DNA testing. The FBI has declared that the Troyer searches (sounds like something from the X Files) are misleading and meaningless, which is not a particularly helpful assessment of Johnny's Fallacy. David Kaye, an expert on science and the law at Arizona State University, remarked that since people's lives are at stake based on DNA evidence, “It [the Troyer matches] has got to be explained.” Steven Levitt in the Freakanomics blog steps up to the plate to provide some numbers. He assumes that the likelihood of a match at a given loci is 7.5%, yielding the odds of a 13-loci match to be about 1 in 400 trillion, and 9-loci match to be 1 in 13 billion. So a DNA database of 65,000 people yields over 2 billion potential matches, and about 100 expected matches on at least 9 loci.

Levitt's numbers are examples to explain the fallacy, or actually a version of the BP where matching of a single personal trait has been substituted with a number of loci. If you google "Kathryn Troyer" there are over a 1000 hits, and her results have generated a storm of controversy. Reading a few of the articles returned by google shows that it will be some time before people fully understand the apparent paradox in this case.

As I mentioned in the introduction to this post, a senior manager recently told me that he used the BP as an example to get people thinking quantitatively in risk assessment workshops. Numbers come with their own logic - calculations can be surprising.

No Tricks

Monday, April 22, 2013

The 12 Bonk Rule

Saturday, February 6, 2010

Single DES and Double Yolks

Monday, January 12, 2009

Some books on Scribd

Friday, November 7, 2008

Weapons of Math Instruction: The Birthday Paradox

About Me

Quick Links

ALL POSTS

Search This Blog

Blog Archive

Lijit Search

Labels

Apture

No Tricks

Monday, April 22, 2013

The 12 Bonk Rule

Saturday, February 6, 2010

Single DES and Double Yolks

Monday, January 12, 2009

Some books on Scribd

Friday, November 7, 2008

Weapons of Math Instruction: The Birthday Paradox

About Me

Quick Links

Subscribe

ALL POSTS

Search This Blog

Blog Archive

Lijit Search

Labels

Apture