A Bayesian analysis of Clintons 6 heads
Clinton recently won 6 coin flips during an Iowa caucus. On facebook and in the news, I’ve only seen information about how unlikely this is – the chances of 6 heads are 1.56% with a fair coin.
Yes, 6 heads is unlikely but these coin flips could have occurred by chance. I mean, on the Washington Post coin flip demo, I got all heads on my 5th try. Instead, it makes more sense a different question: given we observed these 6 heads, what are the chances this coin wasn’t fair?^{1}
This is a Bayesian approach; given our observations what probabilities can we infer? This is not the classic frequentest approach that says “how likely are my observations given all parameters about the model (the coin)?” This Bayesian approach makes complete sense when only the observations are known.
To formulate this problem as a probability problem, we’ll have to define some variables:
 $\theta$ is the probability of getting a heads.
 $f_i$ is the $i$th flip and is either 0 or 1. $f_i = 1$ with probability $\theta$.
 $Y = \sum_{i=1}^6 f_i$ is the number of heads we saw. In Clinton’s case, $Y = 6$. $Y$ is a binomial random variable.
Then given the probability of a heads $\theta$, the probability of flipping $6$ heads with this binomial random variable with
which is exactly what Facebook/news articles focuses on. In their analysis, they are given $\theta = 1/2$ and show how unlikely it is. However, it could be that Clinton got 6 heads by chance – maybe she got lucky enough to be in the 1.56%?
To do this, we do need a prior probability, or we need to guess how likely it is that Clinton made the coin unfair, and by how much. We’re guessing at something and that guess will tend to bias our result! This is a big deal; we can’t be certain our estimate is unbiased.
To do that, let’s take it that the probability of a heads $\theta$ has this probability density function (higher values in the graph below are considered more likely):
Here, you can play with two sliders that determine the shape of this beta distribution. At the risk of spoiling the rest of this post, when we take this prior there is a chance Clinton biased her coin.
To do this, we’ll need to use Baye’s rule,
but this is exceptionally easy because we chose that $\theta \sim \text{beta}(a, b)$. When we do this, we find that $\theta \given Y=k \sim \text{beta}(a + k, b + n  k)$ because the beta distribution is a conjugate prior for the Bernoulli distribution.
After we do this, we have $\prob{\theta \given Y=k}$. However, we want to know how likely the coin was unfair, or the probability that $\theta > 1/2 \given Y=6$. Turns out that this probability is just an integral over the probability density function from 0.5 to 1 or
Performing the calculation, $\prob{\theta > 1/2 \given Y=6} = $ 0.95. That’s right there’s a 95% chance Clinton’s coin was biased given the 6 heads we saw!
We can see that we have weak evidence for Clinton biasing her coin. When we assume that the coin is probably fair and it was only biased a small amount, the largest probability of $\theta>1/2$ is 85.2%. This is not a strong probability; we’re looking for at least a 95% percent chance that Clinton biased her coin to even put moderate faith in this belief.
Yes, we are asserting something is true before we prove it is true. Typically, there is strong reason behind this. For example, we might know that a small number of genes are important in a disease and we typically enforce that. With this, I think it’s reasonable to assume that if the coin isn’t fair it’s only not fair by a small amount.
The Washington post article gives a similar conclusion after considering that there were other coin flips that Sanders won:
There were other coin tosses that emerged today which Sanders won – so, yes. Very slim.
I’m writing this up to chance: Clinton got lucky that she flipped 6 heads. It looks like these 6 flips were performed with a fair coin.
If we wanted to formalize this method, we could take an even more scientific approach. We could use hypothesis testing with two hypothesis to find pvalues and numbers for how likely it is that these 6 heads were generated under each hypothesis (null hypothesis: the coins are fair, alternative hypothesis: the coins are not fair).

If we were really testing to see if the coin was unfair, it’d make more sense to do hypothesis testing ↩