The procedure to produce an unbiased coin from a biased one was first attributed to Von Neumann (a guy who has done enormous work in math and many related fields). The procedure is super simple:
- Toss the coin twice.
- If the results match, start over, forgetting both results.
- If the results differ, use the first result, forgetting the second.
The reason this algorithm works is because the probability of getting HT is p(1-p)
, which is the same as getting TH (1-p)p
. Thus two events are equally likely.
I am also reading this book and it asks the expected running time. The probability that two tosses are not equal is z = 2*p*(1-p)
, therefore the expected running time is 1/z
.
The previous example looks encouraging (after all, if you have a biased coin with a bias of p=0.99
, you will need to throw your coin approximately 50 times, which is not that many). So you might think that this is an optimal algorithm. Sadly it is not.
Here is how it compares with the Shannon's theoretical bound (image is taken from this answer). It shows that the algorithm is good, but far from optimal.
You can come up with an improvement if you will consider that HHTT will be discarded by this algorithm, but in fact it has the same probability as TTHH. So you can also stop here and return H. The same is with HHHHTTTT and so on. Using these cases improves the expected running time, but are not making it theoretically optimal.
And in the end - python code:
import random
def biased(p):
# create a biased coin
return 1 if random.random() < p else 0
def unbiased_from_biased(p):
n1, n2 = biased(p), biased(p)
while n1 == n2:
n1, n2 = biased(p), biased(p)
return n1
p = random.random()
print p
tosses = [unbiased_from_biased(p) for i in xrange(1000)]
n_1 = sum(tosses)
n_2 = len(tosses) - n_1
print n_1, n_2
It is pretty self-explanatory, and here is an example result:
0.0973181652114
505 495
As you see, nonetheless we had a bias of 0.097
, we got approximately the same number of 1
and 0