Using python to analyse coin tossing statistics

Question

I have been learning python programming on edX which is a very good course and I can so far fully recommend. Having just watched a TED talk on Statistics I thought great, a simple way of exercising the python skills I have picked up on a real world scenario. The guy gave an example on probability of continually flipping a coin and looking out for two recurring sequences, which he explained, you would think had the same probability of happening which he claimed in fact don't. Put simply he claims the sequence Heads Tails Heads is more likely to occur than Heads Tails Tails as at the end of the first sequence you are are already one third towards repeating the sequence again where at the end of the second sequence you then have to toss a further head to begin the sequence again. This makes perfect sense, so I set about trying to prove it with my small python program shown here.

import random

HTH = 0
HTT = 0
myList = []
i = 0
numberOfTosses = 1000000

while i < numberOfTosses:
    myList.append(random.randint(0,1))
    i += 1

for i in range (len(myList)):

    if i+2 >= len(myList):
        break

    if myList[i] == 1 and myList[i+1] == 0 and myList[i+2] == 1:
        HTH +=1

    if myList[i] == 1 and myList[i+1] == 0 and myList[i+2] == 0:
        HTT +=1

print 'HTT :' ,numberOfTosses, HTT, numberOfTosses/HTT
print 'HTH :' ,numberOfTosses, HTH, numberOfTosses/HTH

So I have run the program many times and changed the max iteration value higher and higher, yet cannot seem to prove his claim that on average the HTH sequence should happen evey 8 tosses and the HTT sequence every 10, as it would seem that I get on average balanced results either way. So my question is where have I gone wrong in my implementation of the problem?

I think the guy's claim is bogus, unless he is saying you are more likely to get HTH for a small number of tosses. In order to get more mileage out of HTH than HTT, for the reason he says, you would also have to get more occurrences of HTHT than HTTH. But if you apply his same logic, the HTTH already has the beginning of the sequence at the end, whereas you have to start all over again with HTHT. — Markku K.
@MarkkuK. Actually, by the time you have HTHT, you already have the first two letters of the next HTHT. For HTTH, you only have the first letter. — Matt Parker
@MattParker, that is true, I was applying what the guy said too narrowly. However, stats for HTHT vs HTTH come out roughly equal as well, at least using the method here. — Markku K.
@MarkkuK. Agreed - I think your point about the total number of flips is probably at the heart of it. HTH has a clear advantage when N = 5 (as EducateMe points out below), probably at 10 and 15 too, but it's probably gone by a million... — Matt Parker

mgkrebbs mgkrebbs · Accepted Answer · 2014-02-13T19:13:40

Your expert is right, and your code for what you stated he said is right, but he actually said something else. He says that when you start flipping coins, you should expect to see HTT first come up in an average of 8 flips, and HTH first come up in an average of 10 flips.

If you revise your program to test that assertion, it might look like this:

import random

HTH = 0
HTT = 0
numberOfTrials = 10000

for t in xrange( numberOfTrials ):
    myList = [ random.randint(0,1), random.randint(0,1), random.randint(0,1) ]
    flips = 3
    HTHflips = HTTflips = 0

    while HTHflips == 0 or HTTflips == 0:
        if HTHflips == 0 and myList[flips-3:flips] == [1,0,1]:
            HTHflips = flips
        if HTTflips == 0 and myList[flips-3:flips] == [1,0,0]:
            HTTflips = flips
        myList.append(random.randint(0,1))
        flips += 1

    HTH += HTHflips
    HTT += HTTflips


print 'HTT :', numberOfTrials, HTT, float(HTT)/numberOfTrials
print 'HTH :', numberOfTrials, HTH, float(HTH)/numberOfTrials

Running that will confirm the expected values of 8 and 10 tosses.

Using python to analyse coin tossing statistics

5 Answers