I use 538's Riddler to practice code. Wrote a simple simulation in Python but had to add another nested loop to get the mean of means in order to reduce output variance. Tried to run it, but after 45 minutes I stopped it, thinking there must be a way to improve the efficiency of the code.
For context the problem is: You have a radio and play 100 songs per day. How big must the playlist be for the probability of playing the same song to be equal to 50%.
My approach is to increment the size of the playlist (starting at 7000) with 1 until the grand mean of the mean probability of a re-play is equal to 50%, using 1000 for both sample sizes and number of samples.
The code is:
import random
playlist = 7000
chance_of_replay = []
sample = 1000
mean_chance_of_replay = 0
replays = 0
temp_sum = 0
while mean_chance_of_replay > 0.5 or mean_chance_of_replay == 0.0:
playlist += 1
for j in range(0, sample):
for i in range(1, sample + 1):
songs_to_play = 100
songs_played = []
while songs_to_play > 0:
song_pick = random.randint(1, playlist + 1)
if song_pick not in songs_played:
songs_played.append(song_pick)
songs_to_play -= 1
else:
replays += 1
break
chance_of_replay.insert(j, (replays / sample))
replays = 0
for element in chance_of_replay:
temp_sum = temp_sum + element
mean_chance_of_replay = temp_sum/sample
print(playlist)