Advice on improving simulation efficiency while reducing variance in Python

Question

I use 538's Riddler to practice code. Wrote a simple simulation in Python but had to add another nested loop to get the mean of means in order to reduce output variance. Tried to run it, but after 45 minutes I stopped it, thinking there must be a way to improve the efficiency of the code.

For context the problem is: You have a radio and play 100 songs per day. How big must the playlist be for the probability of playing the same song to be equal to 50%.

My approach is to increment the size of the playlist (starting at 7000) with 1 until the grand mean of the mean probability of a re-play is equal to 50%, using 1000 for both sample sizes and number of samples.

The code is:

import random

playlist = 7000
chance_of_replay = []
sample = 1000
mean_chance_of_replay = 0
replays = 0
temp_sum = 0

while mean_chance_of_replay > 0.5 or mean_chance_of_replay == 0.0:

    playlist += 1

    for j in range(0, sample):

        for i in range(1, sample + 1):

            songs_to_play = 100
            songs_played = []

            while songs_to_play > 0:

                song_pick = random.randint(1, playlist + 1)

                if song_pick not in songs_played:
                    songs_played.append(song_pick)
                    songs_to_play -= 1
                else:
                    replays += 1
                    break

        chance_of_replay.insert(j, (replays / sample))
        replays = 0

    for element in chance_of_replay:
        temp_sum = temp_sum + element

    mean_chance_of_replay = temp_sum/sample

print(playlist)

Ivan Carvalho Ivan Carvalho · Accepted Answer · 2019-02-25T19:15:57

Before looking at performance issues in your code, there is a bigger problem to be solved first: the code is in an infinite loop.

The list chance_of_replay is never cleared and the variable temp_sum is never set to 0. Because of that, the variable mean_chance_of_replay is always increasing and your code will run forever.

After fixing those two logical errors, you should start worrying about performance optimization.

Advice on improving simulation efficiency while reducing variance in Python

2 Answers