1
votes

I am fairly new to Python and I have been trying to find a fast way to find primes till a given number.

When I use the Prime of Eratosthenes sieve using the following code:

#Finding primes till 40000.
import time
start = time.time()
def prime_eratosthenes(n):
    list = []
    prime_list = []
    for i in range(2, n+1):
        if i not in list:
            prime_list.append(i)
            for j in range(i*i, n+1, i):
                list.append(j)
    return prime_list

lists = prime_eratosthenes(40000)
print lists
end = time.time()
runtime = end - start
print "runtime =",runtime

Along with the list containing the primes, I get a line like the one below as output:

runtime = 20.4290001392

Depending upon the RAM being used etc, I usually consistently get a value within an range of +-0.5.

However when I try to find the primes till 40000 using a brute force method as in the following code:

import time
start = time.time()
prime_lists = []
for i in range(1,40000+1):
    for j in range(2,i):
        if i%j==0:
            break
    else:
        prime_lists.append(i)
print prime_lists
end = time.time()
runtime = end - start
print "runtime =",runtime

This time, along with the the list of primes, I get a smaller value for runtime:

runtime = 16.0729999542

The value only varies within a range of +-0.5.

Clearly, the sieve is slower than the brute force method.

I also observed that the difference between the runtimes in the two cases only increases with an increase in the value 'n' till which primes are to be found.

Can anyone give a logical explanation for the above mentioned behavior? I expected the sieve to function more efficiently than the brute force method but it seems to work vice-versa here.

1
You should use a set() for your non-prime values (and don't call variables built-in names like list).Daniel Roseman
Appending to a list is a potentially expensive operation due to all the reallocations. Preallocating an array and "striking them off" will be much faster. Also, as Daniel says, "if i in a_list" is expensive too.Max
Understood. However, since both sets of code use the append function with a list, should that not make them equally slower?hashes
The sieve of eratosthenes starts with a complete list of the first n integers (from 2 upwards) and eliminates by the factors remaining in each iteration. That way the iterations are less work the further you get into the algorithmMinn

1 Answers

1
votes

While appending to a list is not the best way to implement this algorithm (the original algorithm uses fixed size arrays), it is amortized constant time. I think a bigger issue is if i not in list which is linear time. The best change you can make for larger inputs is having the outer for loop only check up to sqrt(n), which saves a lot of computation.

A better approach is to keep a boolean array which keeps track of striking off numbers, like what is seen in the Wikipedia article for the Sieve. This way, skipping numbers is constant time since it's an array access.

For example:

def sieve(n):
    nums = [0] * n
    for i in range(2, int(n**0.5)+1):
        if nums[i] == 0:
            for j in range(i*i, n, i):
                nums[j] = 1

    return [i for i in range(2, n) if nums[i] == 0]

So to answer your question, your two for loops make the algorithm do potentially O(n^2) work, while being smart about the outer for loop makes the new algorithm take up to O(n sqrt(n)) time (in practice, for reasonably-sized n, the runtime is closer to O(n))