9
votes

Following M. O'Neill's great paper, I tried implementing some lazy, infinite versions of the Sieve of Eratosthenes in Python. I was surprised to find that the heap-based version, that the paper claims should run faster, was actually over two times slower for me.

The paper contains two examples, one based on a dict, which I have translated (from Haskell) thus:

from itertools import count
def dict_sieve():
    yield 2
    yield 3
    candidates = count(5, 2)
    composites = {9:{3}}  # map composites to their prime factors

    for candidate in candidates:
        try:
            factors = composites.pop(candidate)
        except KeyError:  # if it's not in the dict, it's prime
            yield candidate
            composites[candidate**2] = {candidate}  # Euler's optimization: start from prime**2
        else:
            for prime in factors:  # go through the prime factors and increment their keys
                try:
                    composites[candidate+prime*2].add(prime)  # use prime*2 because we are ignoring evens
                except KeyError:
                    composites[candidate+prime*2] = {prime}

The second example in the paper demonstrates use of a priority queue as data structure. It also uses lazy lists, rather than a simple increment, which I have not done in the interests of a fair test. (Besides, I was using instances of itertools.count for my lazy lists and I found it to run slightly slower).

from itertools import count
from heapq import heappush, heapreplace
def heap_sieve():
    yield 2
    yield 3
    candidates = count(5,2)
    composites = [(9, 3)]  # a priority queue of composite/factor pairs, keyed by composite

    for candidate in candidates:
        prime_flag = True
        while composites[0][0] == candidate:  # loop because there may be duplicates
            prime_flag = False  # signal to the if statement below
            composite, prime = composites[0]
            heapreplace(composites, (composite + prime*2, prime))

        if prime_flag:
            yield candidate
            heappush(composites, (candidate**2, candidate))

I timed the two, along with an 'eager' version, not reproduced here, which produces a list of all primes below a limit:

In [44]: from itertools import islice

In [45]: %timeit list(islice(dict_sieve(), 100000))
    ...: %timeit list(islice(heap_sieve(), 100000))
    ...: %timeit eager_sieve(1299710)  # 1299709 is the 100,000th prime
    ...: 
1 loops, best of 3: 2.12 s per loop
1 loops, best of 3: 4.94 s per loop
1 loops, best of 3: 677 ms per loop

It's not surprising that the 'eager' version is much quicker - it's basically a trade-off between memory use, the inconvenience of having to specify an upper limit, and CPU time. However, I did find it surprising that the heapq version is much slower, when the paper claims it to be more efficient. Is it a problem with my implementation? Or is it simply that, as we all know, dicts are super-fast (and heapq is comparatively slow)?

1
@senderle Thanks for the edit. I've been spelling his name wrong my entire life without realising!Benjamin Hodgson
Yeah, it was a pretty minor edit but seemed worthwhile, as it's an easy name to misspell. I had to double-check the spelling myself!senderle

1 Answers

7
votes

Actually, it should be expected that a dictionary-based approach would be faster than a heap-queue-based approach. Heap insertion and replacement operations are O(log n), while dictionary insertion and replacement operations are O(1).

Indeed, I was surprised to hear that the author of the paper was claiming otherwise. But in fact that's not the case. You've assumed that a Data.Map is implemented as a hash map, but actually, it's a size balanced binary tree. So its performance characteristics are quite similar to the performance characteristics of a heap queue. The difference is that retrieving the minimum key from a heap queue is O(1), which speeds up parts of the sieve code -- but a hash map is faster still.