How to improve the performance of this Haskell program?

Question

I'm working through the problems in Project Euler as a way of learning Haskell, and I find that my programs are a lot slower than a comparable C version, even when compiled. What can I do to speed up my Haskell programs?

For example, my brute-force solution to Problem 14 is:

import Data.Int
import Data.Ord
import Data.List

searchTo = 1000000

nextNumber :: Int64 -> Int64
nextNumber n
    | even n    = n `div` 2
    | otherwise = 3 * n + 1

sequenceLength :: Int64 -> Int
sequenceLength 1 = 1
sequenceLength n = 1 + (sequenceLength next)
    where next = nextNumber n

longestSequence = maximumBy (comparing sequenceLength) [1..searchTo]

main = putStrLn $ show $ longestSequence

Which takes around 220 seconds, while an "equivalent" brute-force C version only takes 1.2 seconds.

#include <stdio.h>

int main(int argc, char **argv)
{
    int longest = 0;
    int terms = 0;
    int i;
    unsigned long j;

    for (i = 1; i <= 1000000; i++)
    {
        j = i;
        int this_terms = 1;

        while (j != 1)
        {
            this_terms++;

            if (this_terms > terms)
            {
                terms = this_terms;
                longest = i;
            }

            if (j % 2 == 0)
                j = j / 2;
            else
                j = 3 * j + 1;
        }
    }

    printf("%d\n", longest);
    return 0;
}

What am I doing wrong? Or am I naive to think that Haskell could even approach C's speed?

(I'm compiling the C version with gcc -O2, and the Haskell version with ghc --make -O).

Your unsigned long may be just 32-bit long. For fair comparison, use an unsigned long long or uint64_t. — kennytm
@KennyTM - fair point - I was testing on 32-bit Ubuntu where a long happens to be 64-bits. — stusmith
@stusmith: Are you sure about that? I could have sworn that sizeof(long) is 4 with gcc on a 32 bit platform. — sepp2k
@stusmith: Linux uses ILP32 and LP64, which means that an int is always 32 bit, a long long is always 64 bit (although I believe there were some discussions about moving to 128 bit for DEC Alpha CPUs) and a long is always the same as a pointer. So, if you're running on 32 bit Linux, then your Haskell ints are indeed twice the size. — Jörg W Mittag

kennytm kennytm · Accepted Answer · 2010-09-05T14:56:48

For testing purpose I have just set searchTo = 100000. The time taken is 7.34s. A few modification leads to some big improvement:

Use an Integer instead of Int64. This improves the time to 1.75s.

Use an accumulator (you don't need sequenceLength to be lazy right?) 1.54s.

seqLen2 :: Int -> Integer -> Int
seqLen2 a 1 = a
seqLen2 a n = seqLen2 (a+1) (nextNumber n)

sequenceLength :: Integer -> Int
sequenceLength = seqLen2 1

Rewrite the nextNumber using quotRem, thus avoiding computing the division twice (once in even and once in div). 1.27s.

nextNumber :: Integer -> Integer
nextNumber n 
    | r == 0    = q
    | otherwise = 6*q + 4
    where (q,r) = quotRem n 2

Use Schwartzian transform instead of maximumBy. The problem of maximumBy . comparing is that the sequenceLength function is called more than once for each value. 0.32s.
```
longestSequence = snd $ maximum [(sequenceLength a, a) | a <- [1..searchTo]]
```

Note:

I check the time by compiling with ghc -O and run with +RTS -s)
My machine is running on Mac OS X 10.6. The GHC version is 6.12.2. The compiled file is in i386 architecture.)
The C problem runs at 0.078s with the corresponding parameter. It is compiled with gcc -O3 -m32.

How to improve the performance of this Haskell program?

5 Answers