8
votes

I am wondering how Haskell pattern matching is resolved internally and how this impacts performance. Say we have a function that is expensive to compute so we precompute the first and / or more frequently used values and pattern match on the corresponding input before making the actual computation itself:

expensiveComputation :: Int -> Int
expensiveComputation 1 = 101
expensiveComputation 2 = 3333333
expensiveComputation 3 = 42
...
expensiveComputation n = the actual function

We could precompute a lot of these cases, thousands if we wanted, but do we? I'm assuming that it takes time for Haskell to actually find the pattern that matches the input, so maybe it is actually quicker to compute the, say, 1000th value instead of making a 1000 pattern matches?

1
Pattern matches are concetually done by iterating over the list, but a compiler is free to optimize it (and given there is a certain structure in the numbers, most will).Willem Van Onsem
Another option is to use memoization to "precompute" (rather, compute on the first use only). data-memocombinators has a combinator arrayRange which will memoize a certain range of inputs into a flat array with constant time lookup, which is basically what you are doing manually here.luqui

1 Answers

15
votes

I made a simple test in the form:

f 0 = 4
f 1 = 5
...

And compiled with ghc -O2 -ddump-asm -c. I observe key ASM of:

What appears to be implementations for each top level equation:

_c2aD:
        movl $28,%ebx    ; N.B. the constant 28 matches my largest value
        jmp *(%rbp)
_c2aC:
        movl $27,%ebx
        jmp *(%rbp)
_c2aB:
        movl $26,%ebx
        jmp *(%rbp)
....

What appears to be a jump table referencing the above implementations:

_n2aN:
        .quad   _c2af-(_n2aN)+0
        .quad   _c2ag-(_n2aN)+0
        .quad   _c2ah-(_n2aN)+0
        .quad   _c2ai-(_n2aN)+0
        .quad   _c2aj-(_n2aN)+0
        .quad   _c2ak-(_n2aN)+0
        ...

What appears to be a pair of range guards followed by a dispatch:

_c2aE:
        cmpq $25,%r14        ; N.B. the last defined input is 24
        jge _c2ae
_u2aH:
        testq %r14,%r14
        jl _c2ae
_u2aI:
        leaq _n2aN(%rip),%rax
        addq (%rax,%r14,8),%rax
        jmp *%rax

So do I think this will be slow for 1000 entries? No, if they are contiguous I bet GHC will generate a jump table just like I'm seeing here. Another interesting test would be if they aren't contiguous.