1
votes

Unfortunately, there is a lot of code involved for the entire example. You can see the full module here (which still won't compile), the pseudocode function f below corresponds to the 'FIXME' tag in the hpaste.

Here is a pseudocode outline:

module Test (run) where
    import Data.Vector.Unboxed as U

    run m i iters = let {get q} in do print $ testWrapper iters m q

    testWrapper :: forall i . Int -> Int -> i -> U.Vector i
    testWrapper iters m q =
        let {get test params: xs, dim, ru}
        in U.map fromIntegral (iterate (f dim ru) xs !! iters)

    {-# INLINE f #-}
    f :: (Int, Int) -> Vector r -> Vector r -> Vector r
    f dim ru = (g dim ru) . zipWith (*) ru

    {-# INLINE g #-}
    g :: (Int, Int) -> Vector r -> Vector r -> Vector r
    g dim ru = ...

For certain parameters, this code runs in ~.5 seconds.

I also tested changing f to f':

f' dim ru = (g dim ru)

(I simply removed the final zipWith, reducing the overall work needed).

On the same input parameters, the modified code takes 4.5 seconds.

This occurs when compiling with optimizaiton (using GHC 7.4.2, ghc -O2, and also with even more optimizations). The core for the fast version is about 3000 lines, while the core for the slow version is about 1900 lines.

This may not be much to go on, but what kind of GHC craziness could be causing my program to slow down by an order of magnitude by reducing the work it does? How might I discover something like this when essentially my smallest test case generates over 2000 lines of core?

Thanks

1
I suspect the slow version recomputes the polymorphic ru while the fast computes it only once.Daniel Fischer
How might I fix that, and why would that happen?crockeea

1 Answers

3
votes

Check out the heap profile. Can it be that the "less work" version leaves some thunks unevaluated? This can lead to a large memory footprint, and affect the speed via garbage collection.