Unfortunately, there is a lot of code involved for the entire example. You can see the full module here (which still won't compile), the pseudocode function f below corresponds to the 'FIXME' tag in the hpaste.
Here is a pseudocode outline:
module Test (run) where
import Data.Vector.Unboxed as U
run m i iters = let {get q} in do print $ testWrapper iters m q
testWrapper :: forall i . Int -> Int -> i -> U.Vector i
testWrapper iters m q =
let {get test params: xs, dim, ru}
in U.map fromIntegral (iterate (f dim ru) xs !! iters)
{-# INLINE f #-}
f :: (Int, Int) -> Vector r -> Vector r -> Vector r
f dim ru = (g dim ru) . zipWith (*) ru
{-# INLINE g #-}
g :: (Int, Int) -> Vector r -> Vector r -> Vector r
g dim ru = ...
For certain parameters, this code runs in ~.5 seconds.
I also tested changing f to f':
f' dim ru = (g dim ru)
(I simply removed the final zipWith, reducing the overall work needed).
On the same input parameters, the modified code takes 4.5 seconds.
This occurs when compiling with optimizaiton (using GHC 7.4.2, ghc -O2, and also with even more optimizations). The core for the fast version is about 3000 lines, while the core for the slow version is about 1900 lines.
This may not be much to go on, but what kind of GHC craziness could be causing my program to slow down by an order of magnitude by reducing the work it does? How might I discover something like this when essentially my smallest test case generates over 2000 lines of core?
Thanks
ru
while the fast computes it only once. – Daniel Fischer