I have some high-performance Haskell code - the inner loop compiles down to 6 assembly instructions. Modifying the inner loop to be less efficient doesn't have any noticable impact on the performance, suggesting the inner loop is not the bottleneck. However, when I turn on profiling, the assembly code generated for the inner loop becomes dramatically less efficient, and the profiler reports that the inner loop takes 85% of the time.
I suspect something is unnecessarily slow, but when I use profiling to see what, I suspect that profiling makes the inner loop sufficiently slow that it dominates. What techniques can I use to see where the time is going? A sampling profiler would be great, if one existed for Haskell.
-prof
puts acall enterFunCCS
in the inner loop, plus about 5 other additional instructions (but thecall
will be the one that kills it). – Neil Mitchell