From the docs for GHC 7.6:
[Y]ou often don't even need the SPECIALIZE pragma in the first place. When compiling a module M, GHC's optimiser (with -O) automatically considers each top-level overloaded function declared in M, and specialises it for the different types at which it is called in M. The optimiser also considers each imported INLINABLE overloaded function, and specialises it for the different types at which it is called in M.
and
Moreover, given a SPECIALIZE pragma for a function f, GHC will automatically create specialisations for any type-class-overloaded functions called by f, if they are in the same module as the SPECIALIZE pragma, or if they are INLINABLE; and so on, transitively.
So GHC should automatically specialize some/most/all(?) functions marked INLINABLE
without a pragma, and if I use an explicit pragma, the specialization is transitive. My question is:
is the auto-specialization transitive?
Specifically, here's a small example:
Main.hs:
import Data.Vector.Unboxed as U
import Foo
main =
let y = Bar $ Qux $ U.replicate 11221184 0 :: Foo (Qux Int)
(Bar (Qux ans)) = iterate (plus y) y !! 100
in putStr $ show $ foldl1' (*) ans
Foo.hs:
module Foo (Qux(..), Foo(..), plus) where
import Data.Vector.Unboxed as U
newtype Qux r = Qux (Vector r)
-- GHC inlines `plus` if I remove the bangs or the Baz constructor
data Foo t = Bar !t
| Baz !t
instance (Num r, Unbox r) => Num (Qux r) where
{-# INLINABLE (+) #-}
(Qux x) + (Qux y) = Qux $ U.zipWith (+) x y
{-# INLINABLE plus #-}
plus :: (Num t) => (Foo t) -> (Foo t) -> (Foo t)
plus (Bar v1) (Bar v2) = Bar $ v1 + v2
GHC specializes the call to plus
, but does not specialize (+)
in the Qux
Num
instance which kills performance.
However, an explicit pragma
{-# SPECIALIZE plus :: Foo (Qux Int) -> Foo (Qux Int) -> Foo (Qux Int) #-}
results in transitive specialization as the docs indicate, so (+)
is specialized and the code is 30x faster (both compiled with -O2
). Is this expected behavior? Should I only expect (+)
to be specialized transitively with an explicit pragma?
UPDATE
The docs for 7.8.2 haven't changed, and the behavior is the same, so this question is still relevant.
plus
was not marked as INLINABLE and 2) simonpj indicated that there was some inlining going on with the ticket code, but the core from my example shows that none of the functions were inlined (in particular, I couldn't get rid of the secondFoo
constructor, otherwise GHC inlined stuff). – crockeeaplus (Bar v1) = \(Bar v2)-> Bar $ v1 + v2
, so that the LHS is fully-applied at the call-site? Does it get inlined and then does specialization kick in? – jberrymanplus
fully applied specifically due to those links, but in fact I got less specialization: the call toplus
was not specialized either. I have no explanation for that, but was intending to leave it for another question, or hope that it would get resolved in an answer to this one. – crockeea