Why does GHC consider the LHS *syntactically* when inlining?

19

votes

According to the GHC docs:

...GHC will only inline the function if it is fully applied, where "fully applied" means applied to as many arguments as appear (syntactically) on the LHS of the function definition.

Where the example given is two semantically-equivalent definitions:

comp1 :: (b -> c) -> (a -> b) -> a -> c
{-# INLINE comp1 #-}
comp1 f g = \x -> f (g x)

comp2 :: (b -> c) -> (a -> b) -> a -> c
{-# INLINE comp2 #-}
comp2 f g x = f (g x)

My questions:

Is it only in the presence of INLINE pragmas that we get this strict behavior (i.e. strict syntactic view of LHS, RHS inlined w/out optimizations)?
when no INLINE pragmas are given, does GHC ever transform a function like comp2 to comp1?
if not, why? Is it too difficult in general for the compiler to look at the semantics of the function and decide how much and where to partially-apply and INLINE?
what would happen if GHC just transformed all functions into a cascade of let... in expressions with lambdas and no bindings on the LHS?

optimizationhaskellghc

7

votes

What if, in this example, c is itself a function type? I'm not clear how your proposal would work out in that scenario.

In any event, there are definitely cases where you don't want all of a function's arguments "pulled to the front." For example, you might have some code like this:

foo :: [Int] -> Int -> Int -> Int
foo list = let
  -- expensive precomputation here
  bar x y = ...
  in \ x y -> bar x y

You want foo to get partially applied, and then for multiple applications of the resulting function to share the expensive precomputation work. If instead you pulled it forward as foo list x y, you wouldn't get to share that expensive precomputation. (I've encountered this case in serious applications.)

6

votes

This is a good question. I read through the Secrets of the Glasgow Haskell Compiler Inliner paper for some clues, but didn't find much.

Here's a hand-wavy explanation. GHC actually at times takes comp1 to comp2 -- which it calls "eta expansion". See this thread for some details: http://www.haskell.org/pipermail/glasgow-haskell-users/2011-October/020979.html

There also is (or was) a problem where this eta expansion can subtly change strictness. See this commit to the docs (which doesn't seem to be in the current ones, so either they haven't been rebuilt, or this has been fixed, not sure which): http://permalink.gmane.org/gmane.comp.lang.haskell.cvs.ghc/57721

In any case the above thread has SPJ explaining why we typically want to go in that direction whenever possible. So to deliberately go in the other direction in order to improve inlining seems a bit silly. As the secrets paper discusses, inlining promiscuously is not the greatest idea -- making the pragma even more of a blunt hammer so that functions got inlined whether or not it made sense to do so would probably hurt more than help overall, not to mention increase code bloat, since modules would have to keep the different levels of eta-shifted functions around all at once.

Anyway, as someone very much not a core GHC dev, that's what seems most likely to me.

3

votes

Well, better late than never, I guess.

comp1 and comp2 are not only equivalent semantically, but even syntactically.

Writing arguments on the LHS of an equals sign of a definition is just syntactic sugar, so these two functions are equivalent:

id1 x = x
id2 = \x -> x

Edit: I realised I didn't really answer your questions, so here you are:

There is a difference for GHC when these are annotated with INLINE pragmas, as GHC stores in its Core representation the Unfolding of a function and for which arity it can be unfolded (That's the Guidance=ALWAYS_IF(arity=1,...) part), so it can actually matter in practice.
I don't think it does, as the comp1 and comp2 can't be distinguished after desugaring to Core, on which all optimisations operate. So when GHC wants to create a new unfolding, it will probably do so for manifest arity (e.g. the number of leading lambdas).
Inlining is mostly not beneficial for unsaturated bindings, see below. The same actually goes for the comp1 example: The reason why we want this to happen is not that we care about eliminating a function call. Rather we want comp1 to be specialised to the f and g parameters, regardless of what concrete x we apply the specialisation to. There is actually an optimisation pass that should do this kind of work, called constructor specialisation (more on that below). INLINE is even quite a misfit to use here: This still won't specialise a call like comp1 (const 5), where it's 'obvious' that this should be reduced to const 5.
Consequently, this won't change much, as long as you don't sprinkle every let-bound thing with INLINE pragmas. Even then it's questionable if that brings any benefit: The point is that it just makes no sense to inline unsaturated calls without any further motives (e.g. specialisation of a function to a concrete argument) and other than that it will just blow up code size at some point, so it probably may even make things slower.

End Edit

One reason I think why unsaturated calls to bindings aren't inlined is that mostly they don't bring any new optimization opportunities.

f = \x y -> 1 + (x * y)
g = \x y -> (1 + x) * y

Inlining f 16 yields \y -> 1 + (16*y), which isn't really much simpler than f 16. On the contrary, code size increased considerably (which is the single biggest drawback of inlining).

Now, if there was a call like g 16 this would yield \y -> (1 + 16) * y which would optimize to \y -> 17 * y. But these kinds of opportunities are detected by another optimization pass, constructor or call-pattern specialization. The insight here is that 1 + x can be simplified if we know the value of x. Since we call g with a literal (e.g. a value), it is beneficial to specialize g for that particular call site, e.g. g16 = \y -> 17 *y. No need to inline g, also other call sites might share the code generated for g16.

That's just one example of how inlining doesn't need to be done while still having efficient code. There are many other optimizations that in interplay with the inliner achieve what you want. Eta-expansion for example will make sure that calls are as saturated as possible:

main = print (f 2)

f = g 1 
g x y = x + y

Since f is always called with 1 argument, we can eta-expand it:

f eta = g 1 eta

Now the call to g is saturated and can be inlined. Dito for f, so eventually this reduces to

main = print 3

f eta = 1 + eta
g x y = x + y

Modulo dead-code elimination.

Why does GHC consider the LHS syntactically when inlining?

3 Answers