Inline optimization for recursive functions, with the help of a where clause (in Haskell)

Question

Prelude

Link 1 This post says:

"For a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored," according to the GHC manual

Link 2 This answer describes a method of optimization via in-lining by GHC.
Link 3 This (unrelated) post explains compilers are free to reject explicit inline suggestions anyway. Or perhaps passing the -O2 flag to the compiler achieves the same effect as the OPTIMIZED VERSION shown below?

Example code

(as seen in Link 2)

--PRE-OPTIMIZED VERSION--
func :: (a -> a) -> [a] -> [a]
func f [] = []
func f [x] = [x]
func f (x:s:xs) = x:(f s):(func f xs)

----OPTIMIZED VERSION----
func :: (a -> a) -> [a] -> [a]
func f = loop
  where
    loop []  = []
    loop [x] = [x]
    loop (x:s:xs) = x : f s : loop xs 

------Eg-USAGE----------
> mapToEverySecond (*2) [1..10]
[1,4,3,8,5,12,7,16,9,20]

Question

To quote @amalloy's doubts in the comments of Link 2, surely

f is "still implicitly passed around just as much [as explicitly passing f in the PRE-OPTIMIZED VERSION] by the closure-creating machinery" ?

As an aside, I am reading up on fusion techniques, an alternative approach as described in Link 1, although I am not trying to seek the fastest implementation possible in lieu of idiomatic styles.

Merely curious as to whether/how "where-definition" trick seen in the OPTIMIZED VERSION works (should the compiler decide to inline such a function).

Extra links are most welcome!

The first one is equivalent to \f -> \list -> case list of { …func… } while the second is let { loop = \list -> case list of { …loop… } } in (\f -> loop). The full laziness optimisation in GHC will convert the former to the latter. In general this increases sharing (of f here) and provides opportunities for inlining; INLINE pragmas work quite differently from C++’s inline. — Jon Purdy

dfeuer dfeuer · Accepted Answer · 2021-03-03T16:05:34

This is a rather bad example because there's very little if anything to be gained by inlining this particular code. But similar patterns slow up all the time in situations where it matters. The big idea is that in the "optimized" func, if I write

func myFun

GHC can inline func and get

let
  loop []  = []
  loop [x] = [x]
  loop (x:s:xs) = x : myFun s : loop xs 
in loop

In this context, that's not meaningfully better, but if the result of myFun were demanded strictly, then it would replace a call to an arbitrary address with a call to a known address, which is faster. Furthermore, if myFun were used in an "interesting" context (where something is known about the argument passed to it or what's done with its result), then GHC would likely inline myFun to take advantage of that.

Inline optimization for recursive functions, with the help of a where clause (in Haskell)

Prelude

Example code

Question

1 Answers