3
votes

I'm trying to ensure that GHC specializes a recursive function to so that everything gets unboxed. The full example code (as well as a dump of the GHC core) is available in this gist. The function in question looks like this:

import Data.Bits                                     
import qualified Data.Vector.Unboxed as UV

lookupSorted :: Ord a => (Int -> a) -> Int -> a -> Maybe Int   
lookupSorted lookupIx len needle =                  
  let !r = go 0 (len - 1)                   
   in if r < 0 then Nothing else Just r                    
  where                                     
  go :: Int -> Int -> Int                 
  go !lo !hi = if lo <= hi   
    then                                                
      let !mid = lo + (unsafeShiftR (hi - lo) 1)              
          !val = lookupIx mid                    
       in case compare val needle of                           
            EQ -> mid                                               
            LT -> go (mid + 1) hi                                              
            GT -> go lo (mid - 1)                                
    else (-1)               

It's an algorithm that looks up a value from any sorted container than can be indexed into. The two functions that I want to ensure are specialized versions of this are:

{-# NOINLINE lookupUnboxedWord #-}
lookupUnboxedWord :: UV.Vector Word -> Word -> Maybe Int   
lookupUnboxedWord v w = lookupSorted (UV.unsafeIndex v) (UV.length v) w

{-# NOINLINE lookupUnboxedDouble #-}           
lookupUnboxedDouble :: UV.Vector Double -> Double -> Maybe Int   
lookupUnboxedDouble v w = lookupSorted (UV.unsafeIndex v) (UV.length v) w

The good news is that, from looking at the dumped core, I can see that GHC is already performing the specialization that I'm interested in. This is pretty impressive. However, I'd like to be able to count on it happening. I'm concerned that if I add enough specialized variants to this file or that if I call lookupSorted from another module, GHC might eventually lean in favor of a small generated executable rather than a fast one.

My understanding is that the SPECIALIZE pragma does not help in the situation. GHC currently does not allow specialization based on value arguments. I'm pretty sure that if I'm willing to write a typeclass for the indexing operation, then I could make SPECIALIZE work. I'm trying to avoid this approach though because I don't want to introduce a typeclass unless there's no other solution.

Is there a way to force GHC to create these specialized variants of my function? Additionally, if anyone has any commentary on the dumped core file (if something is not optimal), I would appreciate any feedback on that. Thanks.

----EDIT----

Thinking about this more, it seems like it might be sufficient to simply put an INLINE pragma on lookupSorted. The GHC docs are not clear on the interaction between INLINE and recursive binding inside let or where clauses. Any clarification on this, hopefully with a source to back it up, could be helpful.

1

1 Answers

2
votes

Your final observation is correct: If you put an INLINE annotation on a function, it will be inlined whenver there is a call to it with sufficient arguments.

Sufficient arguments means the number of parameter of your function on the left of the = (as opposed to lambdas on the right). This allows you to do things like

foo op n  = \y -> go n y
  where go acc i = … op … 

fooSpec1 = foo (+) 0 
fooSpec2 = foo (*) 1

and get two specializations of foo, which you then can call many times without further code duplication.

For all this it does not matter what happens in where, and a recursive function will just be inlined with foo.

(Sorry, no source to back this up with.)