7
votes

I saw some question about similarities and differences between function composition and application and the various ways to do it, but one thing that has started to puzzle me a little(and that has not been asked as far as I searched) is about the difference in performance.

When I learned F# I fell in love with the pipe operator |>, which has it's equivalent in haskell's reverse application &. But the F# variant is unarguably more beautiful in my opinion(and I don't think I'm the only one).

Now, one can easily hack the pipe operator into haskell:

(|>) x f = f x

And it works like a charm! Problem solved!

The big difference between the pipe(both in F# and our haskell trick) is that it does not compose functions, it is based on function application. It takes the value on the left and passes it into the function on the right, as opposed to the composition, which takes 2 functions and returns another function, which can then be used as any regular function.

This, at least for me, makes the code much prettier since you only use one operator to direct the flow of information trough the whole function from arguments to final values, since with basic composition(or >>>) you cannot put a value on the left side to have it passed trough the "chain".

But from a performance standpoint, looking at these general options, the result should be the exact same:

f x = x |> func1 |> func2 |> someLambda |> someMap |> someFold |> show

f x = x & (func1 >>> func2 >>> someLambda >>> someMap >>> someFold >>> show)

f x = (func1 >>> func2 >>> someLambda >>> someMap >>> someFold >>> show) x

Which one would be the fastest, the one based on repeated application or the one(s) base on composition and a single application?

2
Have you tried doing tests to see which is faster?AJF
@AJFarmar I thought about it but is there a way to measure execution time in haskell?Rares Dima
If there were no way to measure execution time in haskell, your question "which of these is faster" would be unanswerable.amalloy
In almost all cases, the execution time will be exactly the same, at least if you compile with optimization, because the optimized code will be exactly the same. If you don't compile with optimization, I'm guessing the first may be faster, but why wouldn't you compile with optimization if you care about performance?dfeuer
Consider running criterion and writing a small benchmark. Likely, with optimization on, you will not find any difference.chi

2 Answers

10
votes

There shouldn't be any difference at all, as long as (|>) and (>>>) get inlined. Let's write an example that uses four different functions, two in F# style, and two in Haskell style:

import Data.Char (isUpper)

{-# INLINE (|>) #-}
(|>) :: a -> (a -> b) -> b
(|>) x f = f x

{-# INLINE (>>>) #-}
(>>>) :: (a -> b) -> (b -> c) -> a -> c
(>>>) f g x = g (f x)

compositionF :: String -> String
compositionF = filter isUpper >>> length >>> show 

applicationF :: String -> String
applicationF x = x |> filter isUpper |> length |> show 

compositionH :: String -> String
compositionH = show . length . filter isUpper

applicationH :: String -> String
applicationH x = show $ length $ filter isUpper $ x

main :: IO ()
main = do
  getLine >>= putStrLn . compositionF  -- using the functions
  getLine >>= putStrLn . applicationF  -- to make sure that
  getLine >>= putStrLn . compositionH  -- we actually get the
  getLine >>= putStrLn . applicationH  -- corresponding GHC core

If we compile our code with -ddump-simpl -dsuppress-all -O0 we get:

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 82, types: 104, coercions: 0}

-- RHS size: {terms: 9, types: 11, coercions: 0}
>>>_rqe
>>>_rqe =
  \ @ a_a1cE @ b_a1cF @ c_a1cG f_aqr g_aqs x_aqt ->
    g_aqs (f_aqr x_aqt)

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule1_r1gR
$trModule1_r1gR = TrNameS "main"#

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule2_r1h6
$trModule2_r1h6 = TrNameS "Main"#

-- RHS size: {terms: 3, types: 0, coercions: 0}
$trModule
$trModule = Module $trModule1_r1gR $trModule2_r1h6

-- RHS size: {terms: 58, types: 73, coercions: 0}
main
main =
  >>
    $fMonadIO
    (>>=
       $fMonadIO
       getLine
       (. putStrLn
          (>>>_rqe
             (>>>_rqe (filter isUpper) (length $fFoldable[]))
             (show $fShowInt))))
    (>>
       $fMonadIO
       (>>=
          $fMonadIO
          getLine
          (. putStrLn
             (\ x_a10M ->
                show $fShowInt (length $fFoldable[] (filter isUpper x_a10M)))))
       (>>
          $fMonadIO
          (>>=
             $fMonadIO
             getLine
             (. putStrLn
                (. (show $fShowInt) (. (length $fFoldable[]) (filter isUpper)))))
          (>>=
             $fMonadIO
             getLine
             (. putStrLn
                (\ x_a10N ->
                   show $fShowInt (length $fFoldable[] (filter isUpper x_a10N)))))))

-- RHS size: {terms: 2, types: 1, coercions: 0}
main
main = runMainIO main

So >>> doesn't get inlined if we don't enable optimizations. If we enable optimizations however, you won't see >>> or (.) at all. Our functions slightly differ, since (.) doesn't get inlined at that stage, but that's somewhat expected.

If we add {-# NOINLINE … #-} to our functions and enable optimiziations we see that the four functions won't differ at all:

$ ghc -ddump-simpl -dsuppress-all -O2 Example.hs
[1 of 1] Compiling Main             ( Example.hs, Example.o )

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 261, types: 255, coercions: 29}

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule2
$trModule2 = TrNameS "main"#

-- RHS size: {terms: 2, types: 0, coercions: 0}
$trModule1
$trModule1 = TrNameS "Main"#

-- RHS size: {terms: 3, types: 0, coercions: 0}
$trModule
$trModule = Module $trModule2 $trModule1

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo_r574
$sgo_r574 =
  \ sc_s55y sc1_s55x ->
    case sc1_s55x of _ {
      [] -> I# sc_s55y;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo_r574 (+# sc_s55y 1#) ys_a2ja;
          0# -> $sgo_r574 sc_s55y ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 14, coercions: 0}
applicationH
applicationH =
  \ x_a12X ->
    case $sgo_r574 0# x_a12X of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo1_r575
$sgo1_r575 =
  \ sc_s55r sc1_s55q ->
    case sc1_s55q of _ {
      [] -> I# sc_s55r;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo1_r575 (+# sc_s55r 1#) ys_a2ja;
          0# -> $sgo1_r575 sc_s55r ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 15, coercions: 0}
compositionH
compositionH =
  \ x_a1jF ->
    case $sgo1_r575 0# x_a1jF of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo2_r576
$sgo2_r576 =
  \ sc_s55k sc1_s55j ->
    case sc1_s55j of _ {
      [] -> I# sc_s55k;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo2_r576 (+# sc_s55k 1#) ys_a2ja;
          0# -> $sgo2_r576 sc_s55k ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 15, coercions: 0}
compositionF
compositionF =
  \ x_a1jF ->
    case $sgo2_r576 0# x_a1jF of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }

Rec {
-- RHS size: {terms: 29, types: 20, coercions: 0}
$sgo3_r577
$sgo3_r577 =
  \ sc_s55d sc1_s55c ->
    case sc1_s55c of _ {
      [] -> I# sc_s55d;
      : y_a2j9 ys_a2ja ->
        case y_a2j9 of _ { C# c#_a2hF ->
        case {__pkg_ccall base-4.9.1.0 u_iswupper Int#
                                     -> State# RealWorld -> (# State# RealWorld, Int# #)}_a2hE
               (ord# c#_a2hF) realWorld#
        of _ { (# ds_a2hJ, ds1_a2hK #) ->
        case ds1_a2hK of _ {
          __DEFAULT -> $sgo3_r577 (+# sc_s55d 1#) ys_a2ja;
          0# -> $sgo3_r577 sc_s55d ys_a2ja
        }
        }
        }
    }
end Rec }

-- RHS size: {terms: 15, types: 14, coercions: 0}
applicationF
applicationF =
  \ x_a12W ->
    case $sgo3_r577 0# x_a12W of _ { I# ww3_a2iO ->
    case $wshowSignedInt 0# ww3_a2iO []
    of _ { (# ww5_a2iS, ww6_a2iT #) ->
    : ww5_a2iS ww6_a2iT
    }
    }
...

All go functions are exactly the same (sans variable names), and application* is the same as composition*. So go ahead and create your own F# prelude in Haskell, there shouldn't be any performance issue.

5
votes

My answer is about F#.

In most cases the F# compiler is able to optimize pipelines into the same code:

let f x = x |> (+) 1 |> (*) 2 |> (+) 2
let g x = x |> ((+) 1 >> (*) 2 >> (+) 2)

Decompiling f and g we see that the compiler reaches the same result:

public static int f(int x)
{
    return 2 + 2 * (1 + x);
}
public static int g(int x)
{
    return 2 + 2 * (1 + x);
}

But it doesn't always seem to hold as we can see with slightly more advanced pipelines:

let f x = x |>  Array.map add1 |> Array.map mul2 |> Array.map add2 |> Array.reduce (+)
let g x = x |> (Array.map add1 >> Array.map mul2 >> Array.map add2 >> Array.reduce (+))

Decompiling shows some differences:

public static int f(int[] x)
{
  FSharpFunc<int, FSharpFunc<int, int>> arg_25_0 = new Program.f@9();
  if (x == null)
  {
    throw new ArgumentNullException("array");
  }
  int[] array = new int[x.Length];
  FSharpFunc<int, FSharpFunc<int, int>> fSharpFunc = arg_25_0;
  for (int i = 0; i < array.Length; i++)
  {
    array[i] = x[i] + 1;
  }
  FSharpFunc<int, FSharpFunc<int, int>> arg_6C_0 = fSharpFunc;
  int[] array2 = array;
  if (array2 == null)
  {
    throw new ArgumentNullException("array");
  }
  array = new int[array2.Length];
  fSharpFunc = arg_6C_0;
  for (int i = 0; i < array.Length; i++)
  {
    array[i] = array2[i] * 2;
  }
  FSharpFunc<int, FSharpFunc<int, int>> arg_B3_0 = fSharpFunc;
  int[] array3 = array;
  if (array3 != null)
  {
    array2 = new int[array3.Length];
    fSharpFunc = arg_B3_0;
    for (int i = 0; i < array2.Length; i++)
    {
      array2[i] = array3[i] + 2;
    }
    return ArrayModule.Reduce<int>(fSharpFunc, array2);
  }
  throw new ArgumentNullException("array");
}

public static int g(int[] x)
{
  FSharpFunc<int[], int[]> f = new Program.g@10-1();
  FSharpFunc<int[], int[]> fSharpFunc = new Program.g@10-3(f);
  FSharpFunc<int, FSharpFunc<int, int>> reduction = new Program.g@10-4();
  int[] array = fSharpFunc.Invoke(x);
  return ArrayModule.Reduce<int>(reduction, array);
}

For f F# inlines the pipeline except for the final reduce.

For g the pipeline is constructed and then invoked. This means that g is likely somewhat slower and somewhat more memory intensive than f.

In this particular example it's likely not important as we are creating array objects and iterating over them but if the functions composed are very cheap in terms of CPU and memory the cost of establishing and invoking the pipeline could prove relevant.

If critical performance is important to you I recommend getting a good decompiler tool to make sure that the generated code doesn't contain unexpected overhead. Otherwise you are probably fine with either approach.