10
votes

I'm still in the early phase of learning OCaml and am curious to know what the best way to extract maximum performance out of generic code in OCaml is.

As a small experiment, I've written two polymorphic functions: one in C++ and the other in OCaml that find the largest element in a given array.

What I've observed is that while in C++ you pay no penalty for this sort of abstraction, the penalty in OCaml is a whopping one degree of magnitude drop in performance. And as an aside, the C++ solution I quickly concocted is more generic that the OCaml one, but I blame that primarily on my inexperience with the language.

My question is the following: how to write and use polymorphic functions in OCaml without paying the huge performance penalty that I've just observed?

Another thing I observed for this particular problem is that my functional solution in OCaml was slower than the imperative one, whereas the "functional" solution in C++ suffered no penalties compared to the analogue imperative approach.

C++ code was compiled with g++ -std="c++0x" -O3 -o max_cpp max.cpp, using gcc-4.6.3 and OCaml code with: ocamlopt -o max_ml max.ml using ocamlopt version 3.12.1. Both of the generated executables were 32 bit and run on Ubuntu x64 11.04

I submit the code of both programs.

C++ code (not entirely safe, of course ;) )

#include <iostream>
#include <vector>
#include <numeric>
template <typename T> T max (T a, T b) {
     return (a>b)? a : b;
}

template <typename I> typename I::value_type find_max (I begin, I end) {
    auto max = *begin;
    for (begin++;begin!=end;begin++)
            if (*begin > max) max = *begin;
    return max;
}

template <typename I> typename I::value_type find_max1(I begin, I end) {
    return std::accumulate(begin, end, *begin, max< typename I::value_type> );
}

int main(int argc, char ** argv) {
    const size_t nElem = atoi(argv[1]);
    const size_t nIter = atoi(argv[2]);
    std::vector<int> intA(nElem);
    std::vector<double> dubA(nElem);
    for (int i=0;i<nIter;i++) {
            auto res1 = find_max(intA.begin(), intA.end());
            auto res2 = find_max(dubA.begin(), dubA.end());
            std::cout << "Max int: " << res1 << " " << "Max dub: " << res2 << std::endl;
    }
}

OCaml code

let max a b = if a > b then a else b

(* functional version *)
let find_max vector =
    Array.fold_right max vector vector.(0)

(* imperative version *)
let find_max1 vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let nElems = int_of_string Sys.argv.(1)

let nIter  = int_of_string Sys.argv.(2)

let _ = let intA = Array.make nElems 0 in
    let dubA = Array.make nElems 0.0 in
    for i=1 to nIter do
            let res1 = find_max intA in
            let res2 = find_max dubA in
            print_int !res1;
            print_newline ();
            print_float !res2;
    done

However, if I "overload" the function to discriminate ints and floats then the performance of the program even exceeds that of my C++ code by 50%! I wonder why though.

let find_max_int (vector : int array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

let find_max_float (vector : float array) =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
            if vector.(i) > !max then max := vector.(i)
    done; max

The timings were performed rather crudely with: time ./max_cpp 1000000 100 and time ./max_ml 1000000 100

2
Why are you writing your own find_max when C++ already provides std::max_element?Jerry Coffin
Just so that that I can better highlight the different price abstraction has in these two languages, at least from what I've observedPetarMarendic
@didierc: I don't think the speedup here is related to data representation, rather to the behavior of the (debatable) polymorphic comparison operators.gasche
Thanks for the input! It is unfortunate however that the functional version in OCaml using Array.fold_left or Array.fold_right is slower than the imperative one, whereas in C++ there is no performance penalty for this sort of abstraction. Sorry for being stubborn, but to finally close the case, is there a way in OCaml to tackle this problem at compile time and write a macro that would generate either find_max_int or find_max_float on explicit demand?PetarMarendic
for native compilation, use inlining (-inline 1000 for instance).didierc

2 Answers

8
votes

In OCaml, the comparison operator (<) is a generic function of type 'a -> 'a -> bool (likewise for equality, etc.). This means that it is implemented by a special primitive in the runtime system that effectively run ad-hoc comparison on data structure of any type. The type-checker is able to optimize monomorphic comparisons on integers and floats into the specialized comparison operation, instead of doing the type check at runtime in the polymorphic case. A tenfold difference in efficiency is not surprising if you test only this operation.

Note that to get maximal flexibility you should abstract on the comparison operation in find_max. This would however prevent the monomorphization optimization, so an inlined version will still perform better.

I think your approach (doing micro-benchmarks and hoping to learn interesting things about the performance of real programs) is flawed. You ran into a highly specific case of OCaml performance behavior and wrongly concluded from there that the performance of polymorphic functions "is poor". This is clearly a bad conclusion out of premature optimization. Write an actually representative example of the computations you intend to run, and then reason on the performance of this real-world piece of code. You will learn very little truth from micro-benchmarks of this sort, and a lot of irrelevant details.

(It is true however that C++ code specialization approach can produce more efficient code in general than the OCaml compilation technique, that compiles only one version of the function for all types but has to make corresponding data representation compromises; in OCaml there can be an overhead related to polymorphism. However, that is observable only in fairly specific cases and you can often just make a specific inlining pass in the small critical section of your code. What you gain in return is much faster compilation (no duplication) and actually readable error messages -- like you might have if concepts were integrated in C++.)

Edit: I was in fact wrong in saying that abstracting on the comparison would prevent the type-directed optimization. The following code, while still not as fast as the inlined-by-hand version, is still noticeably faster than the version using polymorphic comparison:

let find_max1 comp vector =
    let length = Array.length vector in
    let max = ref (vector.(0)) in
    for i=1 to length-1 do
      if comp !max vector.(i) then max := vector.(i)
    done;
    !max

let find_max_int (vector : int array) = find_max1 (fun x y -> x < y) vector
let find_max_float (vector : float array) = find_max1 (fun x y -> x < y) vector
2
votes

In fact you are comparing compile-time specialized functions vs runtime-dispatched. More adequate code on ocaml side would be using functors, that theoretically could reduce the number of indirect calls to zero, but I guess it will still suffer from underoptimization. Another problem is that data representation is uniform and not specialized/inlined for user type in any case.