Dynamic dispatch in Haskell

Question

Programs written in, for example, Java rely a lot on dynamic dispatch.

How are such programs expressed in functional languages such as Haskell?

In other words, what is the Haskell way of expressing the idea underneath "dynamic dispatch"?

(Maybe the simple fix would be s/dynamic dispatch/virtual methods/g?) — glaebhoerl
Dynamic dispatch is the way Java gets around the problem of dynamic typing. It doesn't know at compile time what the data will be, so it tags all the data with more data saying what type it is then looks up which method to apply at runtime. This is slow. It's not necessary in a staticly typed language because the compiler knows what type stuff is at compile time and can compile in the correct code or a switch to the correct functions. (There are ways of making things more dynamic - see answers.) Java interfaces become type classes and Java generics are copied from parametric polymorphism. — AndrewC

glaebhoerl glaebhoerl · Accepted Answer · 2012-10-28T16:14:27

The answer is deceptively simple: higher-order functions. An object with virtual methods in an OO language is nothing more than a glorified record of functions together with some local state. In Haskell you can use records of functions directly, and store the local state in their closures.

More concretely, an OO object consists of:

A pointer (vptr) to a vtable (virtual method table), which contains implementations for the virtual methods of the object's class. In other words, a bunch of function pointers; a record of functions. Notably each function has a hidden parameter which is the object itself, which is passed implicitly.
The data members of the object (the local state)

A lot of the time the whole edifice of objects and virtual functions feels like an elaborate workaround for the lack of support for closures.

For example consider Java's Comparator interface:

public interface Comparator<T> {
    int compare(T o1, T o2); // virtual (per default)
}

And suppose you want to use it to sort a list of strings based on the strings' Nth characters (assume they are long enough). You would define a class:

public class MyComparator implements Comparator<String> {
    private final int _n;

    MyComparator(int n) {
        _n = n;
    }

    int compare(String s1, String s2) {
        return s1.charAt(_n) - s2.charAt(_n);
    }
}

And then you use it:

Collections.sort(myList, new MyComparator(5));

In Haskell you would do it like this:

sortBy :: (a -> a -> Ordering) -> [a] -> [a]

myComparator :: Int -> (String -> String -> Ordering)
myComparator n = \s1 s2 -> (s1 !! n) `compare` (s2 !! n)
-- n is implicitly stored in the closure of the function we return

foo = sortBy (myComparator 5) myList

If you're not familiar with Haskell, here's how it would roughly look in a kind of pseudo-Java: (I'm only going to do this once)

public void <T> sortBy(List<T> list, Ordering FUNCTION(T, T) comparator) { ... }

public (Ordering FUNCTION(String, String)) myComparator(int n) {
    return FUNCTION(String s1, String s2) {
        return s1[n].compare(s2[n]);
    }
}

public void foo() {
    sortBy(myList, myComparator(5));
}

Note that we did not define any types. All we used are functions. In both cases the "payload" that we passed to the sort function was a function that takes two elements and gives their relative ordering. In one case this was accomplished by defining a type implementing an interface, implementing its virtual function in the appropriate way, and passing an object of that type; in the other case we just passed a function directly. In both cases we stored an internal integer in the thing we passed to the sort function. In one case this was done by adding a private data member to our type, in the other by simply referring to it in our function, causing it to be retained in the function's closure.

Consider the more elaborate example of a widget with event handlers:

public class Widget {
    public void onMouseClick(int x, int y) { }
    public void onKeyPress(Key key) { }
    public void paint() { }
    ...
}

public class MyWidget extends Widget {
    private Foo _foo;
    private Bar _bar;
    MyWidget(...) {
        _foo = something;
        _bar = something; 
    }
    public void onMouseClick(int x, int y) {
        ...do stuff with _foo and _bar...
    }
}

In Haskell you could do it like this:

data Widget = Widget {
    onMouseClick :: Int -> Int -> IO (),
    onKeyPress   :: Key -> IO (),
    paint        :: IO (),
    ...
}

constructMyWidget :: ... -> IO Widget
constructMyWidget = do
    foo <- newIORef someFoo
    bar <- newIORef someBar
    return $ Widget {
        onMouseClick = \x y -> do
            ... do stuff with foo and bar ...,
        onKeyPress = \key -> do ...,
        paint = do ...
    }

Note again that after the initial Widget, we did not define any types. We only wrote a function to construct a record of functions and store things in their closures. Which, most of the time, is also the only reason to define a subclass in an OO language. The only difference from our previous example is that instead of one function there's multiple, which in the Java case is encoded by simply putting more than one function in the interface (and its implementations), and in Haskell by passing a record of functions instead of a single function. (We could have passed a record containing a single function in the previous example, but we didn't feel like it.)

(It's worth noting somewhere that a lot of the time you don't need dynamic dispatch. If you just wanted to sort a list based on the default ordering for the type, then you would simply use sort :: Ord a => [a] -> [a], which uses the Ord instance defined for the given a type, which is selected statically.)

Type-Based Dynamic Dispatch

One difference between the Java approach and the Haskell approach above is that with the Java approach, the behaviour of an object (except for its local state) is determined by its type (or less charitably, each implementation requires a new type). In Haskell we're making our records of functions however way we like. Most of the time this is a pure win (flexibility gained, nothing lost), but suppose that for some reason we want it the Java way. In that case the way to go, as mentioned by other answers, is type classes and existentials.

To continue our Widget example, suppose that we want the implementation of a Widget to follow from its type (to require a new type for each implementation). We define a type class to map the type to its implementation:

-- the same record as before, we just gave it a different name
data WidgetImpl = WidgetImpl {
    onMouseClick :: Int -> Int -> IO (),
    onKeyPress   :: Key -> IO (),
    paint        :: IO (),
    ...
}

class IsWidget a where
    widgetImpl :: a -> WidgetImpl

data Widget = forall a. IsWidget a => Widget a

sendClick :: Int -> Int -> Widget -> IO ()
sendClick x y (Widget a) = onMouseClick (widgetImpl a) x y

data MyWidget = MyWidget {
    foo :: IORef Foo,
    bar :: IORef Bar
}

constructMyWidget :: ... -> IO MyWidget
constructMyWidget = do
    foo_ <- newIORef someFoo
    bar_ <- newIORef someBar
    return $ MyWidget {
        foo = foo_,
        bar = bar_
    }

instance IsWidget MyWidget where
    widgetImpl myWidget = WidgetImpl {
            onMouseClick = \x y -> do
                ... do stuff with (foo myWidget) and (bar myWidget) ...,
            onKeyPress = \key -> do ...,
            paint = do ...
        }

It's a bit awkward that we have a class only to get a record of functions, which we then have to take the functions out of separately. I only did it this way to illustrate separate aspects of type classes: they're also just glorified records of functions (which we use below) together with some magic where the compiler inserts the appropriate record based on the inferred type (which we use above, and continue using below). Let's simplify:

class IsWidget a where
    onMouseClick :: Int -> Int -> a -> IO ()
    onKeyPress   :: Key -> a -> IO ()
    paint        :: a -> IO ()
    ...

instance IsWidget MyWidget where
    onMouseClick x y myWidget = ... do stuff with (foo myWidget) and (bar myWidget) ...
    onKeyPress key myWidget = ...
    paint myWidget = ...

sendClick :: Int -> Int -> Widget -> IO ()
sendClick x y (Widget a) = onMouseClick x y a

-- the rest is unchanged from above

This style is often adopted by people coming from OO languages, because it's more familiar and closer to a one-for-one mapping from the way OO languages do it. But for most purposes it's just more elaborate and less flexible than the approach described in the first section. The reason is that if the only significant thing about the various Widgets is the way they implement the widget functions, then there's little point in making types, instances of the interface for those types, and then abstracting away the underlying type again by putting them in an existential wrapper: it's simpler to just pass the functions around directly.

One advantage I can think of is that while Haskell doesn't have subtyping, it does have "subclassing" (probably better referred to as sub-interfacing or sub-constraining). For example you could do:

class IsWidget a => IsWidgetExtra a where
    ...additional methods to implement...

and then with any type for which you have IsWidgetExtra, you can also use the methods of IsWidget seamlessly. The only alternative with the record-based approach is to have records-within-records, which involves some manual wrapping-and-unwrapping of the inner records. But this would only be advantageous if you wanted to explicitly emulate the deep class hierarchy of an OO language, which in turn you would only do if you wanted to make life difficult for yourself. (Note also that Haskell doesn't have any built-in way to dynamically down-cast from IsWidget to IsWidgetExtra. But there is ifcxt)

(What about the advantages of the record-based approach? Besides not having to define a new type every time you want to do a new thing, records are simple value-level things, and values are much easier to manipulate than types. You could, for example, write a function that takes a Widget as an argument and makes a new Widget based on it, with some things different and others kept the same. This is sort of like subclassing from a template parameter in C++, just less confusing.)

Glossary

Higher-order function: a function that takes other functions as arguments (or returns them as results)
Record: struct (class with public data members and nothing else). Also known as a dictionary.
Closure: Functional languages (and many others) allow you to define local functions (functions within functions, lambdas) that make reference to things in scope at the definition site (for example, the arguments of the outer function) that you would normally expect not to be kept around, but are, in the function's "closure". Alternately, if you have a function like plus that takes two ints and returns an int, you can apply it to just one argument, say 5, and the result will be a function that takes an int and returns an int, by adding 5 to it - in that case the 5 is also stored in the resulting function's closure. (In other contexts "closure" is also sometimes used to mean "a function with a closure".)
Type class: not the same as a class in an OO language. Sort of like an interface, but also very different. See here.

EDIT 29-11-14: While I think the kernel of this answer is still essentially correct (HOFs in Haskell correspond to virtual methods in OOP), my value judgements have grown some nuance since I wrote it. In particular, I now think that neither Haskell's nor OOP's approach is strictly "more fundamental" than the other. See this reddit comment.

Dynamic dispatch in Haskell

4 Answers

Type-Based Dynamic Dispatch

Glossary