4
votes

I'm working on an implementation of the usual unification algorithm in the usual way: recursive descent through expression trees, adding variable bindings to a hash table along the way, doing the occurs check. In Java as it happens, using override functions to go with the grain of the language, so the part of the implementation dealing with variables is:

@Override
public boolean unify(Term a, Map<Var, Term> map) {
    if (this == a) {
        return true;
    }
    Term x = map.get(this);
    if (x != null) {
        return x.unify(a, map);
    }
    if (a instanceof Var) {
        x = map.get((Var) a);
        if (x != null) {
            return x.unify(this, map);
        }
    }
    if (a.occurs(this)) {
        return false;
    }
    map.put(this, a);
    return true;
}

This version is correct and for many cases quite fast, but it has a problem, which arises particularly when using it for type inference. When unifying lots of variables to the same target, it ends up with a set of bindings that basically looks like this:

a=b
b=c
c=d
d=e

Then each time a new variable must be unified to the same thing, it has to step through the chain one step at a time to find where it's at by now, which takes O(N) time, which means unifying a collection of variables to the same thing takes total time O(N^2).

Probably the best solution is to implement some kind of shortcut, something along the lines of updating a to point directly to the current end target, whatever that may be. It's not entirely obvious how to do this in a way that will be correct and efficient in all cases.

Unification has been well known and quite widely used for decades, so I would imagine the solution to this must also have been known for decades, but the several discussions I've seen of unification, don't seem to mention it.

What exactly is the way to modify the algorithm to deal with it?

2
I have no idea (at first glance), but excellent question!Elliott Frisch

2 Answers

2
votes

I agree that shortcutting is the right approach. You should be able to change this:

    return x.unify(a, map);

to this:

    if (! x.unify(a, map)) {
        return false;
    }
    map.put(this, map.get(x));
    return true;

and this:

        return x.unify(this, map);

to this:

        if (! x.unify(this, map)) {
            return false;
        }
        map.put(a, map.get(x));
        return true;

(Each individual map.put is only cutting out one level of indirection, but because you do it immediately after a recursive call that will also have cut out any unnecessary indirection, you know that there is only one level of indirection for it to cut out.)

This does not entirely prevent chains, because it's possible to unify a with b and then b with c and so on; but each chain is fully dealt with the first time it's subsequently re-encountered, so you still get amortized constant time.

2
votes

Here's an idea: All the vars connected by = are an equivalence class. So you can make the map

unify(Term a, Map<VarClass, Term> map) {...

where a VarClass is implemented with the classical union-find algorithm for disjoint sets.

When you discover a variable pair x=y that you'd formerly have added to the map, add x to the VarClass containing y instead (creating one and adding it with a mutable empty placeholder mapping if none yet exists).

The Term on the right hand side of the map is never a Var.

The union-find operations are for all practical purposes amortized constant time and quite fast in practice.