34
votes

First, let me define what is short-cut fusion for those of you who don't know. Consider the following array transformation in JavaScript:

var a = [1,2,3,4,5].map(square).map(increment);

console.log(a);

function square(x) {
    return x * x;
}

function increment(x) {
    return x + 1;
}

Here we have an array, [1,2,3,4,5], whose elements are first squared, [1,4,9,16,25], and then incremented [2,5,10,17,26]. Hence, although we don't need the intermediate array [1,4,9,16,25], we still create it.

Short-cut fusion is an optimization technique which can get rid of intermediate data structures by merging some functions calls into one. For example, short-cut fusion can be applied to the above code to produce:

var a = [1,2,3,4,5].map(compose(square, increment));

console.log(a);

function square(x) {
    return x * x;
}

function increment(x) {
    return x + 1;
}

function compose(g, f) {
    return function (x) {
        return f(g(x));
    };
}

As you can see, the two separate map calls have been fused into a single map call by composing the square and increment functions. Hence the intermediate array is not created.


Now, I understand that libraries like Immutable.js and Lazy.js emulate lazy evaluation in JavaScript. Lazy evaluation means that results are only computed when required.

For example, consider the above code. Although we square and increment each element of the array, yet we may not need all the results.

Suppose we only want the first 3 results. Using Immutable.js or Lazy.js we can get the first 3 results, [2,5,10], without calculating the last 2 results, [17,26], because they are not needed.

However, lazy evaluation just delays the calculation of results until required. It does not remove intermediate data structures by fusing functions.

To make this point clear, consider the following code which emulates lazy evaluation:

var List = defclass({
    constructor: function (head, tail) {
        if (typeof head !== "function" || head.length > 0)
            Object.defineProperty(this, "head", { value: head });
        else Object.defineProperty(this, "head", { get: head });

        if (typeof tail !== "function" || tail.length > 0)
            Object.defineProperty(this, "tail", { value: tail });
        else Object.defineProperty(this, "tail", { get: tail });
    },
    map: function (f) {
        var l = this;

        if (l === nil) return nil;

        return cons(function () {
            return f(l.head);
        }, function () {
            return l.tail.map(f);
        });
    },
    take: function (n) {
        var l = this;

        if (l === nil || n === 0) return nil;

        return cons(function () {
            return l.head;
        }, function () {
            return l.tail.take(n - 1);
        });
    },
    mapSeq: function (f) {
        var l = this;
        if (l === nil) return nil;
        return cons(f(l.head), l.tail.mapSeq(f));
    }
});

var nil = Object.create(List.prototype);

list([1,2,3,4,5])
    .map(trace(square))
    .map(trace(increment))
    .take(3)
    .mapSeq(log);

function cons(head, tail) {
    return new List(head, tail);
}

function list(a) {
    return toList(a, a.length, 0);
}

function toList(a, length, i) {
    if (i >= length) return nil;

    return cons(a[i], function () {
        return toList(a, length, i + 1);
    });
}

function square(x) {
    return x * x;
}

function increment(x) {
    return x + 1;
}

function log(a) {
    console.log(a);
}

function trace(f) {
    return function () {
        var result = f.apply(this, arguments);
        console.log(f.name, JSON.stringify([...arguments]), result);
        return result;
    };
}

function defclass(prototype) {
    var constructor = prototype.constructor;
    constructor.prototype = prototype;
    return constructor;
}

As you can see, the function calls are interleaved and only the first three elements of the array are processed, proving that the results are indeed computed lazily:

square [1] 1
increment [1] 2
2
square [2] 4
increment [4] 5
5
square [3] 9
increment [9] 10
10

If lazy evaluation is not used then the result would be:

square [1] 1
square [2] 4
square [3] 9
square [4] 16
square [5] 25
increment [1] 2
increment [4] 5
increment [9] 10
increment [16] 17
increment [25] 26
2
5
10

However, if you see the source code then each function list, map, take and mapSeq returns an intermediate List data structure. No short-cut fusion is performed.


This brings me to my main question: do libraries like Immutable.js and Lazy.js perform short-cut fusion?

The reason I ask is because according to the documentation, they “apparently” do. However, I am skeptical. I have my doubts whether they actually perform short-cut fusion.

For example, this is taken from the README.md file of Immutable.js:

Immutable also provides a lazy Seq, allowing efficient chaining of collection methods like map and filter without creating intermediate representations. Create some Seq with Range and Repeat.

So the developers of Immutable.js claim that their Seq data structure allows efficient chaining of collection methods like map and filter without creating intermediate representations (i.e. they perform short-cut fusion).

However, I don't see them doing so in their code anywhere. Perhaps I can't find it because they are using ES6 and my eyes aren't all too familiar with ES6 syntax.

Furthermore, in their documentation for Lazy Seq they mention:

Seq describes a lazy operation, allowing them to efficiently chain use of all the Iterable methods (such as map and filter).

Seq is immutable — Once a Seq is created, it cannot be changed, appended to, rearranged or otherwise modified. Instead, any mutative method called on a Seq will return a new Seq.

Seq is lazy — Seq does as little work as necessary to respond to any method call.

So it is established that Seq is indeed lazy. However, there are no examples to show that intermediate representations are indeed not created (which they claim to be doing).


Moving on to Lazy.js we have the same situation. Thankfully, Daniel Tao wrote a blog post on how Lazy.js works, in which he mentions that at its heart Lazy.js simply does function composition. He gives the following example:

Lazy.range(1, 1000)
    .map(square)
    .filter(multipleOf3)
    .take(10)
    .each(log);

function square(x) {
    return x * x;
}

function multipleOf3(x) {
    return x % 3 === 0;
}

function log(a) {
    console.log(a);
}
<script src="https://rawgit.com/dtao/lazy.js/master/lazy.min.js"></script>

Here the map, filter and take functions produce intermediate MappedSequence, FilteredSequence and TakeSequence objects. These Sequence objects are essentially iterators, which eliminate the need of intermediate arrays.

However, from what I understand, there is still no short-cut fusion taking place. The intermediate array structures are simply replaced with intermediate Sequence structures which are not fused.

I could be wrong, but I believe that expressions like Lazy(array).map(f).map(g) produce two separate MappedSequence objects in which the first MappedSequence object feeds its values to the second one, instead of the second one replacing the first one by doing the job of both (via function composition).

TLDR: Do Immutable.js and Lazy.js indeed perform short-cut fusion? As far as I know they get rid of intermediate arrays by emulating lazy evaluation via sequence objects (i.e. iterators). However, I believe that these iterators are chained: one iterator feeding its values lazily to the next. They are not merged into a single iterator. Hence they do not “eliminate intermediate representations“. They only transform arrays into constant space sequence objects.

1
Hmm, I wonder whether a composed function is not an intermediate representation as well…Bergi
@AaditMShah: You might be interested in checking out transducers; originally from Clojure but there is JS library as well (github.com/jlongster/transducers.js). You can easily compose functions and use it on arrays, objects, and even on immutable-js and get the high performance guarantees because transducers create no intermediate collections. Check out this blog post as well (jlongster.com/…).Brian Park

1 Answers

39
votes

I'm the author of Immutable.js (and a fan of Lazy.js).

Does Lazy.js and Immutable.js's Seq use short-cut fusion? No, not exactly. But they do remove intermediate representation of operation results.

Short-cut fusion is a code compilation/transpilation technique. Your example is a good one:

var a = [1,2,3,4,5].map(square).map(increment);

Transpiled:

var a = [1,2,3,4,5].map(compose(square, increment));

Lazy.js and Immutable.js are not transpilers and will not re-write code. They are runtime libraries. So instead of short-cut fusion (a compiler technique) they use iterable composition (a runtime technique).

You answer this in your TLDR:

As far as I know they get rid of intermediate arrays by emulating lazy evaluation via sequence objects (i.e. iterators). However, I believe that these iterators are chained: one iterator feeding its values lazily to the next. They are not merged into a single iterator. Hence they do not "eliminate intermediate representations". They only transform arrays into constant space sequence objects.

That is exactly right.

Let's unpack:

Arrays store intermediate results when chaining:

var a = [1,2,3,4,5];
var b = a.map(square); // b: [1,4,6,8,10] created in O(n)
var c = b.map(increment); // c: [2,5,7,9,11] created in O(n)

Short-cut fusion transpilation creates intermediate functions:

var a = [1,2,3,4,5];
var f = compose(square, increment); // f: Function created in O(1)
var c = a.map(f); // c: [2,5,7,9,11] created in O(n)

Iterable composition creates intermediate iterables:

var a = [1,2,3,4,5];
var i = lazyMap(a, square); // i: Iterable created in O(1)
var j = lazyMap(i, increment); // j: Iterable created in O(1)
var c = Array.from(j); // c: [2,5,7,9,11] created in O(n)

Note that using iterable composition, we have not created a store of intermediate results. When these libraries say they do not create intermediate representations - what they mean is exactly what is described in this example. No data structure is created holding the values [1,4,6,8,10].

However, of course some intermediate representation is made. Each "lazy" operation must return something. They return an iterable. Creating these is extremely cheap and not related to the size of the data being operated on. Note that in short-cut fusion transpilation, an intermediate representation is also made. The result of compose is a new function. Functional composition (hand-written or the result of a short-cut fusion compiler) is very related to Iterable composition.

The goal of removing intermediate representations is performance, especially regarding memory. Iterable composition is a powerful way to implement this and does not require the overhead that parsing and rewriting code of an optimizing compiler which would be out of place in a runtime library.


Appx:

This is what a simple implementation of lazyMap might look like:

function lazyMap(iterable, mapper) {
  return {
    "@@iterator": function() {
      var iterator = iterable["@@iterator"]();
      return {
        next: function() {
          var step = iterator.next();
          return step.done ? step : { done: false, value: mapper(step.value) }
        }
      };
    }
  };
}