9
votes

In her talk The Future Of Clojure Bodil makes the following claim:

Guy Steele gave a talk at ICFP called Organizing Functional Code for Parallel Execution (or, foldl and foldr Considered Slightly Harmful) (Also in ACM).

In it Guy Steele asserts on slide 70:

As soon as you say “first, SUM = 0” you are hosed. Accumulators are BAD for parallelism. Note that foldl and foldr, though functional, are fundamentally accumulative.

Which is kind of interesting. So Bodil is saying Guy Steele is calling out a problem. Then she claimed that Rich addressed it with Reducers (and Transducers which are a continuation of this line of thinking). In the Transducers talk at 16:11 we see Rich calling out some particular papers about foldr.

Rich effectively says that folds are composable - and you can use them to build other higher order functions such as map and filter.

My question is - was Bodil right? has Rich solved the problem set up by Guy Steele? Do reducers (in Clojure) address the scaling foldr accumulation issue outlined by Guy Steele?

2
You can remove the clojure tag if you feel the question has a broader audience.leppie
Be aware that currently (clojure 1.7.0-alpha4) transducers have no parallel implementation invoking fork-join like reducers do. Parallel transducers are still in the cards for future versions though.NielsK

2 Answers

10
votes

Yes, reducers do solve this problem, because they have slightly different semantics from the type of fold that Guy Steele is referring to (although the effect can be very similar in practice).

foldr and foldl take a single function argument, which is applied to each member of a collection (along with an accumulator value) in turn. As Steele says, they are intrinsically sequential (which is why it's meaningful to have "left" and "right" variants). Clojure's clojure.core/reduce function works this way too.

clojure.core.reducers/fold, on the other hand, takes two function arguments, a reducing function and a combining function. The collection is divided into chunks, each of which is reduced using the reducing function, and these results are then combined using the combining function. Graphically this looks like this:

Fold Tree

(this diagram comes from my book Seven Concurrency Models in Seven Weeks, which includes a section on Reducers).

Sometimes, you can use a single function for both reducing and combining (summing a sequence of integers, for instance). But in other cases, this isn't possible.

When using a single function for both reducing and combining, clojure.core.reducers/fold will give the same results as a sequential fold if and only if the reducing/combining function is associative.

2
votes

The original blog post about reducers (Reducers - A Library and Model for Collection Processing) introduces the fold function under the section Simplicity is Opportunity.

The primary signature of fold takes a combining function, a reducing function, and a collection and returns the result of combining the results of reducing subsegments of the collection, potentially in parallel.

Clojure data structures take advantage of the possible paralellism.

It does this when the collection is amenable to parallel subdivision. Ideal candidates are data structures built from trees. Clojure vectors and maps are trees, and have parallel implementations of fold based upon the ForkJoin framework.

And if the collection is a sequence then the good old reduce function is applied.

What if the underlying collection is not amenable (e.g. is a sequence)? fold just devolves into reduce, producing the same semantic, if not physical, result.