2
votes

I'm experimenting with the clojure reducers library, and I'm a little confused as to when the combining function is called as part of the reducers/fold function. To see what was being called when, I created the below example:

(def input (range 1 100))

(defn combine-f
  ([]
    (println "identity-combine")
    0)
  ([left right]
    (println (str "combine " left " "  right))
    (max (reduce max 0 left)
         (reduce max 0 right))))

(defn reduce-f
  ([]
    (println "identity-reduce")
    0)
  ([result input]
    (println (str "reduce " result " "  input))
    (max result input)))

(clojure.core.reducers/fold 10 combine-f reduce-f input)

;prints
identity-combine
reduce 0 1
reduce 1 2
reduce 2 3
reduce 3 4
.
.
.
reduce 98 99

I was expecting that when fold executes, the input would be partitioned into groups of approximately size 10, with each group reduced using reduce-f, and then combined using combine-f. However running the above code, it seems that the combine function is only called once as an identity, and the entire input reduced using reduce-f. Can anyone explain why I'm seeing this behaviour?

Thanks,

Matt.

1

1 Answers

1
votes

Unfortunately, range cannot at the moment be realized in parallel. It seems there are foldable implementations around as an enhancement ticket, but I can't seem to find right now why they haven't been accepted. As is, folds over a range will always proceed like a straight reduce, except for the identity call to the combine operator. For comparison, a vector provides random access and so is foldable:

(def input (vec (range 1 50)))

(defn combine-f
  ([]
    (println "identity-combine")
    Long/MIN_VALUE)
  ([left right]
    (println (str "combine " left " "  right))
    (max left right)))

(defn reduce-f
  ([]
    (println "identity-reduce")
    Long/MIN_VALUE)
  ([result input]
    (println (str "reduce " result " "  input))
    (max result input)))

(clojure.core.reducers/fold 10 combine-f reduce-f input)

with output:

identity-combineidentity-combineidentity-combine


reduce -9223372036854775808 1

reduce -9223372036854775808 25reduce -9223372036854775808 19


reduce -9223372036854775808 13reduce 25 26


reduce 26 27
reduce 1 2

reduce 27 28

reduce 28 29

reduce 29 30



reduce 2 3
reduce 19 20

reduce 3 4

identity-combinereduce 4 5

reduce 5 6reduce 13 14


reduce 14 15


reduce 20 21identity-combine

reduce 21 22

reduce 15 16



reduce -9223372036854775808 31

reduce 22 23reduce 16 17reduce -9223372036854775808 7


reduce 7 8


reduce 8 9

reduce 23 24

reduce 31 32
reduce 17 18


reduce 9 10

reduce 10 11


reduce 11 12

identity-combine
reduce 32 33

combine 18 24


combine 6 12identity-combine
reduce -9223372036854775808 37


reduce 33 34

reduce 37 38reduce -9223372036854775808 43
combine 12 24



reduce 43 44reduce 34 35reduce 38 39

reduce 44 45

reduce 35 36


reduce 45 46
reduce 39 40

reduce 46 47
combine 30 36

reduce 47 48



reduce 48 49

reduce 40 41

reduce 41 42

combine 42 49

combine 36 49
combine 24 49

which you may notice is a lot more jumbled because of non-serialized accessing of *out*.
(I needed to alter combine-f a little because it was trying and failing to reduce over a single long. Switching to Long/MIN_VALUE doesn't affect this example much but is the identity element of max over longs, so I figured why not?).