When To Use reduce Or Instead Use pmap

Question

Edit:

The data really looks like this.

1,000-00-000,GRABBUS,OCTOPUS,,M,26-Nev-12,,05 FRENCH TOAST ROAD,,VACANT,ZA,1867,(001) 111-1011,(002) 111-1000,,

I've got to make it look silly, because it contains proprietary information.

This is what it looks like before using clojure-csv to create a vector of vectors.

I used post-parsed numbers to make it easy, but they're not being reduced to a value. I want to cherry pick certain columns from the clojure-csv parsed data and create a smaller csv row.

Please accept my apologies for any confusion.

End Edit:

How do you make a determination of when to use reduce or instead use pmap?

A while ago, I got a comment on my blog concerning reduce. Specifically the comment said reduce in general could not be parallelized, but map (pmap) could be.

When would using or not using reduce make a difference, and for examples like the following, does it make a difference?

Thank You.

(def csv-row [1 2 3 4 5 6 7 8 9])
(def col-nums [0 1 4])

(defn reduce-csv-rowX
    "Accepts a csv-row and a list of columns to extract, and
     reduces the csv-row to the selected list using a list comprehension."
    [csv-row col-nums]
        (for [col-num col-nums
            :let [part-row (nth csv-row col-num nil)]]
            part-row))

(defn reduce-csv-row
    "Accepts a csv-row and a list of columns to extract, and
     reduces the csv-row to the selected list."
    [csv-row col-nums]
    (reduce
        (fn [out-csv-row col-num]
            (let [out-val (nth csv-row col-num nil)]
                (if-not (nil? out-val)
                    (conj out-csv-row out-val))))
        []
        col-nums))

Edit:

(defn reduce-csv-row "Accepts a csv-row and a list of columns to extract, and reduces the csv-row to the selected list." [csv-row col-nums] (reduce (fn [out-csv-row col-num] (let [out-val (nth csv-row col-num nil)] (conj out-csv-row out-val))) [] col-nums))

Retief Retief · Accepted Answer · 2012-03-22T17:17:15

In general, you want to use the function that lets you write the simplest code. This usually means the most specific function possible. In this case, you can think of your operation as converting a list of col-nums into a list of the values of the row at the column. This corresponds to map, so you probably want to use map. You can write it with reduce, but in this case, you are re-implementing map in your call to reduce, so it is probably the wrong method.

However, there are times when reduce is the right choice. If you are trying to trying to "reduce" a list to an arbitrary value, map will not help you much at all. In this case, reduce is what you want, and since your operation is not parallelizable, reduce is also not parallelizable.

If you are further interested in why your reduce code is not ideal, if we abstract away the application-specific code, we get

(reduce
 (fn [out-list current-val]
   (let [out-val (f current-val)]
     (if-not (nil? out-val)
       (conj out-list out-val))))
 []
 col-nums)

The one complicating factor is the if-not call. At the moment, its buggy - if out-val is ever nil, you will throw away everything you have found up to this point and start over (the return from (if-not (nil? out-val) (conj out-list out-val)) is nil when out-val is nil, so nil will be used as the next out-list). Since your other implementation does not have any nil check and this nil check is buggy (and so probably has never been used), I am assuming that it can be ignored. At this point, your code is

(reduce
 (fn [out-list current-val]
   (let [out-val (f current-val)]
     (conj out-list out-val)))
 []
 col-nums)

which is a perfectly valid (albeit non-lazy) implementation of map. Using an actual call to map instead lets you eliminate all of this code that is not actually related to your specific problem and instead focus on what you are actually trying to do. You can see the effect this has by looking at ivant's solution.

When To Use reduce Or Instead Use pmap

Edit:

End Edit:

Edit:

4 Answers