1
votes

Edit:

The data really looks like this.

1,000-00-000,GRABBUS,OCTOPUS,,M,26-Nev-12,,05 FRENCH TOAST ROAD,,VACANT,ZA,1867,(001) 111-1011,(002) 111-1000,,

I've got to make it look silly, because it contains proprietary information.

This is what it looks like before using clojure-csv to create a vector of vectors.

I used post-parsed numbers to make it easy, but they're not being reduced to a value. I want to cherry pick certain columns from the clojure-csv parsed data and create a smaller csv row.

Please accept my apologies for any confusion.

End Edit:

How do you make a determination of when to use reduce or instead use pmap?

A while ago, I got a comment on my blog concerning reduce. Specifically the comment said reduce in general could not be parallelized, but map (pmap) could be.

When would using or not using reduce make a difference, and for examples like the following, does it make a difference?

Thank You.

(def csv-row [1 2 3 4 5 6 7 8 9])
(def col-nums [0 1 4])

(defn reduce-csv-rowX
    "Accepts a csv-row and a list of columns to extract, and
     reduces the csv-row to the selected list using a list comprehension."
    [csv-row col-nums]
        (for [col-num col-nums
            :let [part-row (nth csv-row col-num nil)]]
            part-row))

(defn reduce-csv-row
    "Accepts a csv-row and a list of columns to extract, and
     reduces the csv-row to the selected list."
    [csv-row col-nums]
    (reduce
        (fn [out-csv-row col-num]
            (let [out-val (nth csv-row col-num nil)]
                (if-not (nil? out-val)
                    (conj out-csv-row out-val))))
        []
        col-nums))

Edit:

(defn reduce-csv-row "Accepts a csv-row and a list of columns to extract, and reduces the csv-row to the selected list." [csv-row col-nums] (reduce (fn [out-csv-row col-num] (let [out-val (nth csv-row col-num nil)] (conj out-csv-row out-val))) [] col-nums))

4

4 Answers

5
votes

In general, you want to use the function that lets you write the simplest code. This usually means the most specific function possible. In this case, you can think of your operation as converting a list of col-nums into a list of the values of the row at the column. This corresponds to map, so you probably want to use map. You can write it with reduce, but in this case, you are re-implementing map in your call to reduce, so it is probably the wrong method.

However, there are times when reduce is the right choice. If you are trying to trying to "reduce" a list to an arbitrary value, map will not help you much at all. In this case, reduce is what you want, and since your operation is not parallelizable, reduce is also not parallelizable.

If you are further interested in why your reduce code is not ideal, if we abstract away the application-specific code, we get

(reduce
 (fn [out-list current-val]
   (let [out-val (f current-val)]
     (if-not (nil? out-val)
       (conj out-list out-val))))
 []
 col-nums)

The one complicating factor is the if-not call. At the moment, its buggy - if out-val is ever nil, you will throw away everything you have found up to this point and start over (the return from (if-not (nil? out-val) (conj out-list out-val)) is nil when out-val is nil, so nil will be used as the next out-list). Since your other implementation does not have any nil check and this nil check is buggy (and so probably has never been used), I am assuming that it can be ignored. At this point, your code is

(reduce
 (fn [out-list current-val]
   (let [out-val (f current-val)]
     (conj out-list out-val)))
 []
 col-nums)

which is a perfectly valid (albeit non-lazy) implementation of map. Using an actual call to map instead lets you eliminate all of this code that is not actually related to your specific problem and instead focus on what you are actually trying to do. You can see the effect this has by looking at ivant's solution.

3
votes

A solution using map can look like this:

(defn reduce-csv-rowM [csv-row col-nums]
  (pmap (fn [pos] (nth csv-row pos)) col-nums))

It is trivially parallelizable, and if csv-row is a vector nth is quite fast, so it's fine.

So in your case I think the map solution is the best, because it's easier to understand than the other two, and can be faster as well.

In general case map and reduce aren't interchangeable and in fact are quite useful together (like in google's map-reduce techniques).

1
votes

Resuce and map make an excellent pare and they are used together so often that map-reduce is now a common industry term. in general map is used to transform data into a form that can be aggregated and reduce aggregates data into a single answer. reduce can be parallelized if the reducing function is commutative. for instance parallel reduction is fine with + and works less well with /.

  • use map if you want to produce a collection (like a list or vector)
  • use reduce if you want to produce a single value like 42
1
votes

You can also use select-keys, which returns the response in a bit different format:

(select-keys [1 2 3 4 5 6 7 8 9] [0 1 4])
;==> {4 5, 1 2, 0 1}

That is, a map from the key to the value. It looks better for maps, but works on other seqs as well.

You can also take a look at clojure.set/project, which is like select-keys (and actually uses it internally), but for the whole table, instead of just one row.