3
votes

What if map and doseq had a baby? I'm trying to write a function or macro like Common Lisp's mapc, but in Clojure. This does essentially what map does, but only for side-effects, so it doesn't need to generate a sequence of results, and wouldn't be lazy. I know that one can iterate over a single sequence using doseq, but map can iterate over multiple sequences, applying a function to each element in turn of all of the sequences. I also know that one can wrap map in dorun. (Note: This question has been extensively edited after many comments and a very thorough answer. The original question focused on macros, but those macro issues turned out to be peripheral.)

This is fast (according to criterium):

(defn domap2
  [f coll]
  (dotimes [i (count coll)]
    (f (nth coll i))))

but it only accepts one collection. This accepts arbitrary collections:

(defn domap3
  [f & colls]
  (dotimes [i (apply min (map count colls))]
    (apply f (map #(nth % i) colls))))

but it's very slow by comparison. I could also write a version like the first, but with different parameter cases [f c1 c2], [f c1 c2 c3], etc., but in the end, I'll need a case that handles arbitrary numbers of collections, like the last example, which is simpler anyway. I've tried many other solutions as well.

Since the second example is very much like the first except for the use of apply and the map inside the loop, I suspect that getting rid of them would speed things up a lot. I have tried to do this by writing domap2 as a macro, but the way that the catch-all variable after & is handled keeps tripping me up, as illustrated above.

Other examples (out of 15 or 20 different versions), benchmark code, and times on a Macbook Pro that's a few years old (full source here):

(defn domap1
  [f coll]
  (doseq [e coll] 
    (f e)))

(defn domap7
  [f coll]
  (dorun (map f coll)))

(defn domap18
  [f & colls]
  (dorun (apply map f colls)))

(defn domap15
  [f coll] 
  (when (seq coll)
    (f (first coll))
    (recur f (rest coll))))

(defn domap17
  [f & colls]
  (let [argvecs (apply (partial map vector) colls)] ; seq of ntuples of interleaved vals
    (doseq [args argvecs]
      (apply f args))))

I'm working on an application that uses core.matrix matrices and vectors, but feel free to substitute your own side-effecting functions below.

(ns tst
  (:use criterium.core
        [clojure.core.matrix :as mx]))

(def howmany 1000)
(def a-coll (vec (range howmany)))
(def maskvec (zero-vector :vectorz howmany))

(defn unmaskit!
  [idx]
  (mx/mset! maskvec idx 1.0)) ; sets element idx of maskvec to 1.0

(defn runbench
  [domapfn label]
  (print (str "\n" label ":\n"))
  (bench (def _ (domapfn unmaskit! a-coll))))

Mean execution times according to Criterium, in microseconds:

domap1: 12.317551 [doseq]
domap2: 19.065317 [dotimes]
domap3: 265.983779 [dotimes with apply, map]
domap7: 53.263230 [map with dorun]
domap18: 54.456801 [map with dorun, multiple collections]
domap15: 32.034993 [recur]
domap17: 95.259984 [doseq, multiple collections interleaved using map]

EDIT: It may be that dorun+map is the best way to implement domap for multiple large lazy sequence arguments, but doseq is still king when it comes to single lazy sequences. Performing the same operation as unmask! above, but running the index through (mod idx 1000), and iterating over (range 100000000), doseq is about twice as fast as dorun+map in my tests (i.e. (def domap25 (comp dorun map))).

1
Your actual question of "How to write variadic Clojure macro that takes collections as arguments" is totally lost. Consider editing to just the parts relevant to the actual question.A. Webb
Thanks @A.Webb . Moved main question information to the top. I expected folks to try to steer me away from the question, as you have. I don't mind that, but so far it seems to me that it's worth answering for my case. I've now added additional versions of domap with timings in the "appendix" section. As you can see, dorun+map (domap7 and domap8) is much slower than doseq (domap1) and dotimes (domap2 and domap3). (I resorted to dotimes because I couldn't figure out a more efficient way to walk collections in parallel (see domap15 and domap17).)Mars
I edited and replaced domap8 with domap18, and replaced times after a new test. domap8 had used apply (partial map f). You reminded me that I can just say apply map f instead.Mars
One more thing: I agree--I would think that dotimes would be slow. Maybe I just haven't tried it on long enough collections. There should be a way to make a multiple-collection version that's as fast as the single-collection doseq version.Mars
Since I posted this question and it was answered, run! has been added to the core language. It doesn't quite do what I wanted, but it's related and it's worthwhile to know about it.Mars

1 Answers

5
votes

You don't need a macro, and I don't see why a macro would be helpful here.

user> (defn do-map [f & lists] (apply mapv f lists) nil)
#'user/do-map
user> (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil

note do-map here is eager (thanks to mapv) and only executes for side effects

Macros can use varargs lists, as the (useless!) macro version of do-map demonstrates:

user> (defmacro do-map-macro [f & lists] `(do (mapv ~f ~@lists) nil))
#'user/do-map-macro
user> (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))
32
35
38
nil
user> (macroexpand-1 '(do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40)))
(do (clojure.core/mapv (comp println +) (range 2 6) (range 8 11) (range 22 40)) nil)

Addendum: addressing the efficiency / garbage-creation concerns:
note that below I truncate the output of the criterium bench function, for conciseness reasons:

(defn do-map-loop
  [f & lists]
  (loop [heads lists]
    (when (every? seq heads)
      (apply f (map first heads))
      (recur (map rest heads)))))


user> (crit/bench (with-out-str (do-map-loop (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
            Execution time mean : 11.367804 µs
...

This looks promising because it doesn't create a data structure that we aren't using anyway (unlike mapv above). But it turns out it is slower than the previous (maybe because of the two map calls?).

user> (crit/bench (with-out-str (do-map-macro (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 7.427182 µs
...
user> (crit/bench (with-out-str (do-map (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 8.355587 µs
...

Since the loop still wasn't faster, let's try a version which specializes on arity, so that we don't need to call map twice on every iteration:

(defn do-map-loop-3
  [f a b c]
  (loop [[a & as] a
         [b & bs] b
         [c & cs] c]
    (when (and a b c)
      (f a b c)
      (recur as bs cs))))

Remarkably, though this is faster, it is still slower than the version that just used mapv:

user> (crit/bench (with-out-str (do-map-loop-3 (comp println +) (range 2 6) (range 8 11) (range 22 40))))
...
             Execution time mean : 9.450108 µs
...

Next I wondered if the size of the input was a factor. With larger inputs...

user> (def test-input (repeatedly 3 #(range (rand-int 100) (rand-int 1000))))
#'user/test-input
user> (map count test-input)
(475 531 511)
user> (crit/bench (with-out-str (apply do-map-loop-3 (comp println +) test-input)))
...
            Execution time mean : 1.005073 ms
...
user> (crit/bench (with-out-str (apply do-map (comp println +) test-input)))
...
             Execution time mean : 756.955238 µs
...

Finally, for completeness, the timing of do-map-loop (which as expected is slightly slower than do-map-loop-3)

user> (crit/bench (with-out-str (apply do-map-loop (comp println +) test-input)))
...
             Execution time mean : 1.553932 ms

As we see, even with larger input sizes, mapv is faster.

(I should note for completeness here that map is slightly faster than mapv, but not by a large degree).