8
votes

I am performing element-wise operations on two vectors on the order of 50,000 elements in size, and having unsatisfactory performance issues (a few seconds). Are there any obvious performance issues to be made, such as using a different data structure?

(defn boolean-compare
  "Sum up 1s if matching 0 otherwise"
  [proposal-img data-img]
  (sum
  (map
    #(Math/abs (- (first %) (second %)))
    (partition 2 (interleave proposal-img data-img)))))
2

2 Answers

11
votes

Try this:

(apply + (map bit-xor proposal-img data-img)))

Some notes:

  • mapping a function to several collections uses an element from each as the arguments to the function - no need to interleave and partition for this.
  • If your data is 1's and 0's, then xor will be faster than absolute difference

Timed example:

(def data-img (repeatedly 50000 #(rand-int 2)))
(def proposal-img (repeatedly 50000 #(rand-int 2)))
(def sum (partial apply +))

After warming up the JVM...

(time (boolean-compare proposal-img data-img))
;=> "Elapsed time: 528.731093 msecs"
;=> 24802

(time (apply + (map bit-xor proposal-img data-img)))
;=> "Elapsed time: 22.481255 msecs"
;=> 24802
5
votes

You should look at adopting core.matrix if you are interested in good performance for large vector operations.

In particular, the vectorz-clj library (a core.matrix implementation) has some very fast implementations for most common vector operations with double values.

(def v1 (array (repeatedly 50000 #(rand-int 2))))
(def v2 (array (repeatedly 50000 #(rand-int 2))))

(time (let [d (sub v2 v1)]    ;; take difference of two vectors
     (.abs d)                 ;; calculate absolute value (mutate d)
     (esum d)))               ;; sum elements and return result

=> "Elapsed time: 0.949985 msecs"
=> 24980.0

i.e. under 20ns per pair of elements - that's pretty quick: you'd be hard pressed to beat that without resorting to low-level array-fiddling code.