Fast vector math in Clojure / Incanter

Question

I'm currently looking into Clojure and Incanter as an alternative to R. (Not that I dislike R, but it just interesting to try out new languages.) I like Incanter and find the syntax appealing, but vectorized operations are quite slow as compared e.g. to R or Python.

As an example I wanted to get the first order difference of a vector using Incanter vector operations, Clojure map and R . Below is the code and timing for all versions. As you can see R is clearly faster.

Incanter and Clojure:

(use '(incanter core stats)) 
(def x (doall (sample-normal 1e7))) 
(time (def y (doall (minus (rest x) (butlast x))))) 
"Elapsed time: 16481.337 msecs" 
(time (def y (doall (map - (rest x) (butlast x))))) 
"Elapsed time: 16457.850 msecs"

R:

rdiff <- function(x){ 
   n = length(x) 
   x[2:n] - x[1:(n-1)]} 
x = rnorm(1e7) 
system.time(rdiff(x)) 
   user  system elapsed 
  1.504   0.900   2.561

So I was wondering is there a way to speed up the vector operations in Incanter/Clojure? Also solutions involving the use of loops, Java arrays and/or libraries from Clojure are welcome.

I have also posted this question to Incanter Google group with no responses so far.

UPDATE: I have marked Jouni's answer as accepted, see below for my own answer where I have cleaned up his code a bit and added some benchmarks.

That's going to be difficult, as the intrinsic looping of R is programmed in C or Fortran. Getting faster than that will take quite some effort... — Joris Meys
This is in line with the experience I had previously; Clojure is slower on basic operations almost by a factor of 10. My advice: don't use Clojure if you're looking for performance; use it if you want to have seamless integration on the JVM, etc. You may also find this question relevant: stackoverflow.com/questions/2186709/…. — Shane
Or as diff (already exists). As, for example, { tmp <- rnorm(1e7); all(diff(tmp) == (tmp[-1]-tmp[-length(tmp)])) } #--> TRUE. — Shane
@Matti I can understand you want to compare similar code, but if I compare languages, I use the best tools in each of them. Personally, I like to squeeze the last drop out of the lemon. — Joris Meys

Matti Pastell Matti Pastell · Accepted Answer · 2010-09-29T12:15:44

My final solutions

After all the testing I found two slightly different ways to do the calculation with sufficient speed.

First I've used the function diff with different types of return values, below is the code returning a vector, but I have also timed a version returning a double-array (replace (vec y) with y) and Incanter.matrix (replace (vec y) with matrix y). This function is only based on java arrays. This is based on Jouni's code with some extra type hints removed.

Another approach is to do the calculations with Java arrays and store the values in a transient vector. As you see from the timings this is slightly faster than approach 1 if you wan't the function to return and array. This is implemented in function difft.

So the choice really depends on what you wan't to do with the data. I guess a good option would be to overload the function so that it returns the same type that was used in the call. Actually passing a java array to diff instead of a vector makes ~1s faster.

Timings for the different functions:

diff returning vector:

(time (def y (diff x)))
"Elapsed time: 4733.259 msecs"

diff returning Incanter.matrix:

(time (def y (diff x)))
"Elapsed time: 2599.728 msecs"

diff returning double-array:

(time (def y (diff x)))
"Elapsed time: 1638.548 msecs"

difft:

(time (def y (difft x)))
"Elapsed time: 3683.237 msecs"

The functions

(use 'incanter.stats)
(def x (vec (sample-normal 1e7)))

(defn diff [x]
  (let [y (double-array (dec (count x)))
        x (double-array x)] 
   (dotimes [i (dec (count x))]
     (aset y i
       (- (aget x (inc i))
                   (aget x i))))
   (vec y)))


(defn difft [x]
  (let [y (vector (range n))
        y (transient y)
        x (double-array x)]
   (dotimes [i (dec (count x))]
     (assoc! y i
       (- (aget x (inc i))
                   (aget x i))))
   (persistent! y)))

Fast vector math in Clojure / Incanter

5 Answers

My final solutions

Timings for the different functions:

The functions