6
votes

I tried to speed up an R function by porting it to Julia, but to my surprise Julia was slower. The function sequentially updates a list of vectors (array of arrays in Julia). Beforehand the index of the list element to be updated is unknown and the length of the new vector is unknown. I have written a test function that demonstrates the behavior.

Julia

function MyTest(n)
  a = [[0.0] for i in 1:n]
    for i in 1:n
      a[i] = cumsum(ones(i))
    end  
  a
end

R

MyTest <- function(n){
  a <- as.list(rep(0, n))
  for (i in 1:n) 
    a[[i]] <- cumsum(rep(1, i))
  a
}

By setting n to 5000, 10000 and 20000, typical computing times are (median of 21 tests):

  • R: 0.14, 0.45, and 1.28 seconds
  • Julia: 0.31, 3.38, and 27.03 seconds

I used a windows-laptop with 64 bit Julia-1.3.1 and 64 bit R-3.6.1.

Both these functions use 64 bit floating-point types. My real problem involves integers and then R is even more favorable. But integer comparison isn’t fair since R uses 32 bit integers and Julia 64 bit. Is it something I can do to speed up Julia or is really Julia much slower than R in this case?

2
How are you benchmarking the functions? In Julia, with using BenchmarkTools; @btime MyTest(5000); I get 39.342 ms on my personal laptop, and using BenchmarkTools; @btime MyTest(20000); takes 766.407 msgiordano
Things are usually slower than they really are when you use @timelogankilpatrick
Also, see "Cumulative sum a vector. See also cumsum! to use a preallocated output array, both for performance and to control the precision of the output (e.g. to avoid overflow). Examples ≡≡≡≡≡≡≡≡≡≡ julia> cumsum([1, 1, 1]) 3-element Array{Int64,1}: 1 2 3 julia> cumsum([fill(1, 2) for i in 1:3]) 3-element Array{Array{Int64,1},1}: [1, 1] [2, 2] [3, 3] "logankilpatrick
Even with @time MyTest(20000); I get 1.005717 secondsgiordano
You can use Int32 in julia as well, should you want to.Fredrik Bagge

2 Answers

3
votes

I don't quite see how you get your test results. Assuming you want 32 bit integers, as you said, then we have

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           for i in 1:n
               a[i] = cumsum(ones(i))
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  1.108 s (111810 allocations: 3.73 GiB)

When we only get rid of those allocations, we already get down to the following:

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           @inbounds for i in 1:n
               a[i] = collect(UnitRange{Int32}(1, i))
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  115.702 ms (35906 allocations: 765.40 MiB)

Further devectorization does not even help:

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           @inbounds for i in 1:n
               v = Vector{Int32}(undef, i)
               v[1] = 1
               @inbounds for j = 2:i
                   v[j] = v[j-1] + 1
               end
               a[i] = v
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  188.856 ms (35906 allocations: 765.40 MiB)

But with a couple of threads (I assume the inner arrays are independent), we get 2x speed-up again:

julia> Threads.nthreads()
4

julia> function mytest(n)
           a = Vector{Vector{Int32}}(undef, n)
           Threads.@threads for i in 1:n
               v = Vector{Int32}(undef, i)
               v[1] = 1
               @inbounds for j = 2:i
                   v[j] = v[j-1] + 1
               end
               a[i] = v
           end

           return a
       end
mytest (generic function with 1 method)

julia> @btime mytest(20000);
  99.718 ms (35891 allocations: 763.13 MiB)

But this is only about as fast as the second variant above.

That is, for the specific case of cumsum. Other inner functions are slower, of course, but can be equally threaded, and optimized in the same ways, with possibly different results.

(This is on Julia 1.2, 12 GiB RAM, and an older i7.)

0
votes

Perhaps R is doing some type of buffering for such simple functions?

Here is the Julia version with buffering:

using Memoize
@memoize function cumsum_ones(i)
  cumsum(ones(i))
end
function MyTest2(n)
  a = Vector{Vector{Float64}}(undef, n)
    for i in 1:n
      a[i] = cumsum_ones(i)
    end
  a
end

In a warmed-up function, the timings look the following:

julia> @btime MyTest2(5000);
  442.500 μs (10002 allocations: 195.39 KiB)

julia> @btime MyTest2(10000);
  939.499 μs (20002 allocations: 390.70 KiB)

julia> @btime MyTest2(20000);
  3.554 ms (40002 allocations: 781.33 KiB)