47
votes

Following this post I decided to benchmark Julia against GNU Octave and the results were inconsistent with the speed-ups illustrated in julialang.org.

I compiled both Julia and GNU Octave with CXXFLAGS='-std=c++11 -O3', the results I got:

GNU Octave

a=0.9999;

tic;y=a.^(1:10000);toc
Elapsed time is 0.000159025 seconds.

tic;y=a.^(1:10000);toc
Elapsed time is 0.000162125 seconds.

tic;y=a.^(1:10000);toc
Elapsed time is 0.000159979 seconds.

--

tic;y=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000280142 seconds.

tic;y=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000280142 seconds.

tic;y=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000277996 seconds.

Julia

tic();y=a.^(1:10000);toc()
elapsed time: 0.003486508 seconds

tic();y=a.^(1:10000);toc()
elapsed time: 0.003909662 seconds

tic();y=a.^(1:10000);toc()
elapsed time: 0.003465313 seconds

--

tic();y=cumprod(ones(1,10000)*a);toc()
elapsed time: 0.001692931 seconds

tic();y=cumprod(ones(1,10000)*a);toc()
elapsed time: 0.001690245 seconds

tic();y=cumprod(ones(1,10000)*a);toc()
elapsed time: 0.001689241 seconds

Could someone explain why Julia is slower than GNU Octave with these basic operations? After warmed, it should call LAPACK/BLAS without overhead, right?

EDIT:

As explained in the comments and answers, the code above is not a good benchmark nor it illustrates the benefits of using the language in a real application. I used to think of Julia as a faster "Octave/MATLAB", but it is much more than that. It is a huge step towards productive, high-performance, scientific computing. By using Julia, I was able to 1) outperform software in my research field written in Fortran and C++, and 2) provide users with a much nicer API.

2
I'm fairly certain that neither of these operations – .^ or cumprod – are part of BLAS or LAPACK. These operations are just implemented in C as part of Octave's source and in Julia as part of Julia's base distribution.StefanKarpinski
@StefanKarpinski, I mean ones(1,10000)*a and the internals probably using Horner's rule or something. But you're right, I shouldn't have mentioned LAPACK/BLAS for this particular snippet of code, my fingers always type them unconsciously. :)juliohm
under 1 second is too short to be meaningful.Jack Tang

2 Answers

65
votes

Vectorized operations like .^ are exactly the kind of thing that Octave is good at because they're actually entirely implemented in specialized C code. Somewhere in the code that is compiled when Octave is built, there is a C function that computes .^ for a double and an array of doubles – that's what you're really timing here, and it's fast because it's written in C. Julia's .^ operator, on the other hand, is written in Julia:

julia> a = 0.9999;

julia> @which a.^(1:10000)
.^(x::Number,r::Ranges{T}) at range.jl:327

That definition consists of this:

.^(x::Number, r::Ranges) = [ x^y for y=r ]

It uses a one-dimensional array comprehension to raise x to each value y in the range r, returning the result as a vector.

Edward Garson is quite right that one shouldn't use globals for optimal performance in Julia. The reason is that the compiler can't reason very well about the types of globals because they can change at any point where execution leaves the current scope. Leaving the current scope doesn't sound like it happens that often, but in Julia, even basic things like indexing into an array or adding two integers are actually method calls and thus leave the current scope. In the code in this question, however, all the time is spent inside the .^ function, so the fact that a is a global doesn't actually matter:

julia> @elapsed a.^(1:10000)
0.000809698

julia> let a = 0.9999;
         @elapsed a.^(1:10000)
       end
0.000804208

Ultimately, if all you're ever doing is calling vectorized operations on floating point arrays, Octave is just fine. However, this is often not actually where most of the time is spent even in high-level dynamic languages. If you ever find yourself wanting to iterate over an array with a for loop, operating on each element with scalar arithmetic, you'll find that Octave is quite slow at that sort of thing – often thousands of times slower than C or Julia code doing the same thing. Writing for loops in Julia, on the other hand, is a perfectly reasonable thing to do – in fact, all our sorting code is written in Julia and is comparable to C in performance. There are also many other reasons to use Julia that don't have to do with performance. As a Matlab clone, Octave inherits many of Matlab's design problems, and doesn't fare very well as a general purpose programming language. You wouldn't, for example, want to write a web service in Octave or Matlab, but it's quite easy to do so in Julia.

16
votes

You're using global variables which is a performance gotcha in Julia.

The issue is that globals can potentially change type whenever your code calls anther function. As a result, the compiler has to generate extremely slow code that cannot make any assumptions about the types of global variables that are used.

Simple modifications of your code in line with https://docs.julialang.org/en/stable/manual/performance-tips/ should yield more satisfactory results.