3
votes

In Python, it is usually suggested to vectorize the code to make computation faster. For example, if you want to compute the inner product of two vectors, say a and b, usually

Code A

c = np.dot(a, b)

is better than

Code B

c = 0
for i in range(len(a)):
    c += a[i] * b[i]

But in Julia, it seems sometimes vectorization is not that helpful. I reckoned '* and dot as vectorized versions and an explicit for loop as a non-vectorized version and got the following result.

using Random
using LinearAlgebra

len = 1000000
rng1 = MersenneTwister(1234)
a = rand(rng1, len)
rng2 = MersenneTwister(12345)
b = rand(rng2, len)


function inner_product(a, b)
    result = 0
    for i in 1: length(a) 
        result += a[i] * b[i] 
    end
    return result
end


@time a' * b
@time dot(a, b)
@time inner_product(a, b);
  0.013442 seconds (56.76 k allocations: 3.167 MiB)
  0.003484 seconds (106 allocations: 6.688 KiB)
  0.008489 seconds (17.52 k allocations: 976.752 KiB)

(I know using BenchmarkTools.jl is a more standard way to measure the performance.)

From the output, dot runs faster than for than '*, which is a contradiction to what has been presumed.

So my question is,

  1. does Julia need (or sometimes need) vectorization to speed up computation?
  2. If it does, then when to use vectorization and which is the better way to use (consider dot and '*)?
  3. If it does not, then what is the difference between Julia and Python in terms of the mechanism of vectorized and non-vectorized codes?
2

2 Answers

3
votes

Your are not making the benchmarks correctly and the implementation of your function is suboptimal.

julia> using BenchmarkTools                   
                                              
julia> @btime $a' * $b                        
  429.400 μs (0 allocations: 0 bytes)         
249985.3680190253                             
                                              
julia> @btime dot($a,$b)                      
  426.299 μs (0 allocations: 0 bytes)         
249985.3680190253                             
                                              
julia> @btime inner_product($a, $b)           
  970.500 μs (0 allocations: 0 bytes)         
249985.36801903677                            

The correct implementation:

function inner_product_correct(a, b)
    result = 0.0 #use the same type as elements in the args
    @simd for i in 1: length(a)
        @inbounds result += a[i] * b[i]
    end
    return result
end
julia> @btime inner_product_correct($a, $b)
  530.499 μs (0 allocations: 0 bytes)
249985.36801902478

There is still the difference (however less significant) because dot is using the optimized BLAS implementation which is multi-threaded. You could (following Bogumil's comment set OPENBLAS_NUM_THREADS=1 and then you will find that the times of BLAS function will be identical as the Julia implementation.

Note also that working with float numbers is tricky in many ways:

julia> inner_product_correct(a, b)==dot(a,b)
false

julia> inner_product_correct(a, b) ≈ dot(a,b)
true 

Finally, in Julia deciding whether to use vectorization or write the loop yourself is up two you - there is no performance penalty (as long as you write type stable code and use @simd and @inbounds where required). However in your codes you were not testing vectorization but you were comparing calling BLAS to writing the loop yourself. Here is the must-read to understand what is going on https://docs.julialang.org/en/v1/manual/performance-tips/

3
votes

Let me add my practical experience as a comment (too long for a standard comment):

does Julia need (or sometimes need) vectorization to speed up computation?

Julia does not need vectorization as Python (see the answer by Przemysław), but in practice if you have a well written vectorized function (like dot) then use it as, while possible, it might be sometimes tricky to write as performant function yourself (people have probably spent days on optimizing dot, especially to optimally use multiple threads).

If it does, then when to use vectorization and which is the better way to use (consider dot and '*)?

When you use vectorized code then it all depends on implementation of the function you want to use. In this case dot(a, b) and a' * b are exactly the same as when you write @edit a' * b gives you in this case:

*(u::AdjointAbsVec{<:Number}, v::AbstractVector{<:Number}) = dot(u.parent, v)

and you see it is the same.

If it does not, then what is the difference between Julia and Python in terms of the mechanism of vectorized and non-vectorized codes?

Julia is a compiled language, while Python is an interpreted language. In some cases Python interpreter can provide fast execution speed, but in other cases it currently is not able to do it (but it does not mean that in the future it will not improve). In particular vectorized functions (like dot in your question) are most likely written in some compiled language, so Julia and Python will not differ much in typical cases as they just call this compiled function. However, when you use loops (non-vectorized code) then currently Python will be slower than Julia.