As a starting point in Julia I decided to implement a simple Strassen product:
@inbounds function strassen_product(A :: Array{Num, 2}, B :: Array{Num, 2}, k = 2) :: Array{Num, 2} where {Num <: Number}
A_height, A_width = size(A)
B_height, B_width = size(B)
@assert A_height == A_width == B_height == B_width "Matrices are noth both square or of equal size."
@assert isinteger(log2(A_height)) "Size of matrices is not a power of 2."
if A_height ≤ k
return A * B
end
middle = A_height ÷ 2
A₁₁, A₁₂ = A[1:middle, 1:middle], A[1:middle, middle+1:end]
A₂₁, A₂₂ = A[middle+1:end, 1:middle], A[middle+1:end, middle+1:end]
B₁₁, B₁₂ = B[1:middle, 1:middle], B[1:middle, middle+1:end]
B₂₁, B₂₂ = B[middle+1:end, 1:middle], B[middle+1:end, middle+1:end]
P₁ = strassen_product(A₁₁ + A₂₂, B₁₁ + B₂₂)
P₂ = strassen_product(A₂₁ + A₂₂, B₁₁ )
P₃ = strassen_product(A₁₁, B₁₂ - B₂₂)
P₄ = strassen_product(A₂₂, B₂₁ - B₁₁)
P₅ = strassen_product(A₁₁ + A₁₂, B₂₂ )
P₆ = strassen_product(A₂₁ - A₁₁, B₁₁ + B₁₂)
P₇ = strassen_product(A₁₂ - A₂₂, B₂₁ + B₂₂)
C₁₁ = P₁ + P₄ - P₅ + P₇
C₁₂ = P₃ + P₅
C₂₁ = P₂ + P₄
C₂₂ = P₁ + P₃ - P₂ + P₆
return [ C₁₁ C₁₂ ;
C₂₁ C₂₂ ]
end
All well and good. Actually I like the whole idea of unsafe-ish optimizations like @inbounds
that actually do affect performance by a large margin, not a few milliseconds at best.
Now, to optimize even further since I have no for loops would be to use views for those A₁₁
, etc. matrices so no copying takes place.
So I slapped @views
in front of the 4 lines containing indexing. Of course I got an error because a couple lines later the recursive calls want Array{...}
arguments not SubArray{...}
. So I changed the type of the arguments and the return type to AbstractArray{Num, 2}
. This time it worked since AbstractArray
is a base type for array types but... the performance took a plunge off a rock, literally 10 times slower and a ton more allocations.
My test case is this:
A = rand(1:10, 4, 4)
B = rand(1:10, 4, 4)
@time C = strassen_product(A, B)
When using @views
+ AbstractArray
:
0.457157 seconds (1.96 M allocations: 98.910 MiB, 5.56% gc time)
When using the no-view version:
0.049756 seconds (126.92 k allocations: 5.603 MiB)
The difference is huge, the version that is supposed to be faster is 9-10x slower, has about 15x more allocations ans almost 20x the space of the other version.
EDIT: This is not the first run for either case but the most "median" value for ~10 test runs. Not the first run and certainly not a min or max peak.
EDIT: I am using version 1.0.
My question is: why does this happen? What am I not getting? Is my reasoning that using views instead of copies would be faster... wrong?
@time
just once? The first time it JIT compiles, notice how subsequent calls are faster. I recommend using@btime
from the packageBenchmarkTools
for these micro benchmarks instead. – HarmonicaMuse@views
version) for both versions on Julia 1.0. – fredrikekre