In a C++ course, I was taught that tricks like avoid repeating computation, use more additions instead of more multiplications, avoiding powers and so on to improve performance. When I tried them to optimize code in Julia-Lang, however, I was surprised with the opposite result.
For example, here are a few equations without maths optimization (All codes written in Julia 1.1, not JuliaPro):
function OriginalFunction( a,b,c,d,E )
# Oprations' count:
# sqrt: 4
# ^: 14
# * : 14
# / : 10
# +: 20
# -: 6
# = : 0+4
x1 = (1/(1+c^2))*(-c*d+a+c*b-sqrt(E))
y1 = d-(c^2*d)/(1+c^2)+(c*a)/(1+c^2)+(c^2*b)/(1+c^2)-(c*sqrt(E))/(1+c^2)
x2 = (1/(1+c^2))*(-c*d+a+c*b+sqrt(E))
y2 = d-(c^2*d)/(1+c^2)+(c*a)/(1+c^2)+(c^2*b)/(1+c^2)+(c*sqrt(E))/(1+c^2)
return [ [x1;y1] [x2;y2] ]
end
I optimized them with a few tricks, including:
(a*b + a*c) -> a*(b+c)because addition is faster than multiplication.a^2 -> a*ato avoid power operation.- If there is a long operation used at least twice, assign it to a variable to avoid repeated computation. For example:
x = a * (1+c^2); y = b * (1+c^2)
->
temp = 1+c^2
x = a * temp; y = b * temp
- Convert Int to Float64, so that computer doesn't have to do it (in run time or compile time). For example:
1/x -> 1.0/x
The result gives equivalent equations with far fewer operations:
function SimplifiedFunction( a,b,c,d,E )
# Oprations' count:
# sqrt: 1
# ^: 0
# *: 9
# /: 1
# +: 4
# -: 6
# = : 5+4
temp1 = sqrt(E)
temp2 = c*(b - d) + a
temp3 = 1.0/(1.0+c*c)
temp4 = d - (c*(c*(d - b) - a))*temp3
temp5 = (c*temp1)*temp3
x1 = temp3*(temp2-temp1)
y1 = temp4-temp5
x2 = temp3*(temp2+temp1)
y2 = temp4+temp5
return [ [x1;y1] [x2;y2] ]
end
Then I tested them with the following function, expecting the version with far less operations to fun faster or the same:
function Test2Functions( NumberOfTests::Real )
local num = Int(NumberOfTests)
# -- Generate random numbers
local rands = Array{Float64,2}(undef, 5,num)
for i in 1:num
rands[:,i:i] = [rand(); rand(); rand(); rand(); rand()]
end
local res1 = Array{Array{Float64,2}}(undef, num)
local res2 = Array{Array{Float64,2}}(undef, num)
# - Test OriginalFunction
@time for i in 1:num
a,b,c,d,E = rands[:,i]
res1[i] = OriginalFunction( a,b,c,d,E )
end
# - Test SimplifiedFunction
@time for i in 1:num
a,b,c,d,E = rands[:,i]
res2[i] = SimplifiedFunction( a,b,c,d,E )
end
return res1, res2
end
Test2Functions( 1e6 )
However, it turned out that the 2 functions use the same amount of memory allocation, but the simplified one has more garbage collection time, and runs about 5% slower:
julia> Test2Functions( 1e6 )
1.778731 seconds (7.00 M allocations: 503.540 MiB, 47.35% gc time)
1.787668 seconds (7.00 M allocations: 503.540 MiB, 50.92% gc time)
julia> Test2Functions( 1e6 )
1.969535 seconds (7.00 M allocations: 503.540 MiB, 52.05% gc time)
2.221151 seconds (7.00 M allocations: 503.540 MiB, 56.68% gc time)
julia> Test2Functions( 1e6 )
1.946441 seconds (7.00 M allocations: 503.540 MiB, 55.23% gc time)
2.099875 seconds (7.00 M allocations: 503.540 MiB, 59.33% gc time)
julia> Test2Functions( 1e6 )
1.836350 seconds (7.00 M allocations: 503.540 MiB, 53.37% gc time)
2.011242 seconds (7.00 M allocations: 503.540 MiB, 58.43% gc time)
julia> Test2Functions( 1e6 )
1.856081 seconds (7.00 M allocations: 503.540 MiB, 53.44% gc time)
2.002087 seconds (7.00 M allocations: 503.540 MiB, 58.21% gc time)
julia> Test2Functions( 1e6 )
1.833049 seconds (7.00 M allocations: 503.540 MiB, 53.55% gc time)
1.996548 seconds (7.00 M allocations: 503.540 MiB, 58.41% gc time)
julia> Test2Functions( 1e6 )
1.846894 seconds (7.00 M allocations: 503.540 MiB, 53.53% gc time)
2.053529 seconds (7.00 M allocations: 503.540 MiB, 58.30% gc time)
julia> Test2Functions( 1e6 )
1.896265 seconds (7.00 M allocations: 503.540 MiB, 54.11% gc time)
2.083253 seconds (7.00 M allocations: 503.540 MiB, 58.10% gc time)
julia> Test2Functions( 1e6 )
1.910244 seconds (7.00 M allocations: 503.540 MiB, 53.79% gc time)
2.085719 seconds (7.00 M allocations: 503.540 MiB, 58.36% gc time)
Could anyone kindly tell me why? 5% speed probably isn't something worth fighting for even in some performance critical codes, but I'm still curious: how can I help Julia compiler to produce faster code?
localdeclarations, since that does nothing here but add visual noise. And this assignment is a bit odd:rands[:,i:i] = [rand(); rand(); rand(); rand(); rand()]with the indexirepeated asi:i. Instead just write:rands[:, i] .= rand.(). Oh, and just to emphasize it: always use BenchmarkTools for benchmarking :) - DNFrands[:,i:i]looks odd. I developed this style, because of an odd bug I fixed:arr1[i, :] = arr2[i, :]leads to error, since Julia takes arr2[i, :] as a 1D Array, in stead of a 2D row vector. I fixed that witharr1[i, :] = arr2[i:i, :]. Since this confuses my fuzzy brain to no ends, I just be conservative and err on the side of being repetitive. - zycfor i in 1:length(randbuffer); randbuffer[i]=rand(); endrandbuffer .= rand.()For loop is much faster for a small array, whereas broadcasting is faster for a large array. Anyway, speed is irrelevant for small number. - zyc