I am currently writing a numerical solver in Julia. I don't think the math behind it matters too much. It all boils down to the fact, that a specific operation is executed several times and uses a large percentage (~80%) of running time.
I tried to reduce it as much as possible and present you this piece of code, which can be saved as dummy.jl and then executed via include("dummy.jl") followed by dummy(10) (for compilation) and then dummy(1000).
function dummy(N::Int64)
A = rand(N,N)
@time timethis(A)
end
function timethis(A::Array{Float64,2})
dummyvariable = 0.0
for k=1:100 # just repeat a few times
for i=2:size(A)[1]-1
for j=2:size(A)[2]-1
dummyvariable += slopefit(A[i-1,j],A[i,j],A[i+1,j],2.0)
dummyvariable += slopefit(A[i,j-1],A[i,j],A[i,j+1],2.0)
end
end
end
println(dummyvariable)
end
@inline function minmod(x::Float64, y::Float64)
return sign(x) * max(0.0, min(abs(x),y*sign(x) ) );
end
@inline function slopefit(left::Float64,center::Float64,right::Float64,theta::Float64)
# arg=ccall((:minmod,"libminmod"),Float64,(Float64,Float64),0.5*(right-left),theta*(center-left));
# result=ccall((:minmod,"libminmod"),Float64,(Float64,Float64),theta*(right-center),arg);
# return result
tmp = minmod(0.5*(right-left),theta*(center-left));
return minmod(theta*(right-center),tmp);
#return 1.0
end
Here, timethis shall imitate the part of the code where I spend a lot of time. I notice, that slopefitis extremely expensive to execute.
For example, dummy(1000) takes roughly 4 seconds on my machine. If instead, slopefit would just always return 1 and not compute anything, the time goes down to one tenth of the overall time.
Now, obviously there is no free lunch.
I am aware, that this is simply a costly operation. But I would still try to optimize it as much as possible, given that a lot of time is spend in something that looks like one could optimize it easily as it is just a few lines of code.
So far, I tried to implement minmod and slopefit as C-functions and call them, however that just increased computing time (maybe I did it wrong).
So my question is, what possibilities do I have to optimize the call of slopefit?
Note, that in the actual code, the arguments of slopefit are not the ones mentioned here, but depend on conditional statements which makes everything hard to vectorize (if that would bring any performance gain I am not sure).
@inboundsto the two innerforloops intimethis()reduces the run time a little bit for me. Something curious, though, is that@code_native minmod(1.0,1.0)produces much more code than I would naively expect (but I'm far from an expert). Maybe there's a missed optimization somewhere. - MBaz