Optimize Julia Code by Example

Question

I am currently writing a numerical solver in Julia. I don't think the math behind it matters too much. It all boils down to the fact, that a specific operation is executed several times and uses a large percentage (~80%) of running time.

I tried to reduce it as much as possible and present you this piece of code, which can be saved as dummy.jl and then executed via include("dummy.jl") followed by dummy(10) (for compilation) and then dummy(1000).

function dummy(N::Int64)
    A = rand(N,N)
    @time timethis(A)
end

function timethis(A::Array{Float64,2})
    dummyvariable = 0.0
    for k=1:100 # just repeat a few times
        for i=2:size(A)[1]-1
            for j=2:size(A)[2]-1
                    dummyvariable += slopefit(A[i-1,j],A[i,j],A[i+1,j],2.0)
                    dummyvariable += slopefit(A[i,j-1],A[i,j],A[i,j+1],2.0)
            end
        end
    end
    println(dummyvariable) 
end

@inline function minmod(x::Float64, y::Float64)
    return sign(x) * max(0.0, min(abs(x),y*sign(x) ) );
end

@inline function slopefit(left::Float64,center::Float64,right::Float64,theta::Float64)
    # arg=ccall((:minmod,"libminmod"),Float64,(Float64,Float64),0.5*(right-left),theta*(center-left));
    # result=ccall((:minmod,"libminmod"),Float64,(Float64,Float64),theta*(right-center),arg);
    # return result

    tmp = minmod(0.5*(right-left),theta*(center-left));
    return minmod(theta*(right-center),tmp);
    #return 1.0
end

Here, timethis shall imitate the part of the code where I spend a lot of time. I notice, that slopefitis extremely expensive to execute.

For example, dummy(1000) takes roughly 4 seconds on my machine. If instead, slopefit would just always return 1 and not compute anything, the time goes down to one tenth of the overall time.

Now, obviously there is no free lunch.

I am aware, that this is simply a costly operation. But I would still try to optimize it as much as possible, given that a lot of time is spend in something that looks like one could optimize it easily as it is just a few lines of code.

So far, I tried to implement minmod and slopefit as C-functions and call them, however that just increased computing time (maybe I did it wrong).

So my question is, what possibilities do I have to optimize the call of slopefit?

Note, that in the actual code, the arguments of slopefit are not the ones mentioned here, but depend on conditional statements which makes everything hard to vectorize (if that would bring any performance gain I am not sure).

Adding @inbounds to the two inner for loops in timethis() reduces the run time a little bit for me. Something curious, though, is that @code_native minmod(1.0,1.0) produces much more code than I would naively expect (but I'm far from an expert). Maybe there's a missed optimization somewhere. — MBaz

Bogumił Kamiński Bogumił Kamiński · Accepted Answer · 2018-05-10T21:53:24

There are two levels of optimization I can think of.

First: the following implementation of minmod will be faster as it avoids branching (I understand this is the functionality you want):

@inline minmod(x::Float64, y::Float64) = ifelse(x<0, clamp(y, x, 0.0), clamp(y, 0.0, x))

Second: you can use @inbounds to speed up loop a bit:

 @inbounds for i=2:size(A)[1]-1

Optimize Julia Code by Example

1 Answers