why Julia code performance is much lower than Fortran one?

Question

I read in several places that the performance of Julia code can (under certain conditions) be compared to the one of Fortran. I wrote the following code in Julia:

Pi = 3.141592653589793238462643
n = 100000 
function integration_2d(n,Pi,sum)
       h = Pi/n
       for i=1:n
           x = h*(i-0.5)
           for j=1:n
               y = h*(j-0.5)
               sum = sum + cos(x + y)
           end
       end
       sum*h*h
end

and the mean execution time was 180 sec. A Fortran code which has a very close structure than the one in Fortran compiled with -O3 option had an execution time of 0.013 sec. I wonder where the Julia code is losing performance, any comment is appreciated. Thanks.

For such a claim, it would be good to know 1) the Fortran version for comparison, and 2) the code how you benchmarked this. — phipsgabler
You are computing cos a total number of 100000^2 = 10^10 times. You claim that in Fortran this takes 0.013 seconds. This means that each cosine evaluation takes 1.3*10^(-12) seconds. A CPU can do very approximately one operation per nanosecond 10^(-9) seconds. So clearly, the Fortran code is not doing the work you think it is doing at runtime. This is a constant danger with benchmarking. you have to make sure that you are measuring what you think you are measuring. — Kristoffer Carlsson
BTW, there is no need to manually define Pi, since pi is already a built-in constant in Julia. — DNF
Is it possible that the Fortran compiler, using the O3 optimisation, rearranges the code, written naively as O(N^2), to something like the O(N) code posted by @Vitality? And, if so, could Julia implement the same kind of optimisation ? — Antonello
Hi, it was a mistake that I made in the Fortran code. I collected the partial sums but I didn't print out the result. Because of that the compiler ignored the entire calculation. — armando

Vitaliy Yakovchuk Vitaliy Yakovchuk · Accepted Answer · 2021-05-18T12:46:08

Since you did not provide the Fortran code, I assume your code implemented differently with Fortran. Your O(N^2) algorithm requires CPU with > ~10^12 operations per seconds (even if to use Assembler), and I guess you did not use supercomputer for this test :). We could implement your algorithm in a way that requires O(N) performance. Julia code would look like that:

function integration_2d(n, sum=0.0)
    h = π / n
    multiplier = 1
    for i = 2:2n
        z = h * (i - 0.5)
        sum = sum + multiplier * cos(z)
        if i <= n
            multiplier += 1
        else
            multiplier -= 1
        end
    end
    sum * h * h
end

julia> @time integration_2d(100000)
  0.002846 seconds

Wich is 0.002846 seconds (> 4x times faster the the Fortran time you mantioned) on my laptop (since you did not provide your Fortran code I cannot properly compare performance on the same machine)

why Julia code performance is much lower than Fortran one?

1 Answers