Julia symbolic and numeric performance vs Python

Question

I just translated a set of scientific calculations involving matrices which elements are symbolic expressions which are differentiated and combined with various other mathematical expressions then numerically integrated. The pieces of code below constitute a minimal example for the sake of reproducing the performance gap I am experiencing. I understand that differentiating symbolically then integrating numerically does not make sense, but again, the point is about performance gap. It's important to note that importing libraries do not represent much time and do not explain the performance gap.

Julia code:

using Symbolics, QuadGK

@variables x

m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500]

m_d = expand_derivatives.(Differential(x).(m))
m_d_expr = build_function(m, x)
m_d_f = eval(m_d_expr[1])
v = quadgk(m_d_f, 0, 1)
print(v[1])

Python Code:

import pandas as pd
import numpy as np
from sympy import sin, diff, pi, lambdify, integrate, cos, exp
from sympy.abc import x
from sympy.matrices import Matrix
from scipy.integrate import quad

def integrate_matrix(m, v, a, b):
    mi = np.zeros((m.rows, m.cols))
    for i in range(m.rows):
        for j in range(m.cols):
            f = lambdify(v, m[i, j])
            integral_value = quad(f, a, b)[0]
            mi[i, j] = integral_value
            
    return mi


m = Matrix([i * 10*x**3 + 1/i * sin(x) + 5*i*x**3 * cos(x) - 8*i*x**2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in range(1, 501)])

v = integrate_matrix(m, x, 0, 1)
print(v)

My question: Is there a way to improve the Julia code to match Python code performance. Each time I try to impress my piers about Julia performance, I get embarrassed. I am still a Julia noob, but I really do not see what to do.

Approx timing: Python : 6 seconds Julia : 30+ seconds

Julia version 1.6 Python 3.7

Note: I am posting this due to the big gap. And no, the CAS does not explain it all. Moreover, we are doing a symbolic diff not integration, not to mention sympy is known to be slow. I could add code to precisely time, then what? The original scientific code I faced the problem with was 6 seconds Python vs 75 seconds Julia. What a shame.

So, this will probably come down to the CAS library you are using. I doubt what you are actually doing in Python is making a huge difference — juanpa.arrivillaga
It would be best to add information about the timings and other performance measures you use to say there is a performance gap. As well as info about how did you benchmarked and what versions of the languages and packages are you using. — aramirezreyes
How are you measuring the time? I just benchmarked quadgk(m_d_f, 0, 1), which takes 264ms (using @benchmark) in my computer, vs. integrate_matrix(m, x, 0, 1) 1.36 s ± 15.6 ms (using %timeit in iPython). It seems that the time you're observing considers compilation time. — user2317421
I am not sure I understand your comment. I am seeing that the Julia code runs faster after disregarding the compilation time (so, you're paying upfront but getting code that runs faster). — user2317421
It's not about cherry-picking, more about your use case and understanding the observed differences. I am not defending either, and it totally depends on your use case: If you need to compile once and run many times, Julia will be faster in the end. If you need to run once, then Python might be a better option. That being said, you may want to explore PackageCompiler.jl or ask in Julia's discourse if you're still not satisfied. — user2317421

ForceBru ForceBru · Accepted Answer · 2021-04-13T21:05:21

Running the entire thing faster is what any sane person cares about.

As far as I understand, Julia cares about running stuff multiple times faster, while running it exactly once is always slower because Julia code needs to be compiled before being executed. Unlike Julia, Python doesn't do any JIT compilation and is always ready to run at the same speed.

Julia 1.6

So, I pasted your Julia code into code.jl and ran it multiple times within the same session:

# New Julia session!
julia> @time include("code.jl")
[long array...]
 24.660636 seconds (42.99 M allocations: 2.607 GiB, 3.82% gc time, 0.02% compilation time)

julia> @time include("code.jl")
[long array...]
  2.761062 seconds (5.61 M allocations: 240.159 MiB, 10.39% gc time, 57.06% compilation time)

julia> @time include("code.jl")
[long array...]
  2.608917 seconds (5.61 M allocations: 240.164 MiB, 4.47% gc time, 61.75% compilation time)

# Restarted Julia
julia> @time include("code.jl")
 25.538249 seconds (42.99 M allocations: 2.607 GiB, 3.76% gc time, 0.02% compilation time)

julia> @time include("code.jl")
  2.740550 seconds (5.61 M allocations: 240.159 MiB, 9.94% gc time, 56.72% compilation time)

So, it takes about 25 seconds to run your code the first time and around 3 seconds (!) to run it again, even though 50% of these 3 seconds is spent compiling stuff. However, only 0.02% of the initial 25 seconds is spent compiling. Apparently, the slowdown isn't due to compilation time? Also notice how many memory allocations it performs on the first run: 43 million vs around 5.5 million (7 times less!) for the next runs. But anyway, the first run is really slow while subsequent runs are lightning fast.

Loading packages the first time is slow too:

julia> @time using Symbolics
  3.503349 seconds (6.42 M allocations: 460.519 MiB, 3.53% gc time, 0.13% compilation time)

julia> @time using Symbolics
  0.000241 seconds (136 allocations: 9.641 KiB)
  0.000280 seconds (136 allocations: 9.641 KiB)
  0.000249 seconds (136 allocations: 9.641 KiB)
  0.000251 seconds (136 allocations: 9.641 KiB)
  0.000252 seconds (136 allocations: 9.641 KiB)
  0.000246 seconds (136 allocations: 9.641 KiB)

# I didn't import it before,
# but apparently `Symbolics` did
julia> @time using QuadGK
  0.000276 seconds (137 allocations: 9.688 KiB)
  0.000276 seconds (136 allocations: 9.641 KiB)
  0.000240 seconds (136 allocations: 9.641 KiB)
  0.000251 seconds (136 allocations: 9.641 KiB)

That is, 3.5 seconds are spent just running the first line of your code with the imports. Subsequent imports are obviously faster because of caching, I presume.

The first run of the list comprehension is slow as well

julia> @time m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500];
  2.590259 seconds (4.69 M allocations: 284.672 MiB, 10.86% gc time, 98.69% compilation time)

julia> @time m = [i * 10*x^3 + 1/i * sin(x) + 5*i*x^3 * cos(x) - 8i*x^2 + 2/sin(i*3.0)*x + exp(1/(x+10)) for i in 1:500];
  0.102573 seconds (231.21 k allocations: 12.507 MiB, 72.61% compilation time)
  0.098871 seconds (231.21 k allocations: 12.508 MiB, 72.39% compilation time)
  0.108458 seconds (231.21 k allocations: 12.512 MiB, 7.93% gc time, 67.73% compilation time)
  0.099787 seconds (231.22 k allocations: 12.508 MiB, 72.99% compilation time)
  0.098378 seconds (231.21 k allocations: 12.507 MiB, 73.80% compilation time)

Again, slow startup (98.69% compilation time), but the next runs are way faster.

Python 3.9.2

~/t/SO_q $ time python3 thecode.py
________________________________________________________
Executed in    5,88 secs
~/t/SO_q $ time python3 thecode.py
________________________________________________________
Executed in    5,90 secs
Executed in    5,36 secs
Executed in    5,39 secs
Executed in    5,35 secs
Executed in    5,36 secs
Executed in    5,77 secs
Executed in    6,10 secs
Executed in    5,38 secs

Thus, Python code consistently runs for about 6 seconds.

Which is 2 times slower than subsequent runs of Julia code! However, you get this kind of speed as soon as you fire up the Python interpreter, while Julia will spend time compiling code and doing... other stuff that requires 43 million memory allocations. But what Julia gives you in exchange for terrible startup times is the performance of compiled code (Julia was 2 times faster than Python in this example).

How to make Julia faster

Build a custom sysimage. This looks like overkill to me, unless you really need to restart Julia every time to run your code.
Simply run your code from the same REPL. The simplest variant of this is to include("your_code.jl") after modifying the code. This may lead to weird errors because the environment will be populated by data from previous runs.
Run code in Pluto, which is a notebook that also keeps a live Julia session, but is smart about managing the environment