Attached a minimal example:
from numba import jit
import numba as nb
import numpy as np
@jit(nb.float64[:, :](nb.int32[:, :]))
def go_fast(a):
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
@jit
def go_fast2(a):
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
Running in Jupyter:
x = np.arange(10000).reshape(100, 100)
%timeit go_fast(x)
%timeit go_fast2(x)
leads to
5.65 µs ± 27.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.8 µs ± 46.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Why the eager compilations leads to a slower execution?