python - Performance of zeros function in Numpy

Question

I just noticed that the zeros function of numpy has a strange behavior :

%timeit np.zeros((1000, 1000))
1.06 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.zeros((5000, 5000))
4 µs ± 66 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

On the other hand, ones seems to have a normal behavior. Is anybody know why initializing a small numpy array with the zeros function takes more time than for a large array ?

(Python 3.5, numpy 1.11)

So the second matrix is 25 times larger, but only takes 4 times longer to create? That is surprising. — President James K. Polk
@JamesKPolk read it again, the second, larger array takes 4 microseconds, the first, smaller array takes 1 millisecond! I'm getting similar, though less extreme results. — juanpa.arrivillaga
I think this is probably calloc hitting a threshold where it requests zeroed memory from the OS and doesn't need to actually initialize it. — user2357112 supports Monica
When the size S of a 1D array changes from 4,150,000 to 4,200,000, the time to zero it with np.zeros(S) changes from 5.5 ms per loop to 9.6 µs per loop. However, the number of loops in %timeit simultaneously changes from 100 to 100,000. My guess is that for an array of certain size and above, the difference between the slowest and fastest runs becomes large enough to trigger 1000 times more loops, which drastically improves the measurement accuracy and reduces the reported running time. Not because it is shorter, but because it is measured more accurately. — DYZ
@DYZ I'm using the timeit.timeit function, controlling the number at 1000, and I'm getting 0.343710215005558 for (1000,1000) and 0.0028691469924524426 for (5000,5000) — juanpa.arrivillaga

user2357112 supports Monica user2357112 supports Monica · Accepted Answer · 2017-06-11T19:38:01

This looks like calloc hitting a threshold where it makes an OS request for zeroed memory and doesn't need to initialize it manually. Looking through the source code, numpy.zeros eventually delegates to calloc to acquire a zeroed memory block, and if you compare to numpy.empty, which doesn't perform initialization:

In [15]: %timeit np.zeros((5000, 5000))
The slowest run took 12.65 times longer than the fastest. This could mean that a
n intermediate result is being cached.
100000 loops, best of 3: 10 µs per loop

In [16]: %timeit np.empty((5000, 5000))
The slowest run took 5.05 times longer than the fastest. This could mean that an
 intermediate result is being cached.
100000 loops, best of 3: 10.3 µs per loop

you can see that np.zeros has no initialization overhead for the 5000x5000 array.

In fact, the OS isn't even "really" allocating that memory until you try to access it. A request for terabytes of array succeeds on a machine without terabytes to spare:

In [23]: x = np.zeros(2**40)  # No MemoryError!

python - Performance of zeros function in Numpy

1 Answers