Vectorization in numpy vs Python map()

Question

I'm comparing methods to do calculations against large arrays and wanted to compare the speed of broadcasting operators in numpy versus alternatives. I was surprised to see the speed of the python map() function though, and am wondering if someone could explain how this is so much faster than broadcasting.

Broadcasting

%%timeit farenheit = np.linspace( -10, 20, 1000 )
celcius = (farenheit - 32) * (5/9)

4.5 µs ± 99.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

List comprehension

%%timeit farenheit = np.linspace( -10, 20, 1000 )
[(temp - 32) * (5/9) for temp in farenheit]

886 µs ± 4.56 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Python 3 map()

%%timeit farenheit = np.linspace( -10, 20, 1000 )
celcius = map(lambda temp: (temp - 32) * (5/9), farenheit)

248 ns ± 41.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Why are including farenheit = np.linspace( -10, 20, 1000 ) part in the timings too? For a better benchmarking (to compare NumPy vs map, etc.), think it's better to pre-process that part. — Divakar
The cell magic %%timeit should exclude the array creation (whatever is on the first line is ignored). ipython.readthedocs.io/en/stable/interactive/magics.html — Christopher
%%timeit includes everything in that block. So, linspace one is included too. — Divakar
@Divakar, I routinely use the cell %%timeit to precalculate objects. For example %%timeit x = arr.copy() \n x *= 100 lets me time the *= without timing the copy. — hpaulj
@hpaulj My point was that the focus is to compare NumPy vs map, etc. and with this Q&A it seems it's the calculation of celcius through those different ways. Timing everything, doesn't let us do that. I won't mind seeing the timings of the pre-calculation part separately though. — Divakar

Ofer Sadan Ofer Sadan · Accepted Answer · 2019-09-14T17:06:02

map is so fast because it's not actually running the calculation. It doesn't return a new list/array with new values, it returns a map object (an iterator) that does the calculation only when the items are needed.

For a fair comparison, you should do list(celcius) at the end of your first part. Only then are the calculations executed. If your lambda (or another function) had a print somewhere in it, you would see that map() by itself isn't really executing those commands yet.

To read more on map: https://docs.python.org/3/library/functions.html#map

An example:

def double(x):
    print('hi')
    return x*2

a = [1,2,3]
b = map(double, a)

# notice nothing is printing, the calculation isn't happening as well

c = list(b) # this will print 'hi' 3 times as well as returning the doubled list

Vectorization in numpy vs Python map()

1 Answers