0
votes

An interesting observation I felt I should clarify.

I expect that pandas slice operation should be faster than zipping columns of a dataframe, but on running %timeit on both operations, the zip operation is faster...

import pandas as pd, numpy as np
s = pd.DataFrame({'Column1':range(50), 'Column2':np.random.randn(50), 'Column3':np.random.randn(50)})

And on running

%timeit s[['Column1','Column3']].loc[30].values

1.06 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit dict(zip(s['Column1'],s['Column3']))[30]

53.7 µs ± 6.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This tells that pandas is significantly slower than using the zip function, right? And probably only better for its ease of use, I believe.

Would an apply-map operation be faster?

1

1 Answers

0
votes

Zip is optimized to run in the processor cache. Its very fast, as is itertools in general.