I ran a comparison of several ways to access data in a DataFrame
. See results below. The quickest access was from using the get_value
method on a DataFrame
. I was referred to this on this post.
What I was surprised by is that the access via get_value
is quicker than accessing via the underlying numpy object df.values
.
Question
My question is, is there a way to access elements of a numpy array as quickly as I can access a pandas dataframe via get_value
?
Setup
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(16).reshape(4, 4))
Testing
%%timeit
df.iloc[2, 2]
10000 loops, best of 3: 108 µs per loop
%%timeit
df.values[2, 2]
The slowest run took 5.42 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 8.02 µs per loop
%%timeit
df.iat[2, 2]
The slowest run took 4.96 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 9.85 µs per loop
%%timeit
df.get_value(2, 2)
The slowest run took 19.29 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 3.57 µs per loop
x = df.values; %timeit x[2,2]
gives similar results - perhapsvalues
is not an attribute but aproperty
? – Eric