15
votes

Given the update to pandas 0.20.0 and the deprecation of .ix, I am wondering what the most efficient way to get the same result using the remaining .loc and .iloc. I just answered this question, but the second option (not using .ix) seems inefficient and verbose.

Snippet:

print df.iloc[df.loc[df['cap'].astype(float) > 35].index, :-1]

Is this the proper way to go when using both conditional and index position filtering?

4

4 Answers

9
votes

You can stay in the world of a single loc by getting at the index values you need by slicing that particular index with positions.

df.loc[
    df['cap'].astype(float) > 35,
    df.columns[:-1]
]
9
votes

Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).

For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):

df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5 
# (i.e. Boolean mask along axis 0, position slice of axis 1)

# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop

# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop

# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop

# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop

# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop

Bottom line: If you really want to avoid .ix, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .iloc with np.where() or .loc with integer slices of df.index or df.columns.

3
votes

How about breaking this into a two-step indexing:

df[df['cap'].astype(float) > 35].iloc[:,:-1]

or even:

df[df['cap'].astype(float) > 35].drop('cap',1)
0
votes

Pandas remove .ix, and encourage you to use .iloc, .loc .

for this you can refer to the iloc, loc definition and how they are different from ix, This might help you.

How are iloc, ix and loc different?