1
votes

I have a DataFrame that must be grouped on three levels, and would then have the highest value returned. Each day there is a return for each unique value, and I would like to find the highest return and the details.

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

The return would show that:

Target   - Dish Soap - House       had a 5% ROI on 9/17
Best Buy - CDs       - Electronics had a 3% ROI on 9/3

was the highest.

Here's some example data:

+----------+-----------+-------------+---------+-----+
| Industry | Product   | Industry    | Date    | ROI |
+----------+-----------+-------------+---------+-----+
| Target   | Dish Soap | House       | 9/17/13 | 5%  |
| Target   | Dish Soap | House       | 9/16/13 | 2%  |
| BestBuy  | CDs       | Electronics | 9/1/13  | 1%  |
| BestBuy  | CDs       | Electroincs | 9/3/13  | 3%  |
| ...

Not sure if this would be a for loop, or using .ix.

1

1 Answers

6
votes

I think, if I understand you correctly, you could collect the index values in a Series using groupby and idxmax(), and then select those rows from df using loc:

idx =  data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]

another option is to use reindex:

data.reindex(idx)

On a (different) dataframe I happened to have handy, it appears reindex might be the faster option:

In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop