python - Pandas groupby cumulative sum

Question

I would like to add a cumulative sum column to my Pandas dataframe so that:

name | day       | no
-----|-----------|----
Jack | Monday    | 10
Jack | Tuesday   | 20
Jack | Tuesday   | 10
Jack | Wednesday | 50
Jill | Monday    | 40
Jill | Wednesday | 110

becomes:

Jack | Monday     | 10  | 10
Jack | Tuesday    | 30  | 40
Jack | Wednesday  | 50  | 90
Jill | Monday     | 40  | 40
Jill | Wednesday  | 110 | 150

I tried various combos of df.groupby and df.agg(lambda x: cumsum(x)) to no avail.

Are you really sure that you want aggregation over week days? That loses the index, and also the cumulative sum makes less sense if there are multiple weeks. The answers by dmitry-andreev and @vjayky calculates cumsum over the sequence of days for each name instead. Think of how this could be extended if there were a date column too, that the entries could be sorted by before grouping and aggregating. — Elias Hasle

CT Zhu CT Zhu · Accepted Answer · 2014-03-26T03:56:25

This should do it, need groupby() twice:

df.groupby(['name', 'day']).sum() \
  .groupby(level=0).cumsum().reset_index()

Explanation:

print(df)
   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

# sum per name/day
print( df.groupby(['name', 'day']).sum() )
                 no
name day           
Jack Monday      10
     Tuesday     30
     Wednesday   50
Jill Monday      40
      Wednesday  110

# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum() \
         .groupby(level=0).cumsum() )
                 no
name day           
Jack Monday      10
     Tuesday     40
     Wednesday   90
Jill Monday      40
     Wednesday  150

The dataframe resulting from the first sum is indexed by 'name' and by 'day'. You can see it by printing

df.groupby(['name', 'day']).sum().index

When computing the cumulative sum, you want to do so by 'name', corresponding to the first index (level 0).

Finally, use reset_index to have the names repeated.

df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()

   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   40
2  Jack  Wednesday   90
3  Jill     Monday   40
4  Jill  Wednesday  150

python - Pandas groupby cumulative sum

6 Answers