1
votes

Consider a dataframe:

df = pd.DataFrame(
    {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
    index=['foo', 'bar', 'star']
)
      last_year  next_year
foo           1          4
bar           2          5
star          3          6

I am looking for an easy way to display totals around this table, both column and row wise.

My thought was to throw it through .pivot_table():

pd.pivot_table(
    df,
    index=df.index,
    margins=True,
    aggfunc=sum
)

However, this only works for the first axis (vertically):

      last_year  next_year
bar           2          5
foo           1          4
star          3          6
All           6         15

What did I miss? How come no row wise totals are calculated as well, just like in this documentation's example? Also, why is it messing up my index' order?

I'm not interested in df['All'] = df.sum(axis=1) kind of solutions; I want a dynamic approach which does not affect my original dataframe. A pivot table seems like the most logical way to do this (as far as I know) but maybe there are better ways!

5

5 Answers

3
votes

My guess is that each column is its own group in pivot_table so you don't see the horizontal aggregation (it's rather pointless to aggregate one single element). To demonstrate the difference, consider stack():

(df.stack().reset_index(name='value')
   .pivot_table(index='level_0', columns='level_1', values='value', margins=True,
               aggfunc='sum')
) 

Output:

level_1  last_year  next_year  All
level_0                           
bar              2          5    7
foo              1          4    5
star             3          6    9
All              6         15   21

That said, it might very well be a bug.

2
votes

When you don't specify values and columns parameters in pivot_table. It will use all columns for values. Since you specify only index parameter, the rest of columns which are columns last_year, next_year are used as values for aggregation. I.e., Pandas thinks you want to apply sum function on values of columns last_year, next_year

Since all columns of df are all used for values parameter, pivot_table doesn't pivot anything to column (axis=1). Therefore, there is no reason for it to do margins on axis=1.

Try on this sample

Out[132]:
      last_year  next_year
foo           1          4
bar           2          5
star          3          6
bar          33         66

pd.pivot_table(df, index=df.index, margins=True, aggfunc=sum)

Out[134]:
      last_year  next_year
bar          35         71
foo           1          4
star          3          6
All          39         81

So, pivot_table applies sum on both columns last_year, next_year on each group of index which sum bar to 35 and 71 and finally it computes margins on axis=0. There is no pivoting to axis=1 so it doesn't do margins on axis=1.

To see that pivot_table use all columns for values parameter, you may try this command to see the keyerror

pd.pivot_table(df, index=df.index, margins=True, aggfunc={'last_year': sum})

.....
    220                     grand_margin[k] = getattr(v, aggfunc)()
    221                 elif isinstance(aggfunc, dict):
--> 222                     if isinstance(aggfunc[k], compat.string_types):
    223                         grand_margin[k] = getattr(v, aggfunc[k])()
    224                     else:

KeyError: 'next_year'

When using dict for aggfunc, pivot_table passes each column in values parameter to the dict to get the accordingly aggfunc. As you see above, I don't specify column next_year in the dict. Therefore, when pivot_table looking for its aggfunc in the dict, it returns keyerror.

0
votes

I don't know if this will help, but I made the columns and rows switched in order to make aggregated totals. For improved code, can you explain more in-depth that a dynamic approach? Thanks and I hope this helps!

df43 = pd.DataFrame(
{'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
index=['foo', 'bar', 'star'])
df43 = df43.T #.T is transpose
df43['total'] = df43.sum(axis=1)
df43
0
votes
df = pd.DataFrame(
    {'last_year': [1, 2, 3], 'next_year': [4, 5, 6]}, 
    index=['foo', 'bar', 'star']
)

df.append(df.sum().rename('Total')).assign(Total=lambda d: d.sum(1))

OUTPUT:

     last_year   next_year   Total
foo      1           4         5
bar      2           5         7
star     3           6         9
Total    6          15        21

This doesn't affect the original dataframe!

0
votes

To see row totals you need to specify at least one argument to columns=