15
votes

Pandas Pivot Table Dictionary of Agg function

I am trying to calculate 3 aggregative functions during pivoting:

  1. Count
  2. Mean
  3. StDev

This is the code:

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc={'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
          .reset_index()
         )

Error I am getting is: KeyError: 'Mean'

How can I calculate those 3 functions?

3

3 Answers

21
votes

As written in approved answer by @Happy001, aggfunc cant take dict is false. we can actually pass the dict to aggfunc.

A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select. for example:

import pandas as pd
import numpy as np

df = pd.read_excel('sales-funnel.xlsx')  #loading xlsx file

table = pd.pivot_table(df, index=['Manager', 'Status'], columns=['Product'], values=['Quantity','Price'],
           aggfunc={'Quantity':len,'Price':[np.sum, np.mean]},fill_value=0)
table

In the above code, I am passing dictionary to the aggfunc and performing len operation on Quantity and mean, sum operations on Price.

Here is the output attaching:

enter image description here

The example is taken from pivot table explained.

8
votes

The aggfunc argument of pivot_table takes a function or list of functions but not dict

aggfunc : function, default numpy.mean, or list of functions If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)

So try

n_page = (pd.pivot_table(Main_DF, 
                         values='SPC_RAW_VALUE',  
                         index=['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], 
                         columns=['LOT_VIRTUAL_LINE'],
                         aggfunc=[len, np.mean, np.std])
          .reset_index()
         )

You may want to rename the hierarchical columns afterwards.

4
votes

Try using groupby

df = (Main_DF
      .groupby(['ALIAS', 'SPC_PRODUCT', 'LABLE', 'RAW_PARAMETER_NAME'], as_index=False)
      .LOT_VIRTUAL_LINE
      .agg({'N': 'count', 'Mean': np.mean, 'Sigma': np.std})
     )

Setting as_index=False just leaves these as columns in your dataframe so you don't have to reset the index afterwards.