Pivot / Groupby dataframe with duplicates in index column with non numeric data

Question

let's assume I have the following example dataframe:

df1 = pd.DataFrame({'col1': ['A', 'A', 'B', 'A'], 'col2': ['CA', 'DA', 'CA', 'CA'], 'col3': [1, 1, 1, 2]})


Out[25]:

  col1 col2 col3
0  A    CA   1 
1  A    DA   1 
2  B    CA   1 
3  A    CA   2

And I want to group this table by 'col3' as Index, and 'col1' as new columns (in this case columns are 'A' or 'B'), and the values should be 'col2'.

I tried it with pivot_table (also with pivot):

pd.pivot_table(df1,index='col3', columns="col1", values=['col2'])

in this case the error is: No numeric types to aggregatein case of the pivotcommand the error is: multiple indexes. Both cases seem to be reasonable for me. But is there any option to do it? When I have two values for one columns I prefer the following outcome:

  A                B
1 ['CA', 'DA']     CA
2  'CA'            NaN

jezrael jezrael · Accepted Answer · 2019-06-21T12:16:51

For all lists in output with missing values add custom lambda function:

df1 = pd.pivot_table(df1,index='col3', columns="col1", values='col2',
                     aggfunc = lambda x: x.tolist())
print (df1)
col1         A     B
col3                
1     [CA, DA]  [CA]
2         [CA]   NaN

If need scalars instead one element lists add if-else statement:

df1 = pd.pivot_table(df1,index='col3', columns="col1", values='col2', 
                     aggfunc = lambda x: x.tolist() if len(x) > 1 else x.iat[0])
print (df1)
col1         A    B
col3               
1     [CA, DA]   CA
2           CA  NaN

Pivot / Groupby dataframe with duplicates in index column with non numeric data

1 Answers