2
votes

Say I have a dataframe

df = pd.DataFrame({'colA' : ['ABC', 'JKL', 'STU', '123'],
                   'colB' : ['DEF', 'MNO', 'VWX', '456'],
                   'colC' : ['GHI', 'PQR', 'YZ', '789'],}, index = [0,0,1,1])
   colA colB colC
0  ABC   DEF  GHI 
0  JKL   MNO  PQR
1  STU   VWX   YZ
1  123   456  789

Its guranteed that every pair will have the same index, so we would like the end result to be :

     colA        colB       colC
0  ABC_JKL_0   DEF_MNO_0  GHI_PQR_0 
1  STU_123_1   VWX_456_1   YZ_789_1

where the suffix _number is the index of that group.

I tried doing this by iterating through rows but that's taking a lot of time. I was thinking of something like .groupby(level=0) but can't figure out the next aggregation apply part

3
Try: df_out=df.groupby(level=0).agg('_'.join) to start. - Scott Boston
yeah tried that, stuck there! - Himanshu Poddar
df_out=df.groupby(level=0).agg(lambda x: '_'.join(x)+'_'+str(x.index[0])) - Scott Boston
@ScottBoston Is it possible to apply multiple function on aggregate? like say first list then tuple, obviously no one would want that but just thinking if multiple function could be applied or not - Himanshu Poddar
@ScottBoston please add that as answer, ll accept that - Himanshu Poddar

3 Answers

3
votes

IIUC, you can try something like this using .agg and a lambda function or you can add it into the dataframe after the groupby:

df_out=df.groupby(level=0).agg(lambda x: '_'.join(x)+'_'+str(x.index[0]))

Output:

        colA       colB       colC
0  ABC_JKL_0  DEF_MNO_0  GHI_pQR_0
1  STU_123_1  VWX_456_1   YZ_789_1

Or

df_out=df.groupby(level=0).agg('_'.join)
df_out = df_out.add('_'+df_out.index.to_series().astype(str), axis=0)
print(df_out)
2
votes
df.groupby(level=0).agg(lambda x: f"{'_'.join(x)}_{x.index[0]}")

Output:

        colA       colB       colC
0  ABC_JKL_0  DEF_MNO_0  GHI_PQR_0
1  STU_123_1  VWX_456_1   YZ_789_1
2
votes

You can do:

df.groupby(level=0).agg('_'.join).transform(lambda x:x+'_'+str(x.index[0]))