python - Pandas - dataframe groupby - how to get sum of multiple columns

Question

This should be an easy one, but somehow I couldn't find a solution that works.

I have a pandas dataframe which looks like this:

index col1   col2   col3   col4   col5
0     a      c      1      2      f 
1     a      c      1      2      f
2     a      d      1      2      f
3     b      d      1      2      g
4     b      e      1      2      g
5     b      e      1      2      g

I want to group by col1 and col2 and get the sum() of col3 and col4. Col5 can be dropped, since the data can not be aggregated.

Here is how the output should look like. I am interested in having both col3 and col4 in the resulting dataframe. It doesn't really matter if col1 and col2 are part of the index or not.

index col1   col2   col3   col4   
0     a      c      2      4          
1     a      d      1      2      
2     b      d      1      2      
3     b      e      2      4

Here is what I tried:

df_new = df.groupby(['col1', 'col2'])["col3", "col4"].sum()

That however only returns the aggregated results of col4.

I am lost here. Every example I found only aggregates one column, where the issue obviously doesn't occur.

The issue is likely that df.col3.dtype is likely not an int or a numeric datatype. Try df.col3 = df.col3.astype(int) before doing your groupby — A.Kot

BENY BENY · Accepted Answer · 2017-09-26T16:14:53

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]: 
           col3  col4
col1 col2            
a    c        2     4
     d        1     2
b    d        1     2
     e        2     4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

python - Pandas - dataframe groupby - how to get sum of multiple columns

6 Answers