0
votes

I have a dataframe, I want to groupby Col1 Col2 Col3 and get the 0 frequency of Value column: df =

Col1 Col2 Col3 Value
Val1 Val2  A    0
Val1 Val2  A    1
Val1 Val2  A    2
Val1 Val2  A    0
Val1 Val2  A    1

Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    0
Val1 Val2  B    1
...

How do I apply groupby to achieve

Col1 Col2 Col3 Fercentage_of_0
Val1 Val2  A       0.2
Val1 Val2  B       0.8
...

Thank you!

4
df['Value'].eq(0).groupby([df['Col1'],df['Col2'],df['Col3']]).mean()?Quang Hoang
@QuangHoang Thank you! Where did you learn this?TensorFrozen

4 Answers

1
votes

A simple lambda function does it for you. Generate a list where Value==0, takes len of this list and len of items in group. You have percentage

df = pd.DataFrame({"Col1":["Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1","Val1"],"Col2":["Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2","Val2"],"Col3":["A","A","A","A","A","B","B","B","B","B"],"Value":[0,1,2,0,1,0,0,0,0,1]})

df.groupby(["Col1","Col2","Col3"]).\
    agg({"Value":lambda x: len([v for v in x if v==0])/len(x)})

output

                Value
Col1 Col2 Col3       
Val1 Val2 A       0.4
          B       0.8
0
votes

Use groupby on the dataframe and then apply size() method on resulting dataframe. For example lets say you have createda dataframe named df containing these values

df = pd.DataFrame({'Col1': ['Val1','Val1','Val1','Val1','Val1','Val1','Val1','Val1'], 
               'Col2': ['Val2','Val2','Val2','Val2','Val2','Val2','Val2','Val2'],
               'Col3': ['A','A','A','A','B','B','B','B'],
               'Value':[0,1,2,0,0,0,0,1]}) 

then frequenncy count on individual element can be found using

df.groupby(['Col1','Col2','Col3','Value']).size()
Col1  Col2  Col3  Value
Val1  Val2  A     0        2
                  1        1
                  2        1
            B     0        3
                  1        1
dtype: int64
0
votes

Here's another way without using lambda, which seems more understandable to me:

df['is_zero'] = df['Value'] == 0
df.groupby(['Col1', 'Col2', 'Col3'])['is_zero'].mean()
0
votes

Create a boolean column for Value equal to 0, and groupby on the Col columns

(
    df.assign(Percentage_Of_0=lambda x: x.Value.eq(0))
    .groupby(["Col1", "Col2", "Col3"], as_index=False)
    .Percentage_Of_0.mean()
)

    Col1    Col2    Col3    Percentage_Of_0
0   Val1    Val2    A       0.4
1   Val1    Val2    B       0.8