1
votes

I have a df in below format:

    Priority Mined_Category           server date_reported  Count Zscore_Volume
1 - Critical   Memory issue        xxxxxx111    2018-07-11      1      nan
1 - Critical   Memory issue        xxxxxx111    2018-08-11      1      nan
1 - Critical   Memory issue        yyyyyy195    2018-07-06      1      1.71
1 - Critical   Memory issue        yyyyyy195    2018-07-08      1      1.71
    2 - High   Memory issue  abcabcabcba1410    2018-08-21      1     nan

my aim is to replace nan with 100 whenever Priority Mined_Category and Server groupby count is 1 and replace nan with 1000 whenever Priority Mined_Category and Server groupby count is >1

I tried below code:

> df_aggegrate_Volume.loc[(df_aggegrate_Volume.groupby(["Priority","Mined_Category","server"]).count()>1)&(df_aggegrate_Volume['Zscore_Volume'].isnull()) ,"Zscore_Volume"]= -100

but I get below error:

ValueError: operands could not be broadcast together with shapes (7410,) (3,)

1

1 Answers

1
votes

Need GroupBy.transform for return Series with same size as original df filled by aggregate values:

m1 = (df_aggegrate_Volume.groupby(["Priority","Mined_Category","server"])["server"]
                         .transform('count')>1)

m2 = df_aggegrate_Volume['Zscore_Volume'].isnull()

df_aggegrate_Volume.loc[m1 & m2 ,"Zscore_Volume"]= -100

print (df_aggegrate_Volume)
       Priority Mined_Category           server date_reported  Count  \
0  1 - Critical   Memory issue        xxxxxx111    2018-07-11      1   
1  1 - Critical   Memory issue        xxxxxx111    2018-08-11      1   
2  1 - Critical   Memory issue        yyyyyy195    2018-07-06      1   
3  1 - Critical   Memory issue        yyyyyy195    2018-07-08      1   
4      2 - High   Memory issue  abcabcabcba1410    2018-08-21      1   

   Zscore_Volume  
0        -100.00  
1        -100.00  
2           1.71  
3           1.71  
4            NaN