I'm working with a dataframe similar to this:
Name | Metric 1 | Metric 2 | Country |
---|---|---|---|
John | 0.10 | 5.00 | Canada |
Jane | 0.50 | Canada | |
Jack | 2.00 | Canada | |
Polly | 0.30 | Canada | |
Mike | Canada | ||
Steve | Canada | ||
Lily | 0.15 | 1.20 | Canada |
Kate | 3.00 | Canada | |
Edward | 0.05 | Canada | |
Pete | 0.02 | 0.03 | Canada |
I am trying to define a function that will calculate the percentage of metrics that are greater than 1 of the rows that have metrics. I expect that for Metric 1, I should get 25%, and for Metric 2, I should get 66%. However, my function is returning results based on the total number of rows. Here's my code:
import pandas as pd
import io
df = pd.read_csv(io.BytesIO(data_to_load['metric data.csv']))
df = df.fillna(0)
def metricgreaterthanone(x):
return (x>1).sum()/len(x!=0)
pd.pivot_table(df,index=['Country'],values=["Name","Metric 1","Metric 2"],aggfunc={'Name':pd.Series.nunique, "Metric 1":metricgreaterthanone,"Metric 2":metricgreaterthanone})
The result I get is:
Country | Metric 1 | Metric 2 | Name |
---|---|---|---|
Canada | 0.2 | 0.2 | 10 |
So the function is returning the percent of all rows all that are greater than 1. Any ideas on how to fix this?