1
votes

I'm working with a dataframe similar to this:

Name Metric 1 Metric 2 Country
John 0.10 5.00 Canada
Jane 0.50 Canada
Jack 2.00 Canada
Polly 0.30 Canada
Mike Canada
Steve Canada
Lily 0.15 1.20 Canada
Kate 3.00 Canada
Edward 0.05 Canada
Pete 0.02 0.03 Canada

I am trying to define a function that will calculate the percentage of metrics that are greater than 1 of the rows that have metrics. I expect that for Metric 1, I should get 25%, and for Metric 2, I should get 66%. However, my function is returning results based on the total number of rows. Here's my code:

import pandas as pd
import io
df = pd.read_csv(io.BytesIO(data_to_load['metric data.csv']))

df = df.fillna(0)

def metricgreaterthanone(x):
  return (x>1).sum()/len(x!=0)

pd.pivot_table(df,index=['Country'],values=["Name","Metric 1","Metric 2"],aggfunc={'Name':pd.Series.nunique, "Metric 1":metricgreaterthanone,"Metric 2":metricgreaterthanone})

The result I get is:

Country Metric 1 Metric 2 Name
Canada 0.2 0.2 10

So the function is returning the percent of all rows all that are greater than 1. Any ideas on how to fix this?

2

2 Answers

1
votes

x!=0 returns a boolean array, so len() is not counting the number of Trues.

Try


def metricgreaterthanone(x):
  return (x>1).sum()/(x!=0).sum()

1
votes

It seems that you have empty string "" instead of numbers. You can try:

def metricgreaterthanone(x):
    n = pd.to_numeric(x, errors="coerce")
    return (n > 1).sum() / n.notna().sum()


x = pd.pivot_table(
    df,
    index=["Country"],
    values=["Name", "Metric 1", "Metric 2"],
    aggfunc={
        "Name": pd.Series.nunique,
        "Metric 1": metricgreaterthanone,
        "Metric 2": metricgreaterthanone,
    },
)
print(x)

Prints:

         Metric 1  Metric 2  Name
Country                          
Canada       0.25  0.666667    10