In a pandas pivot table, how do I define a function for a subset of data?

Question

I'm working with a dataframe similar to this:

Name	Metric 1	Metric 2	Country
John	0.10	5.00	Canada
Jane	0.50		Canada
Jack	2.00		Canada
Polly	0.30		Canada
Mike			Canada
Steve			Canada
Lily	0.15	1.20	Canada
Kate	3.00		Canada
Edward	0.05		Canada
Pete	0.02	0.03	Canada

I am trying to define a function that will calculate the percentage of metrics that are greater than 1 of the rows that have metrics. I expect that for Metric 1, I should get 25%, and for Metric 2, I should get 66%. However, my function is returning results based on the total number of rows. Here's my code:

import pandas as pd
import io
df = pd.read_csv(io.BytesIO(data_to_load['metric data.csv']))

df = df.fillna(0)

def metricgreaterthanone(x):
  return (x>1).sum()/len(x!=0)

pd.pivot_table(df,index=['Country'],values=["Name","Metric 1","Metric 2"],aggfunc={'Name':pd.Series.nunique, "Metric 1":metricgreaterthanone,"Metric 2":metricgreaterthanone})

The result I get is:

Country	Metric 1	Metric 2	Name
Canada	0.2	0.2	10

So the function is returning the percent of all rows all that are greater than 1. Any ideas on how to fix this?

Z Li Z Li · Accepted Answer · 2021-04-28T16:38:07

x!=0 returns a boolean array, so len() is not counting the number of Trues.

Try


def metricgreaterthanone(x):
  return (x>1).sum()/(x!=0).sum()

In a pandas pivot table, how do I define a function for a subset of data?

2 Answers