Pandas Pivot Table Weighted Average of Numerical Column Name and Numerical Row Index

Question

How does one calculate the "weighted average axis" value for each axis of "pot" and "graft", weighted by the "rez" values (summed) which make up the belly of the Pivot Table? In this case the row index and column names are not strings, but numbers.

import pandas as pd
import numpy as np
import random

# CREATE DATA
pot = np.linspace(0,80,num=9)
graft = np.linspace(0.30,0.80,num=11)

pots = pd.Series([])
grafts = pd.Series([])
rezs = pd.Series([])

for i in range (1,5):
    for pott in pot:
        for graftt in graft:
            rez = random.uniform(-2,4)
            rez= round(rez, 2)
            pots = pots.append(pd.Series(pott))
            grafts = grafts.append(pd.Series(graftt))
            rezs = rezs.append(pd.Series(rez))

# CREATE DATA FRAME
df = pd.DataFrame({'pot' : pots, 'graft' : grafts, 'rez': rezs })

# CREATE PIVOT TABLE
pivot = pd.pivot_table(data = df, values = 'rez', index = ['pot'], columns = ['graft'], aggfunc = np.sum)

# REPLACE NEGATIVE VALUES WITH ZEROES
pivot[pivot < 0] = 0

# Calculate Weights
weights = pivot / (pivot.sum().sum())
pivotsum = pivot.sum().sum()

Pivot Table:

graft  0.30   0.35  0.40  0.45   0.50   0.55  0.60  0.65  0.70  0.75   0.80
pot                                                                        
0.0    5.66   0.37  3.92  4.99   1.38   7.89  7.04  3.83  3.88  5.48   4.08
10.0   5.44   4.34  2.26  4.55   4.89   4.50  2.07  3.94  2.66  3.77  11.26
20.0   0.00   4.47  7.15  1.20   4.55  11.40  3.33  0.41  2.34  4.20   7.17
30.0   2.88   1.88  8.55  4.60   4.07  13.58  6.06  3.79  9.25  4.21   1.63
40.0   5.06   0.63  4.20  6.68   5.15   7.93  2.03  7.92  6.94  0.00   1.99
50.0   6.57  11.63  3.80  6.69   6.74   5.71  3.48  4.48  0.00  3.20   4.18
60.0   6.46   5.69  5.72  0.00  13.71   5.03  3.82  9.91  4.02  1.12   1.81
70.0   0.82   8.50  4.79  3.82   1.50   5.66  2.57  0.00  6.91  6.12   4.55
80.0   8.90   4.58  5.01  3.47   4.42   0.08  4.63  0.00  2.77  0.96   3.30

I would like to calculate the weighted average "graft" and "pot", weighted by the (summed) "rez" values which make up the belly of the dataframe.

I've gotten as far as to create weights but do not know how to proceed in accessing the indices and column names to calculate the weighted average.

Actual data will not be uniformly distributed.

Desired output:

Weighted Average Graft = (0.30*5.66/pivotsum) + (0.30* 5.44/pivotsum) + ... + (0.80*3.30/pivotsum)

Weighted Average Pot = (0 * 5.66/pivotsum) + (10*5.44/pivotsum) + ... +(80*3.30/pivotsum)

These two weighted averages together then describe a "point" in the pivot table which represents where the pivot table would balance.

So, you want (0.30*(0.0*5.66 + 10.0*5.44 + 20.0*...) + 0.35*(0.0*0.37 + 10.0*4.3.4 + ...) + 0.40*(....) .... 0.80*(...)) / (0.30 + 0.35 + ... + 0.80)? — Mike

tommylicious tommylicious · Accepted Answer · 2019-05-28T18:19:55

Pandas.dataframe.reset_index and pandas.melt work. We can eliminate the last two lines in the original code. We collapse the pivot table multi index and then melt the remaining dataframe columns while preserving one as the id variable for the melt method. We can then calculate the weighted averages on the fully melted pivot table. There may be a more direct solution out there, but this does get to the right numbers.

import pandas as pd
import numpy as np
import random

# CREATE DATA
pot = np.linspace(0,80,num=9)
graft = np.linspace(0.30,0.80,num=11)

pots = pd.Series([])
grafts = pd.Series([])
rezs = pd.Series([])

for i in range (1,5):
    for pott in pot:
        for graftt in graft:
            rez = random.uniform(-2,4)
            rez= round(rez, 2)
            pots = pots.append(pd.Series(pott))
            grafts = grafts.append(pd.Series(graftt))
            rezs = rezs.append(pd.Series(rez))

# CREATE DATA FRAME
df = pd.DataFrame({'pot' : pots, 'graft' : grafts, 'rez': rezs })

# CREATE PIVOT TABLE
pivot = pd.pivot_table(data = df, values = 'rez', index = ['pot'], columns = ['graft'], aggfunc = np.sum)

# PRINT OUT INITIAL PIVOT TABLE
print("\nOriginal Pivot:\n",pivot)

# REPLACE NEGATIVE VALUES WITH ZEROES
pivot[pivot < 0] = 0
print("\nPost negative filtered Pivot:\n",pivot)

# RESET INDEX ON PIVOT TABLE
unpack = pivot.reset_index()

# MELT DATAFRAME COLUMNS
unpack = pd.melt(unpack, id_vars=['pot'], value_name='Rez')

# CALCULATE AND DISPLAY WEIGHTED AVERAGES
totrez = unpack['Rez'].sum()
unpack['Weight'] = unpack['Rez'] / totrez
wavpot = (unpack['pot'] * unpack['Weight']).sum()
wavpot = round(wavpot, 2)
print("\nWeighted Average pot:", wavpot)
wavgraft = (unpack['graft'] * unpack['Weight']).sum()
wavgraft = round(wavgraft, 2)
print("Weighted Average graft:", wavgraft)

Pandas Pivot Table Weighted Average of Numerical Column Name and Numerical Row Index

1 Answers