I am trying to do weighted standard deviation on top of weighted average on my pandas
dataframe. I have a pandas
dataframe like:
import numpy as np
import pandas as pd
df = pd.DataFrame({"Date": pd.date_range(start='2018-01-01', end='2018-01-03 18:00:00', freq='6H'),
"Weight": np.random.uniform(3, 5, 12),
"V1": np.random.uniform(10, 15, 12),
"V2": np.random.uniform(10, 15, 12),
"V3": np.random.uniform(10, 15, 12)})
Currently, to get the weighted mean, inspired by this post, I am doing the following:
def weighted_average_std(grp):
return grp._get_numeric_data().multiply(grp['Weight'], axis=0).sum()/grp['Weight'].sum()
df.index = df["Date"]
df_agg = df.groupby(pd.Grouper(freq='1D')).apply(weighted_average_std).reset_index()
df_agg
Where I get the following:
Date V1 V2 V3 Weight
0 2018-01-01 11.421749 13.090178 11.639424 3.630196
1 2018-01-02 12.142917 11.605284 12.187473 4.056303
2 2018-01-03 12.034015 13.159132 11.658969 4.318753
I want to modify weighted_average_std
so that it returns standard deviation for each column in addition to weighted average
. The idea is to use the weighted average for each group in a vectorized fashion. The new column names for Weighted Standard Deviation
can be something like V1_WSD
, V2_WSD
and V3_WSD
.
PS1: This post goes through the theory of weighted standard deviation.
PS2: Column Weight
in df_agg
is meaningless.