2
votes

I'm trying to calculate Welles Wilder's type of moving average in a panda dataframe (also called cumulative moving average).

The method to calculate the Wilder's moving average for 'n' periods of series 'A' is:

  • Calculate the mean of the first 'n' values in 'A' and set as the mean for the 'n' position.
  • For the following values use the previous mean weighed by (n-1) and the current value of the series weighed by 1 and divide all by 'n'.

My question is: how to implement this in a vectorized way?

I tried to do it iterating over the dataframe (what a I read isn't recommend because is slow). It works, the values are correct, but I get an error

SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

and it's probably not the most efficient way of doing it.

My code so far:

import pandas as pd
import numpy as np

#Building Random sample:
datas = pd.date_range('2020-01-01','2020-01-31')
np.random.seed(693)
A = np.random.randint(40,60, size=(31,1))
df = pd.DataFrame(A,index = datas, columns = ['A'])

period = 12 # Main parameter
initial_mean = A[0:period].mean() # Equation for the first value.

size = len(df.index)
df['B'] = np.full(size, np.nan)
df.B[period-1] = initial_mean

for x in range(period, size):
    df.B[x] = ((df.A[x] + (period-1)*df.B[x-1]) / period) # Equation for the following values. 

print(df)
1

1 Answers

2
votes

You can use the Pandas ewm() method, which behaves exactly as you described when adjust=False:

When adjust is False, weighted averages are calculated recursively as:

weighted_average[0] = arg[0];
weighted_average[i] = (1-alpha)*weighted_average[i-1] + alpha*arg[i]

If you want to do the simple average of the first period items, you can do that first and apply ewm() to the result.

You can calculate a series with the average of the first period items, followed by the other items repeated verbatim, with the formula:

pd.Series(
    data=[df['A'].iloc[:period].mean()],
    index=[df['A'].index[period-1]],
).append(
    df['A'].iloc[period:]
)

So in order to calculate the Wilder moving average and store it in a new column 'C', you can use:

df['C'] = pd.Series(
    data=[df['A'].iloc[:period].mean()],
    index=[df['A'].index[period-1]],
).append(
    df['A'].iloc[period:]
).ewm(
    alpha=1.0 / period,
    adjust=False,
).mean()

At this point, you can calculate df['B'] - df['C'] and you'll see that the difference is almost zero (there's some rounding error with float numbers.) So this is equivalent to your calculation using a loop.

You might want to consider skipping the direct average between the first period items and simply start applying ewm() from the start, which will assume the first row is the previous average in the first calculation. The results will be slightly different but once you've gone through a couple of periods then those initial values will hardly influence the results.

That would be a way more simple calculation:

df['D'] = df['A'].ewm(
    alpha=1.0 / period,
    adjust=False,
).mean()