I have some data in a Pandas Dataframe. I'd like to calculate the moving average from that data, including up to N trailing rows and up to Q leading rows:
import pandas
my_data = pandas.DataFrame({
'values': [5.0, 4.0, 3.0, 5.0, 5.0, 6.0, 4.0, 6.0, 7.0, 4.0, 5.0, 5.0]
})
my_data
values
0 5.0
1 4.0
2 3.0
3 5.0
4 5.0
5 6.0
6 4.0
7 6.0
8 7.0
9 4.0
10 5.0
11 5.0
N=2
Q=3
get_moving_average(my_data, lagging=N, leading=Q)
values mavg
0 5 4.250000
1 4 4.400000
2 3 4.666667
3 5 4.500000
4 5 4.833333
5 6 5.500000
6 4 5.333333
7 6 5.333333
8 7 5.166667
9 4 5.400000
10 5 5.250000
11 5 4.666667
Here, the moving average for row 0 is 4.25 - the value of cell 0 plus 0 bolded lagging rows (as there are none) and 3 italicized leading rows (the value of cells 1:3 - so (5.0 + 4.0 + 3.0 + 5.0) / 4.0 = 17.0 / 4.0 = 4.25
The moving average for row 1 includes all of the above, but also the bolded trailing row 0 (as we accept 2 lagging rows, but only one is present), yielding (5.0 + 4.0 + 3.0 + 5.0 + 5.0) / 5.0 = 22.0 / 5.0 = 4.4.
And so on. But, pandas' rolling method only takes a single argument for window, not something that allows for choosing the size on each side, and seems to have issues where insufficient data is present.
Is there an alternative to rolling that covers this, or do I need to roll my own?
my_data.expanding(4, center=False).mean()? - rafaelc