Pandas - moving average with different time windows

Question

I have some data in a Pandas Dataframe. I'd like to calculate the moving average from that data, including up to N trailing rows and up to Q leading rows:

import pandas

my_data = pandas.DataFrame({
    'values': [5.0, 4.0, 3.0, 5.0, 5.0, 6.0, 4.0, 6.0, 7.0, 4.0, 5.0, 5.0]
})

my_data
    values
0      5.0
1      4.0
2      3.0
3      5.0
4      5.0
5      6.0
6      4.0
7      6.0
8      7.0
9      4.0
10     5.0
11     5.0

N=2
Q=3

get_moving_average(my_data, lagging=N, leading=Q)
    values      mavg
0        5  4.250000
1        4  4.400000
2        3  4.666667
3        5  4.500000
4        5  4.833333
5        6  5.500000
6        4  5.333333
7        6  5.333333
8        7  5.166667
9        4  5.400000
10       5  5.250000
11       5  4.666667

Here, the moving average for row 0 is 4.25 - the value of cell 0 plus 0 bolded lagging rows (as there are none) and 3 italicized leading rows (the value of cells 1:3 - so (5.0 + 4.0 + 3.0 + 5.0) / 4.0 = 17.0 / 4.0 = 4.25

The moving average for row 1 includes all of the above, but also the bolded trailing row 0 (as we accept 2 lagging rows, but only one is present), yielding (5.0 + 4.0 + 3.0 + 5.0 + 5.0) / 5.0 = 22.0 / 5.0 = 4.4.

And so on. But, pandas' rolling method only takes a single argument for window, not something that allows for choosing the size on each side, and seems to have issues where insufficient data is present.

Is there an alternative to rolling that covers this, or do I need to roll my own?

Haven't tested it, but is it by any chance my_data.expanding(4, center=False).mean()? — rafaelc
No, because that only does a side. I need both sides with control — J. Doe

Scott Boston Scott Boston · Accepted Answer · 2018-07-24T19:35:08

I am not sure this will work for all situations without further testing.

Let's get creative. Use rolling and shift backwards in the "rolling window" to get the appropriate trailing and leading row count. To handle the NaN at the of the rolling mean, let's roll backwards and fillna.

N=2
Q=3
T = N+Q+1
my_data.rolling(T, min_periods=1).mean().shift(-Q)\
       .fillna(my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1])

Output:

      values
0   4.250000
1   4.400000
2   4.666667
3   4.500000
4   4.833333
5   5.500000
6   5.333333
7   5.333333
8   5.166667
9   5.400000
10  5.250000
11  4.666667

Steps:

Do a rolling calculation over the full interval with, in your case, would be N (Lagging) + Q (leading) and the current which sums to total (T).

my_data.rolling(T, min_periods=1).mean()

Now, let's shift the calculation using shift back such that the window is as expected, N lagging and Q leading, so shift -Q.

my_data.rolling(T, min_periods=1).mean().shift(-Q)

Output:

      values
0   4.250000
1   4.400000
2   4.666667
3   4.500000
4   4.833333
5   5.500000
6   5.333333
7   5.333333
8   5.166667
9        NaN
10       NaN
11       NaN

Now to handle those NaN values at the end of the dataframe, we do a rolling calculation in reverse and use fillna fill the original. [::-1] is a reversing element to start at the end and rolling to the begin, then use [::-1] to flip again to line up with the original result.

my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1]

Outputs:

      values
0        NaN
1        NaN
2   4.666667
3   4.500000
4   4.833333
5   5.500000
6   5.333333
7   5.333333
8   5.166667
9   5.400000
10  5.250000
11  4.666667

Use fillna to on the first series with the second series to get:

N=2
Q=3
T = N+Q+1
my_data.rolling(T, min_periods=1).mean().shift(-Q)\
       .fillna(my_data[::-1].rolling(T, min_periods=Q).mean().shift(-N)[::-1])

Output:

      values
0   4.250000
1   4.400000
2   4.666667
3   4.500000
4   4.833333
5   5.500000
6   5.333333
7   5.333333
8   5.166667
9   5.400000
10  5.250000
11  4.666667

Pandas - moving average with different time windows

1 Answers