0
votes

If I have a data frame with only number of clicks at certain fixed time interval looking like this:

1
3
4
2
6
1

And I want to calculate their rolling average with the 5 rows above, would this be legit:

SELECT AVG(value) OVER (ORDER BY 1 ASC ROWS 4 PRECEDING ) AS avg_value FROM df GROUP BY 1

Or should it be

SELECT AVG(value) OVER (PARTITION BY 1 ASC ROWS 4 PRECEDING) AS avg_value FROM df GROUP BY 1

1
Not legit, because your data is lacking a second column which you'd need to tell the order of each number of clicks. There is no internal "order" to a SQL table, in general.Tim Biegeleisen
given the lack of order, is there a way to add a row number then do rolling avg?santoku
Yes, but row number itself requires a column for the order. You need to record your clicks with a timestamp. This is really your only option.Tim Biegeleisen

1 Answers

0
votes

You seem to want:

SELECT df.*,
       AVG(value) OVER (ORDER BY datetimecol ASC
                        ROWS 4 PRECEDING
                       ) AS avg_rolling_5
FROM df;

Notes:

  • A rolling average requires implies an ordering on the data. The datetimecol is for the column that represents that ordering.
  • A rolling average is for the original data, not the aggregated data, so no order by is needed.
  • SQL databases have tables, not dataframes.