0
votes

Im trying to use a window function,for each day of sales of a sku, to have the sum of the last 365 days of qty of the sku. If this was sold on every day then I could use ROWS and PRECEDING etc

ORDER BY
      CalendarFullDate ROWS BETWEEN 364 PRECEDING AND CURRENT ROW

But in this case, the dates are not evenly distributed with many days not having a sale (i.e I cant just go back 364 rows and assume a sale is every day).

So with the test/sample below, is it possible to use windowing and some type of where clause so Im only summing back at most 364 days?

WITH samples AS (
  SELECT "1" AS SKU, DATE("2018-10-27") AS CalendarFullDate, 86.0 AS DailySalesQty UNION ALL (
  SELECT "1" AS SKU, DATE("2018-10-20"), 84.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-29"), 88.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-14"), 42.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-09-01"), 21.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-05-05"), 25.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-04-28"), 97.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-31"), 244.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-03-24"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-23"), 52.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-02-10"), 48.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-21"), 243.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-18"), 2.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2018-01-06"), 190.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-26"), 310.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-12-09"), 240.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-11-03"), 30.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-10-21"), 164.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-30"), 44.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-09"), 55.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-09-01"), 35.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-20"), 60.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "1" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (

  SELECT "2" AS SKU, DATE("2018-10-24"), 46.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-10-18"), 56.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-16"), 19.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-02"), 42.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-09-01"), 45.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-07-05"), 25.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-06-28"), 210.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-31"), 44.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-05-24"), 168.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-04-23"), 152.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-03-10"), 8.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-02-21"), 23.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-18"), 20.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2018-01-06"), 10.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-12-26"), 30.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-09"), 1240.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-11-03"), 323.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-10-21"), 123.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-30"), 444.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-09-09"), 555.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-08-01"), 35.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-06-20"), 6.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-05-06"), 68.0 ) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-15"), 136.0) UNION ALL (
  SELECT "2" AS SKU, DATE("2017-04-09"), 136.0)
)

SELECT 
  SKU, 
  CalendarFullDate, 
  SUM(DailySalesQty) OVER(win)
FROM
  samples WINDOW win AS (
    PARTITION BY
      SKU
    ORDER BY
      CalendarFullDate 
    RANGE BETWEEN DATE_TRUNC(CalendarFullDate,INTERVAL 364 DAY) AND CalendarFullDate)

I know above you cant do for RANGE, but its a sort of pseudo code for what I actually want to do. I tried a where clause but thats not allowed.

Is this even possible using windowing? Its a nice clean way to do it but not sure if I can express such a condition for the windowed aggregate?

Note: this is a cut down version of the real data which has 5 fields as the partition, and 20 odd measures to aggregate as well and is a huge dataset (1 TB) so wanting it to be efficient as well.

Thoughts?

Cheers!

1

1 Answers

2
votes

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
    SKU, 
    CalendarFullDate,
    SUM(DailySalesQty) OVER(win) SalesQty365days
FROM (
  SELECT 
    SKU, 
    CalendarFullDate, 
    DailySalesQty,
    UNIX_DATE(CalendarFullDate) unix_days
  FROM samples 
)
WINDOW win AS (
  PARTITION BY SKU ORDER BY unix_days 
  RANGE BETWEEN 364 PRECEDING AND CURRENT ROW
)

the trick here is in 'translating' CalendarFullDate field of DATE type into INTEGER number of days since epoch so it can be used in ORDER BY and RANGE parts of WINDOW expression