2
votes

My question is about sliding window sum up in bigquery.

I have a table like the following

run_id      year_week     value
  001        201451         5
  001        201452         8
  001        201501         1
  001        201505         5
  003        201352         8
  003        201401         1
  003        201405         5

Here for each year the week can range from 01 to 53. For example year 2014 has last week which is 201452 but year 2015 has last week which is 201553. If it makes life easier I only have 5 years, 2013, 2014, 2015, 2016 and 2017 and only year 2015 has weeks those go upto 53.

Now for each run I am trying to get a sliding window sum of the values. Each year_week would assume the sum of the values next 5 year_week (including itself) for the current run_id (e.g. 001). For example the following could be a an output from the current table

run_id    year_week   aggregate_sum
  001      201451      5+8+1+0+0
  001      201452      8+1+0+0+0
  001      201501      1+0+0+0+5
  001      201502      0+0+0+5+0
  001      201503      0+0+5+0+0
  001      201504      0+5+0+0+0
  001      201505      5+0+0+0+0
  003      201352      8+1+0+0+0
  003      201401      1+0+0+0+5
  003      201402      0+0+0+5+0
  003      201403      0+0+5+0+0
  003      201404      0+5+0+0+0
  003      201405      5+0+0+0+0

To explain what is happening, the next 5 weeks for 201451 including itself would be 201451,201452,201501,201502,201503 . If there is a value for those weeks in the table for current run_id we just sum them up which would be, 5+8+1+0+0, because the corresponding value for a year_week is 0 if it is not in the table.

Is it possible to do it using sliding window operation in bigquery?

1

1 Answers

2
votes

Below is for BigQuery Standard SQL

#standardSQL
WITH weeks AS (
  SELECT 100* year + week year_week
  FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year, 
  UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
  SELECT i.run_id, w.year_week, d.year_week week2, value
  FROM  weeks w 
  CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
  LEFT JOIN `project.dataset.table` d
  USING(year_week, run_id)
)
SELECT * FROM (
  SELECT run_id, year_week, 
    SUM(value) OVER(win) aggregate_sum
  FROM temp
  WINDOW win AS (
    PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
  )
)
WHERE NOT aggregate_sum IS NULL

You can test / play with above using dummy data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '001' run_id, 201451 year_week, 5 value UNION ALL
  SELECT '001', 201452, 8 UNION ALL
  SELECT '001', 201501, 1 UNION ALL
  SELECT '001', 201505, 5 
), weeks AS (
  SELECT 100* year + week year_week
  FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year, 
  UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
  SELECT i.run_id, w.year_week, d.year_week week2, value
  FROM  weeks w 
  CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
  LEFT JOIN `project.dataset.table` d
  USING(year_week, run_id)
)
SELECT * FROM (
  SELECT run_id, year_week, 
    SUM(value) OVER(win) aggregate_sum
  FROM temp
  WINDOW win AS (
    PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
  )
)
WHERE NOT aggregate_sum IS NULL
-- ORDER BY run_id, year_week

with result as

Row run_id  year_week   aggregate_sum    
1   001     201447      5    
2   001     201448      13   
3   001     201449      14   
4   001     201450      14   
5   001     201451      14   
6   001     201452      9    
7   001     201501      6    
8   001     201502      5        
9   001     201503      5    
10  001     201504      5    
11  001     201505      5
12  003     201348      8    
13  003     201349      9    
14  003     201350      9    
15  003     201351      9    
16  003     201352      9    
17  003     201401      6    
18  003     201402      5    
19  003     201403      5    
20  003     201404      5    
21  003     201405      5        

note; this is for - I only have 5 years, 2013, 2014, 2015, 2016 and 2017 but can easily be extended in weeks CTE