0
votes

This question is about shifting values of a year-week field in bigquery.

run_id    year_week    value
 0001       201451       13
 0001       201452       6
 0001       201503       3
 0003       201351       8
 0003       201352       5
 0003       201403       1

Here for each year the week can range from 01 to 53. For example year 2014 has last week which is 201452 but year 2015 has last week which is 201553.

Now I want to shift the values for each year_week in each run_id by 5 weeks. For the weeks there is no value it is assumed that they have a value of 0. For example the output from the example table above should look like this:

run_id    year_week    value
 0001       201504       13
 0001       201505       6
 0001       201506       0
 0001       201507       0
 0001       201508       3
 0003       201404       8
 0003       201405       5
 0003       201406       0
 0003       201407       0
 0003       201408       1

Explanation of the output: In the table above for run_id 0001 the year_week 201504 has a value of 13 because in the input table we had a value of 13 for year_week 201451 which is 5 weeks before 201504.

I could create a table programmatically by creating a mapping from a year_week to a shifted year_week and then doing a join to get the output, but I was wondering if there is any other way to do it by just using sql.

2

2 Answers

3
votes
#standardSQL
WITH `project.dataset.table` AS (
  SELECT '001' run_id, 201451 year_week, 13 value UNION ALL
  SELECT '001', 201452, 6 UNION ALL
  SELECT '001', 201503, 3 
), weeks AS (
  SELECT 100 * year + week year_week
  FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year, 
  UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
  SELECT i.run_id, w.year_week, d.year_week week2, value
  FROM  weeks w 
  CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
  LEFT JOIN `project.dataset.table` d
  USING(year_week, run_id)
)
SELECT * FROM (
  SELECT run_id, year_week, 
    SUM(value) OVER(win) value
  FROM temp
  WINDOW win AS (
    PARTITION BY run_id ORDER BY year_week ROWS BETWEEN 5 PRECEDING AND 5 PRECEDING
  )
)
WHERE NOT value IS NULL
ORDER BY run_id, year_week

with result as

Row     run_id      year_week       value    
1       001         201504          13   
2       001         201505          6    
3       001         201508          3    

if you need to "preserve" zero rows - just change below portion

  SELECT i.run_id, w.year_week, d.year_week week2, value
  FROM  weeks w 

to

  SELECT i.run_id, w.year_week, d.year_week week2, IFNULL(value, 0) value
  FROM  weeks w 

or

    SUM(value) OVER(win) value
  FROM temp

to

    SUM(IFNULL(value, 0)) OVER(win) value
  FROM temp
2
votes

If you have data in the table for all year-weeks, then you can do:

with yw as (
      select year_week, row_number() over (order by year_week) as seqnum
      from t
      group by year_week
     )
select t.*, yw5, year_week as new_year_week
from t join
     yw
     on t.year_week = yw.year_week left join
     yw yw5
     on yw5.seqnum = yw.seqnum + 5;

If you don't have a table of year weeks, then I would advise you to create such a table, so you can do such manipulations -- or a more general calendar table.