1
votes

I'm not good at SQL and I struggle to create concrete query in BigQuery.

I have table with columns: value (int), ts (timestamp), date(string in form yyyy-mm-dd (2018-03-21))

I need to construct query which will calculate for some period of time (days, weeks) everyday change plus latest value for that day.
there can be multiple values for every date (day), thus I thought to get change for current day, get and last value from today and last value from yesterday subtract.

so if I have data:

value   ts                        date   
5   2018-01-01 01:59:34.000 UTC 2018-01-01   
2   2018-01-01 18:39:34.000 UTC 2018-01-01   
6   2018-01-02 14:06:14.000 UTC 2018-01-02   
6   2018-01-02 16:52:54.000 UTC 2018-01-02      
1   2018-01-02 19:39:34.000 UTC 2018-01-02   
5   2018-01-02 22:26:14.000 UTC 2018-01-02   
5   2018-01-03 03:59:34.000 UTC 2018-01-03

output should be list of with rows as this:

date      change  value   
2018-01-02  3      5  

I have started with this query, but it returns the same number of rows as in select (since it's using analytical function, not aggregate), i.e. I don't know how to get one row per date with first and last value. After that I thought to do once more selection and use LEAD/LAG to get value for previous row (date) and calculate change (maybe there is simpler way overall)

SELECT
  FIRST_VALUE(value) OVER (PARTITION BY date ORDER BY ts ASC) AS first_value,
  LAST_VALUE(value) OVER (PARTITION BY date ORDER BY ts ASC) AS last_value,
  date
FROM
  `table`
WHERE
  ts BETWEEN TIMESTAMP("2018-01-01 00:00:00.000")
  AND TIMESTAMP("2018-01-03 23:59:59.000")

Any help is much appreciated

1

1 Answers

4
votes

Below example is for BigQuery Standard SQL

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 5 value, TIMESTAMP '2018-01-01 01:59:34.000 UTC' ts,  '2018-01-01' dt UNION ALL   
  SELECT 2, TIMESTAMP '2018-01-01 18:39:34.000 UTC',  '2018-01-01' UNION ALL   
  SELECT 6, TIMESTAMP '2018-01-02 14:06:14.000 UTC',  '2018-01-02' UNION ALL   
  SELECT 6, TIMESTAMP '2018-01-02 16:52:54.000 UTC',  '2018-01-02' UNION ALL      
  SELECT 1, TIMESTAMP '2018-01-02 19:39:34.000 UTC',  '2018-01-02' UNION ALL   
  SELECT 5, TIMESTAMP '2018-01-02 22:26:14.000 UTC',  '2018-01-02' UNION ALL   
  SELECT 5, TIMESTAMP '2018-01-03 03:59:34.000 UTC',  '2018-01-03' 
)
SELECT dt, 
  IFNULL(lastValue - LAG(lastValue) OVER(ORDER BY dt),0) AS change,
  lastValue AS value
FROM (
  SELECT * FROM (
    SELECT dt,
      FIRST_VALUE(value) OVER(myWindow) AS firstValue,
      LAST_VALUE(value) OVER(myWindow) AS lastValue
    FROM `project.dataset.table`
    WINDOW myWindow AS (PARTITION BY dt ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) 
  ) GROUP BY 1,2,3
) ORDER BY dt

with result as

Row dt          change  value    
1   2018-01-01  0       2    
2   2018-01-02  3       5    
3   2018-01-03  0       5     

I don't know how to get one row per date with first and last value.

As you can see I have changed window for analytical functions in your original select statement, which then allowed easily group by and get one row per day. The rest is as you planned - getting LAG, etc.