1
votes

Here is my attempt to group by rolling week and last week. I understand the error, but when I group by day the minimum becomes minimum of each row and thus not the minimum of the set.

There must be a simple way to group days together in Bigquery.

select
    n_rtb_impressions,
    if(dayy between min(dayy) and (min(dayy) + 6) ,1, 0) as D
from
    (SELECT
    ((dayofyear(datetime))) as dayy,
        sum(IF( c_priority IN (19,20,21), IFNULL(s_impressions,0)-IFNULL(hybrid_p_back,0), 0)) AS n_rtb_impressions,
        FROM
        TABLE_DATE_RANGE(hourly_stats.v1_,DATE_ADD(CURRENT_DATE(),-14,"day"),DATE_ADD(CURRENT_DATE(),-1,"day"))
        group by dayy)

EDIT with the help below and then some playing around I managed to expand it for multiple dimensions and then filter out results such that each group of dimensions only has two rows, one for sum 1-7 days and one for 8-14 days. I then used NTH to pivot the data so the grouping is now dimensions. I'll put it here for others:

select
server,
a_name,
w_name,
rtb_name,

NTH(1, last_7_days_n_rtb_impressions_1) This_week, NTH(2, last_7_days_n_rtb_impressions_1) Last_Week,

from(
    select
        [date],
        dw,
        server,
        a_name,
        w_name,
        rtb_name,
        last_7_days_n_rtb_impressions_1
    from(
        SELECT
            [date],
            dw,
            SUM(n_rtb_impressions) OVER(ORDER BY rtb_name, server, a_name, w_name,[date]
              ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS last_7_days_n_rtb_impressions_1,
            server,
            a_name,
            w_name,
            rtb_name,
        FROM (
          SELECT
              server,
              a_name,
              w_name,
              rtb_name,
              dayofweek(datetime) as dw,
              DATE(datetime) AS [date],
              SUM(IF( c_priority IN (19,20,21), IFNULL(s_impressions,0)-IFNULL(hybrid_p_back,0), 0)) AS n_rtb_impressions,

          FROM TABLE_DATE_RANGE(hourly_stats.v1_,DATE_ADD(CURRENT_DATE(),-14,"day"),
               DATE_ADD(CURRENT_DATE(),-1,"day"))

          GROUP BY [date],dw,server,
                 a_name,
                 w_name,
                 rtb_name,
          order by rtb_name,server, a_name,w_name, [date])

          order by rtb_name,server, a_name,w_name, )

          where dw = dayofweek(DATE_ADD(CURRENT_DATE(),-1,"day"))

          order by rtb_name,server, a_name,w_name, [date])

          group by
          server,
          a_name,
          w_name,
          rtb_name,
1
I tried to make the indentation follow standard conventions, but it may not be 100% correct. Can you check it? - Peter Mortensen
I would, but my new job use oracle and sql server so no longer on bigquery. I appreciate your help! - Oli

1 Answers

2
votes

Trying to be as close to your example as possible

last_7_days_n_rtb_impressions_1 - produces correct output in case if you do not have any gaps in your days within the year. And it will continue counting 7 days back even across the new year

last_7_days_n_rtb_impressions_2 - produces correct output even if there is a gap in the days during the year. And it starts from 0 yeach year. Of course this can be further tuned to reflect your needs

Hope this is good start for you

See window-frame-clause section of Window functions for more details

SELECT 
  [date], dayy, n_rtb_impressions, 
  SUM(n_rtb_impressions) OVER(ORDER BY [date] 
      ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS last_7_days_n_rtb_impressions_1,
  SUM(n_rtb_impressions) OVER(ORDER BY [date] 
      RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS last_7_days_n_rtb_impressions_2  
FROM (
  SELECT
    DAYOFYEAR(datetime) AS dayy,
    DATE(datetime) AS [date],
    SUM(IF( c_priority IN (19,20,21), IFNULL(s_impressions,0)-IFNULL(hybrid_p_back,0), 0)) AS n_rtb_impressions,
  FROM TABLE_DATE_RANGE(hourly_stats.v1_,DATE_ADD(CURRENT_DATE(),-14,"day"),
        DATE_ADD(CURRENT_DATE(),-1,"day"))
  GROUP BY dayy, [date]
)
ORDER BY 1