I'm a BigQuery and SQL newbie that's continuing to tackle grouping problems. Using Standard SQL in BigQuery, I'd like to group data by X days. Here's a table of data:
event_id | url | timestamp
-----------------------------------------------------------
xx a.html 2016-10-18 15:55:16 UTC
xx a.html 2016-10-19 16:68:55 UTC
xx a.html 2016-10-25 20:55:57 UTC
yy b.html 2016-10-18 15:58:09 UTC
yy b.html 2016-10-18 08:32:43 UTC
zz a.html 2016-10-20 04:44:22 UTC
zz c.html 2016-10-21 02:12:34 UTC
I want to count the number of each event that occurred on each url in intervals of X days, starting from a given date. For example: how could I group this in intervals of 3 days, where my first interval starts on 2016-10-18 00:00:00 UTC? In addition, can I assign the 3rd day of the interval to each row? Example output:
event_id | url | count | 3dayIntervalLabel
-----------------------------------------------------------
xx a.html 2 2016-10-20 --> [18th thru 20th]
yy b.html 2 2016-10-20
zz a.html 1 2016-10-20
zz c.html 1 2016-10-23 --> [21th thru 23th]
xx a.html 1 2016-10-26 --> [24th thru 26th]
I added three annotations to clarify the 3dayIntervalLabel values.
In general, I'm hoping to solve: group by intervals of X days, starting from date Y, and label the intervals using the final date of the each interval.
Please let me know if more clarification is needed.
If you're interested, I've also asked similar questions on StackOverflow (and gotten answers) about grouping this data using a rolling window: initial question and follow-up.
Thanks!
GENERATE_DATE_ARRAY
if you need a sequence of dates against which to join. – Elliott Brossard