I have a file that has a record of all usage of our product, which includes user id, number of calls made and the date the calls were made (it's rolled up to the date, by user id).
user_id | num_calls | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 35 | 2018-04-18
435 | 10 | 2018-04-18
123 | 20 | 2018-04-19
435 | 90 | 2018-04-20
I want to produce a chart that shows, for each day in the past and going forward, users, who were active in the 30 days prior to and including that date, and how many calls they made over that 30 day period. Ultimately, I will be using this to set various thresholds for "high usage" in a given 30 day period. It would look like this:
user_id | num_calls_in_previous_30_days | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 67 | 2018-04-18
435 | 31 | 2018-04-18
123 | 87 | 2018-04-19
435 | 31 | 2018-04-19
123 | 87 | 2018-04-20
435 | 121 | 2018-04-20
The issue I'm having is that when I try to use the window function
sum(num_calls) over (partition by id ORDER BY UNIX_SECONDS(timestamp(date)) range BETWEEN 2505600 PRECEDING AND CURRENT ROW)
I only get the total number of calls in the last 30 days for users who were active on each specific date as opposed to including all users who were active in the 30 days prior to that date and their usage over that time frame. Using the same data from above, it looks like this:
user_id | num_calls_in_previous_30_days | date
123 | 32 | 2018-04-17
435 | 21 | 2018-04-17
123 | 67 | 2018-04-18
435 | 31 | 2018-04-18
123 | 87 | 2018-04-19
435 | 121 | 2018-04-20
I tried another route, which was getting all unique user_ids from the previous 30 days from each date, but I wasn't sure how to join this with my existing usage data to get my desired result.
I'm sure there's a simple solution here, but I've spent a few hours on it and can't seem to wrap my head around how to solve this.
Thanks in advance!