1
votes

I've been working on this problem for a while now. Basically I have a simple set of data with UserId, and TimeStamp. I want to know how many distinct UserId's appear each week, the catch is my week is measured in Sunday-Saturday, NOT Monday - Sunday, which is what Weekofyear() uses.

Right now I'm hardcoding each week and running the query:

SELECT
count(distinct UserId)
FROM data.table
where from_unixtime((CAST(timestamp as BIGINT))) 
       between TO_DATE("2016-06-05") AND TO_DATE("2016-06-12")

I'm trying to find a way to shift the timestamp back a day to trick weekofyear into thinking my Sunday is actually a Monday, but have not been successful. My latest futile attempt looked like:

SELECT
count(distinct UserId), weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))
FROM table.data
where from_unixtime((CAST(timestamp as BIGINT)))
       between TO_DATE("2016-06-01") AND TO_DATE("2016-06-30")
       group by weekofyear(date_sub(from_unixtime(CAST(timestamp as BIGINT)),1))

This results in the same numbers as if I didn't subtract a day. I not sure why this isn't working. I feel like there should be a way to manage this. Right now if I wanted to pull all the data by week WHERE X is true, I'd have to manually do each week, that won't be sustainable. Any suggestions on how to work smarter?

Thank you.

1

1 Answers

1
votes

Simple Solution

You can simply create your own formula instead of going with pre-defined function for "week of the year" Advantage: you will be able to take any set of 7 days for a week.

In your case since you want the week should start from Sunday-Saturday we will just need the first date of sunday in a year

eg- In 2016, First Sunday is on '2016-01-03' which is 3rd of Jan'16 --assumption considering the timestamp column in the format 'yyyy-mm-dd'

    SELECT
    count(distinct UserId), lower(datediff(timestamp,'2016-01-03') / 7) + 1 as week_of_the_year 
    FROM table.data 
    where timestamp>='2016-01-03'
    group by lower(datediff(timestamp,'2016-01-03') / 7) + 1;