Distinct in Window Functions. BigQuery

Question

I'm trying to do something like this in BigQuery COUNT(DISTINCT user_id) OVER (PARTITION BY DATE_TRUNC(date, month), sample, app_id ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS

In other words, I have a table with Date, Userid, Sample and Application ID. I need to count the cumulative number of unique active users for each day starting from the beginning of the month and ending with the current day.

The function works properly without distinct, however, this gives me a total count of users and it's not what I need.

Tried some tricks with dense_rank, however it doesn't work here as well.

Are there any ways to calculative the number of distinct users using window functions?

-------------UPDATED---------------- here is the full query, so you could better understand what I need

    with mtd1 as (select  
'MonthToDate' as TIMELINE
,fd.date DATE
,td.SAMPLE as SAMPLE
,td.APPNAME as APP_ID 
,sum(fd.revenue) as REVENUE 
,td.user_id ACTIVE_USERS 
from DWH.DailyUser fd 
join DWH.Depositors td using (userid)
group by 1,2,3,4,6
),
mtd as (
select TIMELINE
,DATE
,SAMPLE
,APP_ID
,sum(revenue) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as REVENUE
,COUNT(distinct active_users) over (partition by date_trunc(date, month), sample, app_id order by date range BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as ACTIVE_USERS 
from mtd1
)
select * from mtd 
where extract(day from date) = extract(day from current_date)
group by 1,2,3,4,5,6

Mikhail Berlyant Mikhail Berlyant · Accepted Answer · 2018-01-12T15:20:49

Distinct in Window Functions. BigQuery - Are there any ways to calculate the number of distinct users using window functions?

This specific question is a duplicate and already answered here

... here is the full query ...

As of how to apply above to your particular query - see below (not tested and fully based on your code

#standardSQL
WITH mtd1 AS (
  SELECT  
    'MonthToDate' AS TIMELINE
    ,fd.date DATE
    ,td.SAMPLE AS SAMPLE
    ,td.APPNAME AS APP_ID 
    ,SUM(fd.revenue) AS REVENUE 
    ,td.user_id ACTIVE_USERS 
  FROM `DWH.DailyUser` fd 
  JOIN `DWH.Depositors` td USING (userid)
  GROUP BY 1,2,3,4,6
), mtd2 AS (
  SELECT 
    TIMELINE
    ,DATE
    ,SAMPLE
    ,APP_ID
    ,SUM(REVENUE) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS REVENUE
    ,ARRAY_AGG(ACTIVE_USERS) OVER (PARTITION BY DATE_TRUNC(DATE, MONTH), SAMPLE, APP_ID ORDER BY DATE RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS ACTIVE_USERS 
  FROM mtd1
), mtd AS (
  SELECT * REPLACE((SELECT COUNT(DISTINCT u) FROM UNNEST(ACTIVE_USERS) AS u) AS ACTIVE_USERS)
  FROM mtd2
)
SELECT * FROM mtd 
WHERE EXTRACT(day FROM DATE) = EXTRACT(day FROM CURRENT_DATE)
GROUP BY 1,2,3,4,5,6

Distinct in Window Functions. BigQuery

3 Answers