1
votes

The output I'm looking for is a record for each individual day, with a count of DISTINCT Active Members and a count of the DISTINCT Rolling 7 day Active Members.

I'm working with two tables -- [transactions] which has a record for every transaction through our system, including a dmn_date_id for the transaction processing date, and a DATE_DIMENSION which has every day between 1990 and 2020 with day_offset_nbr representing the days offset from TODAY() (this table is rebuilt nightly).

SELECT
    dd.date_tms
    , COUNT(DISTINCT(t.user_id)) as Active_Members_Today
    , COUNT(DISTINCT(t2.user_id)) as Rolling_7Day_Active_Members
FROM
    transactions t
    JOIN DATE_DIMENSION dd
        ON t.transaction_processed_date_id = dd.date_id
    JOIN DATE_DIMENSION d2
        ON d2.day_offset_nbr BETWEEN (dd.day_offset_nbr - 6) AND dd.day_offset_nbr
    JOIN transactions t2
        ON d2.dmn_date_id = t2.common_trans_processed_date_id
WHERE
    t.user_initiated_ind = 'Y' -- Only User-initiated Transactions
    AND t2.user_initiated_ind = 'Y'
    AND t.transaction_state_name = 'Accepted' -- Accepted Transactions Only
    AND t2.transaction_state_name = 'Accepted'  
    AND dd.day_offset_nbr >= -731
    AND d2.day_offset_nbr >= -738
GROUP BY
    dd.date_tms

The above query gets me what we need but it takes a long time (over an hour) to process. We've created indexes on user_initiated_ind and common_trans_processed_date_id including the user_id but the query is still spinning away.

Thinking I could resolve this with a window function it was learned you can't use DISTINCT in a window, which led me to using a derived table with Row_Number() to identify DISTINCT user's within the window, which led me to discover Row_Number() can't be used with a ROWS BETWEEN clause in a Window...

I've read Calculate running total / running balance which is similar but deals with a running balance rather than rolling distinct count, and I'm not quite sure how to apply what I've read to my problem; only that the INNER JOIN method I'm using is suggested as NOT the method to use...

...which inspired me to post.

How can I improve my query performance time?

1
Does your execution plan show that it is using an index seek with no additional lookups?Tab Alleman
No sir, @tab-alleman -- Clustered Index Scans all the way down.dijikul

1 Answers

1
votes

The second join to transactions seems entirely unnecessary. You can do the counts using conditional aggregation. So, this version might be faster:

SELECT d2.date_tms,
       count(distinct case when d2.day_offset_nbr = dd.day_offset_nbr then t.user_id end) as Active_Members_Today,
       count(distinct t.user_id) as Rolling_7Day_Active_Members
FROM transactions t JOIN
     DATE_DIMENSION dd
     ON t.transaction_processed_date_id = dd.date_id JOIN
     DATE_DIMENSION d2
     ON d2.day_offset_nbr BETWEEN dd.day_offset_nbr AND (dd.day_offset_nbr + 6) AND 
WHERE t.user_initiated_ind = 'Y' AND -- Only User-initiated Transactions
      t.transaction_state_name = 'Accepted' AND -- Accepted Transactions Only
      dd.day_offset_nbr >= -731
GROUP BY d2.date_tms;