4
votes

Situation:

I have a login table with columns email and login_time. I would like to calculate the daily login time (DAU) and the rolling monthly login time (MAU). The DAU and MAU must count the distinct users. i.e. If someone logs in 20 times in the past 30 days (MAU) then he is only counted once. Same logic goes for DAU. MAU's range is 30 days.

DAU: is calculated by taking the distinct email login per day.

MAU: is calculated by taking the distinct email login for the rolling 30 days.

Desired Result: see fiddle below

Date         MAU     DAU     
2019-04-01   4       2  
2019-04-02   3       2  
2019-04-03   4       2   

Calculating the DAU is rather simple, however getting the rolling MAU, not sure how.

Fiddle:

create table #t1 (email varchar(max), login_time datetime)
insert into #t1 values 
('[email protected]', '2019-03-15 00:00:00.000'),
('[email protected]', '2019-04-01 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),
('[email protected]', '2019-04-03 00:00:00.000'),

('[email protected]', '2019-03-19 00:00:00.000'),
('[email protected]', '2019-04-01 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),

('[email protected]', '2019-03-02 00:00:00.000'),
('[email protected]', '2019-04-03 00:00:00.000'),

('[email protected]', '2019-03-06 00:00:00.000')
2
You haven't done a very good job explaining how DAU and MAU are defined. Please explain. - Gordon Linoff
@GordonLinoff Done. - Roger Steinberg
@Cowthulhu > 2016 - Roger Steinberg
@Cowthulhu Extremely important, the table i'm dealing with is handling hundreds of thousands of logins per day. - Roger Steinberg
@D-Shih no it shouldnt. in the past 30 days means within that range of 30 days. anything after that shouldnt be included in the MAU - Roger Steinberg

2 Answers

2
votes

This is one way of doing it.

SELECT login_time, 
    m.MAU,
    COUNT(DISTINCT email) AS DAU
FROM #t1 d
CROSS APPLY (SELECT COUNT( DISTINCT email) AS MAU
            FROM #t1 m
            WHERE m.login_time BETWEEN DATEADD( dd, -30, d.login_time) AND d.login_time) m
GROUP BY login_time, m.MAU
ORDER BY login_time;
0
votes

Thanks Luis, very clever answer, this would be the MySQL 8.X flavor just in case you need it, like myself:

drop table if exists t1;

create table t1 (email text, login_time datetime)
;
insert into t1 values 
('[email protected]', '2019-03-15 00:00:00.000'),
('[email protected]', '2019-04-01 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),
('[email protected]', '2019-04-03 00:00:00.000'),

('[email protected]', '2019-03-19 00:00:00.000'),
('[email protected]', '2019-04-01 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'),

('[email protected]', '2019-03-02 00:00:00.000'),
('[email protected]', '2019-04-03 00:00:00.000'),

('[email protected]', '2019-03-06 00:00:00.000')

;
SELECT login_time, 
    m.MAU,
    COUNT(DISTINCT email) AS DAU
FROM t1 as d,
LATERAL (SELECT COUNT( DISTINCT email) AS MAU
            FROM t1 m
            WHERE m.login_time BETWEEN d.login_time - interval 30 day AND d.login_time) as m
GROUP BY login_time, m.MAU
ORDER BY login_time; 

The equivalent of CROSS APPLY in MySQL 8.X is the LATERAL keyword (which means “this derived table depends on previous tables on its left side”):