0
votes

I'm trying to calculate the DAU average for each country for a time period of 1 month. The job of the query is to:

  1. identify unique users
  2. find all users who logged in during last month
  3. group them into individual days
  4. segment them into their respective countries
  5. count the average for each country.

So far I've managed steps 1, 2, 3 and 4, but the last one is proving to be tricky.

The query is supposed to first calculate the subquery where it calculates how many active users opened the app in the last month and then group them into days and countries. After this, it should calculate the average DAU for each country using all 30 days data it has calculated in the subquery. The result would then be a list of countries and their average DAU.

query so far looks like this:

SELECT Country, AVG(User_ID)
FROM usersession
WHERE User_ID IN
    (SELECT count(distinct us.User_ID)
     FROM usersession us
     WHERE Opened > current_timestamp - interval 1 month
     GROUP BY DAY(Opened), Country)
GROUP BY Country ORDER BY Country;

The subquery does steps 1,2,3,4 but the secondary query outside the subquery isn't just working as intended.

Table is as follows (just a short example of the relevant information):

ID    |  UserID  | Opened              | Country
-----------------------------------------------
233231          1   2017-11-20 08:00:00      NA
223214          2   2017-11-20 08:53:00      DK

Expected result (around 230 countries total):

Country |  Average  
------------------
     NA    150354
     DK     60345
     FI     50242

Actual result:

+---------+--------------+
| Country | AVG(User_ID) |
+---------+--------------+
| NULL    |  804397.7297 |
|         |  746046.7500 |
| BR      |  893252.0000 |
| GB      |  935599.0000 |
| RU      |  993311.0000 |
| US      |  735568.0000 |
+---------+--------------+
1
fix your db tags please, looks like SQL-SERVER not MYSQLTanner
distinct is not a function! Remove those redundant parentheses to make things clearer, i.e. do count(distinct us.User_ID) instead.jarlh
It is actually mysql :) I deleted the arrows for clearer reading.Suinelas
It's not clear how the query in your question works. How does User_ID IN (count of something GROUP BY something) make sense? Please edit your question to clarify.O. Jones
"isn't just working as intended" is not a problem description. What did you want? What did you get instead? Post expected vs. actual output data.underscore_d

1 Answers

1
votes

I think this is what you want:

select
    country,
    sum(number_of_users) / count(distinct day_of_month) as daily_average_users
from
    (
        select 
           country,
           day(opened)             as day_of_month,
           count(distinct user_id) as number_of_users
        from
           user_session
        where
           opened > current_timestamp - interval 1 month
        group by
           country,
           day_of_month
    ) x
group by 
    country
order by 
    country;

I tested this on MySQL 5.7:

create table user_session
(
    id       int,
    user_id  int,
    opened   timestamp,
    country  varchar(2)
);

insert into user_session (id, user_id, opened, country) values ( 1, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 2, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 3, 100, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 4, 100, '2017-12-21 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 5, 100, '2017-12-22 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 6, 200, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 7, 300, '2017-12-21 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 8, 400, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values ( 9, 500, '2017-12-20 08:00:00', 'NA');
insert into user_session (id, user_id, opened, country) values (10, 600, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (11, 600, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (12, 700, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (13, 800, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (14, 800, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (15, 800, '2017-12-21 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (16, 900, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (17, 900, '2017-12-20 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (18, 900, '2017-12-22 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (19, 900, '2017-12-22 08:00:00', 'DK');
insert into user_session (id, user_id, opened, country) values (19, 1000, '2017-12-22 08:00:00', 'DK');

Results:

+---------+---------------------+
| country | daily_average_users |
+---------+---------------------+
| DK      |              2.6667 |
| NA      |              2.3333 |
+---------+---------------------+
2 rows in set (0.00 sec)

For this to be a proper daily average you would need every day of the month to be represented in the data (otherwise the average is over the number of days represented). If that isn't the case then we need to calculate the number of days in the period being considered.