0
votes

I want to model a fact table for our users to help us calculate DAU (Daily active Users), WAU (Weekly active users) and MAU (Monthly active users). The definitions of these measures are as follows: 1. DAU are users who is active every day during last 28 days. 2. WAU are users who are active at least on one day in each 7 days period during last 28 days 3. MAU are users who are active at least 20 days during last 28 days

I have built a SSAS cube with my fact table and user dimension table as follows

Fact : { date, user_id, activity_name} Dimension: { date, user_id, gender, age, country }

Now I want to build a cube over this data so that we can see all the measures in any given day for last 28 days.

I thought of initially storing 28 days of data for all users in the SQL server and then do count distinct on date to see which measures they fall into.. but this proved very expensive since the data per day is huge..almost 10 millions rows.

So my next thought was to model the fact table (before moving it to SQL) such that it has a new column called "active_status" which is a 32 bit binary type column.

Basically, I'll store a binary number (or decimal equivalent) like 11000001101111011111111111111 which has a bit set on the days the user is active and off on the days user is not active.

This way I can compress 28 days worth of data in a single day before loading into data mart Now the problem is , I think MDX doesn't support bitwise operations on columns in the expressions for calculated members like regular SQL does. I was hoping to create calculated measures daily_active_users, weekly_active_users and monthly_active_users using MDX that looks at this active_status bit for the user and does bitwise operation to determine the status.

Any suggestions on how to solve this problem? if MDX doesn't allow bitwise, what else can I do SSAS to achieve this.

thanks for the help

Additonal notes: @Frank

Interesting thought about using a view to do the conversion from bitset to a dimension category..but I'm afraid it won't work. Because I have few dimensions connected to this fact table that have many-many relationships..for ex: I have a dimension called DimLanguage and another dimension called DimCountry and they have many-many relationship. And what ultimately I would like to do in the cube is to calculate the DAU/WAU/MAU which are COUNT(DISTINCT UserId) based on the combination of dimensions. So for ex; If a user is not MAU for dimension country US because he is only active 15 days out of 28 ....but he will be considered

1
Do I understand you correctly that a single user has a fix category of either DAU, or WAU, or MAU and this is not dependent on the date? Or do you want to have this bitset for user for each day representing the last 28 days up to this day?FrankPl
no.. an user can be DAU and WAU and MAU at the same. Yes, I want the bitset for user for each day representing the last 31 days up to this day. The MAU definition can change in future from looking at 28 days to the whole month Ie 31 days, hence I want to keep it at 31user330612

1 Answers

2
votes

You do not want to show the bitmap data to the users of the cube, but just the categories DAU, WAU, MAU, you should do the conversion from bitmap to category on data loading time. Just create a dimension table containing e. g. the following data:

id category
-- --------
 1 DAU
 2 WAU
 3 MAU

Then define a view on your fact table that evaluates the bitmap data, and for each user and each date just calculates the id value of the category the user is in. This is then conceptually a foreign key to the dimension table. Use this view instead of the fact table in your cube.

All the bitmap evaluations are thus done on the relational side, where you have the bit operators available.

EDIT As your requirement is that you need to aggregate the bitmap data in Analysis Services using bitwise OR as the aggregation method, I see no simple way to do that.

What you could do, however, would be to have 28 single columns, say Day1 to Day28, which would be either 0 or 1. These could be of type byte to save some space. You would use Maximum as aggregation method, which is equivalent to binary OR on a single bit.

Then, it would not be really complex to calculate the final measure, as we know the values are either zero or one, and thus we can just sum across the days:

CASE
    WHEN Measures.[Day1] + ... + Measures.[Day28] = 28 THEN 'DAU'
    WHEN Measures.[Day1] + ... + Measures.[Day7] >= 1 AND 
         Measures.[Day8] + ... + Measures.[Day14] >= 1 AND
         Measures.[Day15] + ... + Measures.[Day21] >= 1 AND
         Measures.[Day22] + ... + Measures.[Day28] >= 1 THEN 'WAU'
    WHEN Measures.[Day1] + ... + Measures.[Day28] >= 20 THEN 'MAU'
    ELSE                                                     'Other'
END

The order of the clauses in the CASE is relevant, as the first condition matching is taken, and your definitions of WAU and MAU have some intersection.

If you have finally tested everything, you would make the measures Day1 to Day28 invisible in order not to confuse the users of the cube.