Using intersect with 2 large sets to get the distinct count - MDX

Question

I have a calculated member which represents an active customer. That would be the following;

WITH MEMBER [Measures].[Active Customers] AS
  Count ( nonempty( Filter (
    ( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
    [Measures].[Turnover] > 0
  ) )  )

This works great, when I want to get active customers in the current period and previous ones, as I get my time dimension, and use the CurrentMember, CurrentMember.PrevMember and CurrentMember with the Lag function in order to get customers who were active in previous periods.

My problem is when I want to get the count of customers, who are common in different members. Say I want to get customers who are active in the current period, and NOT in the previous period. Or another case, active in current, and active in previous. Because of this, I would need to use the INTERSECT function, and my customer dimension has 4 million records. This is already a subset of 9 million records.

So when checking for a customer who is active in 2 consecutive periods, I do this (The Active Previous Period, and Active Current Period is basically the calculated member above, however with CurrentMember and CurrentMember.PrevMember) :

set [Previous Active Customers Set] AS
   Filter (
    ( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
    [Measures].[Active Previous Period] > 0
   ) 

set [Current Active Customers Set] AS
   Filter (
    ( [Customer].[Customer Key].Members, [Measures].[Active Current Period] ),
    [Measures].[Active Current Period] > 0
   )  

member [Measures].[Active 2 consecutive periods] as 
count(INTERSECT([Current Active Customers Set],[Previous Active Customers Set]) )

This takes forever. Is there anyway to improve, or go around this performance problem of using the INTERSECT with large sets? Or maybe optimizations on the MDX query? I tried always using a subset of my customers dimension, but this only reduced the number of records to less than 4 million - so it's still large. Any help would be appreciated!

FrankPl FrankPl · Accepted Answer · 2014-09-23T18:00:03

I would assume you can speed this up if you avoid using named sets and calculated members as far as possible.

One step towards this would be as follows: Create a new fact table with foreign keys just to your customer and time dimension, and add a record to it if a customer was active on that day. Build a measure group, let's say "activeCustomers" based on this table, just using "count" as the measure. But make this invisible, as we do not need it.

Then, you can replace

count( nonempty( Filter (
( [Customer].[Customer Key].Members, [Measures].[Turnover] ),
[Measures].[Turnover] > 0
) ) )

with

count( Exists(
[Customer].[Customer Key].Members,
<state your time selection here>,
"activeCustomers"
) )

Exists should be more efficient than Filter.

Another optimization approach could be the observation that instead of intersecting two sets generated via Filter, you could define one set with a more complex filter, avoiding that AS is looping along the customers twice, and then intersecting the results:

set [Active Customers Set] AS
   Filter (
    ( [Customer].[Customer Key].Members, [Measures].[Active Previous Period] ),
    [Measures].[Active Previous Period] > 0
    AND
    [Measures].[Active Current Period] > 0
   )

Using intersect with 2 large sets to get the distinct count - MDX

1 Answers