Calculating an average of a DISTINCTCOUNT efficiently in Dax?

Question

I'm trying to calculate a business-logic in DAX which has turned out to be quite resource-heavy and complex. I have a very large PowerPivot model (call it "sales") with numerous dimensions and measures. A simplified view of the sales model:

+-------+--------+---------+------+---------+-------+
| State |  City  |  Store  | Week | Product | Sales |
+-------+--------+---------+------+---------+-------+
| NY    | NYC    | Charlie |    1 | A       | $5    |
| MA    | Boston | Bravo   |    2 | B       | $10   |
| -     | D.C.   | Delta   |    1 | A       | $20   |
+-------+--------+---------+------+---------+-------+

Essentially what I'm trying to do is calculate a DISTINCTCOUNT of product by store and week:

SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))

+---------+------+-------------------+
|  Store  | Week | Distinct Products |
+---------+------+-------------------+
| Charlie |    1 |                15 |
| Charlie |    2 |                 7 |
| Charlie |    3 |                12 |
| Bravo   |    1 |                20 |
| Bravo   |    2 |                14 |
| Bravo   |    3 |                22 |
+---------+------+-------------------+

I then want to calculate the AVERAGE of these Distinct Products at the store level. The way I approached this was by taking the previous calculation, and running a SUMX on top of it and dividing it by distinct weeks:

SUMX(
SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))
,[Distinct Products]
) / DISTINCTCOUNT([Week])

+---------+------------------+
|  Store  | Average Products |
+---------+------------------+
| Charlie | 11.3             |
| Bravo   | 18.7             |
+---------+------------------+

I stored this calculation in a measure and it worked well when the dataset was smaller. But now the dataset is so huge that when I try to use the measure, it hangs until I have to cancel the process.

Is there a more efficient way to do this?

Leonard Leonard · Accepted Answer · 2017-04-12T21:52:57

SUMX is appropriate in this case since you want the distinct product count calculated independently for each store & for each week, then summed together by store, and then divided by the number of weeks by store. There's no way around that. (If there was, I'd recommend it.)

However, SUMX is an iterator, and so is the likely cause of the slowdown. Since we can't eliminate the SUMX entirely, the biggest factor here is the number of combinations of stores/weeks that you have.

To confirm if the number of combinations of stores/weeks is the source of the slowdown, try filtering or removing 50% from a copy of your data model and see if that speeds things up. If that doesn't time out, add more back in to get a sense of how many combinations are the failing point.

To make things faster with the full dataset:

You may be able to filter to a subset of stores/weeks in your pivot table, before dragging on the measure. This will typically get faster results than dragging on the measure first, then adding filters. (This isn't really a change to your measure, but more of a behaviour change for users of your model).
You might want to consider grouping at a higher level than week (e.g. month), to reduce the number of combinations it has to iterate over
If you're running Excel 32-bit, or only have 4GB of RAM, consider 64-bit Excel and/or a more powerful machine (I doubt this is the case, but am including for comprehensiveness - Power Pivot can be a resource hog)
If you can move your model to Power BI Desktop (I don't believe Calculated Tables are supported in Power Pivot), you could extract out the SUMMARIZE into a calculated table, and then re-write your measure to reference that calculated table instead. This reduces the number of calculations the measure has to perform at run-time, as all the combinations of store/week plus the distinct count of products will be pre-calculated (leaving only the summing & division for your measure to do - a lot less work).

.

Calculated Table =
SUMMARIZE (
    Sales,
    [Store],
    [Week],
    "Distinct Products", DISTINCTCOUNT ( Sales[Product] )
)

Note: The calculated table code above is rudimentary and is mostly designed as a proof of concept. If this is the path you take, you'll want to make sure you have a separate store dimension to join the calculated table to, as this won't join to the source table directly

Measure Using Calc Table =
SUMX (
    'Calculated Table',
    [Distinct Products] / DISTINCTCOUNT ( 'Calculated Table'[Week] )
)

Jason Thomas has a great post on calculated tables and when they can come in useful here: http://sqljason.com/2015/09/my-thoughts-on-calculated-tables-in.html.

If you can't use calculated tables, but your data is coming from a database of some form, then you could do the same logic in SQL and then import a pre-prepared separate table of unique store/months and their distinct counts.

I hope some of this proves useful (or you've solved the problem another way).

Calculating an average of a DISTINCTCOUNT efficiently in Dax?

1 Answers