Imperformant sum of distinct values

Question

I have a tabular table in SSAS containing the stock with a stock date , an SKU and other attributes.

+----------+-----+
| Date     | SKU |
+----------+-----+
| 20180701 | 1   |
+----------+-----+
| 20180701 | 2   |
+----------+-----+
| 20180701 | 3   |
+----------+-----+
| 20180702 | 1   |
+----------+-----+
| 20180702 | 3   |
+----------+-----+
| 20180703 | 3   |
+----------+-----+

Per day I would like to aggregate the data by counting the distinct sku values:

CntSKU:= DISTINCTCOUNT([SKU])

However, per month I would like to sum the daily values and divide them by the days (like an average over days). The DAX seems not so difficult:

 AvgSKU:= DIVIDE(
            sumx(SUMMARIZE(Table;Table[Date];"CntSKU";DISTINCTCOUNT(Table[SKU]));[CntSKU])
            ;DISTINCTCOUNT(Table[Date])
            ;BLANK()
            )

The performance is quite bad, though. For one month, the query time for CntSKU per day is around 7.2 seconds. The AvgSKU-measure takes around 9.5 seconds to calculate. For a bigger datasets over a couple or for several calculations inside Excel Pivot this calculation time s a more fatal lever.

The subsequent question is: Is there a better way to calculate the AvgSKU - Measure?

To be clear, you're saying just the basic DISTINCTCOUNT(Table[SKU]) calculation takes over 7 seconds? How many rows is your data table? Like 100 million rows? — Alexis Olson
The basic Distinct count takes around 1.5 seconds, grouped by Date it takes around 7 seconds however. My sample data consists of 130 mio rows. It's just a month. The whole TM-tables contain 2 years and the size is around 25 GB. — rasenkantenstein
I'm guessing you're just seeing the effects of trying to do calculations on large datasets. Can you upstream any of the aggregation or does it have to be done dynamically with DAX? — Alexis Olson
Possibly. I just can't understand (yet) why the distinct cound by itself is two seconds faster than the distinct count over days divided by the number of days... I thought there might be a way to tune the DAX statement in order to gain more performance. — rasenkantenstein

RADO RADO · Accepted Answer · 2018-07-11T13:50:35

Maybe I misunderstand your goal, but it seems that your Dax formula is over-complicated. I would try these measures:

SKU Count = DISTINCTCOUNT(Table[SKU])

Date Count = DISTINCTCOUNT(Table[Date])

SKU Count per Date = DIVIDE( [SKU Count], [Date Count])

Imperformant sum of distinct values

2 Answers