1
votes

I have a DAX formula that is performing really bad and hopefully someone here can suggest a solution.

I have a table that contains about 400000 rows of data. ProductID's (example field), startdate, enddate and an IsActive flag field. The data out of this table should be reported in several ways. In some reports I want to see all of the active products within a selected period of time and in other reports, I only want to see the number of products that were active on the last day of the month.

So, I have created two DAX queries to calculate this.

First I calculate the active products:

_Calc_Count Fields :=
CALCULATE (
    DISTINCTCOUNT ( MyFactTable[ProductID] ),
    FILTER (
        MyFactTable,
        MyFactTable[StartDate] <= CALCULATE ( MAX ( 'Date'[Date] ) )
            && MyFactTable[EndDate] >= CALCULATE ( MIN ( 'Date'[Date] ) )
    ),
    MyFactTable[IsActive] = 1
)

Please be aware of the fact that the report this calculation is used in can also contain a date range (even a whole year (or multiple years) can be selected with a startdate and enddate selected in the filter). The report also slices on other filters like Client Group.

Then I have a second calculation that uses the first one and applies the LASTNONBLANK function:

Last Non Blank Value :=
CALCULATE (
    [_Calc_Count Fields],
    LASTNONBLANK ( 'Date'[Date], [_Calc_Count Fields] )
)

Both calculations are very, very slow.

Can anyone suggest a better approach? Can the DAX formula be optimized or should it completely be rewritten?

ps. I am using Analysis Services Tabular Model.

Thank you all in advance for your responses!

1

1 Answers

0
votes

there are many points to consider for optimizing.

First of all, you need to understand where is the bottleneck.
I would do three separate preliminary tests:
A) change the DISTINCTCOUNT with a simple COUNT
B) Remove the FILTER
C) Remove the IsActive

Then you can understand where to prioritize your effort, however there are some very simple general optimization you can do anyway:

1.Make use of variables, therefore the formula becomes:

 _Calc_Count Fields 3:=
VAR _startdate = CALCULATE ( MAX ( 'Date'[Date] ) )
VAR _enddate = CALCULATE ( MIN ( 'Date'[Date] ) )
RETURN
CALCULATE (
    DISTINCTCOUNT ( MyFactTable[ProductID] ),
    FILTER (
        MyFactTable,
        MyFactTable[StartDate] <= _startdate 
            && MyFactTable[EndDate] >= _enddate
    ),
    MyFactTable[IsActive] = 1
)

2.If you use as first parameter of FILTER an entire Fact Table, Storage Engine will load in memory the Expanded Table which is very expensive. Therefore, as a second step the formula should become:

_Calc_Count Fields 2:=
VAR _startdate = CALCULATE ( MAX ( 'Date'[Date] ) )
VAR _enddate = CALCULATE ( MIN ( 'Date'[Date] ) )
RETURN
CALCULATE (
    DISTINCTCOUNT ( MyFactTable[ProductID] ),
    MyFactTable[StartDate] <= _startdate && MyFactTable[EndDate] >= _enddate,
    MyFactTable[IsActive] = 1
)

Next, based on the preliminary test you can decide where to invest your effort.
The issue is the DISTINCTCOUNT:
- explore some alternative algorithms for approximating DISTINCTCOUNT (HIGH EFFORT)
- try to sort in the data source (back-end) the table by ProductId to allow better compression in AAS
- make sure ProductId is a Integer Data type with Encoding Hint: Value

The issue is in the FILTER:
- Try to change the "&&" with "," (LOW EFFORT)
- Investigate the cardinality of StartDate and EndDate. If they are DateTime, remove the Time part. (LOW EFFORT)
- Try to change the datasource in the back-end and sort by useful fields (for example, StartDate asc, so when AAS will read the table might perform better compression (LOW EFFORT)
- Make sure StartDate and Date are Whole Number data types, with Encoding Hint: Value (LOW EFFORT)