Optimizing Dax & model for "where date between" type queries

Question

I am building a model to allow reporting on two seperate datasets, for this example we'l say a Students dataset & a Staff dataset.

The datasets are pretty seperate and the only real link between the two is Date, so from a model perspective, there is a Students star schema & a Staff Star Schema.

The data displayed is snapshot type data, answering questions like: - For a selected date, show all active employees - for a selected date, show all enrolled students

This means that when a single date is selected, the model then finds all employees where the selected date falls within the employment start & end date , and finds all students where the selected date falls within the enrolled start & end date.

This means i had to make a decision, how to return the correct data from each schema with a single date dimension. Creating a relationship would not work as relationships in Tabular dont allow "between" type queries, so i instead have one unrelated Date Dimension and the Dax for each model finds applicable rows.

The problem is that its not the most performant. for perhaps 50k rows, adding a measure can take 5-10 seconds.

Im asking if there is a better way to either write the queries, or alter the model to still let me do "between" style queries but give better performance.

Below is an example of a dax query to return all students that were enrolled on a particular date.

Thanks for any advice.

All Enrolled Students:=IF (
HASONEVALUE ( 'Date'[Date] ),
CALCULATE (
    DISTINCTCOUNT ( 'Students'[StudentID] ),
    FILTER (
        'Students',
        'Students'[StudentStartDateID] <= MIN ( 'Date'[DateID] )
            && 'Students'[StudentEndDateID] >= MAX ( 'Date'[DateID] )
    )
),
BLANK ())

In order to see if it is possible you have to add a MCVE, tables with sample data could be useful. — alejandro zuleta

BI_Badger BI_Badger · Accepted Answer · 2016-09-09T13:58:38

Unrelated or "disconnected" tables are good for powering slicers, timelines, and filters in certain situations. As you said in your question, you have two optimization options: Re-structure your data set or optimize the existing measure syntax.

Re-Structure Dataset

Duplicate each row for every day between start and end dates with a column for that iterated date. This can be done a handful of ways depending on how you get your dataset, but could be tedious. Then, relate your tables on this iterated date and use the relation to filter from DATE to FACT. If this is a recurring report and/or you are using SQL to pull the data, this might be worth it to make use of PowerPivot's relational calculation power.

Optimize DAX statement

If this is a one-off request or the dataset would be too tedious to duplicate out by day, then stick with the disconnected table approach and clean up the measure syntax. Since you have already included the MIN() and MAX() functions and your CALCULATE() is returning DISTINCTCOUNT(), the conditional HASONEVALUE() function is unnecessary. I ran this in a simulated environment and had good results, but that can vary with computer performance and dataset size. See below for cleaned syntax.

All Enrolled Students:=CALCULATE (
    DISTINCTCOUNT('Students'[StudentID]),
    FILTER(
        'Students',
        'Students'[StudentStartDateID]<= MIN('Date'[DateID]) &&
        'Students'[StudentEndDateID]  >= MAX('Date'[DateID])
    )
)

If your StudentID column is unique, which would make sense to me, you can further speed this up.

All Enrolled Students:=CALCULATE (
    COUNT('Students'[StudentID]),
    FILTER(
        'Students',
        'Students'[StudentStartDateID]<= MIN('Date'[DateID]) &&
        'Students'[StudentEndDateID]  >= MAX('Date'[DateID])
    )
)

If StudentID is not a number replace COUNT() with COUNTA() to get the desired effect.

Optimizing Dax & model for "where date between" type queries

2 Answers

Re-Structure Dataset

Optimize DAX statement