I have a simple tabular model consisting of a fact table with approx. 20 mio. records (sales transactions) and a dimension table with approx 600.000 records (customers).
A typical reporting scenario is to get the top 10 customers over some measure in the fact table, possibly filtered by some other criteria (typically a time period, a product, etc.).
In Excel, aggregating all of the 20 mio. records to return a total sales amount is instant. However, once I try to group by customers, it takes some time (15-20 seconds) to retrieve all the data, which makes sense since there is quite a lot of customers (600.000) that needs to be displayed.
Now, if I apply a value filter in Excel, to get only the top 10 customers, it still takes around 15-20 seconds to return the result, which is unacceptable to my users (as they would like to instantly see the top 10 customers while slicing on other attributes such as product, time, etc).
Internally, Excel uses the TOPCOUNT
MDX-function when querying the tabular model with a value filter.
Is there anything I can do in the tabular model, to speed up these kinds of queries?
I've tried:
- Creating a calculated column on the dimension table, containing the total sales amount for each customer. While the performance is better, this is not the way to go because the values in this column can not be sliced by other attributes from the facts, and the column obviously ends up containing many distinct values (which is a bad thing in tabular).
- Creating a calculated measure on the fact table, using the DAX
RANKX
function as suggested here. This caused my Tabular instance to crash (too many records in the dimension table?) - Executing a simple DAX statement directly on the tabular model using the DAX
TOPN
function. This was even slower than theTOPCOUNT
MDX-approach, by a factor of 3-4.