2
votes

I have a simple tabular model consisting of a fact table with approx. 20 mio. records (sales transactions) and a dimension table with approx 600.000 records (customers).

A typical reporting scenario is to get the top 10 customers over some measure in the fact table, possibly filtered by some other criteria (typically a time period, a product, etc.).

In Excel, aggregating all of the 20 mio. records to return a total sales amount is instant. However, once I try to group by customers, it takes some time (15-20 seconds) to retrieve all the data, which makes sense since there is quite a lot of customers (600.000) that needs to be displayed.

Now, if I apply a value filter in Excel, to get only the top 10 customers, it still takes around 15-20 seconds to return the result, which is unacceptable to my users (as they would like to instantly see the top 10 customers while slicing on other attributes such as product, time, etc).

Internally, Excel uses the TOPCOUNT MDX-function when querying the tabular model with a value filter.

Is there anything I can do in the tabular model, to speed up these kinds of queries?

I've tried:

  • Creating a calculated column on the dimension table, containing the total sales amount for each customer. While the performance is better, this is not the way to go because the values in this column can not be sliced by other attributes from the facts, and the column obviously ends up containing many distinct values (which is a bad thing in tabular).
  • Creating a calculated measure on the fact table, using the DAX RANKX function as suggested here. This caused my Tabular instance to crash (too many records in the dimension table?)
  • Executing a simple DAX statement directly on the tabular model using the DAX TOPN function. This was even slower than the TOPCOUNT MDX-approach, by a factor of 3-4.
1
I'm not sure of solution but plan to start playing with PowerPivot soon at work - are you using xl 32bit or 64bit? What ram are you using to process this data?whytheq
We are not using PowerPivot - we are using SQL Server Analysis Services in Tabular-mode (which is basically the same thing but deployed to a server environment). The server in question has 32 GB of ram which is plenty for this model, which only takes up about 1 GB of memory once it has been processed.Dan
Is creating a separate dimension, named TopCustomers an option? Pre-calculating the top X customers by your calculated column in your ETL, and then using this as your ROWS in Excel/MDX queries and your measures in Columns? If you cannot achieve good performance as you are saying above, with your hardware, creating a dimension with your TopCustomers might be the best way to go...Mez
Thanks for the suggestion @Mez, but unfortunately, my end users must be able to find the TopCustomers among varying attributes. The top 10 customers in one month may not be the same as the top 10 in the next month, or on a given product, in a given region, etc...Dan
@Dan I'll be very curious if you find a solution for this problem. I had the same issue and posted a question on DBA.SE regarding it. Seems that we are in the same boat.Taryn♦

1 Answers

1
votes

After searching the web some more and performing additional tests, I have come to the conclusion that SSAS Tabular is inherently bad at answering TOPCOUNT/RANK-kind queries in an unfiltered context. The reason seems to be quite simple:

If I have a dimension containing 600.000 customers, and I ask for the top 10 by Sales Amount in an unfiltered context, the tabular engine needs to compute the sum of Sales Amount for each individual customer, before being able to sort and return the top 10 or the rank of each customer.

In multidimensional OLAP, data is typically preaggregated on a pr. customer level, meaning the multidimensional cube can answer these kinds of queries much faster.

When applying one or more filters in SSAS tabular (for example current month, a specific product, etc.), I saw a significant performance increase. The solution, in my case, is then to educate my users to always filter their data before including the customer dimension in their PivotTables.