Typical MDX examples for "give me the top 5 Bs for each A" look like this:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
NON EMPTY generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount([Clients].[Client Name].Children, 5, [Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
I'm prototyping a UI that does a lot of these "top 5" type queries, so I'm looking for any tips to speed them up in particular, especially when crossjoin(A, B) has mostly null measure values.
In this particular case Firms x Clients was big (n=5000 x n=20,000, more or less) and sparse, and I was able to speed things up by a factor of about 100 by replacing the NON EMPTY with a filter(NOT ISEmpty) inside the crossjoin:
-- return top 5 clients for each firm
select
[Measures].[Amount] on columns,
generate(
[Firms].[Firm Name].Children,
crossjoin(
[Firms].[Firm Name].CurrentMember,
TopCount(
filter([Clients].[Client Name].Children, NOT IsEmpty([Measures].[Amount]))
5,
[Measures].[Amount])
)
)
on rows
from [FirmsAndClients]
I was hoping to get a further performance benefit from pre-warming the SSAS caches by running similar queries, but I've discovered the above query (i.e. the one with "filter") is equally slow when run against warm and cold caches. Playing with Sql Profiler, I've discovered part of why this might be: While SSAS is caching portions of the underlying cube data, it doesn't seem to cache the results from the query-as-a-whole, nor does it seem to cache the intermediate sets created by generate or by crossjoin. Thus it has to redo the generate and crossjoin and topcount each time I repeat the query. And even even though each topcount appears to take 1ms or less, those milliseconds add up when iterating across thousands of Firms.
Any tips on what I can do, at the MDX level or the cube level or the Sql Server tuning level? In the SQL world I could get some mileage out of creating an index on the columns I wanted to sort by. As far as I know there's nothing like that for OLAP, though.
Conceivably this is a more general question of what to do when you want to speed up an MDX query that involves complicated sorting and filtering (which apparently doesn't sit 100% well with the SSAS caching system).