Summary: Most of the examples I've seen of MDX joins have involved joining relatively small sets, say with tens or hundreds of items each. But I find myself also wanting to try joining (in particular "non-empty joining") sets that have thousands or tens of thousands of items each, and it's not working well so far. I'm wondering if this could be made to work, or if I perhaps need to consider using something other than Mondrian/OLAP.
To be concrete, I have a cube that records interactions between Firms (n=7000) and Clients (n=27000). Currently both Firm and Client are completely flat hierarchies; there's the All level and the individual-company level, with no other levels in between. There is a central fact table, and separate dimension tables for Firms and for Clients.
My users at least appear to want to get summary reports along these lines, aggregating all the non-empty interactions between Firms and Clients:
select
[Measures].[Amount] on columns,
NonEmptyCrossJoin([Firm].Children,
[Client].Children) on rows
from MyCube
But this query and variations on it don't work in my test Mondrian setup. Either I get an OutOfMemoryException (on a 2GB Java heap), or Java seems to spend impossibly long time in mondrian.rolap.RolapResult$AxisMember.mergeTuple(TupleCursor). (I can provide a more complete stack trace if it would help.) By "impossibly long" I mean Java will stay slaving away at the query for hours and hours before I give up.
I initially expected the above query to perform ok, because conceptually it could be done somewhat efficiently by just doing a SQL query along these lines:
select Firm, Client, Sum(Amount) as n
from fact, firm, client
where fact.firmid = firm.firmid and fact.clientid = client.clientid
group by Firm, Client
(In fact, if I execute something like this directly in MySql it doesn't take more than 15sec to execute.)
But from the debug logs Mondrian doesn't seem to attempt this optimization. Instead it appears to be doing the join internally, and in a way that ends up being particularly slow. I've set mondrian.native.crossjoin.enable=true in my mondrian.properties, but this doesn't seem like one of the join types that Mondrian is able to "make native". (If I turn on mondrian.native.unsupported.alert=ERROR then I get the corresponding exception.)
I'm left wondering whether I need to prevent my users from attempting joins on such large dimensions/sets, or whether Mondrian is maybe not the tool I'm looking for here. But maybe I'm just doing something wrong.