Mondrian Schema, separation of data and presentation

Question

I've been attempting to build a Mondrian schema specifically to use with Pentaho 5.0 (I'm not sure the version matters much here.) One problem I seem to repeatedly come up against, is how to control the presentation of data vs the data itself. Let me offer an example to illustrate.

Imagine a cube such as: (D for Dimension, H for hierarchy, L for level)

D: time
  H: default
    L: year
    L: month
    L: day
D: currency
  H: default
    L: name
    L: code

If we think about the members of time.year, I'm sure we'd all agree they would be ..., 2008, 2009, 2010, 2011, 2012, 2013, .... So let's just move on to time.month. Here things get interesting. Do we represent time.month as numbers or words? Why not have both?

Mondrian provides a way to specify the name of the member as well as the "caption" of the member, which provides a different value for presentation than the member's name. Great! However if I provide a caption, then in Pentaho, you ONLY see the caption. Never the original member name. How can I let my user choose whichever is more appropriate?

The month level (as well as the day level, and any hierarchy with multiple levels,) cause another source of confusion. If the months are represented as one of 12 values, (numbers or words make no difference here,) then the actual member values are time.[2012].[1], time.[2012].[2], ..., time.[2012][12], time.[2013].[1], .... So for June (month 6), there are many members such as ..., time.[2009].[6], time.[2010].[6], time.[2011].[6], .... So if the list of members is presented and it only contains the month portion of the member name, then we see 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,.... You can't differentiate between equal months. "Just include the year column as well," you say. Yes that makes sense, but there are other places that Pentaho doesn't provide the option to do that, such as in the filtering dialog. I had the idea of including the year in the caption of the member so instead of just 6 or June, you would see 2012 June. Unfortunately, this is also less than ideal. If each level of the hierarchy is present, (and suppose we followed this pattern for day as well), then you have each row looking like 2012 | 2012 June | 2012 June 13 | your_measure. This is of course, silly. But this could arise easily when drilling into a report in Pentaho.

Our second dimension has similar problems. Imagine the data set of world currency types. There's the 3-letter ISO standard currency code, and an official currency name. These two values are 1:1 and fully dependent on each other. Each one is a unique key. There is no actual hierarchical relationship between the two. I see them simply as 2 different representations of the same piece of data. The biggest obstacle here is that if they are not in the same hierarchy then Pentaho freely allows you to place them on opposite axes. This makes ridiculous looking reports like:

     United States Dollar | Canadian Dollar | Euro | ...
USD |               12345 |               - |    - | 
CAD |                   - |           12345 |    - |
EUR |                   - |               - | 1234 |
...

The codes are excellent when you desire conciseness. However maybe you're dealing with a specific situation involving several uncommon currencies and you don't want to make the report reader have to look up the meaning of the more obscure codes. I explored the use of <Property> elements but Pentaho again lacks flexibility in that you MUST display the member column to also display a property value. If the name was a property of the code member, there is no way to display only the currency name in a report without also including the code, which is redundant.

Ultimately, I'm hoping there's some mechanism to control the presentation of data, or some technique in the schema design that results in a sensible, coherent experience for the end user doing analysis in Pentaho.

I guess by now you resolved some of those questions. Would be interested in reading your take on it. — Raffael

Codek Codek · Accepted Answer · 2013-10-22T08:21:12

This is pretty common.

Regarding properties - It is annoying how you can't re-order them in analyzer they seem to be sort of 2nd class citizens. And as they are still not supported or displayed in Saiku it frequently means they can't be used anyway. However in your example that is indeed an intended correct use of a property - a good description in fact!

So there is one solution, but it's not super clean. You can define different hierarchies depending on the user preferences, then use role based security to hide one or other of the hierarchies from the end user.

A variation of a theme I've done with this is to have admin level, senior level and beginner level access to the same cubes where you see different levels and hierarchies depending on permissions.

I havent had time to go into Mondrian4 much which may improve things here as everything is now simply an attribute - but I'm not 100% sure.

Finally I would definately raise this with support (sounds like you have a support contract) and see what improvements can be made. Post the jira here i'll definately upvote it!

Mondrian Schema, separation of data and presentation

1 Answers