I've been attempting to build a Mondrian schema specifically to use with Pentaho 5.0 (I'm not sure the version matters much here.) One problem I seem to repeatedly come up against, is how to control the presentation of data vs the data itself. Let me offer an example to illustrate.
Imagine a cube such as: (D for Dimension, H for hierarchy, L for level)
D: time
H: default
L: year
L: month
L: day
D: currency
H: default
L: name
L: code
If we think about the members of time.year
, I'm sure we'd all agree they would be ..., 2008, 2009, 2010, 2011, 2012, 2013, ...
. So let's just move on to time.month
. Here things get interesting. Do we represent time.month
as numbers or words? Why not have both?
Mondrian provides a way to specify the name of the member as well as the "caption" of the member, which provides a different value for presentation than the member's name. Great! However if I provide a caption, then in Pentaho, you ONLY see the caption. Never the original member name. How can I let my user choose whichever is more appropriate?
The month level (as well as the day level, and any hierarchy with multiple levels,) cause another source of confusion. If the months are represented as one of 12 values, (numbers or words make no difference here,) then the actual member values are time.[2012].[1], time.[2012].[2], ..., time.[2012][12], time.[2013].[1], ...
. So for June (month 6), there are many members such as ..., time.[2009].[6], time.[2010].[6], time.[2011].[6], ...
. So if the list of members is presented and it only contains the month portion of the member name, then we see 1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,...
. You can't differentiate between equal months. "Just include the year column as well," you say. Yes that makes sense, but there are other places that Pentaho doesn't provide the option to do that, such as in the filtering dialog. I had the idea of including the year in the caption of the member so instead of just 6
or June
, you would see 2012 June
. Unfortunately, this is also less than ideal. If each level of the hierarchy is present, (and suppose we followed this pattern for day as well), then you have each row looking like 2012 | 2012 June | 2012 June 13 | your_measure
. This is of course, silly. But this could arise easily when drilling into a report in Pentaho.
Our second dimension has similar problems. Imagine the data set of world currency types. There's the 3-letter ISO standard currency code, and an official currency name. These two values are 1:1 and fully dependent on each other. Each one is a unique key. There is no actual hierarchical relationship between the two. I see them simply as 2 different representations of the same piece of data. The biggest obstacle here is that if they are not in the same hierarchy then Pentaho freely allows you to place them on opposite axes. This makes ridiculous looking reports like:
United States Dollar | Canadian Dollar | Euro | ...
USD | 12345 | - | - |
CAD | - | 12345 | - |
EUR | - | - | 1234 |
...
The codes are excellent when you desire conciseness. However maybe you're dealing with a specific situation involving several uncommon currencies and you don't want to make the report reader have to look up the meaning of the more obscure codes. I explored the use of <Property>
elements but Pentaho again lacks flexibility in that you MUST display the member column to also display a property value. If the name was a property of the code member, there is no way to display only the currency name in a report without also including the code, which is redundant.
Ultimately, I'm hoping there's some mechanism to control the presentation of data, or some technique in the schema design that results in a sensible, coherent experience for the end user doing analysis in Pentaho.