0
votes

I am using Mondrian with Pentaho and Saiku to do OLAP analysis on a MySQL database. I have a data warehouse with 2 dimensions (beneficiary and member) linking to a fact table. Beneficiary has the following fields: beneficiary_type1, beneficiary_type2, beneficiary_type3. Member has the a field for gender.

I have created a Mondrian schema defined with the following:

<Dimension type="StandardDimension" visible="true" foreignKey="beneficiary_id" highCardinality="false" name="beneficiary">
  <Hierarchy visible="true" hasAll="true" primaryKey="id">
    <Table name="beneficiary">
    </Table>
    <Level name="beneficiary_type1" visible="true" column="beneficiary_type1" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
    </Level>
    <Level name="beneficiary_type2" visible="true" column="beneficiary_type2" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
    </Level>
    <Level name="beneficiary_type3" visible="true" column="beneficiary_type3" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
    </Level>
  </Hierarchy>
</Dimension>

<Dimension type="StandardDimension" visible="true" foreignKey="member_id" highCardinality="false" name="member">
  <Hierarchy visible="true" hasAll="true" primaryKey="id">
    <Table name="member">
    </Table>
    <Level name="gender" visible="true" column="gender" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
    </Level>
  </Hierarchy>
</Dimension>

I want to be able to view (separately): the amount of males / females for beneficiary_type1, the amount of males / females for beneficiary_type2, the amount of males / females for beneficiary_type3,

If I drag gender over beneficiary_type1 then I see the following, which is correct:

olap1

If I drag gender over beneficiary_type2 then I see the following, which is incorrect:

olap2

As this is the beneficiary_type2 column, grouped by benefiiary_type1 as can be seen when adding both the fields:

enter image description here

When viewing gender over beneficiary_type2, I expect to see only 2 rows, namely one for "Adults" and another for "Children to 19". From what I have read it seems that the uniqueMembers attribute should be set on the beneficiary_type2 level, but this results in the following:

enter image description here

Which yields the correct numbered results but the rows are still displayed as if grouped by beneficiary_type1. Also, this way does not allow for the generation of beneficiary_type2 grouped correctly under beneficiary_type1 as in the third image (the numbered results are never grouped according to the parent level).

How should I structure the schema to have the number of rows determined by the parent level that I have included in the analysis? (dragged into the cube using saiku) i.e. when dragging in beneficiary_type1 and beneficiary_type2 then beneficiary_type2 is grouped according to beneficiary_type1 (like the third image), if only beneficiary_type2 then it is grouped according to its own unique values (2 rows, one for "Adults" and one for "Children to 19").

I am relatively new to OLAP, so there might be some basic concepts that I do not understand. Please don't hesitate to branch out with any explanation.

-------------------- Update --------------------

As explained by @nsousa, members inside the same hierarchy imply a parent child relationship. Would a correct change to the schema be the following?

<Dimension type="StandardDimension" visible="true" foreignKey="beneficiary_id" highCardinality="false" name="beneficiary">
  <Hierarchy visible="true" hasAll="true" primaryKey="id">
    <Table name="beneficiary">
    </Table>
    <Level name="beneficiary_type1" visible="true" column="beneficiary_type1" type="String" uniqueMembers="true" levelType="Regular" hideMemberIf="Never">
    </Level>
  </Hierarchy>
  <Hierarchy visible="true" hasAll="true" primaryKey="id">
    <Table name="beneficiary">
    </Table>
    <Level name="beneficiary_type2" visible="true" column="beneficiary_type2" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
    </Level>
  </Hierarchy>
  <Hierarchy visible="true" hasAll="true" primaryKey="id">
    <Table name="beneficiary">
    </Table>
    <Level name="beneficiary_type3" visible="true" column="beneficiary_type3" type="String" uniqueMembers="false" levelType="Regular" hideMemberIf="Never">
    </Level>
  </Hierarchy>
</Dimension>

Am I correct in understanding that an alternative solution would be to load 3 separate dimensions for the beneficiary table? Would this imply that 3 separate tables (one for each beneficiary member) need to exist in the database as well, or can the same table be used for multiple dimensions? Obviously, having a database table for each member is not ideal, are there any advantages of doing it this way?

1

1 Answers

2
votes

Your beneficiary_type2 level is a descendant of beneficiary_type1. As such, even if the member names match, two "Adults" on level 2 are not the same member unless they also have the same parent.

As they have different parents, they are cousins that share a name. Hence, you get multiple rows.

The only safe way to do what you want is to split the various levels into multiple dimensions/hierarchies instead of having a parent-child hierarchy.

On different hierarchies, there's no relationship between type1 and type2, and you get only the two distinct strings you want.