I'm trying to model my data warehouse using a star schema but I have a problem to avoid joins between fact tables.
To give a trivial idea of my problem, I want to collect all the events who occur on my operating system. So, I can create a fact table event
with some dimensions like datetime
or user
. The problem is I want to collect different kinds of event: hardware event and software event.
The problem is those events have not the same dimensions. By instance, for a hardware event, I can have physical_component
or related_driver
dimensions and, for a software event, software_name
or online_application
dimensions (that is just some examples, the idea to keep in mind is the fact event
can be specialized into some specific events with specific dimensions).
In a relational model, I would have 3 tables organized like that:
The problem is : how to handle joins between fact tables in a star schema?
I imagined 4 ideas but I'm not sure one of them are adapted to the situation.
The first one is to keep the model used in a relational database and add the dimension tables like that:
In this case, the problem is we still have join between fact tables and need to use
JOIN
SQL statement in all of our queries.
The second one is to create only 2 fact tables who will duplicate the shared dimensions (datetime and user) and to create a materialized view event who summarized all the events:
The problem here is: what happen if I want to make a query on the materialized view? According to what I read in the Oracle documentation, we don't have to make query directly on materialized view but we have to let the query rewrite process make its work.
The third one is to create only one fact table who will contain all the information possible for an event (hardware or software):
This time, the problem is my fact table will contain a lot of
NULL
value.
And the last one is to create 3 fact tables (without materialized view this time) like this:
This time, the problem is all events are present in the fact table
event
and in its own table. Because we will store a huge quantity of data, I'm not sure this duplication is a good idea.
So what is the best solution? Or does it exist a fifth solution?