0
votes

I am trying to understand how fact tables are form in relation to the dimension tables.

E.g. Sale Fact Table For there is a query for Sale of product by year/month/week/day, do I create a dimension for each type of period: Dim_Year, Dim_Month, Dim_Week and Dim_Day, each with their own respective keys? Or is it possible to just use one dimension for all periods: Dim_Date and only have one date key?

Another area I am confused about is that why do some fact tables not contain their own ID? E.g. Sale fact table does not have SaleID included in the fact table.

Sale Fact Table Textbook Example

1

1 Answers

1
votes

DATES

Your date dimension needs to correspond to the grain of your fact table. So if you had daily sales you would have a Dim_Day, weekly sales you would have a Dim_Week, etc.

You would normally have multiple date dimensions (at different grains) in your data warehouse as you would have facts at different date grains.

Each date dimension would hold hold attributes applicable to levels higher up in the date hierarchy. So a Dim_Day might hold day, week, month, year attributes; Dim_Month might hold month, quarter and year attributes, etc.

PRIMARY KEYS

Primary keys are rarely (never?) a technical requirement when creating tables in a database i.e. you can create a table without defining a PK. So you need to consider why we normally (at least in OLTP DBs) include PKs. Common reasons include:

  • To easily identify an individual record
  • To ensure that duplicate records (those with the same PK value) are not created

So there are good reasons for creating PKs, however there are cost overheads e.g. the PK needs to be checked every time a new record is inserted into the table.

In a dimensional model where you are performing bulk inserts/updates, having PKs would cause a significant performance hit. Additionally, the insert logic/checks should always be implemented in your ETL processes so there is no need to include these types of checks/constraints in the DB itself.

Fact tables do have a primary key but it is often implicit rather than explicit - so a group of the FKs in the fact table uniquely identify each record. This compound PK may be documented but is is never enabled/implemented.

Occasionally a fact table will have an explicit, single column, PK. This is normally used when the fact table needs to be updated and its implicit PK involves a large number of columns. There is normally logic required to identify the record to be updated using its FKs but this returns the PK; then the update statement just has a clause like this:

WHERE table_pk = 12345678

rather than having to include all the columns in the implicit PK:

WHERE table_sk1 = 1234
AND table_sk2 = 5678
AND table_sk3 = 9876
....

Hope this helps?