0
votes

I'm moving my first steps into datawarehousing.

I've bought the excellent book "The Data Warehouse Toolkit - Third Edition" by Kimball & Ross, which explained me how to grasp the fundamental concepts.
Today I've started designing my second data mart, but I'm already stuck with a (maybe stupid) problem. Suppose I'm modeling a simple sales event: a trivial fact table would be:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | QUANTITY

every dimension has a many-to-many relationship with the others, as explained in the book and on the web.
Next, I'd like to add some more dimensions, like the carrier:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | QUANTITY

the dimensions are still in a many-to-many relation.
Now, I'm asked to add a lot (maybe a dozen or more) of details about the delivery, like a bunch of dates, routing, number of boxes and pallets, various flags etc, so I was thinking about a DELIVERY dimension table. My first attempt was:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | DELIVERY_ID | QUANTITY

but... surprise, the fact table now it's not in a many-to-many relationship anymore. So I thought: "well, I could refactor it, because now the other dimensions are in fact attributes of the delivery", but it would become

DELIVERY_ID | PRODUCT_ID | QUANTITY

and my fact table would just have 2 dimensions.
Now, in other situations I would treat the delivery as a degenerate dimension, but since I have to associate to it a lot of attibutes, I don't which route to follow:

  • create a DELIVERY dimension and refactor the fact table?
  • throw them in the fact table?
  • create a DELIVERY dimension and put the DELIVERY_ID in the fact table pretending that it's only a degenerate dimension?

Maybe it's not so simple to choose between dimensions and facts

1
More free resources for data warehouse design from IBM Redbooks are linked in this SO answer. - Mike Sherrill 'Cat Recall'

1 Answers

0
votes

As you describe it, delivery is a separate event in respect to the sale. So delivery should be a separate fact table.

Of course you could always "project" (so to speak) a fact in a dimension, if you do not need the added complexity. For example, say that you only need to know some simple fact about delivery: for example the carrier and the date of delivery. Then you could use a DELIVERY_ID in SALES and register these information in the DELIVERY dimension.

But if you have to register the full complexity of a delivery (there could be two or more deliveries relative to one sale, and two or more sales relative to one delivery) you need two fact tables.