I'm moving my first steps into datawarehousing.
I've bought the excellent book "The Data Warehouse Toolkit - Third Edition" by Kimball & Ross, which explained me how to grasp the fundamental concepts.
Today I've started designing my second data mart, but I'm already stuck with a (maybe stupid) problem. Suppose I'm modeling a simple sales event: a trivial fact table would be:
DATE_ID | CUSTOMER_ID | PRODUCT_ID | QUANTITY
every dimension has a many-to-many relationship with the others, as explained in the book and on the web.
Next, I'd like to add some more dimensions, like the carrier:
DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | QUANTITY
the dimensions are still in a many-to-many relation.
Now, I'm asked to add a lot (maybe a dozen or more) of details about the delivery, like a bunch of dates, routing, number of boxes and pallets, various flags etc, so I was thinking about a DELIVERY dimension table. My first attempt was:
DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | DELIVERY_ID | QUANTITY
but... surprise, the fact table now it's not in a many-to-many relationship anymore. So I thought: "well, I could refactor it, because now the other dimensions are in fact attributes of the delivery", but it would become
DELIVERY_ID | PRODUCT_ID | QUANTITY
and my fact table would just have 2 dimensions.
Now, in other situations I would treat the delivery as a degenerate dimension, but since I have to associate to it a lot of attibutes, I don't which route to follow:
- create a DELIVERY dimension and refactor the fact table?
- throw them in the fact table?
- create a DELIVERY dimension and put the DELIVERY_ID in the fact table pretending that it's only a degenerate dimension?
Maybe it's not so simple to choose between dimensions and facts