1
votes

I`m new to data warehousing, but I think my question can be relatively easy answered. I built a star schema, with a dimension table 'product'. This table has a column 'PropertyName' and a column 'PropertyValue'. The dimension therefore looks a little like this:

surrogate_key | natural_key (productID) | PropertyName | PropertyValue | ...
    1              5                          Size           20          ...
    2              5                          Color          red
    3              6                          Size           20
    4              6                          Material       wood

and so on.

In my fact table I always use the surrogate keys of the dimensions. Cause of the PropertyName and PropertyValue columns my natural key isn`t unique / identifying anymore, so I get way too much rows in my fact table.

My question now is, what should I do with the property columns? Would it be best, to put each property into separate dimensions, like dimension size, dimension color and so on? I got about 30 different properties. Or shall I create columns for each property in the fact table? Or make one dimension with all properties?

Thanks in advance for any help.

2

2 Answers

3
votes

Your dimension table 'product' should look like this:

surrogate_key | natural_key (productID) | Color | Material | Size | ...
    1              5                      red     wood       20     ...
    2              6                      red     ...         

If you have to many properties, try to group them in another dimension. For example Color and Material can be attributes of another dimension if you can have the same product with same id and same price in another color or material. Your fact table can identify product with two keys: product_id and colormaterial_id...

Reading recommendation: The Data Warehouse Toolkit, Ralph Kimball

1
votes

Your design is called EAV (entity-attribute-value) table.

It's a nice design for the sparse matrices (large number of properties with only few of them filled at the same time).

However, it has several drawbacks.

  • It cannot be indexed (and hence efficiently searched) on two or more properties at once. A query like this: "get all products made of wood and having size or 20" will be less efficient.

  • Implementing constraints involving several attributes at once is more complex

  • etc.

If it's not a problem for you, you can use EAV design.