7
votes

I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while maintaining the fact table. Is a series of INSERT INTO statement within giant stored proc with 20 params my only option (and how to populate the fact table). Many thanks.

2

2 Answers

10
votes

Start with dimensions first -- one by one. Use ECCD (Extract, Clean, Conform, Deliver) approach.

Make sure that each dimension has a BusinessKey that uniquely identifies the "object" that a dimension row describes -- like email for a person.

With dimensions loaded, prepare key-lookup pipeline. In general, for each each dimension table you can prepare a key lookup table (BusinessKey, PrimaryKey). Some designers choose to lookup the dimension table directly, but the key-lookup can be often easily cached into memory which results in faster fact loading.

Use ECCD for fact data too. The ECC part happens in the staging area, you can choose (helper) tables or flat files for each step of the ECC, as you prefer.

While delivering fact tables, replace each BusinessKey in the fact row with the matching PrimaryKey that you get from a key-lookup table. Once all BusinessKeys are replaced with their matching PrimaryKeys, insert the row into the fact table.

Do not waste you time, use ETL tool. You can download Pentaho Kettle (community edition) for free -- it has everything one needs to achieve this.

4
votes

You typically do not insert data into a star schema in the same way you might into a normal form - i.e. with a stored procedure which inserts/updated all the appropriate tables within a single transaction. Remember that the star schema is typically a read-only denormalized model of data - it is (rarely) treated transactionally, and is typically loaded from data that is already denormalized flat - usually one flat file per star.

As Damir points out, typically, you load all the dimensions (handle the slowly changing etc), then load the facts, joining to the appropriate current dimensions to find the dimension IDs (using the business keys).