1
votes

I'm a newbie to data warehousing and I've been reading articles and watching videos on the principles but I'm a bit confused as to how I would take the design below and convert it into a star schema.

In all the examples I've seen the fact table references the dim tables, so I'm assuming the questionId and responseId would be part of the fact table? Any advice would be much appreciated.

enter image description here

3
Just based on cardinalities, responses is many to everything so it's clearly a fact table. If questions is a finite list of textual questions, then clearly it is a dimension table. You would generally combine questions with options to make a single denormalised dimension about questions. Once you combine your options and questions, you actually end up with three tables in a star schema. - Nick.McDermaid
I guess the measures in your fact table would be question count, correct count, incorrect count - Nick.McDermaid

3 Answers

2
votes

I can't see the image at the moment (blocked by my firewall @ the office). but I'll try to give you some ideas.

The general idea is to organize your measurable 'facts' into what are called fact tables. There are 3 main types of facts, but that is a topic for a different day (but I'd be happy to go into this if needed). Each of these facts are what you'd see in the center of typical 'star schema'. The other attributes within the fact tables are typically FK references to the dimension tables.

Regarding dimensions, these are groups of attributes that share commonality (the most notable being a calendar dimension). This is important because when you're doing analysis across multiple facts the dimensions are what you use to connect them.

If you consider this simple example: A product is ordered and then shipped. We could have 2 transaction facts (one that contains the qty ordered - measure, type of product ordered - dimension, and transaction date - dimension). We'd also have a transaction fact for the product shipping ( qty shipped - measure, product type - dimension, and ship date - dimension). This simple schema could be used to answer questions like 'how many products by product type last quarter were ordered but not shipped'.

Hopefully this helps you get started.

0
votes

Usually a fact table is used to aggregate measures - which are always numeric. Examples would be: sales dollars, distances, weights, number of items sold.

The type of data you drew here doesn't have any cut and dry "measure" so you need to decide what you want to measure. Is the number of answers per question? Is it how many responses per sample?

This is often called an Event Fact table (if you want to search for other examples). And you need some sort of reporting requirements before you can turn it into a star schema. So it isn't an easy answer...

0
votes

It's so easy :) Responses is fact, all other is dimensions. And your schema is now star designed, because you can directly connect fact with all dimensions. Example, when you need to redesign its structure where addresses stored in separate table and related with sample. You must add address table id into responses table for get star schema.