When keying a FACT table in a data warehouse, is it better best to use the primary key from the foreign table or the unique key or identifier used by the business?
For example (see below illustration), assume you have two dimension tables "DimStores" and "DimCustomers" and one FACT table named "FactSales". Both of the dimension tables have an indexed primary key field that is an integer data type and is named "ID". They also have an indexed unique business key field that is a alpha-numeric text data type named "Number".
Typically you'd use the primary key of dimension tables as the foreign keys in the FACT table. However, I'm wondering if that is the best approach.
By using the primary key, in order to look up or do calculations on the facts in the FACT table, you'd likely have to always do a join query on the primary key and use the business key as your look up. The reason is because most users won't know the primary key value to do a lookup in the FACT table. They will, however, likely know the business key. Therefore to use that business key you'd have to do a join query to make the relationship.
Since the business key is indexed anyway, would it be better to just use that as the foreign key in the FACT table? That way you wouldn't have to do a join and just do your lookup or calculations directly?
I guess it boils down to whether join queries are that expensive? Imagine you're dealing with a billion record FACT table and dimensions with tens of millions of records.
Example tables:
DimStores:
+------------+-------------+-------------+
| StoreId | StoreNumber | StoreName |
+------------+-------------+-------------+
| 1 | S001 | Los Angeles |
| 2 | S002 | New York |
+------------+-------------+-------------+
DimCustomers:
+------------+----------------+--------------+
| CustomerId | CustomerNumber | CustomerName |
+------------+----------------+--------------+
| 1 | S001 | Michael |
| 2 | S002 | Kareem |
| 3 | S003 | Larry |
| 4 | S004 | Erving |
+------------+----------------+--------------+
FactSales:
+---------+------------+------------+
| StoreId | CustomerId | SaleAmount |
+---------+------------+------------+
| 1 | 1 | $400 |
| 1 | 2 | $300 |
| 2 | 3 | $200 |
| 2 | 4 | $100 |
+---------+------------+------------+
In the above to get the total sales for the Los Angles store I'd have to do this:
Select Sum(SaleAmount)
From FactSales FT
Inner Join DimStores D1 ON FT.StoreId = D1.StoreId
Where D1.StoreNumber = 'S001'
Had I used the "StoreNumber" and "CustomerNumber" fields as the foreign keys instead in the "FactSales" table. I wouldn't have had to do a join query and could have directly done this instead:
Select Sum(SaleAmount)
From FactSales
Where StoreNumber = 'S001'