Just wanted to know whether surrogate keys need to be ordered explicitly on the natural key? Does the Same surrogate key always need to be binded with the natural key when a truncate and reload is used? or does it not matter, I have some big tables that use delta load which basically just does inserts and updates. I don't want to be ordering the data to ensure surrogate key and natural key always bind if they don't need to? Isnt that why they are nonsense keys?
3 Answers
The actual numeric value of a surrogate key has nothing to do with the natural key or other fields in the record. That said, once you assign a surrogate key to a record, you should never break that link or risk leaving orphaned fact records in your data.
You can see this most clearly in a slowly changing dimension table that has multiple versions of some natural keys.
sur_key nat_key description version valid_from valid_through
1 105 UK Office 1 1900-01-01 2017-02-16
2 108 FR Office 1 1900-01-01 2099-12-31
3 109 NL Office 1 1900-01-01 2099-12-31
4 105 UK/IRL Office 2 2017-02-16 2099-12-31
5 102 DK Office 1 1900-01-01 2099-12-31
As you can see, a new version of natural key 105 just gets the next surrogate key and the old record stays in place. A late arriving key 102 also just gets the next key.
Any ordering of natural keys only happens in an index on that column, never in the table itself.
Surrogate keys and natural keys generally should not have any direct relationship. It would be a maintenance nightmare to try to keep them aligned and you would be constantly having to re-assign the keys as new data gets added.
After a truncate and reload of your key table, your dim records may end up with a different SK, requiring your fact record to be updated/reloaded as well.
If that is a recurring scenario, you can include your natural key in the fact table. It does take more space, but it makes reloads and troubleshooting easier.
Inmon described a data warehouse as being a subject-oriented, integrated, time-variant and nonvolatile collection of data .
For implement integrated we have to understand concept of surrogate keys .
ie . We have to create DWH for an group of retail shop , so we have to pull data Item detail data across different shop.
Shope 1
Item
Itemid Name
1 Tea
2 suger
Sales
orderid Itemid tola
1 1 100
2 1 100
3 2 300
Shope 2
Item
Itemid Name
1 cofee
2 tea
Sales
orderid Itemid tola
1 1 100
2 1 100
3 2 300
No itemid 1 may have diffrent item across shopes , when we pull data across shope in warehouse to with natural key (itemid), now we can not identify record by Itemid now we would require tow field ItemID and Shope no (Natural key + data Source identifier ). Now think if there is a transaction data (sales) to be pulled to establish relationship we have to join on two columns (Natural key + data Source identifier ) which will degrade performance .
second scenario you have to implement SCD then also you will required surrogate keys.
In nut shell surrogate keys is improve performance (read) and helps to implement [SCD][1]
Answer to question .
Just wanted to know whether surrogate keys need to be ordered explicitly on the natural key? no
Does the Same surrogate key always need to be binded with the natural key when a truncate and reload is used? Depends on SCD implementaion or does it not matter, I have some big tables that use delta load which basically just does inserts and updates. I don't want to be ordering the data to ensure surrogate key and natural key always bind if they don't need to? you will not able to relate data Isnt that why they are nonsense keys? read