Consider the following two DWH architectures:
DWH with Raw Data Vault, layers:
- Source systems
- Staging area (truncated on every load, exact schema of source tables)
- Raw Data Vault (modelled as Data Vault, contains record history, hubs/sats/links modelled after source systems structure, NO business rules applied)
- Data Marts (dimensional models, business rules applied)
DWH with Persistent Staging Area (called PSA or HDA), layers:
- Source systems
- Staging area (truncated on every load, exact schema of source tables)
- PSA (contains record history, schema of source tables + date_load/date_load_end columns etc.)
- Data Marts (dimensional models, business rules applied)
Does the raw Data Vault concept have any benefits compared to the PSA concept? In my opinion the Data Vault modelling adds unnecessary complexity in terms of ETL, and is also slower performance-wise.
It's hard to find a real good answer on this, any thoughts?
Thanks!