Data Warehouse modelling: Data Vault vs Persistent Staging Area

Question

Consider the following two DWH architectures:

DWH with Raw Data Vault, layers:

Source systems
Staging area (truncated on every load, exact schema of source tables)
Raw Data Vault (modelled as Data Vault, contains record history, hubs/sats/links modelled after source systems structure, NO business rules applied)
Data Marts (dimensional models, business rules applied)

DWH with Persistent Staging Area (called PSA or HDA), layers:

Source systems
Staging area (truncated on every load, exact schema of source tables)
PSA (contains record history, schema of source tables + date_load/date_load_end columns etc.)
Data Marts (dimensional models, business rules applied)

Does the raw Data Vault concept have any benefits compared to the PSA concept? In my opinion the Data Vault modelling adds unnecessary complexity in terms of ETL, and is also slower performance-wise.

It's hard to find a real good answer on this, any thoughts?

Thanks!

This is going to get closed as opinion-based. But note that a Persistent Staging Area is now more commonly called a "Data Lake", which should indicate the popularity of the approach :) — David Browne - Microsoft
For me it depends a lot on your source systems. How many are they? How's the quality of their data models and so on? My experience is that raw vault can be pain if the source system data model integrity is of poor quality — Cedersved

Andreas Andreas · Accepted Answer · 2020-03-04T21:14:28

Data Vault vs. Persistent Staging Area sounds to me like apples and pears - hard to compare. You should not try to define a Data Vault to capture source data without knowing the business ontology - otherwise you're building a source system vault, which offers no or little benefit to the business. Building a Data Vault on a PSA or a data lake makes much more sense to me. Landing the data as an image of the source systems and then step by step building a sustainable data collection out of it.

Data Warehouse modelling: Data Vault vs Persistent Staging Area

2 Answers