traditional star schema vs wide-table performance comparison in Snowflake

Question

When designing the data model for a snowflake data warehouse, is there a general rule as to what type of model will have the best performance? Specifically, compare a traditional star schema vs wide-table

A typical fact table has surrogate keys to the dimensions where fields such as descriptions are stored. What if the structure is further de-normalized and those descriptions are moved (or also on ) the fact tables? this is more in line with the "one-big-table" approach. Changes to the values on any dimension or a fact, would require a new record in the "fact" table which obviously will generate a lot more data"

Erick Roesch Erick Roesch · Accepted Answer · 2019-12-04T13:03:46

The answer depends on your specific situation to some degree. When designing the schema, you typically have to balance the ease/speed/recoverability of ingesting data from many different sources/tables w/ a model that is easy for consumers to understand (e.g., write complex analytical queries) and performs well under load.

I've found that maintaining the core data model in star/snowflake format enables independent ingest/transformation/conforming of all the corresponding fact & dimension tables.

But then I have another transformation/denormalization layer that flattens that model into an analytics ready dataset. Depending on the dataset size, and freshness requirements of the data, this can be accomplished w/ a simple CTAS statement off a view that pulls together the requisite data + swap (this solution can be run at any time w/ no disruption to analytics queries)

For performance reasons, flattened tables are essential for BI tools & analysts that connect live to Snowflake. And for analysts who aren't masters of SQL, it abstracts out the complexity of all the underlying joins.

traditional star schema vs wide-table performance comparison in Snowflake

2 Answers