1
votes

When designing the data model for a snowflake data warehouse, is there a general rule as to what type of model will have the best performance? Specifically, compare a traditional star schema vs wide-table

A typical fact table has surrogate keys to the dimensions where fields such as descriptions are stored. What if the structure is further de-normalized and those descriptions are moved (or also on ) the fact tables? this is more in line with the "one-big-table" approach. Changes to the values on any dimension or a fact, would require a new record in the "fact" table which obviously will generate a lot more data"

2

2 Answers

2
votes

The answer depends on your specific situation to some degree. When designing the schema, you typically have to balance the ease/speed/recoverability of ingesting data from many different sources/tables w/ a model that is easy for consumers to understand (e.g., write complex analytical queries) and performs well under load.

I've found that maintaining the core data model in star/snowflake format enables independent ingest/transformation/conforming of all the corresponding fact & dimension tables.

But then I have another transformation/denormalization layer that flattens that model into an analytics ready dataset. Depending on the dataset size, and freshness requirements of the data, this can be accomplished w/ a simple CTAS statement off a view that pulls together the requisite data + swap (this solution can be run at any time w/ no disruption to analytics queries)

For performance reasons, flattened tables are essential for BI tools & analysts that connect live to Snowflake. And for analysts who aren't masters of SQL, it abstracts out the complexity of all the underlying joins.

0
votes

This question has been asked in a lot of variants before, the latest being snowflake sproc vs standalone sql.

Snowflake's hybrid column/micropartition table storage (and other databases with a pure column structure) means old truths are not valid anymore, or to a lesser degree.

If you have a star schema model it usually means you have a data warehouse that is updated by batch, and not by many small transactions. This means that the cost of maintaining "one-big-table" may not be prohibitive and should be investigated. One-big-table is surely the easiest for most data consumers.