3
votes

How small should a table using Diststyle ALL be in Amazon Redshift?

It says here: http://dwbitechguru.blogspot.com/2014/11/performance-tuning-in-amazon-redshift.html that for vey small tables, redshift should use diststyle ALL instead of EVEN or KEY. How Small is small? If I was to specify a row number in the where clause of the query: select relname, reldiststyle from pg_class how many rows should I specify?

2

2 Answers

6
votes

It really depends on the cluster size you are using. DISTSTYLE ALL will copy the data of your table to all nodes - to mitigate data transfer requirement across nodes. You can find out the size of your table and Redshift nodes available size, if you can afford to copy table multiple times per node, do it!

Also, if you have a requirement of joining other tables with this table very very frequently, like in 70% of your queries, I believe it is worth the space if you want better query performance.

If your Join keys across tables are same in terms of cardinality, then you can also afford to distribute all tables on that key so that similar keys lie in same node which will obviate replication of data.

I would suggest trying out the two options above, and comparing average query run times of around 10 queries and then come to a decision.

3
votes

By considering a Star Schema, the distribution style All is normally used for dimension tables. Doing this have the advantage to speed up joins, let's explain this through an example. If we would like to obtain the quantity saled of each product by country, we would require to join the fact_sales with the dim_store table on store_id key. enter image description here

So, setting diststyle all on dim_store enable us to do a JOIN result in parallel compared to the disvantage of shuffling when enabling diststyle even. However, you can let Redshift automatically handle an optimal distribution style by setting distyle auto, for more info check this link.