I have a very technical question about how Redshift deals with DISTKEY
and SORTKEY
internally in order to fulfill the storing tier and the query execution demands. I have read this amazing post that explains very well what means each of these regarding the table design.
My question is let's suppose I have a table A with three columns:
CREATE TABLE (
orderdate timestamp distkey,
product_id varchar(50),
product_name varchar(250)
) SORTKEY (product_id)
Now, we know that Redshift is a columnar approach DB optimized for data warehousing. In my example is clear that probably the way how the data will be distributed across the slices for the computing nodes is based on the DISTKEY
orderdate. But, what happens with the column product_id
and product_name
? are these distributed along with orderdate
on the same slice and then when I execute a query Redshift uses the zone maps based on my SORTKEY
to point out the zone of the column that has the data and retrieve it?
If Redshift is a columnar approach then shouldn't each column has a different way to be stored? or what this really means is that: Based on a column wisely picked out among all, the whole columns are going to be stored on the same slice along with the DISTKEY
and then to guarantee the performance the user can even focus the query on a specific zone to pull the required data. So I might overall something like:
DISTKEY
storage tier andSORTKEY
query execution behave
Now if I use a DISTKEY
so my data is stored based that punctual column order, so if later on, I use a SORTKEY
the other for my DISTKEY
can't be changed or altered so how this works?
So sorry folks if I'm so wrong but I need to understand well how this architecture drive the data internally. Thanks so much
Update
Based on the @JoeHarris post answering this question I have tried to picture how the data perhaps look stored.
The first level of distribution is my DISTKEY
(dates are not good but just to follow with the same example) and then internally redshift sorts by my SORTKEY
, giving something like:
thanks for the feedback