I'm looking at the performance of some queries that I'm doing in Redshift and noticed something that I can't quite find in the documentation.
I created two tables that have a join key between them (about 10K rows in the child table).
For the parent table, let's call it A, I have a primary key that I've declared to be the distkey and sort key for the table. Let's call this id.
For the child table B, I've made a foreign key field, parent_id that references A.id. parent_id has been declared as the distkey for table B. Table B also has a primary key, id that I've defined. I've created an interleaved sort key on table B for (parent_id,id).
When I try to do an explain joining the two tables, I will always get a Hash Join. If I recreate table B with a normal compound sort key, I will always get a Merge Join.
When I look at the stats of the tables, I don't see any skews that are out of line.
My question is, will Redshift always use Hash Joins with interleaved sort keys or is there something I'm doing wrong?
EDIT - The order of the interleaved sort keys in Table B is actually (parent_id, id). I wrote it above incorrectly. I've updated the above to be clear now.
parent_id
first. – Joe HarrisINTERLEAVED
then. I've had issues whenever I've tried it in the past (check the Redshift forum) so I've been staying away from it unless I really need it. – Joe Harris