3
votes

I'm writing an ETL process in redshift and I would really like for it to be more parallelizable. The process creates a bunch of temporary tables, and then merges them together into the main table, inside a transaction that looks like this:

CREATE TEMPORARY TABLE temporary_table
    (LIKE production_table);

BEGIN TRANSACTION;
LOCK TABLE production_table;

/* Dedupe and then insert */
DELETE FROM temporary_table
   USING production_table
   WHERE temporary_table.id = production_table.id
       AND temporary_table.date <= production_table.date;

INSERT INTO production_table
SELECT * FROM temporary_table;

END TRANSACTION;

I expect the LOCK TABLE production_table; query to block until any other transactions finish, but whenever I try to run this in practice, I get:

ERROR:  deadlock detected
DETAIL: Process 4868 waits for AccessExclusiveLock on relation 165643 of database 154587; blocked by process 4863.
    Process 4863 waits for AccessExclusiveLock on relation 165643 of database 154587; blocked by process 4868.

This happens between the two delete/insert statements. Why does the lock table step not prevent deadlocks? How can I structure my writes so that they block until the other transaction is finished?

1

1 Answers

2
votes

I have figured this out.

There is an implicit lock on production_table whenever you do

CREATE TEMPORARY TABLE temporary_table
    (LIKE production_table)

for the lifetime of the temporary table. I used the actual schema of the production table in the create temporary table query, and the deadlocks went away.