0
votes

I have a table in a Redshift cluster with 5 billion rows. I have a job that tries to update some column values based on some filter. Updating anything at all in this table is incredibly slow. Here's an example:

Update tbl1
set price=tbl2.price, flag=true
from tbl2 join tbl1 on tbl1.id=tbl2.id
where tbl1.time between (some value) and
tbl2.createtime between (some value)

I have sort key on time and dist key on id. When I checked stl_scan table, its shows that my query is scanning 50 million rows on each slice, and only returning 50K rows on each slice. I stopped the query after 20 mins.

For testing, I created same table with 1 billion rows and same update query took 3 mins.
When I run select with same condition I get the results in few seconds.Is there anything I am doing wrong?

1

1 Answers

0
votes

I believe the correct syntax is:

Update tbl1
    set price = tbl2.price,
        flag = true
from tbl2 
where tbl1.id = tbl2.id and
      tbl1.time between (some value) and
      tbl2.createtime between (some value);

Note that tbl1 is only mentoned once, in the update clause. There is no join, just a correlation clause.