2
votes

I have this query in my postgresql db

UPDATE main_table m 
SET new_prod=false 
FROM reseller_product p
WHERE p.distributor_id=896 
AND p.sku = m.sku 
AND p.reseller_id=8;

And it takes too long to execute. I have indexes on sku column in both tables. After using EXPLAIN command I saw that the planer used Hash Join. It seems to me that Merge Join would be faster here. Am I wrong ? Or maybe the planer used Merge Join because query has filters like "p.distributor_id=896"?

PS sku type is character varying(50)

"Update  (cost=80414.35..228437.71 rows=848460 width=212) (actual time=1582872.917..1582872.917 rows=0 loops=1)"
"  ->  Hash Join  (cost=80414.35..228437.71 rows=848460 width=212) (actual time=6906.044..23677.829 rows=706328 loops=1)"
"        Hash Cond: ((p.sku)::text = (m.sku)::text)"
"        ->  Append  (cost=0.00..96319.53 rows=1067877 width=20) (actual time=0.909..7426.880 rows=808287 loops=1)"
"              ->  Index Scan using res_prod_reseller_id on reseller_product p  (cost=0.00..8.74 rows=1 width=20) (actual time=0.729..0.729 rows=0 loops=1)"
"                    Index Cond: (reseller_id = 8)"
"                    Filter: (distributor_id = 896)"
"              ->  Seq Scan on reseller_product_8 p  (cost=0.00..96310.79 rows=1067876 width=20) (actual time=0.168..3850.121 rows=808287 loops=1)"
"                    Filter: ((distributor_id = 896) AND (reseller_id = 8))"
"        ->  Hash  (cost=45779.60..45779.60 rows=848460 width=202) (actual time=6897.658..6897.658 rows=709948 loops=1)"
"              Buckets: 1024  Batches: 256  Memory Usage: 417kB"
"              ->  Seq Scan on main_table m  (cost=0.00..45779.60 rows=848460 width=202) (actual time=0.926..2770.997 rows=709948 loops=1)"
2
would be better if you post the EXPLAIN planAkash
Can you please post both table definitions too?Christian

2 Answers

2
votes

Your plan mentions that you are updating close to a million rows, selecting the data takes 23 seconds, while updating and selecting them takes 26 minutes. updating them is the main problem rather than the plan used for selecting it

A MERGE JOIN would be preferred when you are mainly ordering your results. In your case HASH JOIN seems to be the appropriate choice as why would the planner waste time in ordering the results when its not needed. You could still force the planner to use MERGE JOIN by issuing 'set enable_hashjoin = off', but its likely to make the query slower.

I would suspect any triggers/indexes on the table being updated main_table or multi-column index's that contain the new_prod. I'd suggest to disable the trigger and/or add the indexes once you are done updating your records. Your read/write speed could also depend on other factors considering your hardware to start with.

0
votes

Are sku columns in main_table and reseller_product the same type in both tables?

Does reseller_product have an index in this order?

CREATE INDEX I_reseller_product_1 ON reseller_product (distributor_id,reseller_id,sku);

Does main_table have an index in sku column?

CREATE INDEX I_main_table_1       ON main_table      (sku);

If all questions are yes, try this:

UPDATE main_table m 
SET    new_prod = false 
WHERE  EXISTS (SELECT 1 
               FROM   reseller_product p
               WHERE  p.distributor_id = 896 
               AND    p.reseller_id    = 8
               AND    p.sku            = m.sku 
               );