0
votes

See these 2 Snowflake queries profile images. They are doing similar work (Update the same 370M table join with small tables(one case is 21k, the other one is 9k), but the performance result is 5x). The query finished around 15 mins, using one xsmall VDW: Fast query finished around 15 mins

And this query, update the same table of 370M rows, join with an even small DIM table of 9k, but still running after 1 hour 30 mins Still, running after 90 minutes

From the query profile, I cannot explain why the 2nd query runs so much slower than the first one. The 2nd one is run right after the first one.

Any idea? Thanks

2
the spilling to local storage is a sign of producing a lot of data. This looks like a problem we have recently opened a bug where I join a fact table to 3 dimension tables (for 5 rows of data) and the 3rd table is joined twice and if all 4 join occur with time ranged join on filters then the querry explodes and runs for 6 hours (our max run time) where as if you let one dim not join or remove a time between join it does not explode and returns sub second. Which is to say without seeing your SQL what is happen, but if it's similar to my story I would open a support case.Simeon Pilgrim
your queries are different and you haven't provided the join condition or data sample. Are you able to provide a bit more detail?Simon D

2 Answers

2
votes

in the second query you can see bytes spilled to local storage is 272gb. This means that the work done in processing was too large to fit in the cluster memory and so had to spill to local attached SSD. From a performance perspective this is a costly operation and I think probably why the 2nd query took so long to run (query 1 only had 2gb of spilling). The easiest solution to this is to increase the size of the VDW - or you could rewrite the query:

https://docs.snowflake.net/manuals/user-guide/ui-query-profile.html#queries-too-large-to-fit-in-memory

Note also that query 1 managed to read 100% of its data set from VDW memory - which is very efficient - whereas query2 could only find about half of its data set there and so had to perform remote io (read from cloud storage) to get the rest. Queries/work performed prior to running query 1 and 2 had retrieved that information to the local VDW cache, and retains this info on an LRU basis.

1
votes

The join for the slow query is producing more rows than are flowing into it. This can be what you want, but often it's caused by duplicate values in the tables. I'd do a sanity check on whether that's expected here.