1
votes

We have a need to run multiple merge statements on the same table in Snowflake. One of the challenges we face right now is the fact that each merge statement (that results into updates) seems to block the other merges. Each of the merge statements deals with data that belongs to a non-overlapping Key column, aka Tenant. The data volume is unevenly spread between the tenants.

So we thought, that maybe if we cluster the table using the Tenant as the Clustering Key (I read documentation that says Clustering doesn't help on smaller tables, but I thought I will give it a shot anyways), we will end up having a micropartition per Tenant and hence the blocking on updates would no longer happen. But we notice that the clustering key does not guarantee a unique micropartition per Tenant. Unfortunately, as expected, there were only 23 partitions for 100's of tenants, and blocking continued.

On inspecting the lock, we see its held on PARTITIONS(i'm assuming its the micropartition.)

enter image description here

I was wondering if there is any mechanism in Snowflake that can improve blocking on updates OR a way to control the data that goes into a micropartition.

1

1 Answers

6
votes

As I understand it there isn't an easy way to improve blocking on updates OR a way to control/guarantee the data that goes into a micropartition in Snowflake. As the docs state "UPDATE, DELETE, and MERGE statements hold locks that generally prevent them from running in parallel with other UPDATE, DELETE, and MERGE statements" - so the general encouragement is to batch updates/deletes where possible.

One idea though: might it be possible to have multiple "staging tables" according to groups of tenants you have (so each of them can be operated on by a merge independently without blocking) and bring them together into a downstream/final form table or view?