Cassandra - update table using IN operator on a clustering column

Question

I have the following table
CREATE TABLE group ( tenant_id uuid, group_id uuid, display_name text, group_members set<uuid>, mail text, user_members set<uuid>, PRIMARY KEY (tenant_id, group_id) )
I would like to update group_members field and execute the following statement:

UPDATE group SET group_members = group_members - {7c796908-8c03-4b5d-afd1-4263dc1e469c} WHERE tenant_id = ac0d32ce-786f-4bfc-9b14-88008be2d19a AND group_id in (1ab11837-656a-463f-b7eb-2cbd1a178e49, aa18d888-8e5e-410f-8a67-ca2cb0e4e92c, 0fecc4c5-f49b-412e-b248-1b7c816a49c8)
I know that using the IN operator on a partition key is not recommended since it causes the query to be executed on multiple nodes. But here I provide the partition key so I see no reason that this will cause a performance issue.

My questions are the following:

1. Does this query is executed once on the node holding the data? Or does it executed 3 times - one for each group_id value in the IN operator?

2. Is there a better way (performance wise) to accomplish this update?
3. I tried to run this query without providing the group_id in the WHERE clause but this resulted with the following error

InvalidRequest: Error from server: code=2200 [Invalid query] message="Some clustering keys are missing: group_id"

Why this error has happened?

dilsingi dilsingi · Accepted Answer · 2018-02-20T15:13:58

1. Does this query is executed once on the node holding the data? Or does it executed 3 times - one for each group_id value in the IN operator?

Since the query is matching for a single partition key, its executed in only the nodes which pertain to have this data. Hoping the load balancing policy is TokenAware and consistency is one or local_quorum. Also this update query is executed only once, since single target partition key. Remember Cassandra is append only system, there is no read before writes/updates, hence these updates are essentially appended to a new SSTable and life is good :)

2. Is there a better way (performance wise) to accomplish this update?

As long as the update statements aren't impacting multiple partitions, it should be ok. Also we are talking about only say three "group_id" column values this query should work fine. If the IN query tends to grow towards three digits say 100 clustering columns value, then you would start to see the latency impact during reads for this partition key.

3. I tried to run this query without providing the group_id in the WHERE clause but this resulted with the following error InvalidRequest: Error from server: code=2200 [Invalid query] message="Some clustering keys are missing: group_id" Why this error has happened?

The error essentially states that you can't updated a row with just Partition column alone. As it doesn't know what are all the clustering columns to apply the write to.

There could be 1000s of clustering columns for a given partition key. So without qualifying clustering column, Cassandra wouldn't know which row within the partition key are impacted and which aren't.

Cassandra - update table using IN operator on a clustering column

1 Answers