0
votes

If I have 500k rows to delete, should I form a batch of 100 rows for delete? i.e. 100 rows at a time?

What is the performance characteristics? Other than network round trip, would the server be benefited from the batching?

Thanks

2

2 Answers

2
votes

Short answer-- you're most likely better off with simple, non-batched async operations.

The batch keyword in Cassandra is not a performance optimization for batching together large buckets of data for bulk loads.

Batches are used to group together atomic operations, actions that you expect to occur together. Batches guarantee that if a single part of your batch is successful, the entire batch is successful.

Using batches will probably not make your mass ingestion/or deletes run faster

Okay but what if I use an Unlogged Batch? Will that run super fast?

Cassandra uses a mechanism called batch logging in order to ensure a batch's atomicity. By specifying unlogged batch, you are turning off this functionality so the batch is no longer atomic and may fail with partial completion. Naturally, there is a performance penalty for logging your batches and ensuring their atomicity, using unlogged batches will removes this penalty.

There are some cases in which you may want to use unlogged batches to ensure that requests (inserts) that belong to the same partition, are sent together. If you batch operations together and they need to be performed in different partitions / nodes, you are essentially creating more work for your coordinator. See specific examples of this in Ryan's blog:

Read this post

0
votes

Writes and deletes are the same thing so you should expect the same performance characteristics. I would expect some slight benefits from batching but normal async operations should be just as fast.