Cassandra vnodes performance overhead and changing the number of vnodes

Question

We have a test cluster of 4 nodes, and we've turned on vnodes. It seems that reading out is somewhat slower than the old method (initial_token). Is there some performance overhead by using vnodes? Do we have to increase/decrease the default num_tokens (256) if we only have 4 physical nodes?

Another scenario we would like to test is to change the num_tokens of the cluster on the fly. Is it possible, or do we have to recreate the whole cluster? If possible, how can we accomplish that?

We're using Cassandra 2.0.4.

Evan Chan Evan Chan · Accepted Answer · 2016-02-15T17:10:10

It really depends on your application, but if you are running Spark queries on top of Cassandra, then a high number of vnodes can significantly slow down your queries, by at least 2x or 5x. This is because Spark cannot subdivide queries across vnodes, and each vnode results in one Spark partition, and a high number of partitions slows down low latency queries.

The recommended number of vnodes is more like 16. This lets you split a two node cluster in theory to 32 nodes max, which is more than enough of an expansion ratio for most folks.

Cassandra vnodes performance overhead and changing the number of vnodes

1 Answers