Adding node to Cassandra cluster causes CPU overload on existing nodes

Question

So, here is our setup so far:

Cassandra 2.0.10, JDK 1.7.0_65-b17
6 nodes (EC2 c3.8xlarge/ 32 cores/60GB RAM, EBS disks for data, ephemeral SSD for commit logs etc)
pretty heavy write load - 100Ks/second
RF=2, one dc, 2 racks
everything works just fine with low CPU consumption - load average tends to be around 4-10

Now, we are trying to add a node. This cases a heavy load on existing nodes - like over 100 load average. The cluster becomes unresponsive, writes and reads mostly fail.

The weird observations are that: - without adding new node CPU is low - if we turn off writes while adding a new node load average on existing nodes drops back to 4-10 and the new node just fine

I've checked VisualVM sampling and basically all the CPU on existing nodes is consumed by org.jboss.netty.channel.socket.nio.SelectorUtil.select().

What we tried so far:

throttling streaming - no impact
disabling internode compression - no impact
disabling autocompaction on existing nodes - no impact
even running with -Dorg.jboss.netty.epollBugWorkaround=true - no impact

And as of now we are somewhat desperate as this behavior is a blocker for us - we can't afford losing writes and we will need to expand C* dynamically.

Anyone has encountered something similar? Any ideas/hints? Thanks

EDIT: OS is Ubuntu 12.04

EDIT: I still have no idea what causes the behavior above and I'm still curious. OTOH, I've managed to add a couple of nodes w/o any disruption, cpu usage increase, etc by using the following sequence of actions:

set auto_bootstrap: false
start node, join
run nodetool rebuild on this new node
bingo

This scheme will do as a workaround for our case. But it looks somewhat clumsy

Edward Edward · Accepted Answer · 2014-10-25T10:15:33

Have you tried Cassandra virtual node?
I guess with RF=2, when you try to add a new node, 2 existing nodes would be very busy to transfer their data to the new node while responding to your heavy write load at the same time, which would possibly lead to a hot spot problem.

Adding node to Cassandra cluster causes CPU overload on existing nodes

1 Answers