I've set up a 3-node cluster (Cassandra 1.2.1) and a column family with a replication factor of 3.
The Column family is called metric_hour and hold 15-second average values for one hour with column names:
- account name
- metric name
- value type (string)
- unit type (string)
- m0
- m1 ...
- m239
I am running 6 clients in parallell, each pushing data to the cluster with a total of 2.1 million metric values (15-second values for one metric for an entire year). Because I do not want to read and write data for each metric value, I am calculating the complete list of metric_hours to store up front and sending them to the cluster at the end, just over 8500 inserts. Inserts are batched in groups of 50 with BEGIN BATCH ... END BATCH;
After about 5-6 minutes the Cassandra Cluster is overwhelmed, with heap filled, and nodes start failing (either becoming unresponsive or dies altogether). I've run this setup a number of times with the same result.
Each cassandra node is running on its own dedicated hardware, Quad core 2.3 GHz Intel i7 CPU and 16GB of physical RAM (these are Mac Mini Server machines. The data is persisted to an internal SSD). I have played with setting -Xmx and -Xmn via cassandra-env in ranges between 2 and 8 GB. Running wth 8GB keeps the cluster running for longer, but it still fails after a short time.
I've also set consistenncy levels to QUORUM, which keeps the cluster alive for a bit longer. A minute or so.
All CQL queries are sent in to the Cassandra Cluster using the Datastax java-driver and CQL3. I have tried with row_cache on and off.
Running the excact same setup on a Riak cluster goes without problems for a relatively long period of time. So I am wondering what can be improved on the Cassandra setup, or what might possibly be wrong.