0
votes

I've set up a 3-node cluster (Cassandra 1.2.1) and a column family with a replication factor of 3.

The Column family is called metric_hour and hold 15-second average values for one hour with column names:

  • account name
  • metric name
  • value type (string)
  • unit type (string)
  • m0
  • m1 ...
  • m239

I am running 6 clients in parallell, each pushing data to the cluster with a total of 2.1 million metric values (15-second values for one metric for an entire year). Because I do not want to read and write data for each metric value, I am calculating the complete list of metric_hours to store up front and sending them to the cluster at the end, just over 8500 inserts. Inserts are batched in groups of 50 with BEGIN BATCH ... END BATCH;

After about 5-6 minutes the Cassandra Cluster is overwhelmed, with heap filled, and nodes start failing (either becoming unresponsive or dies altogether). I've run this setup a number of times with the same result.

Each cassandra node is running on its own dedicated hardware, Quad core 2.3 GHz Intel i7 CPU and 16GB of physical RAM (these are Mac Mini Server machines. The data is persisted to an internal SSD). I have played with setting -Xmx and -Xmn via cassandra-env in ranges between 2 and 8 GB. Running wth 8GB keeps the cluster running for longer, but it still fails after a short time.

I've also set consistenncy levels to QUORUM, which keeps the cluster alive for a bit longer. A minute or so.

All CQL queries are sent in to the Cassandra Cluster using the Datastax java-driver and CQL3. I have tried with row_cache on and off.

Running the excact same setup on a Riak cluster goes without problems for a relatively long period of time. So I am wondering what can be improved on the Cassandra setup, or what might possibly be wrong.

1

1 Answers

2
votes

We have 1 billion rows per node and ran into RAM issues at 1 billion row count explained later(using PlayOrm for cassandra).

batches of 50 are good. I can't believe you are running out of RAM with only 8500 inserts though. That makes no sense. What version are you on? The RAM issues are tied to bloomfilters and index sampling and you need to get to one billion with 8G of RAM to have issues.

To have more rows like we are about to do try 1.2.2 with Leveled Compaction STrategy. Index sampling (cassandra.yaml) can be lowered probably as well. The Bloomfilters take about 2 Gig RAM per 1 billion rows. We have over 1 billion rows per node so hit memory issues. We have 32 gig RAM machines but cassandra autoconfigures that to 8G RAM as it the jdk GC gets bad above 8G. Lately, we raised the JVM to 12G to get through this until we can get to LCS to turn off bloomfilters(we hope this helps us do 5 billion rows per node).

Moving to 1.2.2 from 1.1.4 significantly reduced the RAM usage(we are actually running an upgrade today but tested on on node 5 in production and found out it is using a lot less RAM for same amount of rows). We do hope to get to 5 billion rows of time series data with the switch we will make to leveled compaction strategy as well which by default does not use bloomfilter anymore(bloomfilters eat up RAM, more rows = more RAM used).

Dean