1
votes

I have 4 Node Cassandra 2.1.13 Cluster with the below configurations.

32 GB Ram Max HEAP SIZE - 8 GB 250 GB Hard Disk Each (Not SSD).

I am trying to do a load test on write and read. I have created a multi threaded program to create 50 Million Records. Each row has 30 Columns.

I was able to insert 50 Million records in 84 Minutes at a rate of 9.5K insert per seconds.

Next I was trying to read those 50 Million records randomly using 32 clients and I was able to do read at 28K per second.

The problem is after some time, the memory gets full and most of it cached. almost 20GB.After some time the system hangs because of out of memory.

If I clean the cache Memory, my read throughput goes down to 100 per second.

How should I manage my cache memory without affecting read performance.

Let me know if you need any more any more information.

1
How do you check that "memory gets full"? Are there any OOM exceptions in the Cassandra logs? Did you configure swap space for the system?Stefan Podkowinski
using top command I can see there are less than 500mb free. 11 GB is used and rest all is cached. Swap is disabled.Ramkumar KS

1 Answers

1
votes

What you noticed is the Linux disk cache, which is supposed to serve data from RAM instead of going to disk in order to speed up data read access. Please make sure to understand how it works, e.g. see here.

As you're already using top, I'd recommend add "cache misses" as well to the overview (hit F + select nMaj). This will show you whenever a disk read cannot be served by the cache. You should see an increase of misses once the page cache starts to become saturated.

How should I manage my cache memory without affecting read performance.

The cache is fully managed by Linux and does not need any actions from your side to take care of.