We have a test code schema which uses a java client to do Cassandra INSERT/READ/QUERY operations. We have built single node setup with physical server with following configuration.
- OS is Linux SuSE 11.SP2
- Memory on physical server is 32GB
- Swap memory is 32GB
- CPU has 4 core with each 2GHz
- Commit log Residing on SSD disk with 100GB (RAID-0 and local to system)
- Data log residing on SAS disk with 7TB (5 SAS disks configured with RAID-0 and local to system).
- JRE version 1.7.0.25
- Cassandra Version 1.2.5 (Default partition)
- MAX HEAP SIZE 8GB
- HEAP_NEW_SIZE 400MB ( 100MB per core as per Cassandra recommendation).
NOTE Increasing CPU from 4 core to 8 core helped to improve the performance but very less.
We are using below test schema which has 5 secondary indexes.
"CREATE TABLE test_table (
hash_key text PRIMARY KEY,
ctime timestamp,
ctime_bucket bigint,
extension text,
filename text,
filename_frag text,
filesize bigint,
filesize_bucket bigint,
hostname text,
mtime timestamp,
mtime_bucket bigint
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX test_table_ctime_bucket_idx ON test_table (ctime_bucket);
CREATE INDEX test_table_extension_idx ON test_table (extension);
CREATE INDEX test_table_filename_frag_idx ON test_table (filename_frag);
CREATE INDEX test_table_filesize_bucket_idx ON test_table (filesize_bucket);
CREATE INDEX test_table_mtime_bucket_idx ON test_table (mtime_bucket);"
We are trying following INSERT and READ tests with default tuning parameters however we are seeing very slow in read and write performance. The read is drastically slow compared to write performance. When we removed the secondary indexes from above schema we get around 2x time better performance however still we feel there is scope to improve the performance with tuning Cassandra parameters. However with secondary indexes the performance is very bad.
- 1M INSERT provides around 7k Ops/sec
- 10M INSERT provides around 5K Ops/sec (slightly drops the performance)
- 100M INSERT provides around 5K Ops/sec
- 1000MM INSERT provides around 4.5K Ops/sec
If we remove the secondary indexes we get performance around 11K Ops/sec for all workloads listed above.
- 1M READ provides around : 4.5k Ops/sec
- 10M READ provides only around : 225 ops/sec (drastically drops the performance)
We would like to know from your expert team about what specific tuning parameters to be applied for WRITE and READ operations to get better performance. How can we defer the compaction and GC to avoid the performance bottleneck which can play some role during these operations. If there are any system specific tunings to be applied, we would like to know from your expert team.
We are trying with following tuning parameters (in Cassandra.yaml and Cassandra-env.sh) however we have not getting much improvement in write and read performance.