We have this Cassandra cluster and would like to know if current performance is normal and what we can do to improve it.
Cluster is formed of 3 nodes located on the same datacenter with total capacity of 465GB and 2GB of Heap per node. Each node has 8 cores and 8GB or RAM. Version of different components are cqlsh 5.0.1 | Cassandra 2.1.11.872 | DSE 4.7.4 | CQL spec 3.2.1 | Native protocol v3
The workload is described as follows:
- Keyspace use org.apache.cassandra.locator.SimpleStrategy placement strategy and replication factor of 3 (this is very important for us)
- Workload consist mainly of write operations into a single table. The table schema is as follows:
CREATE TABLE aiceweb.records ( process_id timeuuid, partition_key int, collected_at timestamp, received_at timestamp, value text, PRIMARY KEY ((process_id, partition_key), collected_at, received_at) ) WITH CLUSTERING ORDER BY (collected_at DESC, received_at ASC) AND read_repair_chance = 0.0 AND dclocal_read_repair_chance = 0.1 AND gc_grace_seconds = 864000 AND bloom_filter_fp_chance = 0.01 AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' } AND comment = '' AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' } AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.LZ4Compressor' } AND default_time_to_live = 0 AND speculative_retry = '99.0PERCENTILE' AND min_index_interval = 128 AND max_index_interval = 2048;
Write operations come from a NodeJS based API server. The Nodejs driver provided by Datastax is used (version recently updated from 2.1.1 to 3.2.0). The code in charge of performing the write request will group write operations per Primary Key and additionally it will limit the request size to 500 INSERTs per request. The write operation is performed as a BATCH. The only options explicitly set are prepare:true, logged:false
.
OpsCenter reflect a historial level of less than one request per second in the last year using this setup (each write request been a BATCH of up to 500 operations directed to the same table and the same partition). Write request latency has been at 1.6ms for 90% of requests for almost the entire year but lately it has increased up to more than 2.6ms for the 90% of requests. Os Load has been below 2.0 and Disk Utilization has been below 5% most of the time with few peaks at 7%. Average Heap usage has been 1.3GB the entire year with peaks at 1.6GB even though currently this peak is rising during the last month.
The problem with this setup is that API performance has been degrading the entire year. Currently the BATCH operation can take from 300ms up to more than 12s (leading to a operation timeout). In some cases the NodeJS driver report all Cassandra drivers down even when OpsCenter report all nodes alive and healthy.
Compaction Stats shows always 0 on each node and nodetool tpstats
is showing something like:
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 10554 0 0
ReadStage 0 0 687567 0 0
RequestResponseStage 0 0 767898 0 0
MutationStage 0 0 393407 0 0
ReadRepairStage 0 0 411 0 0
GossipStage 0 0 1314414 0 0
CacheCleanupExecutor 0 0 48 0 0
MigrationStage 0 0 0 0 0
ValidationExecutor 0 0 126 0 0
Sampler 0 0 0 0 0
MemtableReclaimMemory 0 0 497 0 0
InternalResponseStage 0 0 126 0 0
AntiEntropyStage 0 0 630 0 0
MiscStage 0 0 0 0 0
CommitLogArchiver 0 0 0 0 0
MemtableFlushWriter 0 0 485 0 0
PendingRangeCalculator 0 0 4 0 0
MemtablePostFlush 0 0 7879 0 0
CompactionExecutor 0 0 263599 0 0
AntiEntropySessions 0 0 3 0 0
HintedHandoff 0 0 8 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 0
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
Any help or suggestion with this problem will be deeply appreciated. Feel free to request any other information you need to analyze it.
Best regards