Please, help me to understand what i missed. I see strange behavior of one cluster node on SELECT with LIMIT and ORDER BY DESC clauses:
SELECT cid FROM test_cf WHERE uid = 0x50236b6de695baa1140004bf ORDER BY tuuid DESC LIMIT 1000;
TRACING (only part):
…
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.136000 | 10.0.25.57 | 6283
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:38.568000 | 10.0.24.51 | 457931
…
10.0.25.56 - coordinator node
10.0.23.15, 10.0.24.51, 10.0.25.57 - node with data
Coordinator get response from 10.0.24.51 13 seconds later than other nodes! Why so? How can i fix it?
Number of rows for partition key (uid = 0x50236b6de695baa1140004bf) is about 300.
All is fine if we use ORDER BY ASC (our clustering order) or LIMIT value less than number of rows for this partition key.
Cassandra (v2.2.5) cluster contains 25 nodes. Every node holds about 400Gb of data.
Cluster is placed in AWS. Nodes are evenly distributed over 3 subnets in VPC. Type of instance for nodes is c3.4xlarge (16 CPU cores, 30GB RAM). We use EBS-backed storages (1TB GP SSD).
Keyspace RF equals 3.
Column family:
CREATE TABLE test_cf (
uid blob,
tuuid timeuuid,
cid text,
cuid blob,
PRIMARY KEY (uid, tuuid)
) WITH CLUSTERING ORDER BY (tuuid ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
nodetool gcstats (10.0.25.57):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1208504 368 4559 73 553798792712 58 305691840
nodetool gcstats (10.0.23.15):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1445602 369 3120 57 381929718000 38 277907601
nodetool gcstats (10.0.24.51):
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
1174966 397 4137 69 1900387479552 45 304448986