0
votes

Please, help me to understand what i missed. I see strange behavior of one cluster node on SELECT with LIMIT and ORDER BY DESC clauses:

SELECT cid FROM test_cf WHERE uid = 0x50236b6de695baa1140004bf ORDER BY tuuid DESC LIMIT 1000;

TRACING (only part):


Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.117000 | 10.0.23.15 | 7862
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:25.136000 | 10.0.25.57 | 6283
Sending REQUEST_RESPONSE message to /10.0.25.56 [MessagingService-Outgoing-/10.0.25.56] | 2016-02-29 22:17:38.568000 | 10.0.24.51 | 457931

10.0.25.56 - coordinator node
10.0.23.15, 10.0.24.51, 10.0.25.57 - node with data

Coordinator get response from 10.0.24.51 13 seconds later than other nodes! Why so? How can i fix it?

Number of rows for partition key (uid = 0x50236b6de695baa1140004bf) is about 300.

All is fine if we use ORDER BY ASC (our clustering order) or LIMIT value less than number of rows for this partition key.

Cassandra (v2.2.5) cluster contains 25 nodes. Every node holds about 400Gb of data.

Cluster is placed in AWS. Nodes are evenly distributed over 3 subnets in VPC. Type of instance for nodes is c3.4xlarge (16 CPU cores, 30GB RAM). We use EBS-backed storages (1TB GP SSD).

Keyspace RF equals 3.

Column family:

CREATE TABLE test_cf (
    uid blob,  
    tuuid timeuuid,  
    cid text,  
    cuid blob,  
    PRIMARY KEY (uid, tuuid)  
) WITH CLUSTERING ORDER BY (tuuid ASC)  
    AND bloom_filter_fp_chance = 0.01  
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'  
    AND comment = ''  
    AND compaction ={'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}  
    AND compression ={'sstable_compression':'org.apache.cassandra.io.compress.LZ4Compressor'}  
    AND dclocal_read_repair_chance = 0.1  
    AND default_time_to_live = 0  
    AND gc_grace_seconds = 86400  
    AND max_index_interval = 2048  
    AND memtable_flush_period_in_ms = 0  
    AND min_index_interval = 128  
    AND read_repair_chance = 0.0  
    AND speculative_retry = '99.0PERCENTILE';  

nodetool gcstats (10.0.25.57):

Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
    1208504                 368                4559                  73        553798792712                  58                305691840

nodetool gcstats (10.0.23.15):

Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
    1445602                 369                3120                  57        381929718000                  38                277907601

nodetool gcstats (10.0.24.51):

Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms)   GC Reclaimed (MB)         Collections      Direct Memory Bytes
    1174966                 397               4137                  69       1900387479552                 45                304448986
1
What is the RF? What is the consistency? What else is running on that node (repair? compaction?)?Chris Gerlt
RF equals 3. Consistensy level ALL. Nothing else.Alexander D.

1 Answers

0
votes

This could be due to a number of factors both related and not related to Cassandra.

Non-Cassandra Specific

  • How does the hardware (CPU/RAM/Disk Type (SSD v Rotational) on this node compare to the other nodes?
  • How is the network configured? Is traffic to this node slower than other nodes? Do you have a routing issue between the nodes?
  • How does the load on this server compare to other nodes?

Cassandra Specific

  • Is the JVM properly configured? Is GC running significantly more frequently than the other nodes? Check nodetool gcstats on this and other nodes to compare.
  • Has compaction been run on this node recently? Check nodetool compactionhistory
  • Are there any issues with corrupted files on disk?
  • Have you checked the system.log to see if it contains any information.

Besides general Linux troubleshooting I would suggest you compare some of the specific C* functionality using nodetool and look for differences:

https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsNodetool_r.html