Cassandra and read latency

Question

I installed cassandra from https://bitnami.com/stack/cassandra on a cloud machine. I cloned this machine so that I get 2 machines. One running cassandra server (1 node cassandra cluster) and other acting as client and issuing queries to the first one (server).

I used YCSB - https://github.com/brianfrankcooper/YCSB to perform the benchmark. What I observed is READ latency on server was very low few microseconds (around 50/100 us for 99th percentile and MAX) as observed using "nodetool cfhistograms <'db'> <'table'>" and "nodetool cfstats <'db'>" - most likely all data was coming from cache i.e. all sstables were in cache.

But end-to-end latency observed from client (other node) with YCSB benchmark tests was high - average latency = 2000 us. So I wonder why the end-to-end latency is so high 2000 us as opposed to 100 us (on server). Moreover network latencies are also low around 200 us (as seen using PING). I want cassandra server to respond as quickly/instantly as possible. Can somebody help?

Chris Lohfink Chris Lohfink · Accepted Answer · 2017-07-28T17:59:33

So to start with cfhistograms measures local read latency which is only time to pull off of memtables merge with sstables. This does not include coordination, for that check proxyhistograms.

Even then you should expect deviation with the client time. Beyond network latency there is latency in kernel and deserialization time in client. Also the incoming network time, and server side cql deserialization are not included. If a Full/YGC occurs in that time its also may not included in the C* latency time (which can easily be 1-500ms). Depending on version/configuration the client will do some request coalescing (up to 10us) as well. You can easily expect 1ms in delays on the jvm just for reaching safepoint for a ygc or revoking bias (if enabled, depends on version) which if happen before we record the "start time" of the request isn't included. Sub 1ms latency on a tcp network can really change with naggle (if enabled) and on the tcp window so seeing a average of 200us may not be consistent from the icmp ping and the actual tcp round trip time.

Cassandra and read latency

1 Answers