We are suddenly observing high write latency in metrics for one table (devices).
This is a tiny table with <100 entries where we update a field regulary.
This is on a 3 node cluster with RF=3. Each node has 8GB ram. We are running Cassandra 3.11.4 in docker.
There is nothing unusual in logs. The application is running smoothly as well.
nodetool tablehistograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 0.00 263.21 0.00 258 17
75% 0.00 1131.75 0.00 372 20
95% 0.00 12108.97 0.00 642 29
98% 0.00 25109.16 0.00 642 35
99% 0.00 43388.63 0.00 642 35
Min 0.00 8.24 0.00 51 0
Max 0.00 155469.30 0.00 770 35
nodetool status
Datacenter: datacenter-prod
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.164.0.23 2.62 GiB 256 100.0% e7e2a38a-d4f3-4758-a345-73fcffe26035 rack1
UN 10.164.0.24 2.61 GiB 256 100.0% 0c18b8e4-5ca2-4fb5-9e8c-663b74909fbb rack1
UN 10.164.0.58 2.62 GiB 256 100.0% 547c0746-72a8-4fec-812a-8b926d2426ae rack1
What is going on? Are the stats lying or is there an issue coming up?
EDIT: I was able to narrow the issue down to one of the nodes. The exporter on node 2 is showing:
cassandra_stats{cluster="Prod Cluster 2",datacenter="datacenter-prod",keyspace="iot_data",table="devices",name="org:apache:cassandra:metrics:table:iot_data:devices:writelatency:99thpercentile",} 268650.95
While node1 and node3 are like this:
cassandra_stats{cluster="Prod Cluster 2",datacenter="datacenter-prod",keyspace="iot_data",table="devices",name="org:apache:cassandra:metrics:table:iot_data:devices:writelatency:99thpercentile",} 10090.808
But still I dont know what is causing this on node2. It has no load, memory usage is fine as well?! Any ideas?
