Cassandra nodetool status inconsistent on different nodes with too many pending compaction tasks

Question

I have a cassandra 2.0.6 cluster with four nodes. The cassandra suffered from inconsistency problem. I use nodetool status to check the status on each nodes. The results are inconsistent. Besides this status command runs very slow. The followings are command result on each node.

Nodes with ip 192.168.148.181 and 192.168.148.121 are seed nodes. The cluster never run repair before.

Besides, the cpu usage on 181 and 121 is really high, and the log shows CMS GC is very frequent on these nodes. I disconnect all the clients and there are no read and write loads. This consistency and high GC persists.

So how to debug and optimize this cluster?

[cassandra@whaty121 apache-cassandra-2.0.16]$ time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
UN  192.168.148.121  10.86 GB   1       25.0%  1d9ba597-c404-481f-af2b-436493c57227  RAC2
UN  192.168.148.181  10.53 GB   1       25.0%  5d90300f-2fb4-4065-9819-10ece285223d  RAC1
DN  192.168.148.182  10.95 GB   1       25.0%  bcb550df-9429-4cae-9fd2-0bfeea9a5649  RAC4
UN  192.168.148.221  10.49 GB   1       25.0%  6867f8b4-1f54-48fc-aaae-da71bc251970  RAC3

real    8m50.506s
user    39m48.718s
sys     76m48.566s
--------------------------------------------------------------------------------
[cassandra@whaty221 apache-cassandra-2.0.16]$  time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
DN  192.168.148.121  10.86 GB   1       25.0%  1d9ba597-c404-481f-af2b-436493c57227  RAC2
UN  192.168.148.181  10.53 GB   1       25.0%  5d90300f-2fb4-4065-9819-10ece285223d  RAC1
DN  192.168.148.182  10.95 GB   1       25.0%  bcb550df-9429-4cae-9fd2-0bfeea9a5649  RAC4
UN  192.168.148.221  10.49 GB   1       25.0%  6867f8b4-1f54-48fc-aaae-da71bc251970  RAC3

real    0m15.075s
user    0m1.606s
sys     0m0.393s
----------------------------------------------------------------------
[cassandra@whaty181 apache-cassandra-2.0.16]$  time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
DN  192.168.148.121  10.86 GB   1       25.0%  1d9ba597-c404-481f-af2b-436493c57227  RAC2
UN  192.168.148.181  10.53 GB   1       25.0%  5d90300f-2fb4-4065-9819-10ece285223d  RAC1
UN  192.168.148.182  10.95 GB   1       25.0%  bcb550df-9429-4cae-9fd2-0bfeea9a5649  RAC4
UN  192.168.148.221  10.49 GB   1       25.0%  6867f8b4-1f54-48fc-aaae-da71bc251970  RAC3

real    0m25.719s
user    0m2.152s
sys     0m1.228s
-------------------------------------------------------------------------

[cassandra@whaty182 apache-cassandra-2.0.16]$  time bin/nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address          Load       Tokens  Owns   Host ID                               Rack
DN  192.168.148.121  10.86 GB   1       25.0%  1d9ba597-c404-481f-af2b-436493c57227  RAC2
DN  192.168.148.181  10.53 GB   1       25.0%  5d90300f-2fb4-4065-9819-10ece285223d  RAC1
UN  192.168.148.182  10.95 GB   1       25.0%  bcb550df-9429-4cae-9fd2-0bfeea9a5649  RAC4
DN  192.168.148.221  10.49 GB   1       25.0%  6867f8b4-1f54-48fc-aaae-da71bc251970  RAC3

real    0m17.581s
user    0m1.843s
sys     0m1.632s

I print the object details of gc:

 num     #instances         #bytes  class name
----------------------------------------------
   1:      58584535     1874705120  java.util.concurrent.FutureTask
   2:      58585802     1406059248  java.util.concurrent.Executors$RunnableAdapter
   3:      58584601     1406030424  java.util.concurrent.LinkedBlockingQueue$Node
   4:      58584534     1406028816  org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
   5:        214682       24087416  [B
   6:        217294       10430112  java.nio.HeapByteBuffer
   7:         37591        5977528  [C
   8:         41843        5676048  <constMethodKlass>
   9:         41843        5366192  <methodKlass>
  10:          4126        4606080  <constantPoolKlass>
  11:        100060        4002400  org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
  12:          4126        2832176  <instanceKlassKlass>
  13:          4880        2686216  [J
  14:          3619        2678784  <constantPoolCacheKlass>

I used nodetool cfstats on one node and found that many compactions tasks have been accumulated in 3 days (I restarted the cluster 3 days ago)

[cassandra@whaty181 apache-cassandra-2.0.16]$ bin/nodetool compactionstats
pending tasks: 64642341
Active compaction remaining time :        n/a

I checked compactionhistory. Here is part of results. It shows many records related to keyspace system.

Compaction History: 
id                                       keyspace_name      columnfamily_name            compacted_at              bytes_in       bytes_out      rows_merged
8e4f8830-b04f-11e5-a211-45b7aa88107c     system             sstable_activity             1451629144115             3342           915            {4:23}
96a6fcb0-b04b-11e5-a211-45b7aa88107c     system             hints                        1451627440123             18970740       18970740       {1:1}
7c42c940-adac-11e5-8bd4-45b7aa88107c     system             hints                        1451339203540             56969835       56782732       {2:3}
585b97a0-ad98-11e5-8bd4-45b7aa88107c     system             sstable_activity             1451330553370             3700           956            {4:24}
aefc3f10-b1b2-11e5-a211-45b7aa88107c     system             sstable_activity             1451781670273             3201           906            {4:23}
1e76f1b0-b180-11e5-a211-45b7aa88107c     system             sstable_activity             1451759952971             3303           700            {4:23}
e7b75b70-aec2-11e5-8bd4-45b7aa88107c     system             hints                        1451458783911             57690316       57497847       {2:3}
ad102280-af6d-11e5-b1dc-45b7aa88107c     webtrn_study_log_formallySCORM_STU_COURSE             1451532129448             45671877       41137664       {1:11, 3:1, 4:8}
60906970-aec7-11e5-8bd4-45b7aa88107c     system             sstable_activity             1451460704647             3751           974            {4:25}
88aed310-ad91-11e5-8bd4-45b7aa88107c     system             hints                        1451327627969             56984347       56765328       {2:3}
3ad14f00-af6d-11e5-b1dc-45b7aa88107c     webtrn_study_log_formallySCORM_STU_COURSE             1451531937776             46696097       38827028       {1:8, 3:2, 4:9}
84df8fb0-b00f-11e5-a211-45b7aa88107c     system             hints                        1451601640491             18970740       18970740       {1:1}
657482e0-ad33-11e5-8bd4-45b7aa88107c     system             sstable_activity             1451287196174             3701           931            {4:24}
9cc8af70-b24a-11e5-a211-45b7aa88107c     system             sstable_activity             1451846923239             3134           773            {4:23}
dcbe5e30-afd0-11e5-a211-45b7aa88107c     system             sstable_activity             1451574729619             3357           790            {4:23}
b285ced0-afa0-11e5-84e3-45b7aa88107c     system             hints                        1451554042941             43310718       42137761       {1:1, 2:2}
119770e0-ad4e-11e5-8bd4-45b7aa88107c     system             hints                        1451298651886             57397441       57190519       {2:3}
f1bb37a0-b204-11e5-a211-45b7aa88107c     system             hints                        1451817000986             17713746

I tried to flush the node with high gc, but it returned failure with reading timeout.

The cluster just receives data to insert. I shutdown the client writes and restarted the cluster these 3 days. The compaction tasks still accumulates.

Adpi2 Adpi2 · Accepted Answer · 2015-12-31T11:01:23

Inconsistency in nodetool status output is nothing to worry about. It is the result of having a lot of GCs. During GC a node is considered to be down by other nodes gossipers. Then when you got a lot of GCs, nodes are switching from DN to UN very quickly.

You must understand what is taking so much space in the java heap

Do you have any StatusLogger in cassandra logs ?
Using nodetool cfstats, do you see any system.hints ? Hints are mutations that have been delayed by the coordinator to be delivered later, when the load is lower. If your cluster have accumulated a lot of hints it would pressure the heap and cause GCs.
Is there any compaction going through ? nodetool compaction stats
Does flushing all column families cool down your cluster ? nodetool flush on all nodes for each ks and cf

Cassandra nodetool status inconsistent on different nodes with too many pending compaction tasks

1 Answers