2
votes

I had a 21 Node Cluster (C* 2.2) of m4.2xlarges, each with 5 Volumes of 1TB SSDs.

When it was 50% full (each node at 500GB * 5 = 2.5 TB), I realised I needed more space so I added a new node.

This new node joined the cluster (from UJ to UN), however the disk usage was at 4.2TB.

I figured this was due to compactions lagging behind and waited for a few days. The disk usage did not change even though there were compactions taking place. The new box was really CPU bound, so I bumped it up to a Compute optimised c4.8xlarge box and cranked up the concurrent_compactions to 20 and disabled compaction_throughput throttling to get this done.

In the mean time I stopped all the writes to the cluster. # of pending Compactions are just going up and up and the data on the disk is not going down.

What am I doing wrong? System time looks really high. I am using org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy and Current compaction thresholds are min = 4, max = 32

When I do strace -f -c -p cassandra-pid > strace_count:

    % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.57 7431.363672      140392     52933     17755 futex
 30.22 4530.012667      482481      9389           epoll_wait
 11.33 1697.685882     2143543       792           recvfrom
  3.68  551.306817        1596    345500         7 write
  3.58  537.257283    14138350        38        33 restart_syscall
  0.78  117.381206      111262      1055           poll
  0.28   41.738677         636     65675           lseek
  0.14   21.138626        1659     12741           pread
  0.10   15.189009        1838      8265           read
  0.07    9.898101         696     14229           sched_yield
  0.06    8.984107       23831       377           sendto
  0.04    6.148230        9759       630           munmap
  0.04    5.760339       21902       263           mprotect
  0.02    3.154839         992      3181       359 fadvise64
  0.02    3.107529         652      4769       215 stat
  0.01    2.006363      167197        12           msync
  0.01    1.956998        7040       278           mmap
  0.01    1.838682        1155      1592         8 unlink
  0.01    1.080512         602      1794           lstat
  0.01    0.861741         578      1490           close
  0.00    0.626903         562      1116           open
  0.00    0.596450         588      1014           fcntl
  0.00    0.440250         644       684           fstat
  0.00    0.318874         630       506           epoll_ctl
  0.00    0.249772        4625        54           fdatasync
  0.00    0.149440        1660        90           fsync
  0.00    0.093154         647       144           rename
  0.00    0.069017         575       120           statfs
  0.00    0.018136         356        51           getpriority
  0.00    0.014358         598        24           rt_sigprocmask
  0.00    0.011584         161        72           times
  0.00    0.009858         616        16           setsockopt
  0.00    0.009396         940        10           link
  0.00    0.008072          24       336         7 rt_sigreturn
  0.00    0.004960        1240         4           getsockopt
  0.00    0.004926         411        12           sched_getaffinity
  0.00    0.004503         500         9           dup2
  0.00    0.002998         500         6           madvise
  0.00    0.002693         449         6           set_robust_list
  0.00    0.002597         433         6           accept
  0.00    0.002000         333         6           clone
  0.00    0.002000         500         4         2 accept4
  0.00    0.001243         207         6           gettid
  0.00    0.001000         500         2           writev
  0.00    0.001000         500         2           recvmsg
  0.00    0.001000         143         7           getsockname
  0.00    0.001000         500         2           getpeername
  0.00    0.001000         167         6         6 setpriority
  0.00    0.000000           0         1           socket
  0.00    0.000000           0         1           bind
------ ----------- ----------- --------- --------- ----------------
100.00 14990.519464                529320     18392 total

When I do top - 1:

Tasks: 1506 total,   8 running, 1496 sleeping,   0 stopped,   2 zombie
Cpu0  :  0.3%us, 47.3%sy, 10.5%ni, 41.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.7%us, 87.6%sy, 11.7%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  3.2%us, 65.0%sy,  0.0%ni, 31.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 11.6%us, 39.9%sy,  0.0%ni, 48.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  1.0%us, 55.3%sy,  9.2%ni, 34.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.3%us, 98.0%sy,  1.7%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.4%us, 90.7%sy,  1.4%ni,  6.8%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  3.4%us, 20.2%sy,  9.4%ni, 64.0%id,  3.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  1.7%us, 24.9%sy,  0.3%ni, 73.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.7%us, 79.4%sy,  0.7%ni, 18.9%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.7%us, 64.9%sy, 13.6%ni, 14.0%id,  6.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  1.0%us, 50.7%sy,  0.0%ni, 18.6%id, 29.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.3%us, 58.9%sy,  0.0%ni, 40.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.3%us, 72.5%sy, 26.8%ni,  0.0%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us, 50.2%sy, 49.8%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :  0.3%us, 54.2%sy,  0.0%ni, 40.5%id,  5.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.7%us, 46.3%sy, 19.9%ni, 24.0%id,  9.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 :  0.7%us, 68.9%sy,  0.0%ni, 30.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :  5.7%us,  3.4%sy,  0.0%ni, 90.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :  0.7%us, 44.4%sy,  0.0%ni, 54.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu21 :  1.3%us, 67.8%sy,  0.0%ni, 30.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu22 :  0.7%us, 45.5%sy,  7.3%ni, 42.9%id,  3.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 :  1.3%us, 22.7%sy,  0.0%ni, 75.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu24 :  0.0%us, 65.4%sy,  0.0%ni, 34.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu25 :  0.0%us, 62.0%sy, 12.2%ni, 25.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu26 :  1.3%us, 68.9%sy, 12.6%ni, 17.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu27 :  0.0%us, 64.3%sy, 12.9%ni, 22.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu28 :  0.0%us, 75.8%sy,  0.0%ni, 23.5%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu29 :  0.0%us, 60.3%sy,  1.7%ni, 37.4%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu30 :  0.3%us, 48.3%sy, 12.7%ni, 38.0%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu31 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu32 :  0.0%us, 72.1%sy, 25.2%ni,  0.0%id,  2.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu33 :  0.0%us,100.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu34 :  0.3%us, 66.7%sy,  0.0%ni, 33.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu35 :  0.0%us, 67.7%sy,  0.0%ni, 32.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  61820728k total, 61610932k used,   209796k free,      456k buffers
Swap:        0k total,        0k used,        0k free, 35425968k cached

nodetool compactionstats

      pending tasks: 281
  id   compaction type     keyspace               table     completed         total    unit   progress
  id        Compaction   keyspace_1         table_____4    1591902797    2851758523   bytes     55.82%
  id        Compaction   keyspace_1         table_____1     193582898     567222689   bytes     34.13%
  id        Compaction   keyspace_1         table_____2     187022078    2264168754   bytes      8.26%
  id        Compaction   keyspace_1         table_____1   22841754587   24781014960   bytes     92.17%
  id        Compaction   keyspace_1         table_____5     764633368    3904191508   bytes     19.58%
  id        Compaction   keyspace_1         table_____1    1856076066    2326634436   bytes     79.78%
  id        Compaction   keyspace_1         table_____7     254856804     499133271   bytes     51.06%
  id        Compaction   keyspace_1         table_____8    1406859449    1803885628   bytes     77.99%
  id        Compaction   keyspace_1         table_____7    1734201253    2308801656   bytes     75.11%
  id        Compaction   keyspace_1         table_____1     656195289     931867447   bytes     70.42%
  id        Compaction   keyspace_1         table_____1     657036608    1380870812   bytes     47.58%
  id        Compaction   keyspace_1         table_____1     235054945   18957522878   bytes      1.24%
  id        Compaction   keyspace_1         table____10       2351049       3552009   bytes     66.19%
  id        Compaction   keyspace_1         table_____2     810635522     867307196   bytes     93.47%
  id        Compaction   keyspace_1         table_____5     281573682     780375396   bytes     36.08%
  id        Compaction   keyspace_1         table_____6    2350396501    2398745060   bytes     97.98%
  id        Compaction   keyspace_1         table_____1      63122362     434443651   bytes     14.53%
  id        Compaction   keyspace_1         table_____3     287859748     399896319   bytes     71.98%
  id        Compaction   keyspace_1         table_____2    1776310557    2685522257   bytes     66.14%
  id        Compaction   keyspace_1         table_____1     494183426   22432529013   bytes      2.20%

nodetool compactionhistory: There is a lot of lines here, but here is a sample:

id  datatype index        1492056758751             558756         540336         {1:175, 2:6}
id  datatype index     1492075503279             128269         114446         {1:1160, 2:31}
id  datatype index     1492072165446             22914902       22464994       {1:626, 2:37}
id  datatype index   1492060375419             73514456       72842367       {1:398795, 2:7294, 3:300}
id  datatype index    1492075160893             85707          64387          {1:236, 2:41}
id  datatype index      1492151303774             139172156      134666782      {1:9129, 2:3313, 3:935, 4:112}
id  datatype index    1492135037619             30839157       29690968       {1:32854, 2:5702, 3:535, 4:61}
id  datatype index   1492075521048             255030         253531         {1:220, 2:6}
id  datatype index        1492116936213             11391100       10943344       {1:6798, 2:301}
id  datatype index    1492075649703             1527580        1486442        {1:5381, 2:330}
id  datatype index          1492153054713             218401839      216306589      {1:6669, 2:1068, 3:273, 4:22}
id  datatype index   1492169550324             9172160        8724129        {1:42943, 2:2390}
id  datatype index   1492087845445             8086487        7810261        {1:8445, 2:1209, 3:95}
id  datatype index    1492116806390             837169         806946         {1:5984, 2:262}
id  datatype index   1492167939189             275277987      271618327      {1:38585, 2:18745, 3:494}
id  datatype index             277471932      266321389      {1:47184, 2:16047, 3:367, 4:468}
id  datatype index        1492116559239             1569590        1402724        {1:460, 2:62}
id  datatype index 1492173763782             83298080       81977056       {1:36383, 2:7577, 3:3565, 4:95, 6:169}
id  datatype index      1492158247355             42660621       40224352       {1:6565, 2:987, 3:316, 4:521, 6:17, 8:70}
id  datatype index      1492179061558             589874248      568901949      {1:16726, 2:9342, 3:1149, 4:141}
id  datatype index        1492190014331             807975203      786973389      {1:67311, 2:1852}
id  datatype index      1491949569125             45499223       46212100       {1:3944, 2:523, 3:1268, 4:262}
id  datatype index   1492063798113             2401           1134           {1:1, 2:3}
id  datatype index   1492100603829             7693737        7507021        {1:7112, 2:870, 3:235, 4:27}
id  datatype index 1492202653921             114122963      111721885      {1:2038, 2:2997, 3:1095, 4:48, 5:40}
id  datatype index      1492063653695             60700          50728          {1:157, 2:12}
id  datatype index 1492152115922             165656033      159591156      {1:5180, 2:3233, 3:600, 4:564, 5:37, 6:14, 7:12}
id  datatype index   1492160511587             3353867375     3280857307     {1:12265239, 2:409303, 3:16391, 4:1932}
id  datatype index        1492116638632             3226315        2863672        {1:956, 2:137}
id  datatype index 1492050334458             64407          56620          {1:447, 2:31}
id  datatype index      1492150640640             587181         424081         {1:1293, 2:218, 3:1}
id  datatype index   1492116731210             429668507      407404356      {1:2208562, 2:131875, 3:338}
id  datatype index 1492134210449             293003702      275992426      {1:7429, 2:1686, 3:165}
id  datatype index    1492171984560             8467649        8318775        {1:13330, 2:892, 3:11}
id  datatype index    1492150632348             424314         368270         {1:356, 2:72, 3:8}
id  datatype index          1492068676918             677842865      653983357      {1:11042, 2:405}
id  datatype index 1492160695008             11985228       11689655       {1:3684, 2:1390, 3:441, 4:87}
id  datatype index             5906438        5731218        {1:7040, 2:445, 3:27}
id  datatype index        1492132529903             234019313      220261439      {1:80014, 2:5316}
id  datatype index             1646302        1634070        {1:575, 2:17, 3:5}
id  datatype index   1492145903652             1544764        1527844        {1:1807, 2:295, 3:65, 4:3, 5:6}
id  datatype index   1492075180569             1034277        986605         {1:6591, 2:235}
id  datatype index   1491928723944             5823014        5811907        {1:6498}
id  datatype index    1492075323943             573147         526857         {1:4395, 2:250}
2
What type of compaction are you using? What Version of C* can ypu provide nodetool compactionhistory also what is your current compaction threshold? - Marko Å valjek
I am using SizeTieredCompactionStrategy and Current compaction thresholds are min = 4, max = 32 also added compactionhistory above - emraldinho1986

2 Answers

1
votes

Your new node should close the compaction gap eventually...

CPU is not the only bound in compactions, check compaction_throughput_mb_per_sec param, and review this article: https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsConfigureCompaction.html

Please review your nodetool compactionstats, and see if the number of pending tasks decreasing over time. Also, please attach output of nodetool cfstats here.

As an alternative, you can try to re-add the new node, with auto_bootstrap off, and running nodetool rebuild afterwards, and repair lately, it should be faster in your case.

EDIT:

After reviewing your compactionstats - try to decrease concurrent_compactors property to a lower value. It will take more time to execute, but should do less impact on overall cluster performance.

0
votes

If you notice the bytes_in and bytes_out for your completed transactions, there is not much of a gap which is why even after completing so many compactions you would not see a drastic change in your disk space utilization.

Note: you should also consider using Leveled compaction strategy if it suits your use case as it has a number of advantages over Size-tiered. Leveled compaction usually works best for most of the use cases. Here is a great block describing when to use one over other. http://www.datastax.com/dev/blog/when-to-use-leveled-compaction