1
votes

I am experiencing really poor performance with Cassandra 2.1.5. I am new to this so would appreciate any advice on how to debug. Here is what my table looks like:

Keyspace: nt_live_october                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Count: 6                                                                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Latency: 20837.149166666666 ms.                                                                                                                                                                 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Count: 39799                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Latency: 0.45696595391844014 ms.                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Pending Flushes: 0                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Table: nt                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable count: 12                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (live): 15903191275                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (total): 15971044770                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used by snapshots (total): 0                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Off heap memory used (total): 14468424                                                                                                                                                       x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable Compression Ratio: 0.1308103413354315                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Number of keys (estimate): 740                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable cell count: 43483                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable data size: 9272510                                                                                                                                                                  x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable off heap memory used: 0                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable switch count: 17                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read count: 6                                                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read latency: 20837.150 ms                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write count: 39801                                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write latency: 0.457 ms                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Pending flushes: 0                                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false positives: 0                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false ratio: 0.00000                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter space used: 4832                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter off heap memory used: 4736                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Index summary off heap memory used: 576                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compression metadata off heap memory used: 14463112                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition minimum bytes: 6867                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition maximum bytes: 30753941057                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition mean bytes: 44147544                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average tombstones per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum tombstones per slice (last five minutes): 0.0    

I am issuing the following query via cqlsh:

cassandra@cqlsh> TRACING ON;                                                                                                                                                                                          Tracing is already enabled. Use TRACING OFF to disable.                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY;                                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE.                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1;                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds

and here is what system_traces.events shows:

xxx.xxx.xxx.203 | 1281 | Parsing select * from nt_live_october.nt where group_id='254358'\nand epoch >=1443916800 and epoch<=1444348800\nLIMIT 30;
xxx.xxx.xxx.203 | 2604 | Preparing statement
xxx.xxx.xxx.203 | 8454 | Executing single-partition query on users
xxx.xxx.xxx.203 | 8474 | Acquiring sstable references
xxx.xxx.xxx.203 | 8547 | Merging memtable tombstones
xxx.xxx.xxx.203 | 8675 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 8685 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 9040 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 9056 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 9120 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 9854 | Read-repair DC_LOCAL
xxx.xxx.xxx.203 | 10033 | Executing single-partition query on users
xxx.xxx.xxx.203 | 10046 | Acquiring sstable references
xxx.xxx.xxx.203 | 10105 | Merging memtable tombstones
xxx.xxx.xxx.203 | 10189 | Key cache hit for sstable 1
xxx.xxx.xxx.203 | 10198 | Seeking to partition beginning in data file
xxx.xxx.xxx.203 | 10248 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
xxx.xxx.xxx.203 | 10261 | Merging data from memtables and 1 sstables
xxx.xxx.xxx.203 | 10296 | Read 1 live and 0 tombstone cells
xxx.xxx.xxx.203 | 12511 | Executing single-partition query on nt
xxx.xxx.xxx.203 | 12525 | Acquiring sstable references
xxx.xxx.xxx.203 | 12587 | Merging memtable tombstones
xxx.xxx.xxx.203 | 18067 | speculating read retry on /xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 18577 | Sending READ message to xxx.xxx.xxx.205/xxx.xxx.xxx.205
xxx.xxx.xxx.203 | 25534 | Partition index with 6093 entries found for sstable 8885
xxx.xxx.xxx.203 | 25571 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 34989 | Partition index with 5327 entries found for sstable 8524
xxx.xxx.xxx.203 | 35022 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 36322 | Partition index with 333 entries found for sstable 8477
xxx.xxx.xxx.203 | 36336 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 714242 | Partition index with 299251 entries found for sstable 8541
xxx.xxx.xxx.203 | 714279 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 715717 | Partition index with 501 entries found for sstable 8217
xxx.xxx.xxx.203 | 715745 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 716232 | Partition index with 252 entries found for sstable 8888
xxx.xxx.xxx.203 | 716245 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 87 | READ message received from /xxx.xxx.xxx.203
xxx.xxx.xxx.205 | 50427 | Executing single-partition query on nt
xxx.xxx.xxx.205 | 50535 | Acquiring sstable references
xxx.xxx.xxx.205 | 50628 | Merging memtable tombstones
xxx.xxx.xxx.205 | 170441 | Partition index with 35650 entries found for sstable 6332
xxx.xxx.xxx.203 | 30718026 | Partition index with 199905 entries found for sstable 5958
xxx.xxx.xxx.203 | 30718077 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 170499 | Seeking to partition indexed section in data file
xxx.xxx.xxx.205 | 248898 | Partition index with 30958 entries found for sstable 6797
xxx.xxx.xxx.205 | 248962 | Seeking to partition indexed section in data file
xxx.xxx.xxx.203 | 67814573 | Read timeout: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
xxx.xxx.xxx.203 | 67814675 | Timed out; received 0 of 1 responses

I have 4 nodes, with replication factor of 3(one node is very light but it's not .203) The data I'm trying to read isn't very much -- even if LIMIT 1 is not being pushed to the remote node, the low end of the interval should be about 3 hours ago (I have no epochs past the current time)

Any tips on how to fix this/what might be going wrong? My cassandra version is 2.1.9, running largely with defaults

Table schema is as follows (I can't publish the whole schema for privacy reasons, but showing the keys which I hope is the main thing that matters)

PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

___________EDIT_____________ to answer questions from below:

output of status:

--  Address         Load       Tokens  Owns    Host ID                               Rack                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN  xxx.xxx.xxx.204  15.8 GB    1       ?       32ed196b-f6eb-4e93-b759  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.205  20.38 GB   1       ?       446d71aa-e9cd-4ca9-a6ac  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.202  1.48 GB    1       ?       2a6670b2-63f2-43be-b672  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.203  15.72 GB   1       ?       dd26dfee-82da-454b-8db2  r1

The system.log is trickier as I have a lot of logging in there...one suspect thing I see is

 WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)   

but it's just a warning...shortly after I see

 INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,].  35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s.  21 total partitions merged to 18.  Partition merge counts were {1:17, 4:1, }

I see quite a few of these pairs in the log...but no ERROR level messages. Compaction seems to be going OK..it does say that this is the largest column family but all messages are INFO level....

1
Can you check the system.log usually in /var/log/cassandra/ and look for any warnings for large partitions, large amounts of tombstones, and nodes coming up and down? An output of nodetool status might help us as well.MarcintheCloud

1 Answers

2
votes

First, the DN status of the node 204 means down. Retrieve its system.log and look for :

  • Exceptions and ERROR level logs
  • Anormal GC activity ( collection longer than 200ms)
  • StatusLogger

Second, the data is badly distributed among the cluster. The load of 202 is only 1.48 GB. I suspect you have some very large partitions replicated on the other nodes. What is the replication factor ? What is the scheme of your keyspace ? You can answer these questions with cqlsh command :

DESCRIBE KEYSPACE nt_live_october;