Trouble tracing cassandra query

Question

I am experiencing really poor performance with Cassandra 2.1.5. I am new to this so would appreciate any advice on how to debug. Here is what my table looks like:

Keyspace: nt_live_october                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Count: 6                                                                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Read Latency: 20837.149166666666 ms.                                                                                                                                                                 x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Count: 39799                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Write Latency: 0.45696595391844014 ms.                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Pending Flushes: 0                                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Table: nt                                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable count: 12                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (live): 15903191275                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used (total): 15971044770                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Space used by snapshots (total): 0                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Off heap memory used (total): 14468424                                                                                                                                                       x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            SSTable Compression Ratio: 0.1308103413354315                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Number of keys (estimate): 740                                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable cell count: 43483                                                                                                                                                                   x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable data size: 9272510                                                                                                                                                                  x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable off heap memory used: 0                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Memtable switch count: 17                                                                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read count: 6                                                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local read latency: 20837.150 ms                                                                                                                                                             x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write count: 39801                                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Local write latency: 0.457 ms                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Pending flushes: 0                                                                                                                                                                           x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false positives: 0                                                                                                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter false ratio: 0.00000                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter space used: 4832                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Bloom filter off heap memory used: 4736                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Index summary off heap memory used: 576                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compression metadata off heap memory used: 14463112                                                                                                                                          x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition minimum bytes: 6867                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition maximum bytes: 30753941057                                                                                                                                               x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Compacted partition mean bytes: 44147544                                                                                                                                                     x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum live cells per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Average tombstones per slice (last five minutes): 0.0                                                                                                                                        x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Maximum tombstones per slice (last five minutes): 0.0

I am issuing the following query via cqlsh:

cassandra@cqlsh> TRACING ON;                                                                                                                                                                                          Tracing is already enabled. Use TRACING OFF to disable.                                                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> CONSISTENCY;                                                                                                                                                                                x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current consistency level is ONE.                                                                                                                                                                            x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cassandra@cqlsh> select * from nt_live_october.nt where group_id='254358' and epoch >=1444313898 and epoch<=1444348800 LIMIT 1;                                                                              x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OperationTimedOut: errors={}, last_host=XXX.203
Statement trace did not complete within 10 seconds

and here is what system_traces.events shows:

xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.205 | xxx.xxx.xxx.203 | xxx.xxx.xxx.203 | 1281 | Parsing select * from nt_live_october.nt where group_id='254358'\nand epoch >=1443916800 and epoch<=1444348800\nLIMIT 30;
2604 | Preparing statement
8454 | Executing single-partition query on users
8474 | Acquiring sstable references
8547 | Merging memtable tombstones
8675 | Key cache hit for sstable 1
8685 | Seeking to partition beginning in data file
9040 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
9056 | Merging data from memtables and 1 sstables
9120 | Read 1 live and 0 tombstone cells
9854 | Read-repair DC_LOCAL
10033 | Executing single-partition query on users
10046 | Acquiring sstable references
10105 | Merging memtable tombstones
10189 | Key cache hit for sstable 1
10198 | Seeking to partition beginning in data file
10248 | Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
10261 | Merging data from memtables and 1 sstables
10296 | Read 1 live and 0 tombstone cells
12511 | Executing single-partition query on nt
12525 | Acquiring sstable references
12587 | Merging memtable tombstones
18067 | speculating read retry on /xxx.xxx.xxx.205
18577 | Sending READ message to xxx.xxx.xxx.205/xxx.xxx.xxx.205
25534 | Partition index with 6093 entries found for sstable 8885
25571 | Seeking to partition indexed section in data file
34989 | Partition index with 5327 entries found for sstable 8524
35022 | Seeking to partition indexed section in data file
36322 | Partition index with 333 entries found for sstable 8477
36336 | Seeking to partition indexed section in data file
714242 | Partition index with 299251 entries found for sstable 8541
714279 | Seeking to partition indexed section in data file
715717 | Partition index with 501 entries found for sstable 8217
715745 | Seeking to partition indexed section in data file
716232 | Partition index with 252 entries found for sstable 8888
716245 | Seeking to partition indexed section in data file
87 | READ message received from /xxx.xxx.xxx.203
50427 | Executing single-partition query on nt
50535 | Acquiring sstable references
50628 | Merging memtable tombstones
170441 | Partition index with 35650 entries found for sstable 6332
30718026 | Partition index with 199905 entries found for sstable 5958
30718077 | Seeking to partition indexed section in data file
170499 | Seeking to partition indexed section in data file
248898 | Partition index with 30958 entries found for sstable 6797
248962 | Seeking to partition indexed section in data file
67814573 | Read timeout: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses.
67814675 | Timed out; received 0 of 1 responses

I have 4 nodes, with replication factor of 3(one node is very light but it's not .203) The data I'm trying to read isn't very much -- even if LIMIT 1 is not being pushed to the remote node, the low end of the interval should be about 3 hours ago (I have no epochs past the current time)

Any tips on how to fix this/what might be going wrong? My cassandra version is 2.1.9, running largely with defaults

Table schema is as follows (I can't publish the whole schema for privacy reasons, but showing the keys which I hope is the main thing that matters)

PRIMARY KEY (group_id, epoch, group_name, auto_generated_uuid_field)
) WITH CLUSTERING ORDER BY (epoch ASC, group_name ASC, auto_generated_uuid_field ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

___________EDIT_____________ to answer questions from below:

output of status:

--  Address         Load       Tokens  Owns    Host ID                               Rack                                                                                                                    x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DN  xxx.xxx.xxx.204  15.8 GB    1       ?       32ed196b-f6eb-4e93-b759  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.205  20.38 GB   1       ?       446d71aa-e9cd-4ca9-a6ac  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.202  1.48 GB    1       ?       2a6670b2-63f2-43be-b672  r1                                                                                                                      x~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
UN  xxx.xxx.xxx.203  15.72 GB   1       ?       dd26dfee-82da-454b-8db2  r1

The system.log is trickier as I have a lot of logging in there...one suspect thing I see is

 WARN [CompactionExecutor:6] 2015-10-08 19:44:16,595 SSTableWriter.java (line 240) Compacting large partition nt_live_october/nt:254358 (230692316 bytes)

but it's just a warning...shortly after I see

 INFO [CompactionExecutor:6] 2015-10-08 19:44:16,642 CompactionTask.java (line 274) Compacted 4 sstables to [/cassandra/data_dir_d/nt_live_october/nt-72813b106b9111e58f1ea1f0942ab78d/nt_live_october-nt-ka-9024,].  35,733,701 bytes to 30,186,394 (~84% of original) in 34,907ms = 0.824705MB/s.  21 total partitions merged to 18.  Partition merge counts were {1:17, 4:1, }

I see quite a few of these pairs in the log...but no ERROR level messages. Compaction seems to be going OK..it does say that this is the largest column family but all messages are INFO level....

Can you check the system.log usually in /var/log/cassandra/ and look for any warnings for large partitions, large amounts of tombstones, and nodes coming up and down? An output of nodetool status might help us as well. — MarcintheCloud

Adpi2 Adpi2 · Accepted Answer · 2015-10-09T10:34:33

First, the DN status of the node 204 means down. Retrieve its system.log and look for :

Exceptions and ERROR level logs
Anormal GC activity ( collection longer than 200ms)
StatusLogger

Second, the data is badly distributed among the cluster. The load of 202 is only 1.48 GB. I suspect you have some very large partitions replicated on the other nodes. What is the replication factor ? What is the scheme of your keyspace ? You can answer these questions with cqlsh command :

DESCRIBE KEYSPACE nt_live_october;

Trouble tracing cassandra query

1 Answers