2
votes

I've the following table definition in Cassandra

CREATE TABLE mytable
(
 colA text,
 colB text,
 startdate timestamp,
 colC text,
 colD text,
 colE text,
 PRIMARY KEY ((colA, colB, startdate), colC)
 ) WITH
 bloom_filter_fp_chance=0.100000 AND
 caching='KEYS_ONLY' AND
 dclocal_read_repair_chance=0.000000 AND
 gc_grace_seconds=864000 AND
 index_interval=128 AND
 read_repair_chance=0.100000 AND
 replicate_on_write='true' AND
 populate_io_cache_on_flush='false' AND
 default_time_to_live=0 AND
 speculative_retry='99.0PERCENTILE' AND
 memtable_flush_period_in_ms=0 AND
 compaction={'class': 'LeveledCompactionStrategy'} AND
 compression={'chunk_length_kb': '64', 'sstable_compression':      'DeflateCompressor'};

 CREATE INDEX colDIdx ON mytable (colD);
 CREATE INDEX colEIdx ON mytable (colE);

This table has hardly 400 records. When I run the following query from cqlsh prompt:

SELECT * FROM mytable WHERE colA = 'colAValue' AND colB = 'colBValue' AND startdate = 1418947200000 and colD = 'XYZ' and colE = 'ABC' ALLOW FILTERING;

Then I get the following error message and query doesn't return a result.

"Request did not complete within rpc_timeout"

However, when I remove the last 2 filter criteria, colD and colE, then the query runs successfully.

I don't know what is the issue in using secondary indexed columns in filter criteria.

2
It is safe to assume any query that uses "ALLOW FILTERING" will not work in anything but toy data or poking around for development, its not there for actual use. - Chris Lohfink
To get to the bottom of why this particular query might be slow, try turning on tracing and check /var/log/cassandra/system.log for errors. - mildewey

2 Answers

2
votes

I will probably sound not very original if I start with the statement that using secondary indexes with Cassandra is not really recommended.

The way secondary indexes work, is that, in general, they are implemented as just tables, but they are not distributed around the ring. You can read more about the secondary indexes here. This means that adding a search by a secondary index automatically adds calls to all nodes in the ring, then combining the results, and only then filters based on the primary key.

So this is why queries by the primary keys are lightning fast as opposed to queries with secondary index filtering.

Why is it timing out with only 400 records - it is another question. I assume that timeout values have not been changed from their default 10000 ms. Well, my guess is that the JVM heap sizes could be too small, and since all index data needs to be loaded into memory to be processed, gc pauses may kill the query. You may want to check what happens with your garbage collection.

HTH

2
votes

on tracing and check the log in system.log file. Basically this error comes when there are too many tombstones .Run nodetool compact on your cassandra cluster and check it again.