0
votes

I am using Cassandra 3.x and have the following table:

CREATE TABLE sp_db.prod_tag 
(
    year int,
    name text,
    group int,
    sn text,
    factory_id bigint,
    PRIMARY KEY ((year), name, group)
) WITH CLUSTERING ORDER BY (name ASC, group ASC);

I have inserted 1 million rows and I started to perform queries:

Query #1 - on partition key

select count(*) 
from sp_db.prod_tag  
where year = 2015;

Output:

    count |
    ------|
    33328 |

Query time: ~90ms

Query #2 - one of clustering keys

select count(*) 
from sp_db.prod_tag 
where group = 104 ALLOW FILTERING;

Result:

    count |
    ------|
    3938  |

Query time: ~800ms

We don't pass primary key, therefore Cassandra gets all the rows from the table and it have performance impact.

So far so good...

Querying first from clustering key:

select count(*) 
from sp_db.prod_tag 
where name = '06d7d' ALLOW FILTERING;

Results:

    count |
    ------|
    3     |

Query time: ~6ms

And here is my surprise: why only 6ms? There is no primary key, so this query should have poor performance.

Studying documentation I did not find the answer.

Can anyone explain this?

1

1 Answers

1
votes

There's no concrete reason why the query on name should be faster than the query on group, assuming you have no secondary index on name.

I suspect, but can not prove, that it's a primarily the result of caching at various layers (including the OS page cache).

Edit: re-read the schema

The data will be sorted on disk sorted first by name, then group, so within each partition, it’ll be able to skip to the relevant block of data by name - yes, this is expected and normal.