CQL partial key filter

Question

I have a table in Cassandra where the key (now looking from a business perspective) is structure like this. Example of a key + values:

Key (exists of 6 columns)             Date/value
A | B | C | D | E | F |    -> 2000-01 : 100, 2000-02 : 220, ....
A | B | C | D | X | F |    -> 2000-01 : 100, 2000-02 : 233, ....
A | B | C | D | Y | F |    -> 2000-01 : 111, 2000-02 : 210, ....
A | Z | C | D | E | F |    -> 2000-01 : 122, 2000-02 : 230, ....

Each key has a values per certain date or month (the amount of those date/value records per key is very low. Arround 200 for now). However, the amount of the keys is high.

From a business side, it's very unusal to query only for one specific key like A B C D E F. The user will apply only partial filtering like:

* * C D * *

In this case query should return all 6 records. He might also want to filter on the date/month, however, given the low amount of data this is a lower priority.

Since CQL does not allow partial table key filtering (beside the ALLOW FILTERING), I'm not sure how to structure my table. Any ideas? Or is this a case which does not fit well into Cassandra?

THank you

Jon Haddad Jon Haddad · Accepted Answer · 2016-01-06T23:36:49

When modeling with Cassandra, you'll need to create a table for each way you want to query your data. So if you want to query by D=some_val, you'll need another table that can answer that query - that is, it'll have to have D as it's partition key.

Your other option is to do full table scans and manually filter, or to use something like Spark to work with your entire dataset. I wrote a blog post a little while ago that shows what you can do with Spark & Cassandra. http://rustyrazorblade.com/2015/07/cassandra-pyspark-dataframes-revisted/

CQL partial key filter

1 Answers