query cassandra table with timestamp range without using allow filtering

Question

I want to add secondary index of type timestamp, in order to be able to query by timestamp range without 'ALLOW FILTERING'. I can't use timeuuid, because the timestamp is different than now(). I have datetime which I should convert into timestamp.

I want to query by time, for example: session.execute("SELECT id from my_table WHERE date>'2019-09-01' AND date<'2019-09-04'").

Any suggestions of what is the right way of doing it? or how can I use timuuid from python with an input of the datetime? (convert datetime to uuid field)

Thanks.

my_table = session.execute(""" CREATE TABLE IF NOT EXISTS my_keyspace.my_table
                            (id text,
                            date timestamp,
                            PRIMARY KEY (id) 
                            )""")

session.execute("CREATE INDEX time_idx ON my_keyspace.my_table (date)""")

Aaron Aaron · Accepted Answer · 2019-09-09T12:41:01

Whenever I see a table in Cassandra with a single primary key on a nigh-unique column (ex: PRIMARY KEY (id)) I foresee problems like this happening. You're right about how adding secondary indexes is definitely an anti-pattern. My question to these designs is "Do you ever query by id?" Because if you don't it shouldn't be the sole primary key.

I would definitely partition your table on something else. Essentially, data that you want to query together, you also want to store together.

Let's say that for your use case you really only care about data for the last month. Of course, this also assumes that a month's worth of data on a single partition doesn't violate Cassandra's limitation of 2 billion cells per partition. But it'll make for a good example, so I'd build your table like this:

CREATE TABLE data_by_month (
    id TEXT,
    date TIMESTAMP,
    month_bucket INT,
    value1 TEXT,
    value2 TEXT,
    PRIMARY KEY ((month_bucket),date,id));

This is a modeling technique called "time bucketing." In storing all data for each month in the same "bucket," I can now run queries like this:

aaron@cqlsh:stackoverflow> SELECT * from data_by_month
    WHERE date>'2019-09-01' AND date<'2019-09-04' AND month_bucket=201909 ;

 month_bucket | date                            | id | value1 | value2
--------------+---------------------------------+----+--------+--------
       201909 | 2019-09-02 23:21:00.000000+0000 | 4d |    456 |    abc
       201909 | 2019-09-03 12:34:00.000000+0000 | 1a |    123 |    abc

(2 rows)

query cassandra table with timestamp range without using allow filtering

1 Answers