I have a requirement to answer the following queries:
- return the number of new customers per quarter (up to 36 months)
- list the new customers per quarter (up to 36 months)
I've created the following table in Cassandra to deal with this:
CREATE TABLE first_purchase_by_shopper_date
(
shop_id uuid,
shopper_id uuid,
dt_first_purchase timestamp,
... (some text fields)
PRIMARY KEY ((shop_id, shopper_id))
);
In order to be able to answer this query in Cassandra, I need to be able to filter this data on the dt_first_purchase field.
But if I add dt_first_purchase to the primary key, then it makes the row non-unique to a shopper - and therefore we get multiple entries in the table - but we only ever want one entry per shopper.
so my insert statement would look like
Insert into first first_purchase_by_shopper_date (shop_id, shopper_id, dt_first_purchase, ... ) Values(...) If Not Exists;
The if not exists at the end ensures that the entry is only written if none exists already (e.g. no update is performed on an existing record.)
How can I filter by date on this table - is a secondary index on the dt_first_purchase column my only option - and isn't this undesirable?