5
votes

I have a list of unordered events and my task is to store first and last occurrences for them.

I have following column family in Cassandra:

CREATE TABLE events (
   event_name TEXT,
   first_occurrence BIGINT,
   last_occurrence BIGINT,
   PRIMARY KEY (event_name)
);

So if I have an event with the name "some_event" and occurrence with 123456, what I want to do is something which in MySQL terms would look like this:

INSERT INTO events (event_name, first_occurence, last_occurence)
VALUES ('some_event', 123456, 123456)
ON DUPLICATE KEY UPDATE 
     first_occurrence = LEAST(first_occurrence, 12345), 
     last_occurrence = GREATEST(last_occurrence, 123456)

I was going to use lightweight transactions in Cassandra to accomplish it, like this:

INSERT INTO events(event_name, first_occurrence, last_occurrence) VALUES ('some_event', 12345, 12345) IF NOT EXISTS;
UPDATE events SET first_occurrence = 123456 WHERE event_name='some_event' IF first_occurrence > 123456;
UPDATE events SET last_occurrence = 123456 WHERE event_name='some_event' IF last_occurrence < 123456;

But as it turns out, CQL3 does not allow < and > operators in lightweight transactions IF clause.

So my question is, what is the pattern for doing such conditional updates?

2

2 Answers

4
votes

What version of cassandra are you running? Support for non-equal conditions with LWTs was added in 2.1.1 via CASSANDRA-6839:

cqlsh:test> UPDATE events SET first_occurrence = 123456 WHERE event_name='some_event' IF first_occurrence > 1;

[applied]
-----------
  True
2
votes

Cassandra does not do read before write with only 2 exceptions - counters and "lightweight transactions". As a result you will not be able to reliably implement your scenario directly in Cassandra. Even if you read the values out and then make an update based on these values, you may overwrite someone else's changes, since there is no locking and isolation in Cassandra, and eventual consistency makes it even worse.

So if you need to implement something like this, you will need to do it outside of Cassandra. Create a synchronization layer which will provide a central point for Cassandra writes, and make this layer responsible for the logic. Just make sure that no writes are making it around this layer.