0
votes

I have a task that insert data into a cassandra table, but i donnot want to overwrite the records that have been inserted before, but the insert cql would overwrite existing data.

Fortunately, I find a operation of 'USING TIMESTAMP ', cql with larger timestamp will overwrite the smaller one, otherwise not. So with 'USING TIMESTAMP' i could use a custom timestamp to determine whether overwrite or not. It works fine in Cqlsh.

But it fails in python-cassandra-driver, how to make 'USING TIMESTAMP' work in python-cassandra-driver? My code is as follow:

insert_sql = ("INSERT INTO activate (rowkey, qualifier, info, act_date, log_time) "
              "VALUES(%s, %s, %s, %s, %s) "
              "USING TIMESTAMP %s")
insert_data = (a_string, a_string, a_string, a_string, a_string, a_custom_timestamp)
session.execute(insert_sql, insert_data)
1
Blake's advice below is sound. Focusing on your original question: your code should work as shown. What makes you think it does not work from the driver? Could it be an issue with resolution in your custom timestamp? How are you generating that?Adam Holmberg

1 Answers

1
votes

This is an incorrect use-case for setting custom timestamps. It's important to not abuse this feature because it can have many unintended side-effects and can ultimately lead to unreliable data.

Use Lightweight Transactions (LWT) instead (also use PreparedStatements instead of raw strings!)

stmt = session.prepare("
           INSERT INTO activate (rowkey, qualifier, info, act_date, log_time)
           VALUES (?, ?, ?, ?, ?)
           IF NOT EXISTS
           ")
results = session.execute(stmt, [arg1, arg2, ...])

Read more here and here. LWT takes a performance hit from having to trigger a paxos consensus check, but it's better than a "read-then-write" approach.

Better yet, if you absolutely need to maximize write performance, consider modifying your data model to include a timestamp:

CREATE TABLE activate (
     rowkey text,
     insert_time timestamp,
     qualifier text,
     info text,
     act_date timestamp,
     log_time timestamp, 
   PRIMARY KEY (rowkey, insert_time));