0
votes

I know that due to Cassandra's distributed nature, it doesn't offer a way to update a column value for an entire table.

However, I do have a requirement where I need to do that. I need to update an "enum" column value to a new value. All the remaining values of that "enum" should be left unchanged. This column is not part of my primary key.

I can think of two alternatives to achieve that, but both seem a bit convoluted and have problems I'd rather not have to deal with:

a) Write code in java, scala, node, or whatever language with drivers for cassandra.

  1. Read a stream from that cassandra table (which will include my primary key)
  2. Update the value I want to update in my object.
  3. Save each row individually based on the primary key.

However, I need to write some code to use only once then throw away, and I will most likely have problems to have approval from my company to run this in other environments (mainly production).

b) Write cql scripts that execute in two separate steps.

  1. First cql script dumps the table to a csv file.
  2. A bash script will replace the values I want to change in the csv file (using sed or something).
  3. A second cql script will drop my table and materialized views, recreate them from scratch, then COPY the updated csv back to the table, and recreate all my materialized views.

This solution is error prone depending how the tables are recreated. Unless I use the result of describe <table>, it may lead to leaving out creation of indexes, or not apply some changes not present when the table was originally created. It also requires down time, while the tables are dropped and recreated, plus the time to replicate data until I have the desired consistency.

Is there any other option? Any way to use a materialized view that includes this field I need to update and somehow use that for updates?

Can I do an update using the token() function? I tried, but got a message saying it can't be used in where clauses for updates.

Any other simple and clean way to do these updates?

1
There were some similar discussions. In this thread, it was necessary to set a null value. Unfortunately, the methods that you described, seem to be the solution...Horia

1 Answers

-1
votes

The big question is should the update be atomic. Also is the data changed a lot for that column. As an option you can create a new column and populate it with an updated value. At the same time restrict i/o operations on affected partitions. After the operation is done enable ops. You need to keep track of processed partitions at the same time.