How to ensure data consistency in Cassandra on different tables?

Question

I'm new in Cassandra and I've read that Cassandra encourages denormalization and duplication of data. This leaves me a little confused. Let us imagine the following scenario:

I have a keyspace with four tables: A,B,C and D.

CREATE TABLE A (
  tableID int,
  column1 int,
  column2 varchar,
  column3 varchar,
  column4 varchar,
  column5 varchar,
  PRIMARY KEY (column1, tableID)
);

Let us imagine that the other tables (B,C,D) have the same structure and the same data that table A, only with a different primary key, in order to respond to other queries.

If I upgrade a row in table A how I can ensure consistency of data in other tables that have the same data?

Jeff Jirsa Jeff Jirsa · Accepted Answer · 2015-12-12T02:37:01

Cassandra provides BATCH for this purpose. From the documentation:

A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/server and server coordinator/replicas. However, because of the distributed nature of Cassandra, spread requests across nearby nodes as much as possible to optimize performance. Using batches to optimize performance is usually not successful, as described in Using and misusing batches section. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will. To achieve atomicity, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty for atomicity. If you do not want to incur this penalty, prevent Cassandra from writing to the batchlog system by using the UNLOGGED option: BEGIN UNLOGGED BATCH

UNLOGGED BATCH is almost always undesirable and I believe is removed in future versions. Normal batches provide the functionality you desire.

How to ensure data consistency in Cassandra on different tables?

2 Answers