0
votes

Cassandra has row cache to improve read performance. I have a use case where a table has 20 fields in which only 2 fields (f1 and f2) update/change more frequently for any given row while other fields are pretty static.

If row cache initially contains a row value (entire row) for a particular primary key K1, reading this row (entire) will be fast. Later if I update f1 and f2 fields for this row (assume f1 and f2 new values are in memtable - in memory) then

1) will reading this row (entire row) be equally fast i.e. will there be any disk access?

2) will reading just f1 and f2 fields (values are in memtable) for this row be fast?

3) will reading just other values of this row (other than f1 and f2 - which have not changed/altered/updated for long and are in row cache) be fast?

1

1 Answers

1
votes

If a write comes in for a row, the row cache for it is invalidated and is not cached again until it is read.

Cassandra read path :

  1. If the row is in the row cache, return the data
  2. Else Check the bloom filter. If the bloom filter indicates the row does not exist in SSTables, then we do not have to read the SSTables, read only from MemTable.
  3. Else read MemTable and read each SSTable that must be read and merge with the data from the MemTable
  4. Update the row cache with the merged data.
  5. The merged data is returned

Cassandra Read Path

So in your case at first entire row of the key K1 is in the row cache. then you updated f1 and f2 so the entire row is invalidated from the row cache.

  1. If you read the entire row, row cache miss and data will be read from MemTable or Both from MemTable and SSTables. So it will be slow

  2. If you read f1 and f2, row cache miss and If the data not in the SSTables then read only from the MemTable (Fast) otherwise read both from MemTable and SSTables (Slow).

  3. Read field other than f1 and f2, must be in the SSTables so data will be read from both SSTables and MemTable. So it will be slow