2
votes

My problem is that cassandra creates tombstones when inserting NULL values.

From what I understand, cassandra doesn't support NULLs and when NULL is inserted it just deletes the respective column. On one hand this is very space effective, however on the other hand it creates tombstones which degrades read performance.

This goes agains NoSql phillosophy because cassandra is saving space but degrading read performance. In NoSql world the space is cheap, however performance matters. I beleive this is the phillosophy behind saving tables in denormalized form.

I would like cassandra to use the same technique for inserting NULL as for any other value - use timestamping and during compaction preserve the latest entry - even if the entry is NULL (or we can call it "unset"). Is there any tweak in cassandra config or any approach how I would be able to achieve upserts with nulls without having tombstones ?

I came across this issue however it only allows to ignore NULL values

My use case: I have stream of events, every event identified by causeID. I'm receiving many events with same causeId and I want to store only the latest event for the same causeID (using upsert). The properties of the event may change from NULL to specific value, but also from specific value to NULL. Unfortunatelly the later case generates tombstones and degrades read performance.

Update

It seems there is no way how I could avoid tombstones. Could you advice me on techniques how to minimize them (set gc_grace_seconds to very low value). What are the risks, what to do when a node goes down for a longer period than gc_grace_seconds ?

3
What does null means in the later case - when changing from specific value to null?Alex Ott
Changing specific value to NULL simplified example: create table event (id text PRIMARY KEY, event text); insert into event ('1', 'specific value'); insert into event ('1', null);Tomas Bartalos

3 Answers

3
votes

You can't insert NULL into Cassandra - it has special meaning there, and lead to creation of tombstones that you observe. If you want to treat NULL as special value, why not to solve this problem on application side - when you get null status, just insert any special value that couldn't be used in your table, and when you read data back, check for that special value and output null to requester...

1
votes

You cannot avoid tombstones if you particularly mention NULL in your INSERT. C* does not do a lookup before insert or writing a data which makes the writes very faster. For this purpose, C* just inserts a tombstone to avoid that value later (taking the latest update comparing the timestamp). If you want to avoid tombstone (which is recommended), you've to prepare different combinations of queries to check each one for NULL before adding it to the INSERT. If you have very few fields to check then it'll be easy to just add some IF-ELSE statements. But if there are lots of them, the code will be bigger and less readable. Shortly, you cannot insert NULL which will impact read performance later.

Inserting null values into cassandra

0
votes

When we want to just insert or update rows using null for values that are not specified, and even though our intention is to leave the value empty, Cassandra represents it as a tombstone causing unnecessary overhead which degrades performance.

To avoid such tombstones for save operations, cassandra has the concept of unset for a parameter value.

So you can do the following to unset a field value while saving to avoid tombstone overhead for example:

1). If you are using express-cassandra then :

const user = new models.instance.User({
    user_id: 1235,
    user_name: models.datatypes.unset // this will not create tombstone when we want empty user_name or null
});
user.save(function(err){
    // user_name value is not set and does not create any unnecessary tombstone overhead
});

2). If you are writing cassandra raw query then for empty or null field you can add value as

user_name: { unset: true } // this i found on logging models.datatypes.unset

For more detail read the sub topic Null and unset values of https://express-cassandra.readthedocs.io/en/latest/datatypes/#cassandra-to-javascript-datatypes

NOTE: I tested this on cassandra 4.0

Hope this will help you or somebody else.

Thanks!