High number of tombstones with TTL columns in Cassandra

Question

I have a cassandra Column Family, or CQL table with the following schema:

CREATE TABLE user_actions (
    company_id varchar,
    employee_id varchar,
    inserted_at timeuuid,
    action_type varchar,
    PRIMARY KEY ((company_id, employee_id), inserted_at)
) WITH CLUSTERING ORDER BY (inserted_at DESC);

Basically a composite partition key that is made up of a company ID and an employee ID, and a clustering column, representing the insertion time, that is used to order the columns in reverse chronological order (newest actions are at the beginning of the row).

Here's what an insert looks like:

INSERT INTO user_actions (company_id, employee_id, inserted_at, action_type)
VALUES ('acme', 'xyz', now(), 'started_project')
USING TTL 1209600; // two weeks

Nothing special here, except the TTL which is set to expire in two weeks.

The read path is also quite simple - we always want the latest 100 actions, so it looks like this:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
LIMIT 100;

The issue: I would expect that since we order in reverse chronological order, and the TTL is always the same amount of seconds on insertion - that such a query should not scan through any tombstones - all "dead" columns are at the tail of the row, not the head. But in practice we see many warnings in the log in the following format:

WARN [ReadStage:60452] 2014-09-08 09:48:51,259 SliceQueryFilter.java (line 225) Read 40 live and 1164 tombstoned cells in profiles.user_actions (see tombstone_warn_threshold). 100 columns was requested, slices=[-], delInfo={deletedAt=1410169639669000, localDeletion=1410169639}

and on rare occasions the tombstone number is large enough to abort the query completely. Since I see this type of schema design being advocated quite often, I wonder if I'm doing something wrong here?

Can you run a trace on that table and post the output? Maybe in a gist? — Patrick McFadin
Hi, i'm looking into the same pattern that i suppose would be a nice usecase for Cassandra ; have you found a solution? — darma
@darma We've done a few things over the last few months that helped us bring down the number of tombstones. You should probably look at the gc_grace_seconds and unchecked_tombstone_compaction settings, as well as different compaction strategies such as the new DateTieredCompactionStrategy detailed here: datastax.com/dev/blog/datetieredcompactionstrategy . Hope these will help steer you in the right direction. — ozk
@ozk : thanks a lot, we've checked gc_grace_seconds already, will look into unchecked_tombstone_compaction as well with what you're suggesting. — darma
@ozk, did Dominic's answer make any impact ? i am not sure I comprehend the logic — treefrog

Dominic Dominic · Accepted Answer · 2015-01-05T07:07:04

Your SELECT statement is not giving an explicit sort order and is hence defaulting to ASC (even though your clustering order is DESC).

So if you change your query to:

SELECT action_type FROM user_actions
WHERE company_id = 'acme' and employee_id = 'xyz'
ORDER BY inserted_at DESC
LIMIT 100;

you should be fine

High number of tombstones with TTL columns in Cassandra

2 Answers