Cassandra Compound primary key CQL3

Question

If I want to partition my primary key by time window would it be better (for storage and retrieval efficiency) to use a textual representation of the time or a truncated native timestamp ie

CREATE TABLE user_data (
user_id TEXT,
log_day TEXT, -- store as 'yyyymmdd' string
log_timestamp TIMESTAMP,
data_item TEXT,
PRIMARY KEY ((user_id, log_day), log_timestamp));

or

CREATE TABLE user_data (
user_id TEXT,
log_day TIMESTAMP, -- store as (timestamp-in-milli - (timestamp-in-mills mod 86400)
log_timestamp TIMESTAMP,
data_item TEXT,
PRIMARY KEY ((user_id, log_day), log_timestamp));

John John · Accepted Answer · 2013-07-05T07:46:55

Regarding your column key "log_timestamp": If you are working with multiple writing clients - which I suggest, since otherwise you probably won't get near the possible throughput in a distributed write-optimized data base like C* - you should consider using TimeUUIDs instead of timestamps, as they are conflict-free (assuming MAC addresses are unique). Otherwise you would have to guarantee that no two inserts happen at the same time, otherwise you will lose this data. You can do column slice queries on TimeUUIDs and other time based operations.

Cassandra Compound primary key CQL3

2 Answers