Here I am again asking similar question after getting really a great explanation on How do secondary indexes work in Cassandra?
CREATE TABLE update_audit (
scopeid bigint,
formid bigint,
time timestamp,
operation int,
record_id bigint,
ipaddress text,
user_id bigint,
value text,
PRIMARY KEY ((scopeid), formid, time)
) WITH CLUSTERING ORDER BY (formid ASC, time DESC)
FYI, operation Column possible values are 1,2 and 3. Low cardinality.
record_link_id high-cardinality. every entry can be unique.
user_id is the best candidate for Index according to How do secondary indexes work in Cassandra? and The sweet spot for cassandra secondary indexing.
Search should work based on
- time with limit 100.
- operation and time with limit 100.
- user_id and time with limit 100.
- record_id and time with limit 100.
Problems
total records more than 10,000M
which One is best - creating Index over operation, user_id and record_id and applying limit 100.
1) Does Hidden columnfamily for index operation Will return only 100 results?
2) More seeks will slow down the fetch operation?
OR Create a new columnfamily with definition like
CREATE TABLE audit_operation_idx (
scopeid bigint,
formid bigint,
operation int,
time timeuuid,
PRIMARY KEY ((scopeid), formid, operation, time)
) WITH CLUSTERING ORDER BY (formid ASC, operation ASC, time DESC)
required two select query for single select operation.
So, if I will create new columnfamily for operation, user_id and record_id
I have to make a batch query to insert into these four columnfamilies.
3) Does TCP problems will come? while executing batch query.because writes will be huge.
4) what else should I cover to avoid unnecessary problems.