Cassandra Cql schema best practice

Question

Here I am again asking similar question after getting really a great explanation on How do secondary indexes work in Cassandra?

CREATE TABLE update_audit (
  scopeid bigint,
  formid bigint,
  time timestamp,
  operation int,
  record_id bigint,
  ipaddress text,
  user_id bigint,
  value text,
  PRIMARY KEY ((scopeid), formid, time)
  ) WITH CLUSTERING ORDER BY (formid ASC, time DESC)

FYI, operation Column possible values are 1,2 and 3. Low cardinality.

record_link_id high-cardinality. every entry can be unique.

user_id is the best candidate for Index according to How do secondary indexes work in Cassandra? and The sweet spot for cassandra secondary indexing.

Search should work based on

time with limit 100.
operation and time with limit 100.
user_id and time with limit 100.
record_id and time with limit 100.

Problems

total records more than 10,000M

which One is best - creating Index over operation, user_id and record_id and applying limit 100.

  1) Does Hidden columnfamily for index operation Will return only 100 results?

  2) More seeks will slow down the fetch operation?

OR Create a new columnfamily with definition like

CREATE TABLE audit_operation_idx (
  scopeid bigint,
  formid bigint,
  operation int,
  time timeuuid,
  PRIMARY KEY ((scopeid), formid, operation, time)
) WITH CLUSTERING ORDER BY (formid ASC, operation ASC, time DESC) 

 required two select query for single select operation.

So, if I will create new columnfamily for operation, user_id and record_id

I have to make a batch query to insert into these four columnfamilies.

   3) Does TCP problems will come? while executing batch query.because writes will be huge. 
   4) what else should I cover to avoid unnecessary problems.

Sunil Harak Sunil Harak · Accepted Answer · 2017-11-28T09:21:07

There are three options.

Create a new table and use bulk insert. If the size of insert query becomes huge you'll have to configure its related parameter. Don't worry about writes in Cassandra.
Create a materialized View with required columns of where clause.
Create secondary index if cardinality is low. (Not recommended)

Cassandra Cql schema best practice

1 Answers