1
votes

Here I am again asking similar question after getting really a great explanation on How do secondary indexes work in Cassandra?

CREATE TABLE update_audit (
  scopeid bigint,
  formid bigint,
  time timestamp,
  operation int,
  record_id bigint,
  ipaddress text,
  user_id bigint,
  value text,
  PRIMARY KEY ((scopeid), formid, time)
  ) WITH CLUSTERING ORDER BY (formid ASC, time DESC)

FYI, operation Column possible values are 1,2 and 3. Low cardinality.

record_link_id high-cardinality. every entry can be unique.

user_id is the best candidate for Index according to How do secondary indexes work in Cassandra? and The sweet spot for cassandra secondary indexing.

Search should work based on

  • time with limit 100.
  • operation and time with limit 100.
  • user_id and time with limit 100.
  • record_id and time with limit 100.

Problems

total records more than 10,000M

which One is best - creating Index over operation, user_id and record_id and applying limit 100.

  1) Does Hidden columnfamily for index operation Will return only 100 results?

  2) More seeks will slow down the fetch operation?

OR Create a new columnfamily with definition like

CREATE TABLE audit_operation_idx (
  scopeid bigint,
  formid bigint,
  operation int,
  time timeuuid,
  PRIMARY KEY ((scopeid), formid, operation, time)
) WITH CLUSTERING ORDER BY (formid ASC, operation ASC, time DESC) 

 required two select query for single select operation.

So, if I will create new columnfamily for operation, user_id and record_id

I have to make a batch query to insert into these four columnfamilies.

   3) Does TCP problems will come? while executing batch query.because writes will be huge. 
   4) what else should I cover to avoid unnecessary problems. 
1

1 Answers

0
votes

There are three options.

  1. Create a new table and use bulk insert. If the size of insert query becomes huge you'll have to configure its related parameter. Don't worry about writes in Cassandra.

  2. Create a materialized View with required columns of where clause.

  3. Create secondary index if cardinality is low. (Not recommended)