0
votes

We are running Apache Cassandra 2.1.X and using Datastax driver. I've a use case where we need keep a count of various things. I came up with schema something like this:

create table count{
partitionKey bigInt,
type text,
uniqueId uuid,
primary_key(partitionKey, type, uniqueId)

So this is nothing but wide rows. My question is if I do something like
select count(uniqueId) from count where paritionKey=987 and type='someType' and this returns back with say 150k count.

  • Will it be a expensive operation for Cassandra? Is there a better way to compute count like this. I also want to know if anyone has solved something like this before?

  • I would prefer to stay away from keeping counter as it's not that accurate and keeping count at application level is anyways doomed to fail.

  • Also it will great to know how does Cassandra internally compute such data.

A big thanks to folks who help the community!

1

1 Answers

0
votes

Even if you specify partition key cassandra still needs to read 150k cell to give you the count

If you haven't specify the partition key cassandra needs to scan all the node's all row to give you the count.

Best Approach is to use counter table.

CREATE TABLE id_count (
    partitionkey bigint,
    type text,
    count counter,
    PRIMARY KEY ((partitionkey, type))
);

Whenever a uniqueId insert increment the count here.