1
votes

Lets consider following table taken from http://planetcassandra.org/blog/getting-started-with-time-series-data-modeling/

CREATE TABLE temperature 
(
    weatherstation_id text,
    event_time timestamp,
    temperature text,
    PRIMARY KEY (weatherstation_id,event_time)
);

So weatherstation_id is the partition key and event_time is the clustering column.
Data is loaded to that table and then we run query:

SELECT COUNT(1) FROM temperature WHERE weatherstation_id = '1234ABCD'

So actually we are asking for number of columns in underlying cassandra storage row.

1) Is it a O(1) operation?
2) If not - how to achievie O(1) in counting columns in a cassandra storage row? Use counters?

(I am using Cassandra v2.0.11)

Thank you

2

2 Answers

3
votes

It is not an O(1) operation, because it must scan the partition and count the number of columns. If you want a constant time count, you'll have to keep track of it some other way. You can use counter columns, but you should read this first.

0
votes

I'd probably use a roll-up approach for a problem like this. You store your events in a table, then periodically you run a task to aggregate whatever stats you need about the data, and then insert it into another table. The second table acts like a cache so that if you are running a webserver for example, it can serve those stats immediately. If you use a partition key that gets you directly to the row with the stat you want, then it would be O(1) access time. The drawback is the roll-up table would not have the exact count at any given moment, but with distributed computing, being close to the right answer is usually good enough.