0
votes

I'm trying figure out what the best practice is for aggregating and rolling up Cassandra time series data.

I came across this this page which mentions Opscenter can be used for roll-ups, but I don't think this will work for me since I'm not using the enterprise version of Cassandra.

I would like to aggregate time series data into several buckets (1 minute, 30 minutes, 1 hour, 4 hours, 12 hours, 1 day, 3 days, etc).

I would like to use this data to generate charts for various time resolutions, similar to bitcoinwisdom.

What is the recommended approach for implementing this? I'm new to Cassandra.

1

1 Answers

2
votes

That page describes how OpsCenter does roll-ups, not that it can be used for roll-ups.

From what I can gather OpsCenter does the following:

  • the individual data points are stored in a table/columnfamily, keyed by (metric id, timestamp)
  • it aggregates (min, max, avg) the individual data points into multiple roll-ups (1min, 5min, 2h & 24h), on the fly and in memory
  • at the end of the roll-up period the aggregates are stored into their own tables/columnfamilies

If that approach works for you depends 100% on your use case: how much data you're receiving and how much of it do you want stored, how you want to aggregated the data [i.e. for larger time frames min and max for can be precisely computed from smaller ones but for something like the average there's some precision loss] and so on.