I have time series data streaming in point by point, say every 5 seconds. And the points might arrive out of order. I want to aggregate in realtime up to higher timespans, say 5m, 30m, 60m. My primary concern is fast reads.
I'm interested in what techniques are common for performing this realtime aggregation. I'm thinking I'm going to need a long term store on disk, but for near realtime points I think I should be storing them in memory, to make it easier to aggregate.
Is the preferred way to store them in a memory cache (Redis) and then have a job that is triggered periodically that calculates the aggregate and flushes to disk? If so, what if I get point that arrives after the periodical job has run? Do I go back and throw away that point and calculate the period again?
I'm probably answering my own questions here, but I'm fishing for any alternatives out there.
Thanks in advance. Chris :-)