We periodically collect the system statistics and dump into Cassandra as blob (Json) in one column for every one minute. This table has only one partition and the entries will not cross 100K
This table seems fine for dumping the data and reading the data based on time stamp. So far we are good.
We are planning to perform the predictive analysis for the system statistics, example for every minute we compare the current statistics with the history of the system statistics with our own logic (to be frank we have not completed the logic)
So if we use the query
Select statisticsjson, timestamp from stattable where partitionid = 'stat' and timestamp > X
Returns all the Json we need.
Now how to analyse the history of the Json data and warn the user that the current state of the system is in a dangerous state, which is the best tool for doing an analytics of this old Json data ?