4
votes

We plan to store our sensor time series data in cassandra and use spark/spark-ts to apply machine learning algorithms on it.

Unlike in the documentation, our time series data is irregular - unevenly spaced time series - as the sensors send the data event-based.

But most algorithms and models require regular time series.

  • Does spark-ts provide any function to transform the irregular time series to regular ones (using interpolation or time-weighted-average, etc.)?

  • If not, what would be a recommended approach to solve that problem ?

1

1 Answers

0
votes

spark-ts does not provide any function to transform irregular time series to regular ones.

How you handle irregularly-spaced time series depends on the goals you are trying to achieve through your analysis. Use cases for time series include prediction/forecasting, anomaly detection, or trying to understand/analyze past behaviour.

If you wish to use the algorithms available in spark-ts (as opposed to modeling your data through other statistical processes designed for event streams), one option is to divide the time axis into equally-sized bins, and then compute a summary of your data within each bin (e.g., the total, the mean, etc.). As you make your bins more and more fine-grained, the information lost due to quantizing the time dimension is minimized, but your data may be harder to model (so the bin size controls the tradeoff). And so, the binned data then forms an evenly-spaced time series, which you can analyze using typical time series techniques.