2
votes

My data looks something like 10 Million numerical values (real + binary) per frame (think array i.e. 10 Million elements in an row of an array) and there are around 100 frames/second. Kind of Time-Series.

The challenge for me is:

(1) Storage - Amount of Data

(2) Handling Velocity of Data

(3) Real Time Analytics

Is Cassandra suitable for this? Can anybody guide me a bit on the application architecture (Think hadoop, cassandra, kafka, storm etc.) which will work out in above scenario (from a very high level point of view).

I know I have asked something big. I need a direction to go before experimenting.

1

1 Answers

2
votes

As storage engine or Handling velocity of data, Cassandra and Hadoop will pass with flying colors.

Now coming into real time part, Cassandra can provide you a near real time solution, where as Hadoop alone is not enough (batch nature, map-reduce jobs). You can try with Hadoop with Storm, that will give you near real time capability, but it will increase the complexity of the solution (using spouts and bolts). Also you can try some rule engine, which will provide you an added advantage towards real time solution.