I am working on a Big Data solution for sensor data and predictive analytics. I am new to Big Data, and have read about the lambda-architecture. I thought about using Cassandra Database together with Hadoop. Cassandra is a high available and Partition tolerance database and Hadoop hdfs a file system for large analytics jobs.
If I receive the data from a Internet of Things Device, should the data be saved first in Hadoop and then to Cassandra? The lambda architecture has Hadoop in batch layer, receiving the data and sending it to the serving layer to a nosql database.
Why should the data be first in Hadoop? and what kind of data is stored in Cassandra if Hadoop contains the raw data?
The stream layer is out of Focus at the moment. I just want to understand the usage of Cassandra and Hadoop together.
The data in Hadoop is for large analytics and in cassandra there should be the result from my Hadoop jobs.
Does that mean i can store my raw data in both? i can store my raw data in Cassandra and in Hadoop if not only the large analytics jobs are useful for my application?
Example
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (’1234ABCD’,’2013-04-03 07:02:00′,’73F’);
if this is my insert and i have thousands of them in one single minute. I want to do some large jobs i use Hadoop ?
But also i need every single Data Row for my application without analytics. Cassandra is storing it too?