First of all I wanted to clarify that I am learning about Hive and Hadoop (and big data in general), so excuse the lack of proper vocabulary.
I am embarking myself in a huge (at least for me) project which requires dealing with enormous quantities of data which I am not use to deal in the past as I always worked mostly with MySQL.
For this project a series of sensors will produce approximately 125.000.000 data points 5 times an hour (15.000.000.000 a day) which is several times more that everything I have ever inserted into every MySQL table combined.
I understand that one approach would be using Hadoop MapReduce and Hive to query and analyze the data.
The problem I am facing is that for what I could learn I understand Hive runs mostly like "cron jobs" and not with real time queries which may take many hours and require a different infrastructure.
I thought of creating MySQL tables based on the results of Hive queries as at most the data which will be needed to be queried in real time would be approximately 1.000.000.000 rows but I was wondering if this is the right way to go or I should look into some other technology.
Is there any technology I should study which is specifically created for real time queries on big data?
Any tip will be much appreciated!