2
votes

I am building an application which requires lot of data processing and analytics (processing tons of files at same time ).

I am planing to use Hadoop (Map-reduce , Hbase(HDFS file system)) for this.

At same time i have small dataset like user setting, application user listing ,payment information and other which can be easily managed on any RDMS database like sql or Mongo.

Some time it may have few aggregated and analysis data which is computed by Hadoop but that data is also not that big.

My question is whether i should pick 2 database like Mysql/Mongo for storing small dataset and HBase for big dataset ?

Or my HBase can do both job efficiently ?

1
request you to go through my answer stackoverflow.com/questions/37781992/… - Ram Ghadiyaram
feel free to ask questions. - Ram Ghadiyaram
Sir , Thanks for reply. Just reframing question : I am pretty much sure i will use Hadoop ( Map reduce , HDFS ) and Hbase for my computation problem. My question is can i also store my small dataset in Hbase like user setting and user info in hbase ? and it is right to do so ? - Pradeep Jaiswar
i think you will have mysql as backing store if you have hive installed. why cant you consider that. I am assuming that you are looking up this static small data set from your hbase + mapreduce (big dataset) - Ram Ghadiyaram
there is no restriction that small dataset cant be stored in hbase. But you cant able to join rows using joins and other sql features you will miss. - Ram Ghadiyaram

1 Answers

2
votes

My opinion you cant compare apple with banana. Hbase is schema less and From CAP theorem, CP is the main attention for hbase.

Where as CA is for RDBMS. please see my answer. RDBMS has these properties has schema , is centralized, supports joins, supports ACID, supports referrential integrity.

Where as Hbase is schema less , distributed, doesnt support joins ,no built-in support for ACID.

Now you can decide which one is for what based on your requirements.

Hope this helps!