5
votes

I am working on Proof of Concept task. The task is to implement a feature of our product using Hadoop technology.

Feature is quite simple, we have a UI which will let you insert details about "Network Issue". All details about such a issue are captured and inserted into a table in Oracle DB. We then process data in this table and calculate a Health Score.

I have to use Hadoop instead of a traditional Db So my question is what to go for? Impala on HDFS? or Impala on Hbase ? or Hbase?

I am using a cloudera VM for the POC implementation.

As per my understanding, Hbase is NoSQL distributed database, which is actually a layer on HDFS , which provides java APIs to access data. Impala is a tool which also provides JDBC access to access data over Hbase or directly over HDFS. I am very new to hadoop, can some one please help?

1
Could you show some of your requirements? For example, some of your queries. HBase is designed to access <key, value> by key quickly. Impala is designed to run the SQL statements in seconds. They are different things and can be used together.zsxwing
Well, i dont have exact queries as of now. But requirement is as i said we create a table with around 10-15 columns. Each row in this table represents a network issue. We then frequently run a select query on this table and use the values of one column of this table and input it to a algorithm which will calculate the health score. Insertion of network issues can haapen randomly and frequently as well..Ameya Y

1 Answers

5
votes

Well, it depends on several things, like the kind of processing you are going to perform, desired response time etc. But by looking at whatever you have written here, HBase seems to be fine. I don't find any need of Impala as of now. HBase API is good and will serve your most of the needs.

IMHO, it's better to keep things simple initially and add a tool only if it is really required. Same holds good here. If you reach a point where you find that HBase API is not able to serve the purpose you could definitely add Impala to your stack.

That being said, there is one thing which you should keep in mind. HBase is a NoSQL DB and doesn't follow RDBMS conventions and terminologies. So, you might find it a bit strange initially. It's better to keep this in mind and then proceed as you have to design the schema in a way which is totally different from the RDBMS style of schema design.